Designing
AI Quality Systems

A Four-Part Framework for
AI-Generated Commerce Video

December, 2025 · Kenneth Hung

A four-part case study asking: How do we ensure AI-generated video quality for social media commerce?

Using TikTok Shop as a test case, the exploration proposes a three-layer quality framework: Floor (binary safety gate), Ceiling (numeric optimization target), Style (categorical contextual fit). The framework is designed to surface bias, build user trust, and target GMV lift.

Part 1 introduces the framework
Part 2 scales it across eight verticals and three markets
Part 3 applies it to cross-regional adaptation (one product, three markets)
Part 4 (coming soon) applies it to authenticity, making AI-generated content feel human

The case studies show what AI Behavior Design could look like as a practice. They cover four things: setting safety thresholds that adapt to different content categories, defining quality by benchmarking against the work already performing well, breaking style into specific attributes a model can recognize, and turning human judgment into signals an AI system can actually learn from.

The result is design exploration of an emerging discipline that sits at the intersection of policy, product, data, and ML engineering.

Part 1: Floor / Ceiling / Style for AI-Generated Commerce Video

Introduces a three-layer quality framework for AI-generated commerce video: Floor as a binary safety gate, Ceiling as the numeric optimization target benchmarked against top performers, Style as categorical match across format, aesthetic, and product category.

Covers how each layer maps to a distinct annotation method, how the framework turns human judgment into machine-learnable signals, and where bias enters when quality systems codify behavior without asking who that behavior serves.

The result is both a quality specification and a responsible AI structure.

View Part 1

Part 2: Scaling the Framework Across Verticals, Markets, and Execution

Picks up where Part 1 left off and covers how the framework would scale: eight verticals with their own Floor / Ceiling / Style configurations, three markets launching the same product under different regulatory regimes, and ten failure modes to design against at scale.

Outlines AI Behavior Design as a discipline with four responsibilities, a 90-day plan to stand up the function, and a sample rubric criterion illustrating what production rubric authoring would look like.

Part 2 is both a proposed scaling architecture for AI quality systems, and a conceptual portrait of an emerging design discipline that sits at the intersection of policy, product, data, and ML engineering.

View Part 2

Part 3: Cross-Regional Adaptation. From Framework to Applied Concept

User's problem: "I want to sell these sneakers across three different regions (APAC, US/EU, and LATAM) to grow GMV. But I don't know how to create video content that fits each region's aesthetic and drives sales."

The solution: A concept design for AIGC Studio, a mobile-first AI video generation tool for TikTok Shop. A built-in AI Creative Director suggests target audiences, creative direction, and creative briefs per region, based on top-performing patterns per market. The AI quality framework from Part 2 runs invisibly across the seller's five-step flow. Includes an interactive prototype you can test live.

Grounded in human-in-the-loop design, the work spans AI Behavior Design, product and UX architecture for AI-native tools, cross-cultural content design, design system discipline, and an honest documentation of where current AIGC technology actually lands.

VIEW Part 3

Part 4: Authenticity Through Imperfection. Making AI Content Feel Human

Explores a reusable framework designed to make AI-generated TikTok content feel like authentic human creation rather than AI production. Diagnoses five failure modes (script, footage, performance, audio, editing) where AI content reads as too perfect, and applies a three-layer prompt architecture (Floor, Ceiling, Style) for constraining AI generation toward imperfection.

Walks through a five-scene narrative template (pain point → failed solutions → discovery → proof → CTA) with full visual prompts, scripts, humanization rules, and regenerate conditions per scene. Includes a template library showing how the framework could adapt across narrative formats (problem-solution, before/after, unboxing, review, GRWM).

This is a design study of prompt architecture for AIGC trust signals, and the case that authenticity is a framework problem, not a model problem.

coming soon!

DesigningAI Quality Systems