Designing
AI Quality Systems

Part 2:
Scaling the Framework Across Verticals, Markets, and Execution

December 18, 2025 · Kenneth Hung · 20 min read

Part 1 introduced Floor / Style / Ceiling as a quality framework for AI-generated content. Part 2 is about what happens when you try to deploy that framework across real categories, real markets, and real teams.

Three sections walk through the operational layer:

  1. How the framework adapts across eight product verticals from beauty to health

  2. How it scales across markets with different regulatory regimes, stress-tested with launching immunity gummies in the US, Brazil, and Indonesia

  3. What the practice of AI Behavior Design actually looks like in production.

1. Category Specificity:
Why One Model Can't Serve All Verticals

Scaling Across Verticals

Category-Specific Signals

The three-layer structure is universal. The signal values inside it are not. Every category has a different core challenge — and each needs its own signal definitions and training data. One generic model can't serve them all.

💄 Beauty & Personal Care
Core Challenge
Believable results
🚫 Floor
No false claims + real skin
📈 Ceiling
Before/after + texture close-up
🎨 Style
Tutorial + Clean aesthetic
🎬 AIGC Output
Hook "Watch this in 3 seconds…"
Edit Slow-mo texture close-up
👗 Fashion & Apparel
Core Challenge
Wearability & styling
🚫 Floor
Worn demo + size info
📈 Ceiling
Multi-angle + scene cuts
🎨 Style
Transformation + Y2K aesthetic
🎬 AIGC Output
Hook "One piece, 3 ways to wear it…"
Edit Beat-synced outfit changes
🍜 Food & Beverages
Core Challenge
Craveability
🚫 Floor
Real consumption + clear package
📈 Ceiling
ASMR audio + appetite shots
🎨 Style
ASMR + Cottagecore
🎬 AIGC Output
Hook "Wait until you hear this bite…"
Edit Amplified ASMR + captions
🌿 Health Products
Core Challenge
Trust without claims
🚫 Floor
No medical claims + cert display
📈 Ceiling
Ingredient education + use case
🎨 Style
Education + Quiet luxury
🎬 AIGC Output
Hook "Here's what's actually inside…"
Edit Animated ingredient breakdown
🏠 Home & Living
Core Challenge
Use-case pain points
🚫 Floor
Function demo + size reference
📈 Ceiling
Problem→Solution + before/after
🎨 Style
Before/After + Japandi
🎬 AIGC Output
Hook "This problem is finally solved…"
Edit Split-screen before/after
💻 Tech & Electronics
Core Challenge
Proof & comparison
🚫 Floor
Function demo + specs shown
📈 Ceiling
Competitor comparison + real tests
🎨 Style
Unboxing + Cyberpunk
🎬 AIGC Output
Hook "Real-world comparison is in…"
Edit Animated data chart overlay
🐾 Pet Supplies
Core Challenge
Real pet reaction
🚫 Floor
Pet using product + safety
📈 Ceiling
Cute reaction + owner interaction
🎨 Style
Daily life + Warm aesthetic
🎬 AIGC Output
Hook "Watch their reaction…"
Edit Slow-mo pet expression
🧸 Toys & Collectibles
Core Challenge
Reveal payoff
🚫 Floor
Unbox + product details
📈 Ceiling
Surprise reveal + creative play
🎨 Style
Unboxing + Dopamine
🎬 AIGC Output
Hook "You won't believe what's inside…"
Edit Reveal-moment FX
Same universal framework. Different local signals.

The structure is universal.
The signals are not.

A single model trained on all categories' data will average toward the largest category and underserve the others. Per-category Ceiling signals keep performance balanced across the vertical mix.

Deep Dive · Per Vertical

Beauty & Personal Care Pet Supplies Fashion & Apparel Food & Beverages Toys & Collectibles Tech & Electronics Health Products Home & Living

How the three-layer framework operationalizes for one vertical — from signal definitions to AIGC production roles to measurable outcomes. Switch between verticals below to see how the framework adapts.

Stage 1 of 3
Define the quality target
🚫 Floor
Must pass
Product ClarityFull product visible
Real Skin 👤No heavy filters
No False Claims 👤No exaggerated efficacy
Usage DemoCreator applies product
Lighting QualityResult clearly visible
🚫 Floor
Must pass
Real Pet UseActual pet using product
Pet Safe 👤Non-toxic safe materials
Size FitBreed/weight specified
Product ClearFull product visible
Pet Comfort 👤No distress/discomfort
🚫 Floor
Must pass
Worn DemoMust show on body
Size InfoSize/height/weight shown
Real Fit 👤No slimming filters
Fabric VisibleFabric texture clear
No False Claims 👤Accurate material claims
🚫 Floor
Must pass
Real Tasting 👤Must show actual eating
Package VisibleFull package shown
Fresh IngredientsNo spoilage signs
Hygienic Setting 👤Clean environment
No False ClaimsNo exaggerated health claims
🚫 Floor
Must pass
Full ProductComplete product shown
Real SizeHand-held size reference
Brand/IP ClearBrand or IP visible
Contents InfoWhat's included/specs
Safety/Age 👤Kids items: age/safety noted
🚫 Floor
Must pass
Function DemoShow actual operation
Specs ClearKey specs visible
Size CompareHand-held/object reference
Real Device 👤Not renders/mockups
CompatibilityCompatible devices/OS
🚫 Floor
Must pass
No Medical Claims 👤No cure/treatment claims
Credentials Shown 👤Certifications visible
Ingredients ClearFormula clearly listed
Target Users 👤Clear who can/cannot use
No Exaggeration 👤Realistic expectations
🚫 Floor
Must pass
Function DemoShow actual use result
Size ReferenceHand-held/object comparison
Easy to UseSimple operation shown
Real Results 👤No speed-up/special FX
Use Cases 👤Kitchen/bath/storage context
📈 Ceiling
Benchmark top 5%
3s HookInstant result reveal
Before/After 👤Real transformation
Texture Close-upSlow-mo texture detail
Color Swatches3+ color options
Multi-product LayeringRoutine across products
📈 Ceiling
Benchmark top 5%
Cute ReactionSurprise/happy expression
Before/After 👤Grooming/cleaning result
Multi-PetDifferent breeds/sizes
Owner-Pet BondHeartwarming interaction
Problem SolvedPet parent pain point
📈 Ceiling
Benchmark top 5%
Multi-AngleFront/side/back views
MovementWalk/turn/sit demo
Scene Changes2+ styling scenarios
Outfit StylingFull outfit coordination
Beat SyncBeat-matched transitions
📈 Ceiling
Benchmark top 5%
ASMR AudioAmplified crunch/chew
Appetite ShotsSteam/pull/cut reveals
Prep ProcessFull cooking/brewing
Taste DescriptionVivid sensory words
Genuine Reaction 👤Authentic taste reaction
📈 Ceiling
Benchmark top 5%
Unboxing RevealCard pull/blind box surprise
Collection Value 👤Rarity/limited/hidden edition
Detail ShotsCraftsmanship close-ups
Series DisplayFull set/collection shown
Play DemoActual play experience
📈 Ceiling
Benchmark top 5%
UnboxingFull unpack + accessories
Real-World TestActual use case testing
Comparisonvs competitor/old version
Hidden Features"X features you didn't know"
Sound DesignAmplified startup/alert sounds
📈 Ceiling
Benchmark top 5%
Ingredient Education 👤Expert knowledge simplified
Use ScenariosReal-life context shown
Data VisualizationCharts & infographics
Expert Endorsement 👤Doctor/expert backing
Testimonials 👤Authentic user stories
📈 Ceiling
Benchmark top 5%
Problem→SolutionShow frustration, then fix
Satisfying MomentCleaning/organizing/cutting joy
Before/AfterUsage result comparison
Multi-UseShow multiple functions
Surprise RevealUnexpected result wow
🎨 Style
Match context
Format
📚 Tutorial·🔄 Before/After·⚡ Fast-cut
Aesthetic
💧 Clean·🌈 Dopamine·🤍 Quiet Luxury
Audience 👤
💃 18-25 Trendy·💼 25-35 Pro·👵 35+ Anti-age
Scene Vibe
✨ Studio·🏠 Home·🌿 Outdoor
Music
🎵 Trending·🎹 Soft BGM·🎧 ASMR/Silent
🎨 Style
Match context
Format
📸 Cute Daily·💗 Pet Care Tutorial·📊 Before/After
Aesthetic
💗 Warm Cozy·💧 Clean Fresh·✨ Playful Cute
Audience 👤
🐶 Dog Parent·🐱 Cat Parent·🐹 Small Pet Parent
Scene Vibe
🏠 Home Daily·🌿 Outdoor Walk·🛁 Grooming
Music
🎵 Upbeat Happy·🎹 Warm Healing·✨ Cute BGM
🎨 Style
Match context
Format
🔄 Transformation·📚 Styling Tutorial·⚡ Fast-cut Change
Aesthetic
🌟 Y2K Retro·🤍 Minimalist·🔥 Streetwear
Audience 👤
🎒 Student·💼 Office Wear·👗 Mature Elegant
Scene Vibe
🏙️ Street Style·🏠 Home Try-on·✨ Runway Vibe
Music
🎵 Strong Beat·🎹 Soft BGM·🎧 Trending Beats
🎨 Style
Match context
Format
🎧 ASMR·📚 Cooking Tutorial·⚡ Quick Taste
Aesthetic
🌾 Cottagecore·💗 Warm Appetite·💧 Fresh Healthy
Audience 👤
🍿 Snack Lover·🥗 Health Focus·🍽️ Family Meals
Scene Vibe
🏠 Home Kitchen·🌳 Outdoor Picnic·🫖 Tea Time
Music
🔇 ASMR Silent·🎵 Upbeat·🎹 Healing BGM
🎨 Style
Match context
Format
📦 Unboxing Reveal·🏆 Collection Show·🎮 Play Demo
Aesthetic
🌈 Dopamine·🎨 Refined Display·✨ Colorful Fun
Audience 👤
🎯 Collectors·🃏 Card Hobbyist·👨‍👩‍👧 Parents
Scene Vibe
🛒 Display Shelf·📦 Unboxing Set·🏠 Home Play
Music
🎵 Trendy Beat·🎉 Surprise SFX·🧒 Playful Kids
🎨 Style
Match context
Format
📦 Unboxing Review·⚡ Quick Demo·🔄 Comparison
Aesthetic
🌃 Cyberpunk·🖤 Tech Black·🤍 Minimal White
Audience 👤
🧑‍💻 Tech Enthusiast·💼 Business Pro·🎒 Student
Scene Vibe
🖥️ Desktop Setup·🎒 Outdoor Portable·🏠 Home Use
Music
🎛️ Electronic Beat·🔔 Product Sounds·🎵 Tech BGM
🎨 Style
Match context
Format
📚 Education·🧪 Ingredient Analysis·📖 User Story
Aesthetic
🤍 Quiet Luxury·🌿 Natural Fresh·🔬 Scientific
Audience 👤
💼 Office Wellness·🏃 Fitness·🧓 Senior Health
Scene Vibe
🏥 Clinical·🏠 Home Daily·🏃 Sports Setting
Music
🎹 Calm Soft·🤍 Quiet Pro·🌿 Healing
🎨 Style
Match context
Format
🔄 Before/After·⚡ Quick Demo·📚 How-to Tutorial
Aesthetic
🍃 Japandi·✨ Satisfying·🤍 Clean Minimal
Audience 👤
👩‍🍳 Kitchen Pro·🧼 Cleaning Fan·📦 Organization Lover
Scene Vibe
🍳 Kitchen·🛁 Bathroom·🏠 Home Daily
Music
🎵 Upbeat·🎧 ASMR Effect·✨ Satisfying BGM
Stage 2 of 3
Generate content against the target
📝Script Gen
FloorFilter banned words ("whitening," "fade dark spots")
CeilingInject before/after narrative structure
StyleMatch skin type + age tone 👤
🎬Video Gen
FloorReal skin, no heavy filters 👤
CeilingAuto texture close-up + application sequence
StyleApply Clean / Dopamine aesthetic template
🤖Avatar
FloorNo medical implications 👤
CeilingTrust swatch + reaction sequence 👤
StyleGlam expert / Skincare pro persona
✂️Smart Edit
FloorRemove over-filtered clips
CeilingFind texture + reveal moments
StyleTutorial pacing + beat template
🔄Variants
FloorAll versions: no false claims
CeilingTest color / skin-type hooks
StyleAdapt for age segments 👤
📝Script Gen
FloorEnforce breed/size info
CeilingInject pain point + cute moment
StyleMatch pet type tone
🎬Video Gen
FloorEnsure no pet distress 👤
CeilingAuto capture cute reaction moments
StyleApply warm/playful aesthetic
🤖Avatar
FloorCorrect product usage demo
CeilingPet interaction + surprise reaction
StylePet parent / Pet expert persona
✂️Smart Edit
FloorRemove distress/forced clips
CeilingFind cute highlight + reaction moments
StyleCute pacing + healing template
🔄Variants
FloorConsistent safety info all versions
CeilingTest different cute/pain hooks
StyleAdapt for pet type segments 👤
📝Script Gen
FloorEnforce size/height info
CeilingInject multi-scene structure
StyleMatch audience tone 👤
🎬Video Gen
FloorEnsure full body visible
CeilingAuto multi-angle transitions
StyleApply Y2K / Minimalist aesthetic
🤖Avatar
FloorRealistic body proportions 👤
CeilingConfident pose + turn sequence
StyleFashion blogger / Girl-next-door 👤
✂️Smart Edit
FloorRemove unflattering clips
CeilingFind best movement moments
StyleBeat-sync outfit template
🔄Variants
FloorConsistent size info all versions
CeilingTest different styling hooks
StyleAdapt for audience segments 👤
📝Script Gen
FloorFilter health exaggerations
CeilingInject sensory words + appetite triggers
StyleMatch snack / health / family tone 👤
🎬Video Gen
FloorEnsure fresh food appearance
CeilingAuto appetite shots + steam
StyleApply cottagecore / warm aesthetic
🤖Avatar
FloorHygienic appearance
CeilingAuthentic tasting + enjoyment 👤
StyleMukbang host / Home chef persona 👤
✂️Smart Edit
FloorRemove unhygienic clips
CeilingFind bite / cut / pull moments
StyleASMR pacing + audio boost
🔄Variants
FloorAll versions food-safe compliant
CeilingTest flavor / scene hooks
StyleAdapt for audience segments 👤
📝Script Gen
FloorEnforce brand / IP info
CeilingInject unboxing suspense + reveal
StyleMatch collector / card / parent tone
🎬Video Gen
FloorEnsure size reference clear
CeilingAuto unboxing + detail close-ups
StyleApply cool / refined aesthetic
🤖Avatar
FloorAccurate product info delivery
CeilingUnbox surprise + collector explain
StyleCollector / Hobbyist persona 👤
✂️Smart Edit
FloorRemove unclear product info clips
CeilingFind reveal + reaction highlights
StyleSuspense pacing + SFX template
🔄Variants
FloorConsistent brand / IP all versions
CeilingTest rarity / series hooks
StyleAdapt for collector / card / parent audiences 👤
📝Script Gen
FloorEnforce key specs info
CeilingInject hidden features + comparison
StyleMatch enthusiast / business / student tone 👤
🎬Video Gen
FloorEnsure real device footage
CeilingAuto unboxing + test scenes
StyleApply tech black / minimal aesthetic
🤖Avatar
FloorAccurate operation demo
CeilingExpert explain + discovery reaction
StyleTech blogger / Gadget pro persona 👤
✂️Smart Edit
FloorRemove unclear specs / function clips
CeilingFind unboxing + test moments
StyleTech pacing + sound design template
🔄Variants
FloorConsistent specs all versions
CeilingTest different feature / scene hooks
StyleAdapt for enthusiast / business / student audiences 👤
📝Script Gen
FloorForce-filter medical terms 👤
CeilingInject ingredient edu + scenarios
StyleMatch wellness audience tone 👤
🎬Video Gen
FloorEnsure credentials visible 👤
CeilingAuto ingredient chart animations
StyleApply luxury / scientific aesthetic
🤖Avatar
FloorProfessional, no medical implied 👤
CeilingExpert delivery + trust expression 👤
StyleNutritionist / Health advisor persona 👤
✂️Smart Edit
FloorRemove any medical claims 👤
CeilingFind edu highlights + data moments
StylePro pacing + chart transition
🔄Variants
FloorAll versions compliance-checked 👤
CeilingTest ingredient / scenario hooks
StyleAdapt for audience segments 👤
📝Script Gen
FloorEnforce usage instructions
CeilingInject problem→solution + wow structure
StyleMatch kitchen / clean / organize tone
🎬Video Gen
FloorEnsure hand-held size reference
CeilingAuto satisfying moment shots
StyleApply satisfying / clean aesthetic
🤖Avatar
FloorClear operation demonstration
CeilingPain empathy + surprise reaction
StyleHome hack / Lifestyle blogger persona 👤
✂️Smart Edit
FloorRemove function unclear / effect blurry
CeilingFind satisfying highlight + comparison moments
StyleBefore/after + satisfying pacing template
🔄Variants
FloorConsistent function info all versions
CeilingTest different pain point / use hooks
StyleAdapt for kitchen / cleaning / organizing audiences 👤
Stage 3 of 3
Learn from outcomes
3s Retention +28%
58% → 74%
What it measures Watch-rate at the 3s mark — TikTok's algorithmic surfacing threshold.
*Hypothesized lift over baseline
Key drivers 🎬Video texture + ✂️Edit pacing
Completion +37%
38% → 52%
What it measures Full-video watch rate — proxy for re-exposure in the algorithm.
*Hypothesized lift over baseline
Key drivers 📝Before/after + 🎬Swatches
CVR +64%
2.8% → 4.6%
What it measures Click-to-purchase within session — the bottom-line outcome.
*Hypothesized lift over baseline
Key drivers 🤖Avatar trust + 🔄Variant targeting
3s Retention +23%
62% → 76%
What it measures Watch-rate at the 3s mark — pet visuals already hook fast; lift comes from holding viewers past it.
*Hypothesized lift over baseline
Key drivers 🎬Cute hook + ✂️Reaction moments
Completion +33%
42% → 56%
What it measures Full-video watch rate — owner-pet bond moments sustain attention to completion.
*Hypothesized lift over baseline
Key drivers 📝Pain-point setup + 🎬Bond moments
CVR +58%
2.4% → 3.8%
What it measures Click-to-purchase within session — pet purchases skew deliberate; conversion is the harder lift.
*Hypothesized lift over baseline
Key drivers 📝Problem-solve script + 🔄Pet-type variants
3s Retention +31%
52% → 68%
What it measures Watch-rate at the 3s mark — fashion's visual hook is competitive but not unique; multi-angle + beat sync lift it past the bar.
*Hypothesized lift over baseline
Key drivers 🎬Multi-angle + ✂️Beat sync
Completion +47%
34% → 50%
What it measures Full-video watch rate — scene changes and styling sequences drive viewers all the way through.
*Hypothesized lift over baseline
Key drivers 📝Scene structure + 🎬Movement demo
CVR +55%
1.8% → 2.8%
What it measures Click-to-purchase within session — fashion is impulse-friendly, but high return rate keeps lift moderate vs beauty.
*Hypothesized lift over baseline
Key drivers 🤖Avatar confidence + 🔄Variant targeting
3s Retention +33%
54% → 72%
What it measures Watch-rate at the 3s mark — food's visual appetite hook is among the strongest categories on TikTok.
*Hypothesized lift over baseline
Key drivers 🎬Appetite shots + ✂️ASMR audio
Completion +50%
36% → 54%
What it measures Full-video watch rate — ASMR audio + prep process pulls viewers all the way through.
*Hypothesized lift over baseline
Key drivers 📝Sensory script + 🎬Prep process
CVR +72%
1.8% → 3.1%
What it measures Click-to-purchase within session — strong impulse + sensory triggers drive the highest realistic CVR lift across verticals.
*Hypothesized lift over baseline
Key drivers 🤖Genuine reaction + 🔄Variant targeting
3s Retention +30%
56% → 73%
What it measures Watch-rate at the 3s mark — unboxing suspense and the reveal mechanic are TikTok-native hooks with strong stopping power.
*Hypothesized lift over baseline
Key drivers 🎬Unboxing suspense + ✂️Surprise SFX
Completion +47%
38% → 56%
What it measures Full-video watch rate — the reveal-payoff structure pulls viewers all the way to the unbox moment.
*Hypothesized lift over baseline
Key drivers 📝Reveal structure + 🎬Series display
CVR +65%
1.6% → 2.6%
What it measures Click-to-purchase within session — collector psychology drives strong conversion; FOMO on limited drops accelerates it.
*Hypothesized lift over baseline
Key drivers 🤖Rarity demo + 🔄Variant targeting
3s Retention +28%
50% → 64%
What it measures Watch-rate at the 3s mark — tech viewers scroll for specs; unboxing hook + sound design pulls them in.
*Hypothesized lift over baseline
Key drivers 🎬Unboxing hook + ✂️Sound design
Completion +47%
30% → 44%
What it measures Full-video watch rate — hidden features and comparison structure reward viewers who stay through to the end.
*Hypothesized lift over baseline
Key drivers 📝Hidden features + 🎬Real-world test
CVR +42%
1.2% → 1.7%
What it measures Click-to-purchase within session — tech is research-heavy; viewers cross-reference reviews before purchasing, which caps video-driven CVR.
*Hypothesized lift over baseline
Key drivers 🤖Expert credibility + 🔄Variant targeting
3s Retention +29%
48% → 62%
What it measures Watch-rate at the 3s mark — health content struggles to hook fast; credential displays and data visualizations earn attention.
*Hypothesized lift over baseline
Key drivers 🎬Credential display + ✂️Data visualization
Completion +50%
28% → 42%
What it measures Full-video watch rate — education structure and scenario examples reward sustained attention; viewers stay to learn.
*Hypothesized lift over baseline
Key drivers 📝Education script + 🎬Scenario examples
CVR +40%
1.0% → 1.4%
What it measures Click-to-purchase within session — health buyers are skeptical, deliberate, and often consult doctors first; video CVR is intentionally the lowest.
*Hypothesized lift over baseline
Key drivers 🤖Expert trust + 🔄Audience targeting
3s Retention +29%
55% → 71%
What it measures Watch-rate at the 3s mark — home pain-point hooks ("does your sink look like this?") + satisfying visuals work in tandem.
*Hypothesized lift over baseline
Key drivers 🎬Pain-point hook + ✂️Satisfying moments
Completion +50%
34% → 51%
What it measures Full-video watch rate — the problem→solution arc rewards viewers who stay for the payoff moment.
*Hypothesized lift over baseline
Key drivers 📝Problem→solution arc + 🎬Before/after comparison
CVR +50%
1.4% → 2.1%
What it measures Click-to-purchase within session — home buyers are aspirational but moderately deliberate; impulse-friendly when problem-solving is clear.
*Hypothesized lift over baseline
Key drivers 🤖Surprise reaction + 🔄Use-case targeting
System learns: outcomes refine signal weights
⚠️ Hypothesized values — validate with internal data · 👤 Requires human labeling

2. Scaling Globally:
What Stays Universal, What Localizes

The framework is designed for global scale by separating what transfers from what doesn't.

Universal & Local

Global Framework Architecture

Some layers are globally unified, others are market-customized. Hover the legend below to see which layers transfer.

🌐Infrastructure Layer
Universal core. Fast new-market launch base.
~95%
Global Reuse
Core Infrastructure
Universal
Compute, storage, networking
Base Model
Universal
Foundation model weights & serving
Pipeline Architecture
Universal
Generation, evaluation, deployment flow
Monitoring Framework
Universal
Telemetry, logging, alerting
🚫Floor Layer
Universal framework + local rules. Strict localization avoids legal risk.
~60%
Global Reuse
Technical Quality
Universal
Image/audio fidelity standards
Regulatory Compliance
Regional
GDPR, FTC, advertising law
Cultural Taboos
Local
Religion, political sensitivities
📈Ceiling Layer
Structure unified, numbers localized. Reusable scaffolding for fast scaling.
~70%
Global Reuse
Funnel Logic
Universal
Hook → Demo → CTA structure
Benchmark Numbers
Regional
Retention thresholds vary by market
Conversion Patterns
Local
Purchase path, payment, trust signals
🎨Style Layer
Highly localized. Needs local team input. Content earns relevance.
~20%
Global Reuse
Regional Aesthetics
Regional
Y2K, Quiet Luxury, Cottagecore
Cultural References
Local
Memes, symbols, in-jokes
Language & Tone
Local
Translation, register, slang
Platform Expression
Local
Native pacing, format conventions
*Hypothesized reuse ratios based on cross-market case studies

Stress-Testing the Framework: Launching Immunity Gummies in Three Markets

Immunity gummies is deliberately chosen as the hardest case. The product carries serious regulatory weight (FDA, ANVISA, BPOM), high cultural sensitivity (Halal certification, trust signals), and the failure surface is broad (legal action, consumer rejection, commercial flop). If the framework holds up here, it holds up everywhere.

Universal & Local · Applied

Three Markets, One Framework

Watch the framework operate on a real product. Immunity gummies launching in the US, Brazil, and Indonesia. The structure is global. Every signal value is local.

🎯Why immunity gummies is the right example
Floor stakes are clearest
Failure means legal action, takedowns, brand crisis. Not "low conversion."
Cross-market difference is extreme
Gelatin source (Halal in Indonesia), efficacy claims (FDA vs ANVISA vs BPOM). No gray area, must adapt.
All three layers indispensable
Floor = entry ticket, Ceiling trust patterns differ entirely, Style cultural symbols vary widely.
🇺🇸
United States
FDA / FTC regime
🚫 Floor
FDA / FTC
  • Dietary supplements need no pre-market approval
  • Can say "supports immune health" (structure/function claim)
  • Cannot say "boosts immunity" beyond support, or "prevents cold"
  • Banned: "cure," "treat," "prevent disease"
  • Ingredients in Supplement Facts format
  • DSHEA disclaimer required: "Not evaluated by FDA. Not intended to diagnose, treat, cure, or prevent any disease."
⚠ Failure scenario
"Clinically proven to cure colds" in a US gummies ad. Violates FTC substantiation rules. FDA enforcement action and brand penalty.
📈 Ceiling
"Ingredient + Data" trust
3s retention
≥70%
Product shown
≤5s appears on screen
Trust source
Science evidence + Supplement Facts + NSF/USP certification + doctor endorsement
Hero line
"Doctor-formulated. Just clean ingredients."
CTA
Soft "view ingredient details"
Duration
30–45 seconds (willing to read ingredients)
⚠ Failure scenario
Showing the product without ingredient transparency to a US audience. Viewers expect to see what's in it. Trust collapses, retention drops, conversion fails.
🎨 Style
"Clean Science"
Visual
Clean, professional, credible
Color
White / light green / natural
Models
Diverse skin tones, inclusivity
Music
Lo-fi / acoustic
Symbols
Self-care, wellness
⚠ Failure scenario
Loud, high-energy family styling deployed in the US market. Reads as foreign advertising. US audience expects clinical, ingredient-focused tone. "This isn't for me."
🇧🇷
Brazil
ANVISA regime
🚫 Floor
ANVISA
  • Food supplements require prior ANVISA notification (not registration)
  • Can say "fonte de vitamina C" (vitamin C source)
  • Cannot say "previne gripes" (prevents colds)
  • Cannot say "fortalece imunidade" without clinical evidence
  • Strict upper nutrient limits per population group (IN 28/2018, Annex IV)
  • 100% Portuguese labels + mandatory warning "Este produto não é um medicamento"
⚠ Failure scenario
"Previne gripes" (prevents colds) in a Brazil ad. ANVISA classifies as unauthorized medical claim. Takedown, fine, and product registration risk.
📈 Ceiling
"Relationship network" trust
3s retention
≥75%
Product shown
≤7s appears on screen (longer setup)
Trust source
Family recommendation + community endorsement
Hero line
"Minha mãe me recomendou" ("My mother recommended this")
CTA
Direct "Compre 30% desconto" (Buy, 30% off)
Duration
45–60 seconds (full story arc)
⚠ Failure scenario
Soft "tap to view details" CTA in Brazil. Brazilian audience expects discount and direct purchase ask. The CTA doesn't match the relationship-trust culture. Conversion drops.
🎨 Style
"Energia Familiar"
Visual
Warm, vibrant, family-centered
Color
Warm yellow / orange / sunlight
Models
Local faces + family scenes
Music
Funk / Pop brasileiro
Symbols
Família, Energia
⚠ Failure scenario
Cold, clinical US-style aesthetic deployed in Brazil. Minimal style reads as distant and uncaring. "This brand doesn't get us." Family doesn't endorse, social trust collapses.
🇮🇩
Indonesia
BPOM + Halal MUI regime
🚫 Floor
BPOM + Halal (BPJPH)
  • BPOM ML certification is the basic entry requirement
  • Halal certification mandatory (87% Muslim market). BPJPH issues, LPPOM MUI inspects
  • Gelatin source is the core question:
  • ✗ Pork gelatin → automatic rejection (haram)
  • ✓ Bovine (Halal-slaughtered) / fish / plant-based
  • Halal logo (BPJPH purple, or legacy MUI green through Oct 2026) must be visible
⚠ Failure scenario
Pork gelatin product launched in Indonesia. Muslim consumers detect the gelatin source. Social media crisis, lawsuit risk, permanent brand damage. The Halal floor is non-negotiable.
📈 Ceiling
"KOL + Religious authority" trust
3s retention
≥80% (highly competitive feed)
Product shown
≤3s appears on screen + Halal logo
Trust source
KOL endorsement + visible Halal certification
Hero line
"Sudah bersertifikat Halal" ("Already Halal certified")
CTA
Social proof "Sudah terjual 10.000+" ("Already 10,000+ sold")
Duration
15–30 seconds (fast pace)
⚠ Failure scenario
Product IS Halal certified but the video doesn't emphasize the Halal logo. Muslim consumers can't verify, default to uncertainty, won't purchase. Trust must be visible, not assumed.
🎨 Style
"Halal Wellness"
Visual
Fresh, natural, religiously appropriate
Color
Green / white (Islamic positive)
Models
Hijab + Non-Hijab versions
Music
Pop Indo / Dangdut
Symbols
Halal, Berkah (blessing)
⚠ Failure scenario
Non-Hijab models in an Indonesia launch. Muslim women feel unrepresented. "This brand doesn't respect our culture." Negative word of mouth, sales tank.

Three-layer framework for health products: Floor is non-negotiable, Ceiling adapts to trust culture, Style is highly localized.

🚫 Floor = legal redline 📈 Ceiling = trust pattern 🎨 Style = cultural resonance

10 Failure Modes to Design for Upfront

Failure modes from the World

The framework can break in predictable ways when scaling globally. The first set of failures comes from the diversity of markets, cultures, and commerce contexts the framework must adapt to.

Failure modes from the System

The second set comes from how the framework operates internally — its data, its scale, its people, and how the optimization itself can go wrong.

Gen AI Video Pipeline

How the Framework Generates a Market-Ready Video

The Floor / Ceiling / Style framework operating end-to-end as a generation pipeline. Indonesia walked step by step as the most complex of the three markets.

Step 1
Brief → Market config
  • Brief: launch immunity gummies in Indonesia
  • Resolves to: BPOM ML + Halal + Indonesian
  • Channel: TikTok Shop
  • Locale stack: Global + APAC + ID
Step 2
🚫Floor input check
  • BPOM ML cert?
  • Halal cert (BPJPH)?
  • Gelatin source?
  • Pork gelatin → reject
Step 3
📈Ceiling template
  • 15–30 second fast pace
  • Product + Halal within 3s
  • KOL recommend hook
  • Urgency + social proof CTA
Step 4
🎨Style application
  • Green / white palette
  • Halal logo upper right
  • Hijab version
  • Pop Indo music
Step 5
🚫Floor output check
  • Forbidden medical claims removed?
  • Halal logo visible 3+ seconds?
  • Models religiously appropriate?
  • → Pass: ship
  • ↻ Single fail: regenerate
  • ⟲ Pattern fail: framework refinement
📹 Output · Indonesia market video
Duration
22 seconds
Hook
KOL + Halal close-up
Trust signals
Halal logo + "50,000+ sold"
CTA
"Diskon 25% hari ini!"

3. The Practice of AI Behavior Design

The AI Behavior Designer's role

The Floor / Ceiling / Style framework doesn't write itself. Someone has to decide that "no medical claims" is a hard binary, not a 0–100 score, and that the line shifts when the category does.

That decision is behavior design. It isn't policy work (which writes the rules upstream) and it isn't engineering (which trains the models). It's the judgment work that translates platform standards into AIGC pipeline behavior. Across all the diagrams above, that work breaks into four responsibilities:

The Role · In Context

Where AI Behavior Design Operates

The framework runs as code. The AI Behavior Designer designs how it should behave. Here is what the role owns, decides, and updates across the pipeline.

AI Behavior Design is a discipline in formation. Across companies it goes by many names — Model Behavior, Model Policy, Responsible AI, Trust & Safety, Content Integrity. The work converges on the same problem: ensuring AI-generated content quality at scale, where the surface of design isn't UI but the system's behavior itself. At each framework layer, the AI Behavior Designer translates domain expertise into artifacts that ship as code. Across all layers, the role aggregates findings into framework refinements.

🌐 Infrastructure
Behavior designer
Specifies behavioral capabilities the system must support. Translates desired model behaviors into infrastructure requirements.
Specifies
Multimodality coverage, latency tolerances, multilingual scope, context length needs
Delivers
Behavioral capability spec
Required model behaviors translated into infrastructure asks. Latency targets, modality support, multilingual scope, context length.
per release planning cycle
🚫 Floor
Behavior designer
Translates regulatory requirements into testable rubrics that ship as code. Bridges legal/policy intent and engineering implementation.
Translates
Specific banned phrasing, mandatory disclosure language, certification visibility tests, per-market compliance criteria
Delivers
Compliance rubrics
Per-market regulatory criteria translated into testable rules. Banned phrasing, mandatory disclosures, certification visibility requirements.
updated when regulations change
📈 Ceiling
Behavior designer
Translates behavioral targets into eval rubrics the pipeline scores against. Bridges product targets and pipeline scoring.
Translates
Retention benchmarks per market, trust signal scoring criteria, performance thresholds, eval rubrics for each layer
Delivers
Eval benchmarks
Retention thresholds, trust signal scoring, duration calibrations. The behavioral targets the pipeline aims for in each market.
quarterly recalibration
🎨 Style
Behavior designer
Codifies aesthetic intuition into taxonomies and match rules. Bridges cultural insight and structured rubrics.
Codifies
Visual / color / music / symbol vocabularies per market, cultural appropriateness criteria, taxonomies for style-to-market matching
Delivers
Style match taxonomies
Aesthetic vocabularies per market. Visual, color, music, and symbol rules. Cultural appropriateness criteria.
updated with cultural shifts
🧩 All layers
Behavior designer
Aggregates findings across markets and layers into framework refinements. Closes the loop between production outputs and framework evolution.
Aggregates
Recurring failure patterns across markets, ambiguous edge cases, drift indicators, framework version requirements
Delivers
Eval datasets
Test cases that probe each layer's rules. Pass/fail criteria. Edge cases that surface ambiguity in the rubric.
continuous
Failure pattern reports
Aggregated findings from production outputs. Recurring failures, drift indicators, recommended framework updates.
per cycle (weekly/monthly)
Framework version notes
What changed, what failure pattern triggered the change, what the change should produce in the next cycle.
per release
The feedback loop the role operates within
Outputs do not retrain the model. They reveal patterns. The AI Behavior Designer translates those patterns into framework updates that the pipeline then runs against.
📹
Outputs accumulate
Thousands of generated videos per market per week
📊
Patterns detected
Recurring failures, drift, ambiguous cases surface
🧠
Behavior Designer reviews
Identifies whether the failure is a rubric gap, a threshold issue, or a model limitation
✍️
Framework refined
Updates Floor rules, Ceiling benchmarks, or Style match rules
🚀
Pipeline updated
New framework version runs against the next batch of generation requests

The system generates videos. The framework defines what "good" means. The AI Behavior Designer keeps that definition accurate as markets evolve, content trends shift, and regulations change.

These rubrics live at the evaluation layer, not the training layer. The model itself is trained upstream via RLHF, DPO, RLAIF, Constitutional AI, fine-tuning, and reward modeling — those are the lab-level levers. Floor / Ceiling / Style rubrics define "good" for the system that runs in production, and failure patterns reveal what to update at the framework layer.

See a sample rubric criterion: BC-04 Result-First Hook (Beauty Vertical, Ceiling Layer)

Illustrative excerpt showing the depth and format of production rubric authoring. Real rubrics span 8-12 criteria per vertical-market combination, with parallel artifacts for calibration data, annotator guidelines, and eval pipeline specs. Part 3 and 4 will apply criteria like this one to AI-generated videos.

The first 90 days

The first 90 days for an AI Behavior Designer standing up the function: audit, MVP, validation. The scope isn't the whole AIGC pipeline. It's the eval/rubric layer that turns Floor / Ceiling / Style into a working signal system, starting with one pilot vertical.

The 90-day deliverable: a replicable framework for one category, with data showing it works.

Org structure

How the AI Behavior Design role lives within an organization: who it works with, how the work gets measured, and how the function scales beyond MVP. The role isn't a solo function. It's the connective tissue between policy, engineering, data science, and operations.

Continue to Part 3 & 4

The framework above is theory until applied. Parts 3 and 4 each apply Floor / Ceiling / Style to a different use case.

Part 3: Cross-Regional Adaptation One product, three markets, one quality system. The framework in scaling mode.

Part 4: Authenticity Through Imperfection(coming soon!) Making AI-generated content feel human. The framework in trust mode.

Read in any order. Both stand alone.

Together they demonstrate two things at once:

  • The framework working

  • The AI fluency that defining quality systems for generative AI now requires

Appendix

Appendix · Adjacent System

How AI Behavior Design Feeds RLHF

The AI Behavior Designer defines rubrics at the evaluation layer. Those rubrics feed RLHF training via preference labels, reward signal definitions, and post-deployment observations. Here is how that handoff works across the 4-stage loop.

🎨 Design
Defines standards · Interprets results
📊 DS
Statistical validation · Correlations
⚙️ Eng
Model training · A/B infra
Scope: The AI Behavior Designer does not train models — ML researchers and Eng do. The Designer's surface is rubric definition (what training optimizes toward) and rubric interpretation (why a model update worked or didn't). DS handles validation, Eng handles infrastructure.
1
🎬 Generate Variants
Design defines · What to generate
  • Floor: product visible in 2s
  • Ceiling: hook must attract
  • Style: beauty uses aspirational tone
📦 Signal Framework
3 layers × 8 verticals
📦 Style Guide
Best-practice examples
2
📊 Preference Signal
Design defines · How to judge
  • Good/bad criteria
  • Rating rubric (1-5 scale)
  • Edge case handling
📦 Annotation Guide
Labeler instructions
📦 Eval Rubric
Multi-dimensional scoring
3
⚖️ Update Reward Model
Design interprets · Why it works
  • Reusable patterns across verticals
  • Recommend rubric reweighting
  • Flag preference data gaps
📦 Weight Updates
Signal adjustment log
📦 Insight Report
Why it works
4
🚀 Model Iteration
Design observes · Next iteration
  • Output distribution shifts
  • Behavioral regressions to investigate
  • Next-iteration hypotheses
📦 Drift Report
Where outputs are shifting
📦 Hypothesis Spec
Next experiment plan
🔬 Real Example: Beauty Hook Format Validation
AI Behavior Designer owns observation, hypothesis design, insight, and rubric recommendation (steps 1, 2, 4, 5). DS owns the A/B data (step 3). ML researchers operationalize the rubric update.
① Observe
Analyzed top performers
Top 10% of beauty videos used "result-first" hooks (show outcome before problem).
② Hypothesis
Two hook formats
A: "One swipe and it's gone" (result-first)
B: "Can't cover those dark circles?" (pain-point)
③ Data
A/B test results
3s retention: A 67% vs B 42%
CVR: A 3.2% vs B 1.8%
④ Insight
User mental model
Users already know their pain. They scroll looking for solutions, not validation of their problem.
⑤ Update
Recommend signal weight
AI Behavior Designer proposes "result-first" +25% in next Beauty Ceiling rubric. ML researchers integrate the updated rubric into reward model training.
Core value: AI Behavior Design writes the reward function in human language. ML training learns to operationalize it. The rubric layer and the training layer are different surfaces, with the same goal: machine-learnable creative judgment.
Appendix · Sample Artifact

Sample Eval Dashboard

What an eval run output looks like at the operations level. Illustrative mockup showing a single run against the Beauty US-EU rubric (including BC-04 from the linked criterion document). Real dashboards are interactive, with filter controls, drill-downs, and time-range selectors; this is a static representative view.

Eval Dashboard — Beauty / US-EU v3.2 rubric × generation-model 4.1
Run eval-2026-05-08-001 · n 487 · 2026-05-08 14:23
Floor pass rate
87.3%
▼ 2.1pts vs prior run
Mean Ceiling score
3.41
▲ 0.18 vs prior run
Style match rate
79.5%
— flat
Inter-rater κ
0.78
▲ 0.03 vs prior cal.
Floor pass rate by vertical
target ≥ 90%
Beauty
91.2%
Fashion
89.4%
Food
93.1%
Health
72.4%
Home
90.2%
Tech
83.5%
Pet
94.3%
Toys
91.0%
Ceiling score distribution
all verticals, n=487
12
42
87
154
132
60
0 1 2 3 4 5
Distribution skews toward 3-4 (mid-range). Mode at score 3.
Low tail (0-1) = 11.1% of outputs. Right shift vs. prior run.
Floor pass rate · last 8 runs
2026-03 → 2026-05
95% 85% 75% 87.3%
Latest run (2026-05-08): 87.3% — regression of 2.1pts. Investigate.
Top Ceiling criteria by score drag
Beauty vertical
ID Criterion Mean Δ prior
BC-09 Claim verifiability 2.41 −0.34
BC-07 Face visibility 3.12 −0.18
BC-11 Pattern saturation 3.28 ±0.02
BC-04 Result-first hook 3.89 +0.22
BC-02 Brand aesthetic 4.14 +0.07
Action items from this run
3 items · review 2026-05-15
P1
Health vertical Floor failure spike — 72% pass (target 90%). Root cause likely new acne-treatment claim policy update not propagated to Floor rubric. Cross-validate FC-03 against current FDA guidance and reissue Floor v2.4.
@m-okonkwo · 09-19
P2
BC-09 (Claim verifiability) regression — −0.34 vs prior, dragging aggregate Ceiling. Suspect generation-model 4.1 over-produces efficacy claims without disclaimers. Recalibrate BC-09 with stricter disclaimer-presence anchor; coordinate with ML team on next fine-tune.
@r-patel · 09-25
P3
BC-04 (Result-first hook) improvement — +0.22 confirms result-first weighting from prior iteration is working. Hold weight at +25%; schedule next saturation check in Q4 to prevent BC-11 (pattern saturation) drift.
@e-tanaka · 10-15