Amor Fati

Stress-Testing AIGC Video System

by Breaking It on Purpose

Jan 10, 2026 · Kenneth Hung · 8 min read

Over the holidays, I did what I always do with new creative tools—pushed them until they broke.

I've spent years as a Product Creative Director shipping at scale—consumer effects, creator tools, APIs, templates—leading UX teams that built AR/AI experiences for billions of users at Meta. That work taught me something: you don't understand a system by following the happy path. You understand it by finding the edges.

So I took Google Flow and gave myself one constraint: a single image—my LinkedIn profile photo—as the only identity reference. From that, I built a surreal short called Amor Fati, inspired by my turbulent childhood.

The film served two purposes:

  • 1. Creative Challenge

    Could I push one identity reference across wildly different aesthetics—cyberpunk, classical painting meets sci-fi, horror cinematics, video game environments—while maintaining emotional arc?

    The scenes are intentionally dense—layered environments, symbolic imagery, deliberate pacing—because I wanted to see if my creative instincts could translate through generative tools.

  • 2. Technical Stress Test

    I pushed Veo 3.1 with complex VFX transitions, aggressive morphing sequences, multi-axis camera movements, dense scene compositions, and rapid environmental shifts.

    Not to see what the system does well, but to find where it strains—and what that reveals about the road ahead.

All images and clips were generated in Google Flow using Veo 3.1 and Nano Banana Pro.
Exported at native resolution (1280×720).

High-res upscaling was tested but introduced visible artifacts (see notes below).
Final assembly and timing edits were completed in iMovie due to limited fine-grained editing in Scene Builder.

What I Learned
(And What I'd Build Next)

Before assembling the final cut, I ran hundreds of prompt variations—testing structure, hierarchy, reference density, and temporal framing.

Not to find the "best" prompt, but to map the system's behavior: where identity stabilizes, where continuity degrades, how frame-to-frame interpolation compares to full scene regeneration.

That exploration shaped both the film and the observations below.

  • 1. Identity Persistence Is the Core Design Tension

    Using my own face was intentional—I am the character. Across multiple scenes with heavy morphing and frame-to-frame generation, identity drift still appeared. Even with restructured prompts, explicit negative constraints, and reinforcement, the system would introduce a different Asian male face.

    This isn't a bug—it's a structural challenge. Existing approaches—fine-tuning (LoRA, DreamBooth), embedding-based methods (IP-Adapter, InstantID)—each trade off consistency against flexibility. For narrative, advertising, or branded content, solving this is non-negotiable.

    What would actually unlock this:

    — Project-level identity references that persist across scenes
    — Scene-level reference stacking (not constant re-uploads)
    — Explicit reference carryover for frame-to-frame generation

    Solving identity persistence doesn't just improve quality—it expands the category of stories creators can tell. This is where AI video shifts from demo to production tool.

  • 2. Anatomical Consistency Isn't Solved Yet

    In one scene featuring Guanyin, the intended motion was simple: water pours from a vase held in her left hand. Despite explicit prompts, masking, and negative constraints, the model repeatedly switched which hand performed the action.

    This isn't a prompting failure—it's architectural. Current models don't maintain persistent skeletal tracking across frames. "Left hand" and "right hand" aren't stable internal concepts—the system optimizes for gesture realism frame-by-frame, not anatomical continuity over time.

    I tried multiple approaches: anchoring the water origin spatially, keeping hands static, removing the pouring gesture initiation entirely. None reliably solved it.

    The reality:

    Until models maintain object-anchored reasoning across time, this remains a design-around constraint—not a prompting problem with a prompting solution.

  • 3. Scene Composition > Clip Generation

    Flow's Scene Builder is intuitive and fast. Extend seamlessly continues from the last frame; Jump maintains identity across cuts—both smart solutions to the 8-second generation limit.

    But these solve continuity within a scene. The real unlock is transitions between scenes.

    Most AI videos rely on jump cuts because that's what the tools make easy. Cinematic storytelling lives in the in-between moments: the match cut, the morph, the breath between scenes. Right now, those require manual work outside the tool.

    What I'd build:

    A lightweight transition layer—cut, match, morph, dissolve—where the AI generates the bridge between two different scenes. Select clip A, select clip B, choose transition type, generate. This reduces post-production fragmentation, encourages intentional pacing, and moves the tool from clip assembly toward actual scene composition.

  • 4. From Fragments to Narrative

    Sound design and music work well at the 8-second scene level. In another Flow project, I explored Latin-inspired scoring, and the tonal quality held up.

    The challenge is continuity. Each scene behaves as an isolated fragment—no throughline, no arc. And audio has harder unsolved problems: dialogue and lip-sync, voice consistency across scenes, music beats and rhythms that align across cuts, sound effects that match generated environments.

    The unlock:

    A final assembly layer for global music, transitions, voiceover, and audio continuity. That's where creators shift from assembling clips to shaping narrative. That's where AI video stops feeling experimental and starts feeling intentional.

  • 5. Export Quality Has a "Soft vs. Artificial" Problem

    Flow offers three export options: 270p animated GIF (not practical for production), 720p original (soft but coherent), and 1080p upscaled (sharper but artifact-prone). The 1080p neural upscaling often over-synthesizes—hallucinating texture detail that wasn't in the original across the entire image. On faces, the effect is especially damaging: the model adds what looks like wrinkles and skin imperfections, making faces look unnaturally aged or degraded.

    The 720p original feels more visually coherent—just too soft for final delivery. Creators shouldn't have to choose between soft but natural and sharp but hallucinated.

    This compounds in Scene Builder, where exports appear limited to 720p with no option for controlled upscaling at the scene level. That forces extra post-processing steps and inconsistent workflows—exactly the kind of pipeline fragmentation that pulls creators out of the tool.

    What would help:

    — Multiple upscaling profiles (neutral / filmic / sharp) with different synthesis aggressiveness
    — 1080p+ exports directly from Scene Builder
    — Face-aware upscaling that preserves rather than invents texture
    — Quality parity with tools like Topaz

  • 6. Provenance Drift Is the Unspoken Problem

    Early prompts often reference named artists—Dalí, Escher, Ocampo—to steer tone and visual language. But once initial images are generated, those AI outputs become the references for subsequent iterations. The lineage collapses into synthetic intermediates.

    Downstream, I encountered something unexpected: three different scene images, each with what looked like a signature in the lower-right corner—similar in style but not identical. I hadn't prompted for authorship or attribution.

    Generative models synthesize from learned patterns rather than retrieving stored works. But signatures appearing across multiple generations blur that distinction—raising questions about how stylistic influence propagates and re-materializes in ways neither creator nor platform can explain.

    This is provenance drift: as creators iterate through AI-generated references, visibility into where influence originated rapidly degrades. For personal work, acceptable. For commercial contexts, ambiguity around attribution and compensation becomes harder to ignore.

    This matters beyond ethics. The industry is splitting: AI-native production agencies are emerging rapidly, while traditional players—illustrators, VFX houses, unionized talent, brand legal teams—remain skeptical. On social media, the criticism is loud: AI "steals" artists' work, trained on their images without consent or compensation. Whether you agree or not, the perception is real and it's blocking adoption. Without provenance clarity, this divide deepens. The tools that build trust infrastructure—not just capability—will be the ones that bridge it.

    What the industry needs:

    — Influence tracking across generation chains
    — Opt-in artist participation framework
    — Mechanisms to acknowledge and compensate sources of inspiration

    As generative tools mature, ethics isn't separate from UX—it's a question of observability. Building trust in AI-native creative systems may depend on it.

The Unsexy Problem:
Observability at Scale

Building Amor Fati required generating hundreds—possibly thousands—of images and video clips. At that volume, UX friction compounds.

What I noticed

  • Visual indicators for selected assets (outlines, hearts) don't always clearly reflect state across views

  • Scene-level highlights show inclusion, but active vs. historical assets can blur

  • Even with filtering and favorites, forming a clear project-level view is difficult

For solo creators, this is manageable. For agencies, studios, or anyone with budget accountability, it's a blocker.

What I'd want to see

  • Total assets generated per project

  • Credit usage by model/quality tier

  • Cost per scene, cost per minute

  • Aggregate project cost with drill-down

These aren't just operational metrics—they're learning tools. They reveal which prompt structures and generation approaches produce the best results. They help creators plan, optimize, and scale.

Platform outcomes this enables

  • Identity-stable projects → higher completion rates, more credits consumed per project, lower churn

  • Reduced tool-switching → deeper engagement, stronger retention, higher lifetime value (LTV)

  • End-to-end workflows → professional-tier adoption, higher ARPU, team/enterprise expansion

I suspect the Flow team is already thinking about this as they expand toward professional use cases. Observability, control, and cost clarity will be critical for turning a powerful tool into a platform.

Visual Direction: Prompting as Cinematography

One face. One logo. Zero environment references.

These scenes were built entirely through prompting—framing, lighting, color, composition, mood. This is what creative direction looks like when your only tool is language.

(A known limitation: text generation remains unreliable. Some of the Chinese characters in these scenes are gibberish—a reminder that current models see text as texture, not meaning.)

Reference 1: Self-portrait (LinkedIn profile photo)

Reference 2: Logo (wardrobe detail)

Outlook + Process

  • Where This Is Going

    AI video is at an inflection point. We're moving from generation (make me a clip) to systems (help me build a film). The tools that win will be the ones that solve:

    1. Identity — persistent characters across scenes and sessions

    2. Continuity — transitions, pacing, narrative throughline

    3. Control — project-level management, cost visibility, exportable quality

    4. Integration — audio, voiceover, and post as first-class citizens

    Google Flow has strong foundations. The UX is thoughtful, the creative ceiling is high, and the Scene Builder points toward the right abstraction. What comes next is the shift from creative toy to creative infrastructure.

  • The Process Behind the Film

    Building Amor Fati meant developing a working methodology for constrained AI filmmaking:

    Single identity anchor: One reference image, used consistently

    Prompt iteration as exploration: Not optimizing for output, but mapping system behavior

    Deliberate breaking: Pushing morph intensity, scene complexity, and temporal jumps to find failure modes

    Assembly as authorship: The edit—pacing, juxtaposition, sound—is where the film actually gets made

    If there's interest, I'm happy to share the full prompt structures, constraint frameworks, and assembly decisions that went into this project.

Let's Talk

I'm looking for my next leadership role in AI-generated content—building the systems that take this from experimental to essential.

If you're working on these problems, I'd like to hear what you're seeing. And if you're building a team to tackle identity, continuity, or end-to-end creative workflows, let's talk.

CONTACT ME