Amor Fati

Stress-Testing Google Flow by Breaking It on Purpose

Jan 10, 2026 · Kenneth Hung · 15 min read

Over the holidays, I did what I always do with new creative tools. Pushed them until they broke.

I've spent years as a Product Creative Director shipping at scale (consumer effects, creator tools, APIs, templates), leading UX teams that built AR/AI experiences for billions of users at Meta. That work taught me something: you don't understand a system by following the happy path. You understand it by finding the edges.

So I took Google Flow and gave myself one constraint: a single image, my LinkedIn profile photo, as the only identity reference. From that, I built a surreal short called Amor Fati, inspired by my turbulent childhood.

I'm the character. That choice wasn't sentimental. Identity persistence is the hardest unsolved problem in generative video, and the only honest way to test it is to use a face you know down to the millimeter and notice every place the model gets it wrong.

All images and clips were generated in Google Flow using Veo 3.1 and Nano Banana Pro.
Exported at native resolution (1280×720).

High-res upscaling was tested but introduced visible artifacts (see notes below).
Final assembly and timing edits were completed in iMovie due to limited fine-grained editing in Scene Builder.

TL;DR

This isn't a tool review. It's an analysis of a generative AI video system, organized around this thesis:

A generative video tool is two products at once: a model surface and a product surface. They require different UX vocabularies, different evaluation methods, and different design judgment. The work of an AI Behavior designer in this space is keeping them separate long enough to think clearly, then bringing them back together at the layer where they collide and where it matters most: trust.

Table of Contents:

  1. Model-layer observations. Identity drift and anatomical inconsistency are architectural, not prompting problems. A small eval across hundreds of generations changed what I noticed: drift wasn't a prompt problem, it was a temporal-anchoring problem. The design move is knowing which to advocate against in research roadmaps and which to design around.

  2. Product-layer observations. Scene Builder solves continuity within scenes. The unsolved abstraction is transitions between scenes. I designed and prototyped one feature end-to-end, based on Google Flow's existing Scene Builder UI. The interactive demo is below.

  3. Trust-layer argument. Provenance drift is the most important unsolved problem in this category, and it's a UX problem before it's a policy problem. Ethics isn't separate from interface. It's a question of observability.

Part 1.
Model-layer observations

There's a category of problems where the interface can't save you. The model's behavior is the user experience, and the only honest design response is to understand the architecture well enough to know what kind of problem you're looking at.

Identity Persistence

Using my own face was intentional because I knew every millimeter of it. Across multiple scenes with heavy morphing and frame-to-frame generation, identity drift appeared frequently. Even with restructured prompts, explicit negative constraints, and reinforcement, the system would introduce a different Asian male face.

This isn't a bug. It's a structural challenge. Existing approaches like fine-tuning (LoRA, DreamBooth) and embedding-based methods (IP-Adapter, InstantID) each trade off consistency against flexibility.

The Flow team is also navigating policy surfaces (likeness misuse, deepfake exposure, training-data consent) that constrain how aggressively any of these can be deployed.

For narrative, advertising, or branded content, solving this is non-negotiable.

What's the right abstraction for identity in a creative tool, given the policy surface and the architectural tradeoffs?


  • Project-scoped identity reference, not session-scoped. Commit once, with explicit consent, and the system carries it through every generation in that project until released.

    Reference stacking at the scene level, so creators don't re-upload the same anchor for every clip.

    Explicit reference carryover for frame-to-frame generation, with visible state showing when the anchor is active vs. drifted.

    Identity strength controls, letting creators dial fidelity vs. flexibility per scene rather than fighting one global tradeoff.

    Solving identity persistence isn't just a quality fix. It's the move that shifts AI video from demo to production tool.

Anatomical Consistency

In one scene featuring Guanyin, the intended motion was simple: water pours from a vase held in her left hand. Despite explicit prompts, masking, and negative constraints, the model repeatedly switched which hand performed the action.

This isn't a prompting failure. It's architectural. Current models don't maintain persistent skeletal tracking across frames. "Left hand" and "right hand" aren't stable internal concepts. The system optimizes for gesture realism frame-by-frame, not anatomical continuity over time.

I tried multiple approaches: anchoring the water origin spatially, keeping hands static, removing the pouring gesture initiation entirely. None reliably solved it.

Until models maintain object-anchored reasoning across time, this remains a design-around constraint, not a prompting problem with a prompting solution.


  • ➔ Legible failure messaging, so creators know when they're hitting an architectural limit rather than a prompt limit.

    ➔ Anchor-then-animate workflow, where spatial anchoring of objects is a primitive separate from motion generation.

    ➔ Scene-level constraint hints ("avoid generating both hands in motion") surfaced as design suggestions, not buried in documentation.

    The current Flow experience is largely silent about why generations fail. Silent failures train creators to blame themselves and abandon the tool. Legible failures train creators to understand the medium.

Behavioral Eval · Amor Fati Project (Grid)
Figure 01 · Behavioral Eval

A small eval

Single-operator behavioral measurements across the Amor Fati project. Hover bars for context. Toggle each finding for methodology.

Image Gens
847
Video Clips
312
Project Days
11
FINDING 01

Identity drift rate

Across 312 video generations, the output produced a recognizably different face in 54% of cases. Drift concentrated in high morph intensity and chained AI conditioning.

High morph intensity78%
78% drift · n=84
scenes with > 50% style change
AI output as conditioning64%
64% drift · n=141
prev. generation used as frame anchor
Original reference anchor19%
19% drift · n=87
original photo as anchor every time
What this changed. Drift correlates with how recently the system saw the original reference, not with how the prompt was written. This is a temporal-anchoring problem, not a prompt problem.
  • Sample312 video clip generations across 11 project days, single reference image (LinkedIn profile photo).
  • MethodManual coding of each output as match or recognizably different. Single rater.
  • BucketsMorph intensity (manual estimate of style change %), conditioning source (original ref vs. previous AI output).
  • LimitsSubjective classification, no controlled prompt variants, single operator. Treat as directional.
FINDING 02

Hand-swap frequency

Across 47 attempts at the Guanyin pour scene, the wrong hand performed the action in 66% of generations. Spatial anchoring helped. Negative prompts didn't.

Baseline prompt72%
72% wrong-hand · n=18
"holding vase in left hand, pouring"
+ Spatial anchor first44%
44% wrong-hand · n=18
vase placed before pour motion
+ Negative prompt69%
69% wrong-hand · n=11
"not the right hand" appended
What this changed. "Left" and "right" aren't stable internal concepts persisted across frames. Anchoring spatially before motion gives the model an inference scaffold the prompt alone can't.
  • Sample47 generations of the Guanyin pour scene across 3 prompt structures.
  • CodingOutput classified as correct hand or wrong hand based on which hand held the vase and initiated the pour.
  • ConditionsBaseline; baseline + spatial anchor (vase placed first); baseline + negative prompt.
  • LimitsSmall N per condition. No controlled order. Single scene only.
FINDING 03

Upscaler hallucination on faces

Across 113 1080p upscales of close-up shots, 58% added invented texture not in the 720p source. The 720p felt softer but more coherent.

Invented skin texture58%
58% · n=66
pores, lines, blemishes not in source
Apparent age shift37%
37% · n=42
subject reads several years older
Coherent enhancement29%
29% · n=33
sharper without invention
What this changed. Creators shouldn't have to choose between "soft but coherent" and "sharp but hallucinated." Multiple upscaling profiles, plus face-aware upscaling, would close this.
  • Sample113 close-up shots upscaled from 720p to 1080p using Flow's default neural upscaler.
  • CodingSide-by-side comparison of source and upscaled output. Three non-exclusive categories.
  • LimitsSingle rater. "Invented texture" is judgment-based. Categories are non-exclusive.
FINDING 04

Provenance signature drift

Across 196 generations referencing named artists in early prompts, 7 downstream outputs contained signature-like marks I had not prompted for.

Iteration 1-3 (early)1.2%
1 of 84 outputs · 1.2%
named artist still prompted
Iteration 4-7 (mid)5.6%
4 of 71 outputs · 5.6%
references chained from prior outputs
Iteration 8+ (late chain)4.9%
2 of 41 outputs · 4.9%
named artists no longer in prompt
What this changed. Signatures appeared even when no artist was named in the immediate prompt. The conditioning chain absorbs style from earlier outputs. The lineage is not just hidden, it is computationally erased.
  • Sample196 image generations whose prompt chain referenced named artists at some point (Dalí, Escher, Ocampo, Arcimboldo).
  • CodingVisual inspection for signature-like marks. Bucketed by iteration depth from original artist reference.
  • Cross-checkEach found signature compared against actual signatures of named artists. Zero matches.
  • LimitsSmall absolute count (n=7). Iteration depth is a manual estimate. Rare in absolute terms.

This is a designer's eval: informal, single-operator, no controlled holdouts. It's not a research artifact. But the discipline of counting changed what I noticed, and the numbers are sharper than prose.

The methodology has limits, but the act of measuring shifted my framing. I went in expecting identity drift to be a prompt problem. The numbers told me it was a temporal-anchoring problem. Different design conversation.

This is what AI fluency looks like in design practice: not perfect rigor, but the willingness to count.

Part 2.
Product-layer observations

Conventional product design territory. Flows, states, abstractions, metrics. The model layer is exotic. The product layer is craft.

Scene Composition

Flow's Scene Builder is intuitive and fast. Extend seamlessly continues from the last frame; Jump maintains identity across cuts. Both are smart solutions to the 8-second generation limit.

But these solve continuity within a scene. The real unlock is transitions between scenes.

Most AI videos rely on jump cuts because that's what the tools make easy. Cinematic storytelling lives in the in-between moments: the match cut, the morph, the breath between scenes. Right now, those require manual work outside the tool.

*Scroll down to test a transitions between scenes prototype


  • A first-class transition layer, with the seam between two clips becoming a designed affordance. Cut, match, morph, and dissolve as primitives rather than post-production work. Designed end-to-end in Part 3 below.

    Multi-clip selection on the timeline, so creators can apply transitions, identity references, or audio cues across ranges rather than one clip at a time.

    Beat-anchored cut points, where transitions snap to musical or rhythmic markers in the project audio rather than to arbitrary clip boundaries.

    Cinematic camera moves between clips (push-in, dolly, parallax) as composable moves the model can interpret, not just visual effects layered on top.

Observability & cost

Building Amor Fati required generating in the high hundreds of images and clips.

At that volume, two product gaps compound:

  1. Asset state legibility (active vs. historical assets blur across views)

  2. Cost visibility (no project-level view of credit consumption or cost-per-scene).

For a solo creator, this is friction.

For a small agency running ten client projects, it's a blocker.


  • Total assets generated per project, including failed and rejected outputs

    Credit usage by model and quality tier, so creators learn which generations earn their cost

    Cost per scene and per finished minute of video, so creators can plan and forecast

    Aggregate project cost with drill-down, so agencies can answer the question every client eventually asks

    These aren't operational metrics; they're learning tools. Knowing one prompt structure produced six usable outputs at 60 credits while another produced one at 240 is the feedback creators need to develop judgment about the tool.

    Platform outcomes this enables:

    ➔ Identity-stable projects → higher completion rates, more credits consumed per project, lower churn

    ➔ Reduced tool-switching → deeper engagement, stronger retention, higher LTV

    ➔ End-to-end workflows → professional-tier adoption, higher ARPU, team and enterprise expansion

    Observability is a prerequisite for the team and enterprise tiers, not a polish item. At the agency tier, someone other than the creator is paying. That person needs answers the current product can't give.

Narrative assembly

Sound design and music work well at the 8-second scene level. In another Flow project, I explored Latin-inspired scoring, and the tonal quality held up.

The challenge is continuity. Each scene behaves as an isolated fragment, with no throughline, no arc. And audio has harder unsolved problems: dialogue and lip-sync, voice consistency across scenes, music beats and rhythms that align across cuts, sound effects that match generated environments.


  • Global music tracks that persist across the project timeline, with auto-aligned beat markers

    Voiceover layers with character voice references that maintain identity across scenes

    Cross-scene sound design with ambient continuity and audio match cuts as first-class moves

    This is the gap between generating clips and making a film. Assembly is authorship. The edit, with its pacing, juxtaposition, and sound, is where the film actually gets made.

Export quality

Flow offers three export options: 270p animated GIF (not practical for production), 720p original (soft but coherent), and 1080p upscaled (sharper but artifact-prone).

The 1080p neural upscaling often over-synthesizes, hallucinating texture detail that wasn't in the original across the entire image. On faces, the effect is especially damaging: the model adds what looks like wrinkles and skin imperfections, making faces look unnaturally aged or degraded.

The 720p original feels more visually coherent, just too soft for final delivery. Creators shouldn't have to choose between soft but natural and sharp but hallucinated.

This forces extra post-processing steps and inconsistent workflows, exactly the kind of pipeline fragmentation that pulls creators out of the tool.


  • Multiple upscaling profiles (neutral, filmic, sharp) with different synthesis aggressiveness

    Face-aware upscaling that preserves rather than invents texture

    1080p+ exports directly from Scene Builder, eliminating the post-processing detour

    Worth fixing. Not the most interesting problem in the system.

One feature, designed

The transition layer. A first-class abstraction for transitions between scenes. Click the seam between two clips on the timeline. Choose transition type (cut, match, morph, dissolve). The model generates the bridge.

Transition Layer · Flow Concept Prototype
Current scene
Next scene preview
Generating bridge…
Scene 02 · Look back
CLICK A SEAM BETWEEN CLIPS TO INVOKE TRANSITION
0:08 / 0:32
  • 1. A creator working in Scene Builder notices a small "+" pulse appear on the seam between two clips.

    2. They hover; the seam widens and the "+" brightens. They click. The seam expands inline into a picker of four transitions, each with a preview thumbnail and credit cost.

    3. They pick Morph. The bridge generates between their clips, and the new frames slot into the timeline.

    4. They accept, regenerate, or reject. If the two clips contain different real identities, the system pauses them at the picker and asks for explicit consent before proceeding.

  • Empty: only one clip selected, picker disabled with explanatory hint

    Generating: parallel previews loading, with credit-cost preview shown before commitment

    Generated: bridge frames inline, with accept / regenerate / reject controls

    Failed: the system surfaces why it failed, not just that it failed

    Identity-conflict: when clips contain different identity references, the system requires explicit creator confirmation before generating, surfacing the policy surface rather than silently proceeding

  • At rest: between every clip, a subtle dot signals the seam is interactive. Nothing demands attention; the affordance is there when they're ready.

    Choosing a transition: the seam expands. They see four options laid out side by side with preview thumbnails and credit cost on each, so they're comparing visual outcomes and unit economics in the same glance.

    Waiting: the picker dims, a status surface tells them what's generating and how long it should take. They know the cost is committed only after they accept.

    Reviewing: the bridge frames appear inline, in the timeline, where the rest of their work lives. Accept, regenerate, or reject sit one click away.

    Hitting a wall: when a generation fails, they get the reason (not just "failed") and an alternative path forward.

    Crossing an identity boundary: when their two clips contain different real people, the system stops, names what's happening, and asks for explicit acknowledgment before continuing. They can't click through it without seeing it.

  • Two different real people in adjacent clips. The creator selects a morph between a clip featuring themselves and a clip featuring a family member. The picker surfaces the conflict before generation. They acknowledge the consent surface and proceed, or pick a non-identity transition (cut, dissolve) and skip the gate. They're never silently morphed between two real identities.

    Mismatched aspect ratios. The creator pulls a vertical clip into a 16:9 project. The picker surfaces the choice (upscale vs. letterbox) rather than defaulting silently and surprising them at export.

    A morph the model can't deliver coherently. The creator picks a morph between visually incompatible clips. Instead of generating a degraded bridge, the system tells them the coherence threshold was exceeded and offers a path forward (extended morph, dissolve fallback).

  • ● Adoption rate among multi-clip projects (target population)

    ● Time spent in Scene Builder vs. external editors (workflow closure)

    ● Generation cost per accepted transition (unit economics)

    ● Identity-mismatch confirmation rate (the trust surface; thoughtful use vs. clicking through)

  • It addresses workflow fragmentation, deepens Scene Builder's existing abstractions, and is bounded enough to ship in a quarter.

    Identity persistence is more important but lives at the model layer.

    Transitions are where Flow could move from creative toy to creative infrastructure with one well-designed primitive.

Part 3.
The trust layer

Provenance drift is the most important unsolved problem in generative video, and it's neither purely a model problem nor a product problem. It's a UX problem about observability.

  • Early prompts often reference named artists like Dalí, Escher, Ocampo, and Arcimboldo to steer visual language. But once images are generated, those AI outputs become references for subsequent iterations. The lineage collapses into synthetic intermediates.

  • Downstream, I found three scene images, each with what looked like a signature, similar in style but not identical. I hadn't prompted for authorship. Signatures across multiple generations raise questions about how stylistic influence propagates in ways neither creator nor platform can explain.

This is provenance drift: as creators iterate through AI-generated references, visibility into influence origins degrades. For personal work, acceptable. For commercial contexts, ambiguity around attribution becomes harder to ignore.

The industry is splitting. AI-native agencies are emerging fast, while traditional players (illustrators, VFX houses, unionized talent) remain skeptical. The criticism is loud: AI "steals" artists' work.

Whether you agree or not, this perception blocks adoption.

The tools that build trust infrastructure, not just capability, will bridge the divide.

  • The temptation is to treat provenance as something to handle in terms of service and back-end compliance. That's necessary but not sufficient.

    Creators make decisions in real time, mid-generation, and those decisions shape what gets shipped commercially.

    A creator who can see, at the moment of generation, that an output is heavily influenced by a specific named artist will make different choices than a creator who can't.

    Trust infrastructure has to live where the decisions live, and the decisions live in the interface.

    This is what I mean when I say ethics isn't separate from UX. The architecture of the interface determines what creators are able to know about their own work. And what they can know shapes what they're willing to ship, what they're willing to claim authorship over, and whether the broader creative community considers their practice legitimate.

    The version of generative video that wins in commercial contexts is the version that makes provenance legible at the point of decision. Not after the fact. Not in a policy document. In the interface.

Provenance Drift is not one problem

It's at least three, and they require different design responses.

Lineage tracking

Lineage tracking is the question of where this output came from in the chain of generations the creator made. The most tractable. Craft work: IA, data modeling, UI surface.


  • Provenance panel per asset, showing prompts, references, and intermediate outputs that fed it

    Visual generation graph at the project level, so creators can trace any output back to its anchors

Influence weighting

Influence weighting is the harder question of whose style is in this generation, in what proportion, drawn from what training.

Partly research (techniques for tracing influence through diffusion models are immature), partly design (even with the data, what UI surface? what threshold triggers what affordance?)


  • Confidence-weighted attribution view: "this generation shows strong stylistic similarity to [N] artists; here are the top three"

    Influence threshold alerts, surfaced when a single named artist crosses a configurable confidence bar

Attribution surfacing

Attribution surfacing links generation to the artists whose work shaped it, in a way that supports consent, credit, or compensation.

The interesting design question isn't whether to build it. It's the asymmetric trust problem.

The creator wants visibility into their generation chain. The artist wants visibility out, into where their style is showing up across generations they didn't make.

Same data, two completely different products. Most platforms ship neither.


  • Creator-facing provenance: visible attribution at the moment of generation, not buried in audit logs

    Artist-facing signal: opt-in dashboards showing where registered styles appear across the platform

    Compensation rails: optional credit-share for opted-in artists when their influence crosses thresholds

Where this is going

AI video is at an inflection point. The shift is from generation (make me a clip) to systems (help me build a film). The tools that win will solve four interlocking problems:

  1. Identity. Persistent characters across scenes, sessions, and projects, with the consent surface designed in rather than retrofitted.

  2. Continuity. Transitions, narrative throughline, and audio coherence as first-class primitives.

  3. Control. Project-level observability, cost visibility, and exportable quality at parity with conventional post.

  4. Trust. Provenance as observability, surfaced at the point of creative decision.

Google Flow has strong foundations. The UX is thoughtful, the creative ceiling is high, and Scene Builder points toward the right abstraction. What comes next is the harder shift, from creative toy to creative infrastructure, with the trust layer designed in from the start rather than addressed in a future audit.

The unsolved problems are also the most interesting ones. That's usually how it works.

Thank you for reading!

Appendix

Visual Direction:
Prompting as Cinematography

One face. One logo. Zero environment references.

The film served two purposes:

Creative Challenge

  • Could I push one identity reference across wildly different aesthetics (cyberpunk, classical painting meets sci-fi, horror cinematics, video game environments) while maintaining emotional arc?

  • The scenes are intentionally dense, with layered environments, symbolic imagery, and deliberate pacing, because I wanted to see if my creative instincts could translate through generative tools.

Technical Stress Test

  • I pushed Veo 3.1 with complex VFX transitions, aggressive morphing sequences, multi-axis camera movements, dense scene compositions, and rapid environmental shifts.

  • Not to see what the system does well, but to find where it strains, and what that reveals about the road ahead.

These scenes were built entirely through prompting: framing, lighting, color, composition, mood. This is what creative direction looks like when your only tool is language.

(A known limitation: text generation remains unreliable. Some of the Chinese characters in these scenes are gibberish, a reminder that current models see text as texture, not meaning.)

Reference 1: Self-portrait (LinkedIn profile photo)

Reference 2: Logo (wardrobe detail)