I’ve been experimenting with AI video creation, and it feel honestly feels like magic. What used to require specialized tools, production experience, and a decent amount of coordination can now start with a prompt, a reference image, a voice recording, and a willingness to iterate. This must the the feeling of a non-coder vibe coding for the first time.

For this experiment, I wanted to see if I could create a short cinematic video where a digital version of me teaches a concept while moving through a scene. The final result came from a HeyGen avatar created from one picture and a voice recording, combined with Seedance video generation through HeyGen. I also tested the same prompt in Google’s Gemini Omni using the same picture and voice.

The interesting part is that each tool exposed a different strength and failure mode.

Video generated by HeyGen in One Shot Prompt

The prompt is the new production brief

Here’s the one-shot prompt I used:

Will (avatar) moves through a massive early industrial factory floor. Belts, gears, pulleys, brass machines, and workers moving parts from station to station. Warm sparks from machinery. Golden industrial light. Camera tracks alongside him continuously throughout — medium close-up, never static. No music. Will moves through the factory as he teaches. He places one hand briefly on a moving conveyor rail. He picks up a small metal part — examines it briefly — sets it back into the workflow and keeps walking. Natural fluid movement throughout. He looks directly at camera the entire time. Will teaches directly to camera while moving.

Exact words only:
"AI agents are not magic employees. They are workflows with a brain in the middle. If the steps are vague, the agent wanders. If the inputs, tools, checks, and handoffs are clear, suddenly it starts looking useful."

At 11 seconds — camera tracking alongside him — a weathered wooden door becomes visible ahead. It is already there. Set into a brick factory wall between two machines. It has always been there. He approaches it still looking at camera still talking. He pushes it open — warm office light already fully established beyond it. Bookshelves visible. Window with garden. Microphone. He walks through still looking at camera. Crosses to his desk. Sits naturally into exact seated position from reference image. Looks at camera. Ready. Camera tracks continuously from factory floor through door to seated position. Never cuts. Never loses his face. Clear and stable facial features. Widescreen. Shallow depth of field. Film grain. 4K HD. No blur. No ghosting. No music.

This prompt is doing the work that used to be spread across a creative brief, storyboard, shot list, script, camera direction, and editing notes. It defines the scene, movement, camera behavior, dialogue, timing, lighting, facial continuity, and constraints.

How I did it with HeyGen

If you’ve tried recording yourself in a video, you’d know the setup: lighting, camera anxiety, microphone issues, retakes, editing, and the somehow universal truth that recording a 30-second clip can easily take 45 minutes.

HeyGen is built for businesses and creators who need to produce polished videos without turning every idea into a full production project. It can turn scripts into videos, photos into videos, upscale content, auto-edit clips, and handle many of the workflows you would expect around explainer videos, product videos, training content, and social content.

Because of that focus, HeyGen has a strong avatar layer. I gave it a picture and a short voice recording, and it created a usable digital version of me that could be applied across different video formats. That is the real unlock: once the avatar exists, the workflow becomes repeatable. You can create founder updates, product explainers, sales videos, training modules, or multilingual versions of the same message without recording from scratch every time.

What Seedance added through HeyGen

Seedance was the cinematic layer inside the HeyGen workflow. The factory floor, moving camera, industrial lighting, physical motion, door transition, and office reveal came from the video generation side. That is what changed the output from a basic talking-head avatar into something closer to a short visual story.

I did not use Seedance as a separate standalone workflow. I used it directly through HeyGen’s Seedance integration, which made the process much easier. That combination is important: HeyGen handled “make this feel like me,” while Seedance handled “make the scene feel alive.”

Together, they got much closer to the result I wanted than either layer would have on its own. The avatar gave the video identity. The generated scene gave it motion and metaphor.

Google Gemini Omni: what it did well, and where it struggled

I also tested the same prompt in Google Gemini Omni using the same picture and voice. The avatar likeness actually worked pretty well. It looked like me, which was a pleasant surprise and a big improvement over what I expected.

Where Omni really shines is workflow simplicity. Uploading the avatar image and voice felt more integrated. With HeyGen, avatar creation and video generation are more separate steps. With Omni, the process felt easier and more unified: give it the image, give it the voice, give it the prompt, and iterate from there.

The tradeoff was prompt adherence. My prompt specifically asked for a continuous tracking shot with no cuts, but Omni still produced frames that felt cut together rather than one seamless camera move. The output was good, but it took four tries, and even then it did not fully preserve the “single continuous shot” requirement.

That is an important distinction. Omni was strong on ease of use and likeness. HeyGen + Seedance was stronger for getting closer to the specific cinematic structure I wanted. Neither workflow was perfect, but each revealed where the tools are heading.

Quick comparison: Gemini Omni vs. HeyGen

Category

Google Gemini Omni

HeyGen + Seedance

Strength

Fast, integrated avatar +
voice recording in one-go

More controlled avatar
video with cinematic scene
generation

Weakness

Can ignore specific shot
constraints

Workflow has more setup
steps

Video Length

10 seconds max

15 seconds max

Avatar setup

Easier. Upload image and
voice in one flow

More structured. Avatar
creation and video
generation feel like
separate steps

Likeness

Very good, but showing my
body gave it away

Excellent, even the watch
I’m wearing made it into
the video

Prompt adherence

Good, but struggled with
“single continuous shot”
and introduced cuts;
maybe my prompt wasn’t
good

Better for getting closer to
the intended cinematic
structure

Cinematic control

Promising, but less reliable
for precise camera
continuity

Stronger when using
Seedance through HeyGen
for motion and scene
design

Business use case

Fast experiments, concept
clips, quick drafts

Founder videos, explainers,
training, sales content,
repeatable creator
workflows

My takeaway

Very promising and easy to
use

Better final result for this
specific video

The bigger lesson: vibe coding and vibe video are more alike than they look

AI video creation is starting to feel a lot like vibe coding. With vibe coding, the prompt is only part of the workflow. You still need the right tool, whether that is Claude Code, Codex, Cursor, Replit, or whatever comes next. You still need to understand the system you are building, and you still need to know when the output is wrong, fragile, overcomplicated, or quietly making a mess behind the scenes.

AI video has the same pattern. The prompt matters, but the tool matters too. HeyGen, Seedance, and Gemini Omni are not interchangeable. Each has different strengths, assumptions, and failure modes. The same prompt can produce something impressive in one workflow and something unusable in another.

And just like with coding, judgment is the real multiplier. You need to know what good looks like. You need taste. You need to understand pacing, message clarity, visual metaphor, audience expectations, and where the model is likely to drift. You need to decide what is worth fixing and what is good enough to ship.

This is the part people sometimes miss about AI tools. They lower the barrier to starting, but they do not remove the need for judgment. In fact, they make judgment more important because the iteration cycle is so fast. You can generate five versions quickly, but you still need to know which one works and why.

What this changes

The big shift is not just that AI can generate video. The big shift is that video creation is becoming prompt-native and iteration-native.

Before, the hard parts were equipment, editing tools, production skill, filming, and time. Now, a lot of that friction is compressed. The new bottleneck is clearer: can you describe what you want, choose the right tool, evaluate the output, and iterate toward something useful?

That is a meaningful change for builders, founders, marketers, educators, and anyone who has ideas but not a production team sitting around waiting for instructions. You can test ideas in minutes, turn abstract concepts into visual stories, create content without waiting on a full production cycle, and experiment with formats that would have been too expensive or annoying before.

The first output will still be weird sometimes. That is part of the deal. But the speed of learning is the unlock.

My takeaway

AI video is not replacing creativity. It is compressing the distance between an idea and something that’s not AI slop.

That does not make everyone instantly great at video, just like AI coding tools do not make everyone instantly great at software. But they do give more people the ability to experiment, prototype, and publish.

And once that distance collapses, the builders have an advantage. Builders are not waiting for permission, perfect tools, or a studio budget. We are trying stuff, learning what works, and getting a little better each time.

Right now, that might be the whole game.

Keep Reading