HeyGen Avatar IV: Deepfakes Are Point-and-Click Now.
All you need is one image. Is this a problem?
TL;DR
HeyGen’s Avatar IV creates lifelike, expressive talking avatars from a single image that can lip-sync to any audio or script—are deepfakes too easy now?
What is it?
Avatar IV is a new AI avatar model from HeyGen:

Not so long ago, creating a custom avatar with HeyGen or Synthesia required you to record a training video and submit it for professional processing.
Now, it takes one image, one script (written or recorded), and a few minutes. That’s it:
As you can see, the avatars don’t just sync their lips to the voice track—their movements and microexpressions accurately match the tone, etc.
It’s quite uncanny.
How do you use it?
The process is disarmingly simple.
On the main dashboard, click the Photo to Video with Avatar IV option:
This brings up a pop-up where you can complete the entire procedure in one go:
From here, you just:
Upload a single image.
Type out your script (or upload a pre-recorded voice clip).
Select a template voice (if not using the pre-recorded audio above).
Here’s how that might look:
Now you just click “Generate video” and wait a minute or so. Then you get your video:
Ignore the frozen background and the voice mismatch for a second.
This is pretty solid for close to zero effort on my part.
Anyone can create up to three 10-second clips a month for free, while paid accounts can make 30-second videos.
Why should you care?
Because deepfakes are quickly heading into “So easy, your grandma’s goldfish can make one” territory.
To be sure, talking avatars are far from a brand-new concept.
D-ID has let people create custom talking avatars for years. So has Synthesia.
Even the single-image-to-talking-avatar tech isn’t that new.
Last year alone, I wrote about three such models:
March 3, 2024: Alibaba’s Emote Portrait Alive (EMO)
March 17, 2024: Google’s VLOGGER
April 21, 2024: Microsoft’s VASA-1
But even then, the feature struck me as borderline creepy.
I could see why those models were research papers rather than consumer products.
To wit, here’s exactly what I said about VASA-1 (emphasis added):
“VASA-1 is outright scary. Given just a single image of a person paired with an audio clip, VASA-1 makes a realistic talking head of that person. Optional controls let you adjust the speaker’s emotion, camera view, and more. Understandably, Microsoft is not planning to make the model available at this time.”
Just one year later, thanks to Avatar IV, this technology is now mainstream.
HeyGen’s version stands out in several ways:
It’s highly accessible through a clean point-and-click interface.
It can feel quite realistic thanks to lifelike microexpressions.
It exists inside a popular product with at least 3 million monthly active users (and the potential to reach 130 million more as a native app inside Canva).
As such, Avatar IV reaches the average user in a way above research demos and previews never could.
But why take my word for it?
Everything I say above is the literal sales pitch, straight from the horse’s mouth:

“A video that feels real, not rendered.”
Indeed.
Look, I’m a big-time AI enthusiast.
I even have a newsletter about AI; you might’ve heard of it.
And while I tend to be cautiously optimistic about tech, sometimes it feels like we’re making increasingly high-impact tools available too broadly, too quickly.
Sure, my test video above isn’t going to fool anyone. I picked an image with a busy background that doesn’t get properly animated and a random, generic AI voice.
But what happens when you pair HeyGen’s Avatar IV with a better starting image and recent voice-cloning tech? Here are just a few that I covered:
Many of them can clone a voice from only 10 seconds of input audio.
It doesn’t take a huge mental leap to imagine all sorts of unpleasant shenanigans:
Fake UGC influencers
Political or celebrity deepfakes
Scams featuring friends and family
Last February, a Zoom deepfake convinced a finance worker to pay $25 million to scammers. This year, a would-be scammer needs just one image.
Even if you don’t think HeyGen’s avatars are that convincing, how long will it take for them to evolve? At the current pace of AI developments, next Tuesday is a safe bet.
So while I appreciate all the positive use cases like guided tutorials or personalized messages, our guardrails better soon catch up with this new reality.
What do you think?
🫵 Over to you…
Have you tried Avatar IV yet? If so, what did you think of the output?
What’s your general take on this tech? Are you worried about it enabling deepfakes and scams? Or do you trust that we’ll figure it out? Share your hopes and your fears!
Leave a comment or drop me a line at whytryai@substack.com.
Thanks for reading!
If you enjoy my writing, here’s how you can help:
❤️Like this post if it resonates with you.
🔄Share it to help others discover my newsletter.
🗣️Comment below—I love hearing your opinions.
Why Try AI is a passion project, and I’m grateful to those who help keep it going. If you’d like to support my work and unlock cool perks, consider a paid subscription:
I agree that it seems hard to find a positive use for this kind of thing. What I could envision happening is essentially the end of the video testimonial for product reviews. Nobody will ever believe a "busy mom's" TikTok style review for an egg-whisker, skin cream, hair supplement etc again.
It will eventually make consumers extremely cynical of everything they're ever shown and basically break advertising.
Maybe that's okay?
Yes, but, but, what if we don't want to make a video of that man with the stubbly beard??
Seriously, thanks for this review Daniel. Interesting as usual. Given the fairly severe time limits placed on these accounts, it looks like processing power may be the primary limiting factor on services of this kind, for now.
Here's the value I see in this DeepFake stuff. It's a form of emerging tech that everyone can understand and see the potential problems with. The next step should be to help readers understand that DeepFake tech is just one of a thousand different problematic technologies that are emerging from an accelerating knowledge explosion.
I just watched a documentary on Amazon Prime about DeepMind, a leading AI development company. It gives you insight in to how such industry leaders think. They're going for broke in every direction they can. They make vague polite little noises about their supposed "concerns" but that seems to have little to no impact upon their desire to race ahead as fast as possible.
https://www.amazon.com/Thinking-Game-Greg-Kohs/dp/B0DV8XKWG8
Another example beyond AI is CRISPR, which is making genetic engineering ever easier, ever cheaper, and thus ever more accessible to ever more people. Don't worry about DeepFakes, worry about your next door neighbor creating new life forms in his garage workshop. The people who brought us this emerging threat got a Nobel Prize for it.
I've been writing about this kind of thing for over a decade and engaging in conversation with the most educated people who will talk to me.
https://www.tannytalk.com/p/our-relationship-with-knowledge
From that experience I've concluded that as a culture we're not even vaguely ready for what is coming. These developments are just too big for us to grasp with mere logic. It's going to take some kind of epic calamity to get us to take any of this seriously. Human beings learn primarily through pain.
So am I worried about DeepFakes? No, I'm worried about much larger picture DeepFakes are a tiny example of. And at age 73, I'm worrying less and less about that too, as I have a "get out of jail free" card.