HeyGen Avatar IV: Deepfakes Are Point-and-Click Now.

All you need is one image. Is this a problem?

May 08, 2025

TL;DR

HeyGen’s Avatar IV creates lifelike, expressive talking avatars from a single image that can lip-sync to any audio or script—are deepfakes too easy now?

What is it?

Avatar IV is a new AI avatar model from HeyGen:

Source: X

Not so long ago, creating a custom avatar with HeyGen or Synthesia required you to record a training video and submit it for professional processing.

Now, it takes one image, one script (written or recorded), and a few minutes. That’s it:

As you can see, the avatars don’t just sync their lips to the voice track—their movements and microexpressions accurately match the tone, etc.

It’s quite uncanny.

How do you use it?

The process is disarmingly simple.

On the main dashboard, click the Photo to Video with Avatar IV option:

This brings up a pop-up where you can complete the entire procedure in one go:

From here, you just:

Upload a single image.
Type out your script (or upload a pre-recorded voice clip).
Select a template voice (if not using the pre-recorded audio above).

Here’s how that might look:

This is Daniel, and I did NOT say ANY of this! This isn't even my voice, dude. Come on! Stop making me say stuff, you creep.

Now you just click “Generate video” and wait a minute or so. Then you get your video:

Ignore the frozen background and the voice mismatch for a second.

This is pretty solid for close to zero effort on my part.

Anyone can create up to three 10-second clips a month for free, while paid accounts can make 30-second videos.

Why should you care?

Because deepfakes are quickly heading into “So easy, your grandma’s goldfish can make one” territory.

To be sure, talking avatars are far from a brand-new concept.

D-ID has let people create custom talking avatars for years. So has Synthesia.

Even the single-image-to-talking-avatar tech isn’t that new.

Last year alone, I wrote about three such models:

March 3, 2024: Alibaba’s Emote Portrait Alive (EMO)
March 17, 2024: Google’s VLOGGER
April 21, 2024: Microsoft’s VASA-1

But even then, the feature struck me as borderline creepy.

I could see why those models were research papers rather than consumer products.

To wit, here’s exactly what I said about VASA-1 (emphasis added):

“VASA-1 is outright scary. Given just a single image of a person paired with an audio clip, VASA-1 makes a realistic talking head of that person. Optional controls let you adjust the speaker’s emotion, camera view, and more. Understandably, Microsoft is not planning to make the model available at this time.”

Just one year later, thanks to Avatar IV, this technology is now mainstream.

HeyGen’s version stands out in several ways:

It’s highly accessible through a clean point-and-click interface.
It can feel quite realistic thanks to lifelike microexpressions.
It exists inside a popular product with at least 3 million monthly active users (and the potential to reach 130 million more as a native app inside Canva).

As such, Avatar IV reaches the average user in a way above research demos and previews never could.

But why take my word for it?

Everything I say above is the literal sales pitch, straight from the horse’s mouth:

Source: HeyGen

“A video that feels real, not rendered.”

Indeed.

Look, I’m a big-time AI enthusiast.

I even have a newsletter about AI; you might’ve heard of it.

And while I tend to be cautiously optimistic about tech, sometimes it feels like we’re making increasingly high-impact tools available too broadly, too quickly.

Sure, my test video above isn’t going to fool anyone. I picked an image with a busy background that doesn’t get properly animated and a random, generic AI voice.

But what happens when you pair HeyGen’s Avatar IV with a better starting image and recent voice-cloning tech? Here are just a few that I covered:

Many of them can clone a voice from only 10 seconds of input audio.

It doesn’t take a huge mental leap to imagine all sorts of unpleasant shenanigans:

Fake UGC influencers
Political or celebrity deepfakes
Scams featuring friends and family

Last February, a Zoom deepfake convinced a finance worker to pay $25 million to scammers. This year, a would-be scammer needs just one image.

Even if you don’t think HeyGen’s avatars are that convincing, how long will it take for them to evolve? At the current pace of AI developments, next Tuesday is a safe bet.

So while I appreciate all the positive use cases like guided tutorials or personalized messages, our guardrails better soon catch up with this new reality.

What do you think?

🫵 Over to you…

Have you tried Avatar IV yet? If so, what did you think of the output?

What’s your general take on this tech? Are you worried about it enabling deepfakes and scams? Or do you trust that we’ll figure it out? Share your hopes and your fears!

Leave a comment or drop me a line at whytryai@substack.com.

Thanks for reading!

If you enjoy my writing, here’s how you can help:

❤️Like this post if it resonates with you.
🔄Share it to help others discover my newsletter.
🗣️Comment below—I love hearing your opinions.

Why Try AI is a passion project, and I’m grateful to those who help keep it going. If you’d like to support my work and unlock cool perks, consider a paid subscription:

stillhooman

May 8

I agree that it seems hard to find a positive use for this kind of thing. What I could envision happening is essentially the end of the video testimonial for product reviews. Nobody will ever believe a "busy mom's" TikTok style review for an egg-whisker, skin cream, hair supplement etc again.

It will eventually make consumers extremely cynical of everything they're ever shown and basically break advertising.

Maybe that's okay?

Expand full comment

1 reply by Daniel Nest