Complete Beginner’s Guide to AI Video
What you need to know about text-to-video models and how to prompt them.
Note: Prior version of this article first appeared as a guest post for
’s AI Supremacy in April 2025:Since then, many video models have been updated and new entrants came out. The AI video space has seen the most rapid progress this year.
While certain model info is now outdated, this guide still holds up as a solid intro to how these models work and how to prompt them effectively.
Hey, check this out:
"What is this nightmare fuel, Daniel?" you rightfully ask.
That, friend, is an AI-generated video from late 2022 of a baby sloth using a computer.
It’s just one of several creepy demo clips from Meta’s announcement of its Make-A-Video system that could turn text descriptions into short videos.
Believe it or not, that sloth video was considered state-of-the-art at the time.
Remember: Text-to-image models were just starting to get decent, so text-to-video ones seemed like science fiction.
Now let's fast-forward to today:
That’s a video for the same “sloth” prompt created by Hailuo AI, one of the many modern video models.
We’ve come a long way in just over two years, haven’t we?
Today, anyone with an idea can instantly turn a short text description into a video that's often indistinguishable from reality.
If you haven't already tried text-to-video models, this intro guide is for you.
Here's what you'll learn if you stick around:
Introduction to AI video: What can today's models do?
Prompting basics: My quick start guide to prompting video models.
Major AI video models: A comprehensive list of video models on the market, with video output comparisons.
Let’s go!
Introduction to AI video
Before we dive headfirst into the enchanted world of AI video models, I’d like to highlight that there are currently somewhat different flavors of text-to-video platforms.
We can broadly slot them into three categories:1
1. Animated avatars
These text-to-video sites let you provide a script to be read out by a lifelike AI persona powered by a text-to-speech model under the hood. Such platforms typically have a business slant, since animated talking heads are especially useful for training videos, presentations, and so on.
Prominent examples: Synthesia and HeyGen
2. Slideshow creators
These tools turn a script or even a vague request into an entire video narrative by stitching together stock images, stock footage, AI-generated clips, B-roll footage, etc., into a complete video, typically accompanied by a voiceover.
Prominent examples: Pictory and InVideo
3. True AI video models
These foundational models can generate brand-new video scenes from nothing but a prompt, much like text-to-image models do for pictures.
This third category is what I’ll be diving into below!
Now that we’re on the same page, let’s look at the video models in the third category—what they can do and how to use them.
What can AI video models do?
Today’s video models turn different inputs into video clips, and many of them offer additional features to control the output.
Let’s look at what they’re capable of.
1. Text-to-video generation
This is the most basic feature: Every video model lets you describe a scene with words and then generates a video based on your description.
If you’ve ever generated AI pictures using a text-to-image tool, you’ll likely feel right at home with text-to-video prompting.
2. Image-to-video generation
Most video models let you go beyond pure text description by uploading a reference image.
They’ll use this image as the first frame of the resulting video, which gives you more precise control over the output. You anchor your scene with the starting image, then you provide a text prompt that describes how it should develop from there. I compared 17 image-to-video models a few months ago.
With some of these, you can also provide the last frame to create a video transition between the two frames.
Note that even if you use anchor images, you’ll usually want text prompts to describe the scene and any transition between the frames. It helps to understand the anatomy of such text prompts, so I’ll look at prompting basics in the “Text-to-video prompting 101” section.
3. Advanced features
In addition to text-to-video and image-to-video capabilities, certain platforms offer extra features, including:
Motion brush: Select separate areas of the scene to animate individually.
Lip syncing: Make your characters talk using voice input.
Retexturing: Reimagine the entire look of an underlying driving video.
Recropping: Change the aspect ratio by filling in gaps using AI.
Special effects: Insert reference objects into videos, apply preset visual effects, and more.
For this intro guide, I’ll focus on the core text-to-video functionality.
Text-to-video prompting 101
No matter which video model you decide to work with, you won’t get far without knowing the core concept of prompting them via text. Even if you plan to rely on reference images for your AI videos, you’ll still have to describe the desired action using words.
As such, it’s useful to have a framework for thinking about text-to-video prompts and pick up a few best practices.
But don’t worry: It really isn’t rocket science, as you’re about to see.
The anatomy of a video prompt: My “5S Framework”
You’ll find plenty of guides on prompting AI video models.
Adobe recently published one: “Writing effective text prompts for video generation.”
After reading many guides and playing with lots of video models, I condensed their lessons into a “5S Framework” for thinking about text-to-video prompts:
Subject: The main character(s) or object(s). Who or what are they? How do they look?
Sequence: What actions do they take or what happens to them?
Setting: Where and when does the scene take place? (Time of day, season, location, etc.)
Shot: What’s the initial angle and framing? How does the camera move and zoom during the scene?
Style: What’s the aesthetic of the video? Is it a live-action movie, a Pixar cartoon, or a claymation film? This may include details like lighting, type of camera film, mood, etc.
Let’s look at an example prompt and break it down into its five “S” components:
Here’s what happens when we run this prompt through an AI video model:
Note that you won’t always need to specify all five “S” elements in your prompt. Just the subject and sequence may often be enough.
For instance, if you’re making B-roll footage of a person watching TV, a simple “woman watching TV” is plenty to go on. The AI video model will fill in the blanks and give you a passable filler clip.
Don’t be afraid to experiment.
This brings us to…
Prompting best practices
Here are a few rule-of-thumb guidelines for working with text-to-video prompts:
1. Start simple, then iterate
Don’t overengineer your scene from the get-go.
Write a shorter prompt first and see what output the video model generates. This will give you an idea of the default style of your chosen video model. Then you can build up your prompt and specify any missing concepts.
2. Focus on the must-have details
Figure out which elements of your scene are critical and describe those in greater detail.
For less important aspects, the video model will fill in the blanks on its own. It might even inspire you by imagining elements of the scene you didn’t think to describe.
3. Reduce scene complexity
Generally speaking, you want a single character performing a single action per scene.
Trying to create a clip with too many moving parts increases the chance of things getting all wonky. AI models aren’t perfect, so don’t expect them to handle nuanced choreography just yet.
Exceptions include situations that naturally involve many people, such as shots of crowds at a concert, battlefield scenes, and so on.
4. Avoid 5S conflicts
You want your 5S elements to tell a coherent story together.
So don’t ask for a wide, zoomed-out establishing shot while also describing a single character’s fingernails in great detail. Keep your requests consistent to avoid confusing AI.
5. Stick to short clips
Most video models let you specify the desired length of the output clip, typically between 5 and 10 seconds.
I recommend selecting the minimum available length for a few reasons:
You will burn through fewer AI credits (longer clips tend to cost more).
You’ll risk fewer errors. The longer the clip, the higher the chance of it going off the rails.
You can always extend a clip that works. Some platforms let you do this natively; in others, you can use the last frame of your original clip as the starting frame of an extension clip.
6. Use a starting image where possible
If your model supports image-to-video generation, use it!
It’s cheaper and faster to create lots of AI images than video clips. So, for any given scene, first create your desired starting image in one of the many text-to-image models.
When you have the starting frame you’re happy with, feed it to the video model to animate the action. This also lets you avoid describing certain 5S elements—the video model will automatically take cues from your starting image.
7. Try the same prompt in many models
Video models are different, so they’ll produce different outputs from the same text prompt.
If you’re not getting good results in one model or want to explore a range of options, run the same prompt in a few separate models to see which one better aligns with your vision.
This will also help you figure out which models work best for specific purposes.
And speaking of different AI video models...
List of AI video models & test clips
Note: This article first ran in early April 2025. Since then, many models were updated and new entrants came out, including the paradigm-shifting Veo 3 from Google. I have chosen to keep this section as-is to serve as an “early 2025” snapshot and highlight just how quickly the AI video space is evolving. (My test of AI image-to-video tools is slightly more updated.)
In late 2023, there were about six text-to-video models on the market.
Today, that number has more than doubled.
In addition to Western AI labs, many frontier video models now come from Chinese providers.
So let’s take stock of the current video model landscape:

I’ll introduce the major models and tell you where you can try them. To spice things up, I’ll ask each model to create a video for the following simple prompt:
Prompt: Drunk pirate from the 1800s trying to use a modern laptop.
This will also help to demonstrate why it’s worth trying the same prompt in multiple models.
Let’s go!
1. Adobe Firefly Video
Firefly Video is a relatively new US entrant: It launched in public beta in mid-October 2024.
Firefly Video’s claim to fame is that it was trained on licensed and public domain content, so it’s commercially safe, and you shouldn’t run into copyright issues. Firefly Video also lets you select from preset camera angles and movements, so it’s easier to request the framing you want.
If you already use Adobe Creative Cloud products in your workflow, you can run Firefly Video natively inside of those.
Adobe Firefly Video at a glance:
Maker: Adobe
Country: United States
Launched: October 2024 (public beta)
Free to try? Yes
Limitations of the free tier: Only two free video clips.
Pricing: Starts at $9.99/month (Firefly Standard).
Advanced features: Camera angles and motion presets, start frame/end frame input, video translation, and integration with Adobe Creative Cloud products.
Where to try: firefly.adobe.com
2. Genmo Mochi 1
As a platform, Genmo has been around since 2023, but its new Mochi 1 video model is only several months old. Notably, Mochi 1 is an open-source model, meaning anyone can freely download, modify, and run it.
So if your computer is powerful enough and you’re tech-savvy enough, you can download Mochi 1 from Hugging Face. The rest of us can simply go to Genmo’s official site to generate a few free images per day.
Genmo Mochi 1 at a glance:
Maker: Genmo
Country: United States
Launched: October 2024
Free to try? Yes (open-source)
Limitations of the free tier: Two generations per day, watermark, low queue priority.
Pricing: Starts at $8/month (Lite)
Advanced features: Stealth mode for private video creation.
Where to try: genmo.ai
3. Hailuo AI video-01
Note: Superseded by Hailuo 02 in June 2025.
Hailuo AI video-01 comes from a Chinese AI company called MiniMax. The model only came out in September 2024 and instantly earned its place among the top AI video models.
The Hailuo AI platform lets you make videos from text prompts, use starting frames, and upload reference images of characters or objects to include in your videos. There’s also a native text-to-image model.
Hailuo AI video-01 at a glance:
Maker: MiniMax
Country: China
Launched: September 2024
Free to try? Yes
Limitations of the free tier: Three generations per day, watermark, low queue priority.
Pricing: Starts at $9.99/month for annual plans.
Advanced features: Subject reference uploads, advanced camera controls, optional auto-prompt optimization, and image-to-video creation.
Where to try: hailuoai.video
4. HunyuanVideo
HunyuanVideo is an open-source model from China released by Tencent in December 2024. It’s the largest open-source video model on the market and performs admirably compared with top-tier competitors like Runway Gen-3.
It was initially purely a text-to-image model, but it got image-to-video capabilities in March 2025, so you can now use starting frames for video generation as well.
HunyuanVideo at a glance:
Maker: Tencent
Country: China
Launched: December 2024
Free to try? Yes (open-source)
Limitations of the free tier: N/A
Pricing: N/A
Advanced features: Video-to-audio synthesis; avatar animation control; high-resolution video generation
Where to try: Hugging Face (select one of the spaces in the right-hand column)
5. Kling AI v1.6
Note: Superseded by Kling 2.1 in May 2025.
Kling was the first Chinese video model to be seen as a serious contender globally. Its creator, Kuaishou AI, released the original model in June 2024. A new 1.6 version came out in December 2024 with improved output quality.
The Kling AI site also functions as a creator platform with advanced options like character lip sync, special effects, and an “Elements” feature that lets you combine up to four reference images in a single video.
Kling v1.6 at a glance:
Maker: Kuaishou
Country: China
Launched: December 2024
Free to try? Yes
Limitations of the free tier: Only standard features, no access to higher-quality “Professional” mode, limited number of credits, watermark, no upscaling.
Pricing: Starts at $6.99 per month
Advanced features: Lip syncing, camera controls, video extension, special effects, merging multiple reference images in one video.
Where to try: klingai.com
6. Luma Ray2
Luma Dream Machine has also been around since June 2024. The newer Ray2 model, however, came out as recently as January 2025. It’s much better at handling finer details like hand movements and facial expressions.
Ray2 is an expensive model to run, so you can’t access it with Luma’s free plan. The platform itself has a minimalist interface but offers many advanced options like reference images and keyframe transitions.
Luma Ray2 at a glance:
Maker: Luma Labs
Country: United States
Launched: January 2025
Free to try? No (but Ray 1.6 is free)
Limitations of the free tier: 720p resolution, no access to the newer Ray2 model, limited amount of credits, non-commercial use, lower queue priority, max 5s clips.
Pricing: Starts at $6.99 per month for annual plans.
Advanced features: Image-to-video, keyframe transition, shot and camera movement presets, loop video option.
Where to try: lumalabs.ai
7. Pika Labs 2.2
Pika Labs is one of the veterans in AI video and has been around since late 2023.
In my experience, Pika’s video model doesn’t have the most polished output quality. To wit: Our pirate above apparently gets possessed by demons halfway through the clip, and that’s representative of my general impression of Pika’s videos.
But Pika partially makes up for this by being the “fun” platform with many unique effects. You can insert objects into videos, swap objects out, and select from dozens of preset video transitions to apply.
Pika Labs 2.2 at a glance:
Maker: Pika Labs
Country: United States
Launched: 2023
Free to try? No (but version 1.5 is free)
Limitations of the free tier: Limited generations, lower queue priority, only version 1.5, no access to special effects.
Pricing: Starts at $8 for annual plans.
Advanced features: First/last frame, character swapping, object additions, advanced camera controls, and dozens of preset special effects to apply to a video.
Where to try: pika.art
8. PixVerse V4
Note: Superseded by PixVerse V4.5 in May 2025.
Made by Beijing-based company AIsphere, PixVerse reminds me quite a bit of Pika: It might not have the best quality videos, but its platform adds plenty of bells and whistles like video restyling, reference character uploads, and dozens of preset special effects.
What’s more, PixVerse is one of the few platforms that lets you automatically add new sound effects and speech to your video clips.
PixVerse V4 at a glance:
Maker: AIsphere
Country: China
Launched: February 2024
Free to try? Yes
Limitations of the free tier: Watermark, lower queue priority, few daily generations, SD resolution, limited access to special effects.
Pricing: Starts at $8 per month for annual plans.
Advanced features: Add audio effects and speech, image restyling, dozens of preset effects, keyframe transitions, character reference, and video extension.
Where to try: app.pixverse.ai
9. Runway Gen-3 Alpha
Note: Superseded by Runway Gen-4 in April 2025.
Runway was the original AI video platform, releasing its first models as far back as early 2023. The latest version, Gen-3 Alpha, is still widely considered one of the top AI video models.
Over time, Runway evolved into a full-fledged, all-in-one platform that lets creators make images, video, 3D objects, add audio, make professional edits, and a lot more. Unfortunately, this comes with a steeper price and strict limitations on free accounts.
Gen-3 Alpha at a glance:
Maker: Runway
Country: United States
Launched: June 2024
Free to try? No (but Gen-2 and Gen-3 Alpha Turbo are)
Limitations of the free tier: Limited credits, lower queue priority, no access to Gen-3, watermark, no upscaling, no custom voices.
Pricing: Starts at $12 per month for annual plans.
Advanced features: Image, audio, and video generation, dozens of professional film editing tools, and “Act One” feature that lets you map your own character video onto the AI-generated one.
Where to try: runwayml.com
10. SkyReels V1
Note: Superseded by SkyReels V2 in April 2025.
SkyReels V1 is technically not its own model. Instead, it’s a fine-tuned version of HunyuanVideo above. In short, its maker, SkyworkAI, took the HunyuanVideo model and trained it further on 10 million high-quality film and movie clips.
As a result, SkyReels V1 can more accurately reflect facial expressions and emotions and is generally better at creating videos of humans. So it might be better suited for scenes that involve close-up shots of faces, dialogues between people, and the like. The SkyReels platform also provides advanced tools for creating storylines and drama-focused video content.
SkyReels V1 at a glance:
Maker: SkyworkAI
Country: Singapore
Launched: February 2025
Free to try? Yes (open-source)
Limitations of the free tier (only applicable on skyreels.ai): Limited credits, slower generations, watermark, no access to advanced features.
Pricing: Free model, but platform prices start at $7.9 per month for annual plans.
Advanced features: Storyboarding, audio and music creation, image-to-video, lip syncing, script-to-shots functionality.
Where to try: skyreels.ai
11. Sora
When OpenAI first teased Sora back in February 2024, it looked head and shoulders above the competition. But the company took almost a year to finally release Sora in December 2024, by which point many Chinese and US competitors had comparable or better models.
While currently not a top-tier model, Sora is decent at generating certain types of videos, offers a few styling presets, and has a “Storyboard” feature where you can stitch multiple clips together into a storyline. If you already pay for a ChatGPT Plus or Pro account, you can access Sora at no extra cost.
Sora at a glance:
Maker: OpenAI
Country: United States
Launched: December 2024
Free to try? No
Limitations of the free tier: N/A
Pricing: Starts at $20 per month (requires a ChatGPT Plus account)
Advanced features: Starting image, remix video, video transitions, looping videos, Storyboard workflow.
Where to try: sora.com
12. Veo 2
Note: Superseded by Veo 3 in May 2025.
Google’s Veo 2 is now widely seen as the best video model on the market. (But check out Wan 2.1 below.)
A few lucky testers like Ethan Mollick have had early access since December 2024. Their verdict: Veo 2 videos look and feel incredibly realistic and follow directions precisely and accurately. Sadly, access is currently managed via a waitlist.
The model is also available to run per-use at sites like fal.ai, but at a steep price of $2.5 per five seconds of video, so it’s definitely not for everyone just yet.
Veo 2 at a glance:
Maker: Google
Country: United States
Launched: December 2024
Free to try? No (waitlist)
Limitations of the free tier: N/A
Pricing: $2.5 per 5-second clip
Advanced features: N/A
Where to try: Join the waitlist at labs.google.com (or pay per clip at fal.ai)
13. Vidu 2.0
Note: Superseded by Vidu Q1 in April 2025.
Vidu is a collaboration between Chinese AI startup ShengShu Technology and Tsinghua University. Version 2.0 is a relatively new model, which came out in January 2025.
Its claim to fame is ultra-fast generation speed: It takes just 10 seconds to generate a video clip. Vidu is also great at understanding precise instructions written in natural language. The vidu.com platform has a good range of tools that include video templates for repeatable scenes and more.
Vidu 2.0 at a glance:
Maker: ShengShu Technology & Tsinghua University
Country: China
Launched: January 2025
Free to try? Yes
Limitations of the free tier: Standard resolution, max 4-second clips, watermark, non-commercial use only.
Pricing: Starts at $8 per month for annual plans.
Advanced features: Image-to-video features, reference characters, and dozens of video templates.
Where to try: vidu.com
14. Wan 2.1
Note: Superseded by Wan 2.2 in July 2025.
Wan 2.1 from Alibaba is the newest model on our list and arguably the most impressive. It’s an open-source model that currently sits at #1 on the VBench Leaderboard, which evaluates the quality of video models. (This leaderboard doesn’t include Google’s Veo 2, so it’s hard to know exactly how the two compare.) It is quite a feat for an open-source model to leave most other models in the dust.
You can download and use the model locally (if your computer can handle it) at no cost from GitHub. Otherwise, you can also use Wan 2.1 via the official wan.video platform, which gives you free daily credit refills.
Wan 2.1 at a glance:
Maker: Alibaba
Country: China
Launched: February, 2025
Free to try? Yes (open-source)
Limitations of the free tier: N/A
Pricing: N/A (free daily credits)
Advanced features: Image-to-video, automatic prompt enhancement suggestions, “Inspiration Mode” that takes creative liberties with your prompt, and sound effects creation.
Where to try: wan.video
Where to go from here?
Phew, that was a lot!
But look at everything we’ve done together:
Learned about the different categories of AI video
Discovered what AI video platforms can do
Picked up a few text-to-video prompting basics and best practices
Got a solid overview of all major AI models on the market
Before I wrap up, I want to leave you with a few recommendations.
If you’d like to try AI video for yourself but aren’t sure where to start, I propose one of the following options.
1. Krea: Many models in one place
Krea.ai is a one-stop shop for creative AI tools like image and video models.
It lets you test many of the above models in a single place.
Aside from the “pro only” models, the rest are available to try for free with a daily credit limit.
In addition to text-to-video, Krea offers several image models, lets you train custom AI models, and gives you access to many other creative AI features.
It’s the easiest way to get a taste of different options without having to switch between multiple sites.
2. Wan.video: Top-tier model at no cost
Alternatively, you can go for wan.video if you want free access to one of the best models on the market.
You get 50 free credits per day to spend on AI videos or AI images, and the platform can even generate relevant sound effects along with your video.
The only downside is that you may run into long wait times during peak hours.
So head on over to krea.ai or wan.video and put some of what you’ve learned today to the test.
I hope you found this introduction to AI video helpful and inspiring.
If you end up creating some especially fun videos, I’d love to see them—feel free to share below!
Thanks for reading!
If you enjoy my writing, here’s how you can help:
❤️Like this post if it resonates with you.
🔄Share it to help others discover this newsletter.
🗣️Comment below—I love hearing your opinions.
Why Try AI is a passion project, and I’m grateful to those who help keep it going. If you’d like to support my work and unlock cool perks, consider a paid subscription:
In reality, things aren’t quite as black-and-white as I’m making them sound. For instance, some sites that I call “slideshow creators” (e.g. Hedra) also offer their own underlying text-to-video models or incorporate third-party tools for video generation. But the categorization is still useful as a mental model of the different options.
Great comparison! My takeaways:
1: Most pirates nowadays look like Johnny Depp from Pirates of the Caribbean, complete with dreadlocks and eye shadow. Most don't have the stereotypical eyepatch. Or a parrot.
2. They live in an age when computers didn't exist. They still have a laptop, and know how to work these machines, even drunk.
3. Pirates seem to prefer Apple.
4. One has a keyboard without a screen. How does that work?