Beginner's Guide to Google Whisk

How does Whisk work and what can you do with it?

May 29, 2025

∙ Paid

Howdy, AI visioneers!

It’s the last Thursday of the month, which was traditionally dedicated to new Midjourney terms but has evolved into broader visual explorations.

Today, I wanted to play with a tool that’s been kicking around for half a year but didn’t exactly make waves: Google’s experimental, funky image-mixing platform, Whisk.

Whisk launched in late December 2024.

But three relatively recent developments have made it worth revisiting now:

February 2025: Whisk expanded to 100+ countries.
April 2025: We got the ability to animate images with Veo 2.
May 2025: Whisk switched to using the much-improved Imagen 4 image model.

So Whisk is now both more widely available and more powerful.

Let’s see what you can do with it!

What is Whisk?

In short, Whisk lets you blend uploaded images in different ways by using them as subject, scene, or style references. Here:

As of April, you can also bring these images to life by animating them using Veo 21:

At the time of writing, Whisk is available in the following countries:

We’re working to bring our tools to as many people as possible. See below for a list of countries the site is currently available to 18+ users in. American Samoa, Angola, Antigua and Barbuda, Argentina, Australia, Bahamas, Barbados, Belize, Benin, Bhutan, Bolivia, Botswana, Brazil, Brunei, Burkina Faso, Burundi, Cabo Verde, Cambodia, Cameroon, Canada, Central African Republic, Chile, Christmas Island, Cocos (Keeling) Islands, Colombia, Congo-Brazzaville Republic of the Congo, Congo-Kinshasa Democratic Republic of the Congo, Cook Islands, Costa Rica, Côte d'Ivoire, Dominica, Dominican Republic, Ecuador, El Salvador, Equatorial Guinea, Eswatini, Ethiopia, Fiji, Gabon, Ghana, Grenada, Guam, Guatemala, Guinea, Guyana, Heard Island and McDonald Islands, Honduras, Jamaica, Japan, Kenya, Kiribati, Laos, Lesotho, Liberia, Madagascar, Malawi, Malaysia, Mali, Mauritius, Mexico, Micronesia, Mozambique, Namibia, Nauru, Nepal, New Zealand, Nicaragua, Niger, Nigeria, Niue, Norfolk Island, Northern Mariana Islands, Pakistan, Palau, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Puerto Rico, Rwanda, Saint Kitts and Nevis, Saint Lucia, Saint Vincent and the Grenadines, Samoa, São Tomé and Príncipe, Senegal, Seychelles, Sierra Leone, Singapore, Solomon Islands, South Africa, South Korea, South Sudan, Sri Lanka, Tanzania, The Gambia, Tokelau, Tonga, Trinidad and Tobago, Türkiye, Tuvalu, U.K., U.S., U.S. Virgin Islands, Uganda, Uruguay, Vanuatu, Venezuela, Zambia, and Zimbabwe. Whisk (our latest image generation tool) is available in all the countries listed above except for the UK. Flow is available to early access users in the U.S. — Source: **Google**

The list for the “Animate” feature is shorter:

Whisk Animate is available in the following countries: American Samoa, Angola, Antigua and Barbuda, Argentina, Australia, Bahamas, Belize, Benin, Bolivia, Botswana, Brazil, Burkina Faso, Cabo Verde, Cambodia, Cameroon, Canada, Chile, Côte d'Ivoire, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Fiji, Gabon, Ghana, Guam, Guatemala, Honduras, Jamaica, Japan, Kenya, Laos, Malaysia, Mali, Mauritius, Mexico, Mozambique, Namibia, Nepal, New Zealand, Nicaragua, Niger, Nigeria, Northern Mariana Islands, Pakistan, Palau, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Puerto Rico, Rwanda, Senegal, Seychelles, Sierra Leone, Singapore, South Africa, South Korea, Sri Lanka, Tanzania, Tonga, Trinidad and Tobago, Türkiye, U.S. Virgin Islands, Uganda, United States, Uruguay, Venezuela, Zambia, and Zimbabwe. — Source: **Google**

But, as with most things in AI, you can circumvent country restrictions with a VPN. I’m using my go-to NordVPN (affiliate link) for this post.

How do you use it?

Head to labs.google/fx/tools/whisk and log in with your Google account.

You’ll see a minimalist interface below, which has two main components:

A left-side column for uploading your image references (subject, scene, and style)
A classic text prompt box at the bottom (with aspect ratio settings)

Whisk interface with side-bar image references and a text prompt box at the bottom

At this point, you can mix and match the above elements in any combination, and this flexibility is exactly what makes Whisk a whole lot of fun to play with!

Let’s look at the many ways you can use Whisk.

Types of prompting in Whisk

You can prompt Whisk using:

Text only
Image reference(s) only
A hybrid of text prompt and image references

1. Text-to-image prompting

This should be familiar to any readers who have used a text-to-image model before.

You describe what you want in the text box, and Whisk spits out an image based on that.

Let’s try the following prompt:

Cartoon image of a mouse sitting at an outdoor bistro table holding a menu. The mouse looks up from the menu and says, "Whisk me up some ice cream images, Tom. Make them fluffy!" The waiter is a cat in a red sweater, standing next to the table and holding a tablet with pictures of ice cream on it.

Here’s how that looks in Whisk:

Google Whisk prompt: "Cartoon image of a mouse sitting at an outdoor bistro table holding a menu. The mouse looks up from the menu and says, "Whisk me up some ice cream images, Tom. Make them fluffy!" The waiter is a cat in a red sweater, standing next to the table and holding a tablet with pictures of ice cream on it."

Optionally, you can adjust the aspect ratio by clicking the middle “screen” icon:

Aspect ratios in Whisk: Square, portrait, landscape

We’ll go with 16:9 for that widescreen look.

Here’s one of the results:

Whisk image for the following prompt: "Cartoon image of a mouse sitting at an outdoor bistro table holding a menu. The mouse looks up from the menu and says "Whisk me up some ice cream images, Tom. Make them fluffy!" The waiter is a cat in a red sweater, standing next to the table and holding a tablet with pictures of ice cream on it."

See? Simples!

2. Image mixing

This is kind of like no-prompt prompting…but for images!

Expand the left-side column to reveal the three image reference spots:

For each of these, you can either add a text description, upload your own reference image, or click the “die” icon at the top to get a random preset image from Google2:

Text, image, and "roll the dice" options of adding references in Google Whisk

I rolled the dice for all three and ended up with:

Subject: Sad blue robot
Scene: Cafe on a snowy cliff
Style: Colorful geometric shapes

Whisk image prompt: Subject: Sad blue robot Scene: Cafe on a snowy cliff Style: Colorful geometric shapes

At this point, it’s a simple matter of clicking the Submit button without providing any additional text prompt:

"Submit" button in Whisk, next to an empty text prompt box

Whisk gets to work and spits out some images (two images at a time):

A grid of several robot images based on subject, scene, and style references in Whisk

If you’ve never prompted image models before, this is the easiest and fastest way to experiment: Simply whisk a bunch of images together and see what happens!

We sure have moved on from the early days of “splatterprompting” and endless walls of text descriptors, haven’t we?

3. Text + image prompting

Finally, you can combine the two options to have more control over the output.

Let’s keep our three reference images and add the following text prompt:

The robot is holding a green ballon and talking to a purple dog

Whisk prompt: The robot is holding a green ballon and talking to a purple dog

Here’s how that might look:

Whisk image of a robot and a purple dog with a green balloon

Great!

Our scene and subject references still act as visual anchors3, but the text prompt lets us add new details.

Feature overview

Here are a few more things you can do in Whisk.

1. Use preset styles

This feature is a bit tucked away in the top-left hamburger menu:

Click that, and you’ll see a Load Template dropdown:

This lets you pick from a few ready-made presets:

While this sounds rather advanced, “templates” are basically just glorified style reference images. For instance, picking “Sticker” populates the style reference box with a reference image of a cat sticker…and that’s about it:

It’s nice to have a few reliable styles to pick from, but I wish there was a way to save your own presets like with Sora for GPT-4o images.

2. Refine an image

When you hover over a finished image in Whisk, you’ll see buttons that let you flag, delete, download, like, or share a generation at the bottom:

Google Whisk image of a robot with "Animate" and "Refine" buttons at the top

At the top, there are a few action buttons, including Refine:

Clicking it brings up a new prompt box, but instead of having to prompt your entire scene from scratch, it lets you describe the changes you’d like to make:

"Refine" prompt in Whisk, asking for a balloon to become red

After I asked for a red balloon, here’s what I got:

Robot in a snowy cafe with a red balloon and a purple dog

Note that the two images are similar but not identical.

That’s because Whisk doesn’t simply modify targeted areas.

Instead, it regenerates the entire image with requested changes while sticking closely to the original layout and description.

3. Animate an image

Now it’s time for the good stuff: Using Veo 2 to animate your images.

Click the Animate button at the top to bring up a text box, then describe the action:

The baloon flies away. The robot and the dog look up at it with sad expressions.

After a while, your image comes to life:

Man, that balloon sure took its sweet time before finally flying off, but we did get what we asked for!

Note: Whisk gives you 10 free Veo 2 generations per month. Use them sparingly.

4. Share and remix creations

Finally, you can share a creation with others and let them tweak it.

Simply click the Share button in the bottom-right corner:

caption...

This creates a shareable link for others to use:

"Share this recipe" view in Google Whisk

When you send a link to someone, they’ll not only see your original image but also a Make Your Own button to remix it:

Shared creation from Google Whisk of a blue robot holding a green balloon

The process works like the Refine button above and lets anyone request tweaks.

Want to try remixing my robot? You can do that right here:

Remix my robot

Bonus tips and tricks

Here are two things you can do in Whisk that are somewhat hidden.

1. View and tweak the underlying prompt

Unlike AI tools that use image references directly, Whisk converts all the text prompts and image references into a longer text prompt under the hood.

You don’t see this prompt by default, but let me show you how to view it and modify it to your liking.

Let’s first create an image of a turtle swimming in the ocean by combining a scene reference of an ocean and a short text prompt:

Cute cartoon turtle swimming

Cute cartoon turtle swimming in the ocean - Google Whisk snapshot using a scene reference

Now, when you hover over an image, next to the Animate and Refine buttons, there’s this understated “notepad” icon:

Clicking the icon zeros in on the image and displays the text prompt Whisk used to create it:

Green turtle swimming in a blue ocean in Google Whisk, with a prompt at the bottom

Note: You can also bring up this prompt view by clicking on the image itself.

The prompt appears to be grayed out and fixed, but you can actually click right into it and make changes:

Highlighting the turtle as a subject in Google Whisk

Let’s change our turtle to an orca while keeping everything else as is:

Replacing the turtle with an orca in Whisk

Now we click Generate and see what happens:

Whisk result for an orca swimning in the ocean

Pretty neat!

2. Blend multiple subjects

Did you know that you’re not limited to just one subject reference?

Above the “Subject” box, there’s a little “+” icon.

Clicking it adds new “Subject” boxes. Let’s add two more:

Now, we’ll use the “die” icon to populate the three boxes with reference images of a dino, our blue robot, and a fancy teacup:

Whisk with three subjects: dinosaur, robot, and teacup

We won’t add any scene or style reference images, but we’ll write a short guiding prompt:

Cartoon dino and robot drink tea in a park

Cartoon dino and robot drink tea in a park

Let’s take a look at the result:

A cartoon green Tyrannosaurus Rex in cowboy boots and a sad robot sit at a picnic table in a park, drinking tea from a floating sea urchin-like teacup. The bright green cartoon dinosaur, with a light brown belly, large head, long tail, and its mouth slightly open showing its tongue, wears black and white cow-patterned cowboy boots. Beside it, a squat, rounded robot with chipped and faded blue paint revealing grey metal, a drooping yellow wire, a delicate oval face with drooping thin antennae, large round eyes reflecting the park, a small curved frowning mouth, a bulbous nose, two small stubby arms with three-fingered hands, and small round feet sits wearily. On the picnic table, a round-bellied, pastel pink teacup resembling a sea urchin with lighter pastel blue-lavender bumps and a curved handle floats slightly above a matching iridescent pastel saucer, both containing and hinting at pale tea with faint steam rising. The background shows a lush green cartoon park setting with trees and foliage under a bright sky.

That worked!

You can also use this in combination with scene and/or style references, like so:

Blending two subjects and a scene and a style in Google Whisk

Let’s run it without an additional text prompt and see what happens:

In the style of an epic comic book with cool pop art flair and fun, bold colors, a green Tyrannosaurus Rex in black and white cow-patterned cowboy boots runs through a fantastical pink tennis court overgrown with lush vegetation, while a squat, rounded blue robot with chipped paint and drooping antennae stands melancholically nearby. The green Tyrannosaurus Rex, rendered with thick, dynamic lines and saturated hues, has a light brown belly, a large head with its mouth open revealing its tongue, and a long tail stretched out behind it as it runs with straight legs, its black and white cow-patterned cowboy boots prominent. The robot, its once vibrant blue paint now chipped to reveal grey metal and outlined with bold linework, has large, round eyes reflecting the scene, a delicate oval face with drooping antennae, a small curved mouth in a frown, and clumsy three-fingered hands at the ends of its stubby arms, with a thin yellow wire dangling from its shoulder. They are situated on a vibrant pink tennis court with white lines and a loosely hanging black net, surrounded by dense, emerald green foliage and vines. In the background, ornate pink and cream stone buildings, overgrown with greenery, rise towards a pale blue sky with wispy white clouds, suggesting a peaceful atmosphere. A stone staircase with ornate handrails descends to the left, adorned with blossoming pink flowers and a small white tree, while a similar staircase ascends to the right, framed by a large pink flowering bush. The scene is bathed in soft, warm light, with strategic Ben-Day dots adding highlights, and incorporates geometric shapes and a dynamic composition characteristic of the pop art comic style.

That worked, too!

But I discovered a few caveats to this functionality that I’d like to share with you:

Max 8 references in total: You can’t submit more than 8 reference images. Doing so will throw up this error:
Max one style or scene reference: You can blend multiple subjects but only add a single scene or style reference when submitting a prompt.4
More than 4 subject references = unreliable results: In theory, you can add up to 8 subject reference images if you don’t use a style or scene references. In practice, I found that Whisk struggles to consistently render more than four subjects at a time.5

So keep the above limitations in mind and use Whisk more as a tool for fun and inspiration than for creating controllable, polished end products.

🫵 Over to you…

Have you used Whisk before? What do you think of it? If you have some Whisk tips and tricks to share, I’m all ears!

Leave a comment or drop me a line at whytryai@substack.com.

Thanks for reading!

If you enjoy my writing, here’s how you can help:

❤️Like this post if it resonates with you.
🔄Share it to help others discover this newsletter.
🗣️Comment below—I love hearing your opinions.

Why Try AI is a passion project, and I’m grateful to those who help keep it going. If you’d like to support my work and unlock cool perks, consider a paid subscription:

“God tier” in my recent review of free image-to-video models.

All reference boxes are optional. You can add a subject and a style without a scene, or a scene and a subject with no style, or….well, you get it.

Note that the style reference isn’t as prominent in this image. Longer prompts may dilute the impact of certain elements.

Whisk lets you add multiple “Scene” and “Style” boxes, but you can only tick one of them at a time when submitting a prompt to generate the image.

I once succeeded in getting 6 subjects into a scene, but anything over 4 is usually a gamble and requires multiple rerolls.

Why Try AI