Sunday Rundown #109: Lip Syncing & Dan Diesel
Sunday Bonus #69: Get workflow improvement suggestions from Gemini
Happy Sunday, friends!
Welcome back to the weekly look at generative AI that covers the following:
Sunday Rundown (free): this week’s AI news + a fun AI fail.
Sunday Bonus (paid): an exclusive segment for my paid subscribers.
In case you missed it, here’s this week’s Thursday deep dive:
Let’s get to it.
🗞️ AI news
Here are this week’s AI developments.
👩💻 AI releases
New stuff you can try right now:
Adobe added Gemini 2.5 Flash Image to Firefly and Express so you can generate, remix, and edit visuals faster inside the apps.
Alibaba launched Wan-S2V that turns a single input image and an audio clip into a cinematic video with lifelike lip movements. (Try on Hugging Face.)
ByteDance open-sourced USO, an image model that can faithfully transfer any style to any subject while keeping characters consistent. (Try on Hugging Face.)
Character.AI rolled out PipSqueak, a tuned language model that makes chats more fun and coherent.
Google news:
Gemini 2.5 Flash Image (aka “Nano Banana”) is now the best image model, excelling at character consistency and multi-turn edits.
Google Translate now offers spoken live translations and personalized practice across 70+ languages (out in the US, Mexico, and India for now)
Google Vids now comes with AI avatars and image-to-video features (powered by Veo) that can turn photos and scripts into narrated clips.
NotebookLM expanded Audio and Video Overviews to 80+ languages, offering full-length narration and slideshows.
HeyGen upgraded its “Digital Twin” feature to run on Avatar IV, resulting in lifelike video with realistic gestures and expressions.
Lindy launched Lindy Build, an AI app builder that iteratively tests its own work to deliver finished, bug-free results.
Microsoft news:
The company launched two in-house models: an LLM called MAI-1-preview and speech generation one called MAI-Voice-1. (Try in Copilot Labs.)
VibeVoice is an open‑source text-to-speech system that turns written scripts into up to 90 minutes of expressive, multi-speaker audio. (Try the demo.)
Nous Research released Hermes 4, a “neutrally-aligned” hybrid reasoning model that outperforms many other open-weight models. (Try it here.)
OpenAI news:
Codex has new developer-focused features, including a new IDE extension, GitHub code reviews, and more.
gpt-realtime is a natural-sounding, expressive speech-to-speech model that is better at following instructions and using tools.
Perplexity rolled out a paid Comet Plus subscription that lets users access content from premium publishers, who in turn get 80% of the revenue.
PixVerse released V5 of its video model with smoother motion and sharper visuals. (Try for free until September 1.)
Sync Labs introduced Lipsync‑2‑Pro, a state-of-the-art model that can edit speech in a video with extremely lifelike lip-syncing.
Tencent open‑sourced HunyuanVideo‑Foley, a model that automatically generates synchronized sound for an uploaded video clip. (Try on Hugging Face.)
xAI dropped Grok‑Code‑Fast‑1, a fast and cheap coding model that excels at agentic tasks.
🔬 AI research
Cool stuff you might get to try one day:
Anthropic is piloting a Claude for Chrome extension that can summarize pages, draft replies, and take limited actions on websites. (Join the waitlist.)
Krea is rolling out a Realtime Video feature that can turn your canvas inputs into live video at 12+ frames per second. (Join the waitlist.)
📖 AI resources
Helpful AI tools and stuff that teaches you about AI:
“An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana” [VIDEO]—good episode by AI Explained.
“Stax” [TOOL]—an experimental tool by Google that lets developers benchmark and evaluate LLMs for their specific needs.
“The Top 100 Gen AI Consumer Apps” [REFERENCE]—list of top 50 GenAI web products and top 50 GenAI mobile apps by Andreessen Horowitz.
“Tips for getting the best image generation and editing” [ARTICLE]—useful tips for working with Gemini 2.5 Flash Image from Google.
🤦♂️ AI fail of the week
I tried the “Create a replica” challenge with GPT-4o image generation.
I am Groot have some regrets.
Send me your AI fail for a chance to be featured in an upcoming Sunday Rundown.
💰 Sunday Bonus #69: How to improve any recorded workflow using Gemini
Both Gemini and ChatGPT now have live video chat modes that let you have a voice conversation while streaming your camera feed or sharing your screen.
Pretty handy, but here’s the catch: These live modes don’t actually watch your video feed in real time. Instead, they grab regular static snapshots and combine them with your ongoing audio conversation to figure out what’s happening.
That may work for quick, in-the-moment help. But if your workflow involves lots of small steps or longer sequences of actions, live modes won’t cut it. (Check out these entertaining live tests by
of to see why.)Yet there’s something much better-suited for this job: Gemini’s ability to parse an uploaded video recording. I already shared an effective way to use this feature to gain detailed insights from video meetings.
Today, I’ll show how you can use Gemini to improve the way you do stuff. Simply upload a screen recording of yourself working on a task, and it can:
Spot inefficiencies you might not be aware of
Suggest ways to streamline your process
Recommend specific features of a digital tool, app, etc.
Propose alternative tools for the task
…and more
Below, I’ll share the exact process and my starter guide for turning Gemini into your personal workflow auditor. (Plus a few workarounds for some of Gemini’s limitations.)