Sunday Rundown #96: LLMs & Limbs
Sunday Bonus #56: My swipe file with 45+ o3 use cases.
Heads up: This Thursday, May 1, I’ll hold a casual live Q&A to showcase two web-based agents: Genspark Super Agent and Proxy. If there’s time, I’ll also tackle pre-submitted and live questions. It’s free to join, so sign up here.
Happy Sunday, friends!
Welcome back to the weekly look at generative AI that covers the following:
Sunday Rundown (free): this week’s AI news + a fun AI fail.
Sunday Bonus (paid): an exclusive segment for my paid subscribers.
I’ve been away for Easter, so we have two weeks of news to catch up on.
Let’s get to it.
🗞️ AI news
Here are the past two weeks in AI.
👩💻 AI releases
New stuff you can try right now:
Firefly Image 4 is a new version of Adobe’s text-to-image model, also available in a better-quality Ultra version for paid accounts. (Try it here.)
The Firefly platform has been overhauled and lets you run third-party models, like Google Imagen 3 and Veo 2, OpenAI image generation, and Flux 1.1 Pro.
Text to Vector lets you create editable and scalable vector graphics from text prompts.
Firefly Boards (join waitlist) is a multiplayer canvas that lets you create storyboards, moodboards, and brainstorm collaboratively with others.
Alibaba open-sourced an upgraded video model, Wan 2.1-FLF2V-14B, which can now use first and last frames as inputs.
Anthropic launched two new features for paid Claude users:
Research that lets Claude perform agentic deep dives into different topics (similar to “Deep Research” features from other providers).
Integration with Google Workspace that lets Claude pull insights from Gmail, Calendar, and more.
ByteDance released Seedream 3.0, its next-gen text-to-image model with 2K native resolution. It is #2 on the AI Image Arena Leaderboard, just behind GPT-4o. I tested Seedream 3.0 with a few style modifiers several days ago. (Try it for free.)
Character.AI introduced a video model called AvatarFX with expressive facial expressions and native voice generation.
Cohere released Embed 4, a SOTA multimodal embedding model for enterprises that enables more accurate search across text and images.
Descript is rolling out an agentic video editor that can automatically tweak videos in response to your natural language chat commands. (Apply to test.)
ElevenLabs added call transfers, letting voice agents seamlessly share information and calls with each other.
Google news:
Gemini 2.5 Flash is a hybrid reasoning model with a “thinking” toggle that lets users balance quality, speed, and cost. (Free in Google AI Studio.)
Gemma 3 QAT are optimized versions of the Gemma 3 family that can run on consumer-grade devices.
Music AI Sandbox is now powered by the new Lyria 2 music model and is expanding to more US-based creators. (Sign up for the waitlist.)
Veo 2 is now available to Gemini Advanced users on gemini.google.com and inside Whisk, as well as a free rate-limited version inside Google AI Studio.
Kling AI released a massive Phase 2.0 update, dramatically upgrading its text-to-image and video models and adding many new editing features.
KREA AI’s new Stage tool lets you create and edit entire interactive 3D environments from text or image input.
Luma Labs added the option to pick from preset, consistent camera angles for your AI-generated videos.
Microsoft news:
Computer use feature is rolling out in early research preview, letting Copilot Studio agents navigate systems that use a graphical user interface.
365 Copilot app saw many updates, including AI-powered search, an Agent Store, Copilot Notebooks, and more:
OpenAI news:
o3 and o4-mini are new best-in-class reasoning models that can intelligently use multiple tools to solve problems on the fly.
GPT-4.1 (and smaller mini and nano versions) is available in the API, showing strong coding performance and better instruction following.
The new Library inside ChatGPT lets you view and manage your images generated by GPT-4o. (See my swipe file for use case ideas.)
Codex CLI is a lightweight, open-source command-line coding agent that can read, modify, and run code on your device.
Developers can now access GPT-4o native image generation in the API through the gpt-image-1 model.
A lightweight version of Deep Research, powered by o4-mini, is rolling out to free users. Paid accounts get a higher quota of standard Deep Research.
Perplexity launched an iOS Voice Assistant that can answer questions and perform basic tasks on your phone.
Play AI now has a Voice Changer that can clone a voice in 10 seconds while preserving tone and emotion.
Tencent released an upgraded Hunyuan 3D v2.5 model with 10x more mesh faces and ultra-high-definition modeling.
xAI news:
Grok can now remember past conversations to provide personalized responses.
Grok 3 Mini is now available in the API and tops reasoning model benchmarks while being 5x cheaper than competitors.
Grok Studio works just like “Canvas” in ChatGPT and outputs documents, code, etc., in a separate window.
Grok Vision is capable of multilingual audio and real-time search in Voice mode.
Vidu AI released the Vidu Q1 video model with sharper visuals, keyframe transitions, sound effects, and more.
🔬 AI research
Cool stuff you might get to try one day:
ByteDance teased lots of research previews:
Seaweed is a relatively small, 7B-parameter video model that can produce high-quality clips that are competitive with larger models.
Seed-Thinking v1.5 is a reasoning mixture-of-experts model that achieves high marks in STEM and coding benchmarks.
UI-TARS 1.5 is an open-source multimodal agent that can reason visually and perform diverse tasks in any operating system.
📖 AI resources
Helpful AI tools and sites that teach you about AI:
“OpenAI Preparedness Framework” [PDF] - an updated version of OpenAI’s process for reducing risks from advanced AI capabilities.
“Search Arena” [REFERENCE] - LM Arena’s new leaderboard for ranking search-augmented large language models.
“The Urgency of Interpretability” [ARTICLE] - an essay by Anthropic’s Dario Amodei highlighting the urgent need to understand AI’s inner workings.
“Values in the Wild” [PDF] - an analysis of 700K anonymized Claude conversations to uncover its expressed values by Anthropic.
🔀 AI random
Other notable AI stories of the week:
OpenAI rumors:
The company is reportedly in talks to acquire the coding platform Windsurf after previously considering Cursor.
Sam Altman might be looking at starting a social media platform similar to X.
📝 Suddenly, a surprise survey spawns…
Please help make Why Try AI better. Let me know what works and what doesn’t:
🤦♂️ AI fail of the week
It started out so well….
💰 Sunday Bonus #56: 45+ practical use cases for o3 (swipe file)
This week, I'm keeping my streak alive with yet another helpful interactive page.
The last two were:
Today, we’re diving into OpenAI’s o3.
o3 is more than just a powerful reasoning model—its standout feature is that it can use tools. It can search the web, run Python code, create images, etc., making independent decisions about what’s needed for the task at hand.
But what can you use o3 for?
To answer that, I dug into hands-on articles and videos focusing on o3 use cases.
Then I teamed up with o3 itself (meta!) to brainstorm additional use cases, consolidate all the details, and turn everything into a nice, interactive swipe file.
We ended up with over 45 use cases in total:
You can filter by category, search by keyword, and one-click copy sample prompts to try them yourself—just as with the GPT-4o swipe file.
I hope you find it handy!