Why Try AI

Why Try AI

Share this post

Why Try AI
Why Try AI
Sunday Rundown #96: LLMs & Limbs
Copy link
Facebook
Email
Notes
More
Sunday Rundown

Sunday Rundown #96: LLMs & Limbs

Sunday Bonus #56: My swipe file with 45+ o3 use cases.

Daniel Nest's avatar
Daniel Nest
Apr 27, 2025
∙ Paid
12

Share this post

Why Try AI
Why Try AI
Sunday Rundown #96: LLMs & Limbs
Copy link
Facebook
Email
Notes
More
4
5
Share

Heads up: This Thursday, May 1, I’ll hold a casual live Q&A to showcase two web-based agents: Genspark Super Agent and Proxy. If there’s time, I’ll also tackle pre-submitted and live questions. It’s free to join, so sign up here.

Live Q&A sign-up


Happy Sunday, friends!

Welcome back to the weekly look at generative AI that covers the following:

  • Sunday Rundown (free): this week’s AI news + a fun AI fail.

  • Sunday Bonus (paid): an exclusive segment for my paid subscribers.

Every Sunday Bonus in one place

I’ve been away for Easter, so we have two weeks of news to catch up on.

Let’s get to it.

🗞️ AI news

Here are the past two weeks in AI.

👩‍💻 AI releases

New stuff you can try right now:

  1. Adobe news:

    1. Firefly Image 4 is a new version of Adobe’s text-to-image model, also available in a better-quality Ultra version for paid accounts. (Try it here.)

    2. The Firefly platform has been overhauled and lets you run third-party models, like Google Imagen 3 and Veo 2, OpenAI image generation, and Flux 1.1 Pro.

    3. Text to Vector lets you create editable and scalable vector graphics from text prompts.

    4. Firefly Boards (join waitlist) is a multiplayer canvas that lets you create storyboards, moodboards, and brainstorm collaboratively with others.

  2. Alibaba open-sourced an upgraded video model, Wan 2.1-FLF2V-14B, which can now use first and last frames as inputs.

  3. Anthropic launched two new features for paid Claude users:

    1. Research that lets Claude perform agentic deep dives into different topics (similar to “Deep Research” features from other providers).

    2. Integration with Google Workspace that lets Claude pull insights from Gmail, Calendar, and more.

  4. ByteDance released Seedream 3.0, its next-gen text-to-image model with 2K native resolution. It is #2 on the AI Image Arena Leaderboard, just behind GPT-4o. I tested Seedream 3.0 with a few style modifiers several days ago. (Try it for free.)

  5. Character.AI introduced a video model called AvatarFX with expressive facial expressions and native voice generation.

  6. Cohere released Embed 4, a SOTA multimodal embedding model for enterprises that enables more accurate search across text and images.

  7. Descript is rolling out an agentic video editor that can automatically tweak videos in response to your natural language chat commands. (Apply to test.)

  8. ElevenLabs added call transfers, letting voice agents seamlessly share information and calls with each other.

  9. Google news:

    1. Gemini 2.5 Flash is a hybrid reasoning model with a “thinking” toggle that lets users balance quality, speed, and cost. (Free in Google AI Studio.)

    2. Gemma 3 QAT are optimized versions of the Gemma 3 family that can run on consumer-grade devices.

    3. Music AI Sandbox is now powered by the new Lyria 2 music model and is expanding to more US-based creators. (Sign up for the waitlist.)

    4. Veo 2 is now available to Gemini Advanced users on gemini.google.com and inside Whisk, as well as a free rate-limited version inside Google AI Studio.

  10. Kling AI released a massive Phase 2.0 update, dramatically upgrading its text-to-image and video models and adding many new editing features.

  11. KREA AI’s new Stage tool lets you create and edit entire interactive 3D environments from text or image input.

  12. Luma Labs added the option to pick from preset, consistent camera angles for your AI-generated videos.

  13. Microsoft news:

    1. Computer use feature is rolling out in early research preview, letting Copilot Studio agents navigate systems that use a graphical user interface.

    2. 365 Copilot app saw many updates, including AI-powered search, an Agent Store, Copilot Notebooks, and more:

  14. OpenAI news:

    1. o3 and o4-mini are new best-in-class reasoning models that can intelligently use multiple tools to solve problems on the fly.

    2. GPT-4.1 (and smaller mini and nano versions) is available in the API, showing strong coding performance and better instruction following.

    3. The new Library inside ChatGPT lets you view and manage your images generated by GPT-4o. (See my swipe file for use case ideas.)

    4. Codex CLI is a lightweight, open-source command-line coding agent that can read, modify, and run code on your device.

    5. Developers can now access GPT-4o native image generation in the API through the gpt-image-1 model.

    6. A lightweight version of Deep Research, powered by o4-mini, is rolling out to free users. Paid accounts get a higher quota of standard Deep Research.

  15. Perplexity launched an iOS Voice Assistant that can answer questions and perform basic tasks on your phone.

  16. Play AI now has a Voice Changer that can clone a voice in 10 seconds while preserving tone and emotion.

  17. Tencent released an upgraded Hunyuan 3D v2.5 model with 10x more mesh faces and ultra-high-definition modeling.

  18. xAI news:

    1. Grok can now remember past conversations to provide personalized responses.

    2. Grok 3 Mini is now available in the API and tops reasoning model benchmarks while being 5x cheaper than competitors.

    3. Grok Studio works just like “Canvas” in ChatGPT and outputs documents, code, etc., in a separate window.

    4. Grok Vision is capable of multilingual audio and real-time search in Voice mode.

  19. Vidu AI released the Vidu Q1 video model with sharper visuals, keyframe transitions, sound effects, and more.


🔬 AI research

Cool stuff you might get to try one day:

  1. ByteDance teased lots of research previews:

    1. Seaweed is a relatively small, 7B-parameter video model that can produce high-quality clips that are competitive with larger models.

    2. Seed-Thinking v1.5 is a reasoning mixture-of-experts model that achieves high marks in STEM and coding benchmarks.

    3. UI-TARS 1.5 is an open-source multimodal agent that can reason visually and perform diverse tasks in any operating system.


📖 AI resources

Helpful AI tools and sites that teach you about AI:

  1. “OpenAI Preparedness Framework” [PDF] - an updated version of OpenAI’s process for reducing risks from advanced AI capabilities.

  2. “Search Arena” [REFERENCE] - LM Arena’s new leaderboard for ranking search-augmented large language models.

  3. “The Urgency of Interpretability” [ARTICLE] - an essay by Anthropic’s Dario Amodei highlighting the urgent need to understand AI’s inner workings.

  4. “Values in the Wild” [PDF] - an analysis of 700K anonymized Claude conversations to uncover its expressed values by Anthropic.


🔀 AI random

Other notable AI stories of the week:

  1. OpenAI rumors:

    1. The company is reportedly in talks to acquire the coding platform Windsurf after previously considering Cursor.

    2. Sam Altman might be looking at starting a social media platform similar to X.


📝 Suddenly, a surprise survey spawns…

Please help make Why Try AI better. Let me know what works and what doesn’t:

Share your feedback


🤦‍♂️ AI fail of the week

It started out so well….

Why Try AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


💰 Sunday Bonus #56: 45+ practical use cases for o3 (swipe file)

This week, I'm keeping my streak alive with yet another helpful interactive page.

The last two were:

  1. Swipe file of 90+ GPT-4o image generation use cases.

  2. Comparison page for 15 video models.

Today, we’re diving into OpenAI’s o3.

o3 is more than just a powerful reasoning model—its standout feature is that it can use tools. It can search the web, run Python code, create images, etc., making independent decisions about what’s needed for the task at hand.

But what can you use o3 for?

To answer that, I dug into hands-on articles and videos focusing on o3 use cases.

Then I teamed up with o3 itself (meta!) to brainstorm additional use cases, consolidate all the details, and turn everything into a nice, interactive swipe file.

We ended up with over 45 use cases in total:

Filters Search Search use-cases... Select all Deselect all  business  coding  creative  learning  research  utility  vision Image Text Extractor vision Turn any legible words in an image into clean, line-broken text.  Extract the text from this image and return it as plain, line-broken text. Extract the text from this photo of a faded WWII postcard and return it as plain, line-broken text. Possible use cases Read handwritten prescriptions Transcribe old manuscripts Capture lecture-slide text Location-From-Photo vision Guess where a photo was taken by analysing architecture, signage, or landscape clues.  Identify the likely location of this image and explain the visual clues you used. Identify the likely city or region for this café-street photo and explain the tile and signage clues. Possible use cases Guess holiday snaps Verify stock-photo origins Spot movie locations Puzzle Solver vision Solve mazes, hidden-object scenes, or spot-the-difference images.  Solve the visual puzzle in this image and describe the solution or object positions. Solve this 200×200-pixel maze and describe the solution path. Possible use cases Kids’ activity pages Retro puzzle books Game-jam ideas
Three sample use cases

You can filter by category, search by keyword, and one-click copy sample prompts to try them yourself—just as with the GPT-4o swipe file.

I hope you find it handy!

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Daniel Nest
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More