10X AI (Issue #24): Fuyu-8B, PlayHT Turbo, Two Fun Tools, and a Multitalented Bear

PLUS: Google's language learning, Pi with Internet access, Claude worldwide, new Midjourney upscalers, and getting better images out of DALL-E 3.

Daniel Nest

Oct 22, 2023

Happy Sunday!

Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.

Let’s get to it.

This long post might get cut off in some email clients. Click here to read it online.

🗞️ AI news

This week, instead of one or two blockbuster announcements from major companies, we’ve had a lot of smaller ones.

1. New Fuyu-8B multimodal model

For free ChatGPT users with no access to ChatGPT Vision, there are an increasing number of free alternatives lately.

This week, a company called Adept released Fuyu-8B, the smallest of their multimodal models. The model is fast and performs well across image question-answer benchmarks:

Multimodal benchmarks with Fuyu-8B scores against competitors — Source: Adept

It appears to be quite accurate in my anecdotal testing:

Want to try it for yourself? Here’s a free Hugging Face demo.

2. PlayHT’s speedy text-to-speech model

PlayHT, one of the frontrunners in text-to-speech, just released a ridiculously fast model called PlayHT Turbo.

It can convert text into speech in practically real time. Most of the latency comes from the network connection rather than the model’s intrinsic speed.

In my own test, PlayHT Turbo generated output in under 0.5 seconds:

Here’s the above free demo with a bunch of voices and emotions. Enjoy!

3. Google takes on Duolingo. Sort of.

Lately, Google has been incorporating more features directly into its search environment. The latest one lets English learners start practice sessions from search results related to language queries:

The goal is to bring learning into the appropriate context. To begin with, this feature will become available to Android users in Argentina, Colombia, India, Indonesia, Mexico, and Venezuela. More countries are likely to soon follow suit.

4. Claude is available in more countries

Anthropic’s Claude still boasts the highest context window of any chatbot: 100K tokens, to be exact.

And now, users in 95 countries can access the model. (Denmark’s not on that list, which means you can look forward to me complaining about that for a while.)

There’s a free version anyone—except me—can try, so go check out Claude.

5. Pi can now browse

Inflection’s Pi chatbot is known for its friendly and supportive personality.

As of this week, it’s also plugged into the Internet and can access updated info on any topic.

If you haven’t tried Pi yet, you now have another reason to check it out.

From my sponsor:

Recast helps you ‘read more’ without reading. Easily breeze through your reading backlog by converting articles into bite-sized audio convos. No more 'info FOMO' – stay up-to-date and discover thousands of interesting recasts within the app.

Try It Now

6. Midjourney releases a new built-in upscaler

We haven’t had a new Midjourney model release since Version 5.2.

But the company has been keeping us happy with additions like Vary (Region).

This week, Midjourney finally launched an upscaler for Version 5 (and above).

Under any individual image, you should now see options to upscale it by 2x or 4x:

Neon cityscape in Midjourney with 2x and 4x upscale buttons visible

Unlike Version 4 upscalers, which altered the image itself in the process, the new ones are supposed to stick as closely as possible to the original.

🛠️ AI tools

Today, I have space for just two tools, so I’m sharing a couple of fun ones I came across recently.

7. Upside Down Diffusion

Hey, remember the classic “princess or old lady” optical illusion? Where you can see one or the other by simply rotating the picture?

Upside Down Diffusion lets you make your own rotating illusions with any two subjects of your choice.

I tried having a skeleton turn into a mummy…

Skeleton image that turns into a mummy if you rotate it — Skeleton vs. Mummy

…a squirrel on a branch that becomes King Kong…

Squirrel image that turns into King Kong if you rotate it — Squirrel vs. King Kong

…and, of course, the classic:

Princess that turns into an old lady when you rotate the image — Princess vs. old lady

What will you try?

(Thanks to

Zeng

for sharing the tool in one of her

PicAisso

posts.)

Check out Upside Down Diffusion

8. Riffusion

Not to be confused with the namesake text-to-music model I tested back in June, Riffusion can create a music track in any genre that incorporates whatever lyrics you give it.

(So it’s very similar to Suno’s Chirp I recently covered but with a proper interface for those who might be avoiding Discord.)

The process is simple. You feed Riffusion the lyrics and describe the sound or genre you want:

Then you get three options to pick from. Here’s my favorite:

1×

0:00

-0:11

You can save your riff as either a video or an audio file. But what’s really cool is the ability to split the track into stems with a single click:

This gives you each individual instrument or voice as a separate audio file:

Curious? It’s free to try so go for it:

Check out Riffusion

💡 AI tip

Here’s this week’s tip.

9. Nudge ChatGPT into giving better DALL-E 3 prompts

ChatGPT Plus is already pre-prompted to create its own detailed DALL-E 3 descriptions when you request an image. But YouTuber Glibatree came up with a set of detailed custom instructions that take ChatGPT’s DALL-E 3 descriptions even further:

And Glibatree was generous enough to share the exact instructions. Simply paste them into the appropriate “Custom Instructions” sections to have ChatGPT write better prompts for DALL-E 3.

Can’t use Custom Instructions because you’re e.g. using them for something else? No problem, simply copy-paste Glibatree’s text into any new DALL-E 3 chat before asking it to generate images.

Have fun!

🤦‍♂️ 10. AI fail of the week

I asked for an “ice skating bear,” but this is so much better.

Bear with three hind legs on rollerskates standing on top of a skateboard with ice skates attached

Sunday poll time

Liked the post? Help me grow Why Try AI by sharing it with others!

Previous issue of 10X AI:

10X AI

10X AI (Issue #23): Adobe MAX, AI Reference Sites, and a Moonwalking Dinosaur

Daniel Nest

October 15, 2023

10X AI (Issue #23): Adobe MAX, AI Reference Sites, and a Moonwalking Dinosaur

Happy Sunday, friends! Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips. I’m away with the kids for the school autumn holiday, so I’ll keep the commentary short today. Let’s get to it. This long post might get cut off in some email clients.

Read full story

Rationaltail

My heart goes out to you not having access to Claude!🥲. The huge context window makes it my favorite LLM. My favorite experiment so far was to paste in ‘The circular ruins’ by Jorge Louis Borges and ask for some critical analysis. Claude actually helped me to understand the themes and symbols in this obscure story better! I hope Claude comes to your neck of the woods soon. Maybe try a VPN?

Expand full comment

2 replies by Daniel Nest and others