15 Comments
Dec 16, 2023Liked by Daniel Nest

I like you provide side by side comparison, do you mind to share your prompts?

And, do you think each models has a distinct features that others don't?

Expand full comment
author

Hey man, all the prompts are listed at the start of the post!

It's hard to generalize when it comes to models. Some are great at text rendering (Ideogram, DALL-E 3, and Imagen), while Midjourney is still the best at photographic images. And so on. But as I write at the end, the quality of the models is converging fast.

Expand full comment
Dec 16, 2023Liked by Daniel Nest

my bad, I skipped it. thanks for the reply

Expand full comment
author

No stress, give me a shout if you have questions about specific image models. I've written a bunch about Midjourney, Stable Diffusion, and DALL-E 3 especially

Expand full comment

You said it well here: "The most obvious conclusion here is that text-to-image models are converging. At the start of the year, there were clear leaders. Now, it’s often impossible to tell AI image models apart in terms of quality."

I feel this way about all generative AI. The worst model today is probably better than the best model a year ago all across the board, and it's only going to get more competitive from here on out.

Super cool that Google is getting words right! I am sure we'll look back on this problem as trivial one day, but it's sure frustrating today.

Expand full comment
author
Dec 14, 2023·edited Dec 14, 2023Author

Absolutely. The rate of progress is crazy. When I got into the game, Stable Diffusion was just starting to approach something resembling realism. Now every model pumps out images that can be mistaken for real photos if you don't look closely enough.

Yeah Imagen was really the big surprise here. And the showcase for Imagen 2 looks crazy: https://deepmind.google/technologies/imagen-2/

Expand full comment

Impressive stuff. If one does it, they'll all do it, I suppose... is there room for proprietary differentiators?

Expand full comment
author

Well funnily enough, most of these ARE proprietary models trained separately. But since, in the end, they all use the available images to train on, it's little wonder that there's some convergence.

I think differentiation will come more and more not from the image quality itself but the unique styling (MJ Style Tuner is a great example) and all the additional features within the tool that serves the model (inpainting, outpainting, animating, remixing, etc.)

Expand full comment

I guess the reason I was asking is the Sidekick II. This phone kicked the shit out of every other phone on the market in terms of typing and in terms of predictive text, for like 5 years in a row, WAY longer than it should have since it got leapfrogged by new technology.

But T-Mobile owned the rights to that programmable text, which was just so much better than anything else like it, well before text-to-speech was viable. I kept my SKII for like 4 years, even as new technology was surpassing it, and while hanging out with innovator-types (I might qualify as an early adopter of a lot of emerging tech, but I'm never ever at the vanguard).

I'm curious if someone will patent a technique and then keep others from using it, and if that will even matter.

Expand full comment
author
Dec 14, 2023·edited Dec 14, 2023Author

Yeah there's probably some room for proprietary features, but I doubt they'll come from how the images look. Because ultimately, the best we can really get is "just like a photograph" or "just like a real painting," and we're seemingly about 90% there already.

So proprietary stuff will have to be the user interface and the additional bells and whistles that let designers and other users work with the models in a meaningful way. Which is where existing companies like Adobe with a suite of products used by design professionals might end up having an advantage. Would you rather generate images in Midjourney's clunky Discord interface, export them out, then try to work on them in a third-party tool? Or simply open up Adobe Photoshop and handle the entire process from image generation to remixing to editing, upscaling, and finalizing in a single app?

Expand full comment

Good roundup of AI image generators Daniel, thanks. I wasn't aware of some of these.

All I can add is my usual mention of Dezgo.com. As you know, it's powered by Stable Diffusion. I'm not qualified to compare it to other SD installations. I like it mainly because of the web interface, which seems very accessible. No signup, no BS, simple interface etc.

I used the free version for awhile. It was kinda slow, and includes ads, but I still found it useful.

Then I upgraded to the paid version. Much faster, no ads, additional features. It's pay as you go, and seems very reasonable. I've generated a couple hundred images so far for about 2 bucks.

I get the feeling that SD is being left behind by other models, but you would be the one to speak to that.

Expand full comment
author

Yeah Dezgo is a good option for SD and other spinoff models, but it doesn't seem to offer SDXL though. Leonardo and Playground are great sites with lots of free credits, but they do require account creation (free), which I know you're not a huge fan of!

I wouldn't say SD is falling behind, especially if you consider the XL version - remember that there are also many improved spinoff models that build on vanilla SDXL and specialize in different styles and mediums.

Expand full comment

Dezgo offers "Text to Image XL", though I can't tell you what that means. I obviously don't know what SDXL is either really. I really can't compare Dezgo to anything else in any meaningful way. Dalle does seem to offer more control over the image, as you taught me earlier.

Expand full comment
author

If you have a Microsoft account, just go to https://www.bing.com/create and try generating some images. It's free, you won't need any additional accounts, and it uses DALL-E 3 which follows your prompt very closely. Let me know how it works out for you!

Expand full comment

Yes, I did that earlier, thanks to your guidance. I thought Dalle was pretty impressive, but wandered off when they started throttling usage in an unexplained manner. I've been meaning to purchase the paid version of Dalle, but just haven't gotten there yet. Lots going on here other than the Internet.

Expand full comment