Hey man, all the prompts are listed at the start of the post!
It's hard to generalize when it comes to models. Some are great at text rendering (Ideogram, DALL-E 3, and Imagen), while Midjourney is still the best at photographic images. And so on. But as I write at the end, the quality of the models is converging fast.
No stress, give me a shout if you have questions about specific image models. I've written a bunch about Midjourney, Stable Diffusion, and DALL-E 3 especially
You said it well here: "The most obvious conclusion here is that text-to-image models are converging. At the start of the year, there were clear leaders. Now, it’s often impossible to tell AI image models apart in terms of quality."
I feel this way about all generative AI. The worst model today is probably better than the best model a year ago all across the board, and it's only going to get more competitive from here on out.
Super cool that Google is getting words right! I am sure we'll look back on this problem as trivial one day, but it's sure frustrating today.
Absolutely. The rate of progress is crazy. When I got into the game, Stable Diffusion was just starting to approach something resembling realism. Now every model pumps out images that can be mistaken for real photos if you don't look closely enough.
Well funnily enough, most of these ARE proprietary models trained separately. But since, in the end, they all use the available images to train on, it's little wonder that there's some convergence.
I think differentiation will come more and more not from the image quality itself but the unique styling (MJ Style Tuner is a great example) and all the additional features within the tool that serves the model (inpainting, outpainting, animating, remixing, etc.)
I guess the reason I was asking is the Sidekick II. This phone kicked the shit out of every other phone on the market in terms of typing and in terms of predictive text, for like 5 years in a row, WAY longer than it should have since it got leapfrogged by new technology.
But T-Mobile owned the rights to that programmable text, which was just so much better than anything else like it, well before text-to-speech was viable. I kept my SKII for like 4 years, even as new technology was surpassing it, and while hanging out with innovator-types (I might qualify as an early adopter of a lot of emerging tech, but I'm never ever at the vanguard).
I'm curious if someone will patent a technique and then keep others from using it, and if that will even matter.
Yeah there's probably some room for proprietary features, but I doubt they'll come from how the images look. Because ultimately, the best we can really get is "just like a photograph" or "just like a real painting," and we're seemingly about 90% there already.
So proprietary stuff will have to be the user interface and the additional bells and whistles that let designers and other users work with the models in a meaningful way. Which is where existing companies like Adobe with a suite of products used by design professionals might end up having an advantage. Would you rather generate images in Midjourney's clunky Discord interface, export them out, then try to work on them in a third-party tool? Or simply open up Adobe Photoshop and handle the entire process from image generation to remixing to editing, upscaling, and finalizing in a single app?
Yeah Dezgo is a good option for SD and other spinoff models, but it doesn't seem to offer SDXL though. Leonardo and Playground are great sites with lots of free credits, but they do require account creation (free), which I know you're not a huge fan of!
I wouldn't say SD is falling behind, especially if you consider the XL version - remember that there are also many improved spinoff models that build on vanilla SDXL and specialize in different styles and mediums.
If you have a Microsoft account, just go to https://www.bing.com/create and try generating some images. It's free, you won't need any additional accounts, and it uses DALL-E 3 which follows your prompt very closely. Let me know how it works out for you!
I like you provide side by side comparison, do you mind to share your prompts?
And, do you think each models has a distinct features that others don't?
Hey man, all the prompts are listed at the start of the post!
It's hard to generalize when it comes to models. Some are great at text rendering (Ideogram, DALL-E 3, and Imagen), while Midjourney is still the best at photographic images. And so on. But as I write at the end, the quality of the models is converging fast.
my bad, I skipped it. thanks for the reply
No stress, give me a shout if you have questions about specific image models. I've written a bunch about Midjourney, Stable Diffusion, and DALL-E 3 especially
You said it well here: "The most obvious conclusion here is that text-to-image models are converging. At the start of the year, there were clear leaders. Now, it’s often impossible to tell AI image models apart in terms of quality."
I feel this way about all generative AI. The worst model today is probably better than the best model a year ago all across the board, and it's only going to get more competitive from here on out.
Super cool that Google is getting words right! I am sure we'll look back on this problem as trivial one day, but it's sure frustrating today.
Absolutely. The rate of progress is crazy. When I got into the game, Stable Diffusion was just starting to approach something resembling realism. Now every model pumps out images that can be mistaken for real photos if you don't look closely enough.
Yeah Imagen was really the big surprise here. And the showcase for Imagen 2 looks crazy: https://deepmind.google/technologies/imagen-2/
Impressive stuff. If one does it, they'll all do it, I suppose... is there room for proprietary differentiators?
Well funnily enough, most of these ARE proprietary models trained separately. But since, in the end, they all use the available images to train on, it's little wonder that there's some convergence.
I think differentiation will come more and more not from the image quality itself but the unique styling (MJ Style Tuner is a great example) and all the additional features within the tool that serves the model (inpainting, outpainting, animating, remixing, etc.)
I guess the reason I was asking is the Sidekick II. This phone kicked the shit out of every other phone on the market in terms of typing and in terms of predictive text, for like 5 years in a row, WAY longer than it should have since it got leapfrogged by new technology.
But T-Mobile owned the rights to that programmable text, which was just so much better than anything else like it, well before text-to-speech was viable. I kept my SKII for like 4 years, even as new technology was surpassing it, and while hanging out with innovator-types (I might qualify as an early adopter of a lot of emerging tech, but I'm never ever at the vanguard).
I'm curious if someone will patent a technique and then keep others from using it, and if that will even matter.
Yeah there's probably some room for proprietary features, but I doubt they'll come from how the images look. Because ultimately, the best we can really get is "just like a photograph" or "just like a real painting," and we're seemingly about 90% there already.
So proprietary stuff will have to be the user interface and the additional bells and whistles that let designers and other users work with the models in a meaningful way. Which is where existing companies like Adobe with a suite of products used by design professionals might end up having an advantage. Would you rather generate images in Midjourney's clunky Discord interface, export them out, then try to work on them in a third-party tool? Or simply open up Adobe Photoshop and handle the entire process from image generation to remixing to editing, upscaling, and finalizing in a single app?
Yeah Dezgo is a good option for SD and other spinoff models, but it doesn't seem to offer SDXL though. Leonardo and Playground are great sites with lots of free credits, but they do require account creation (free), which I know you're not a huge fan of!
I wouldn't say SD is falling behind, especially if you consider the XL version - remember that there are also many improved spinoff models that build on vanilla SDXL and specialize in different styles and mediums.
If you have a Microsoft account, just go to https://www.bing.com/create and try generating some images. It's free, you won't need any additional accounts, and it uses DALL-E 3 which follows your prompt very closely. Let me know how it works out for you!