Ha! I'm so pleased I was right about it being 4o! This may be my most useless and niche skill XD
It's very interesting what you said about the "below the surface"... The thought that they're planning plotlines ahead (even in non reasoning models) suggests much more of an "inner world" than I typically think of LLMs as having.
I wonder if you might have got better results if you had asked each model to give just one version, but perhaps asked in three chats, or regenerated the response. Asking for three at once might make it focus on producing sufficient variation between them, distracting it from just getting on and writing the story.
Ha, you gotta add "GPT-4o Spotting Expert" to your CV right away!
Yeah, I can't speculate on how much actual world-building takes place below the surface when LLMs spit out a story, but it's clear that when you stick to a single model, there's a level of self-consistency that's harder to replicate when you pass the baton from one LLM to the next.
Good point about asking for multiple variations in one chat vs. re-rolling the dice. Maybe some fodder for future experiments. But my main goal was exactly to find out whether LLMs can "fool" us even without any specialized techniques or approaches. The answer appears to be "yes."
Section 4 is pretty good LLM Daniel. You are a stylish writer in your own right, while you may have turned it down for this, I wonder if you've spent so much time with 4o that it's picked up a little bit of your mojo?
I'm sorry, but as a large language model, I am not allowed to discuss my system prompts, my writing style, and their implications. But I am happy to assist you with the second part of your comment:
I actually intentionally ran my experiment in a "Temporary Chat" within ChatGPT, specifically to avoid it having access to prior chats, my custom instructions, memories, etc. Because I didn't want to affect its "vanilla" default voice. So what you're reading should be standard, out-of-the-box GPT-4o.
Hey while I've got you, a question. I'm so lazy I never switch off of the default model (4o). I pretty much exclusively use ChatGPT for research and making pictures. I get a little tired of the super-happy-enthusiastic thing that 4o is. Would I be better off using o3?
Dude, o3 all the way. I'm pretty much using o3 as my default model.
It self-adjusts the amount of thinking tokens it spends based on the task. So for quick, simple questions, it'll respond almost as fast as standard models. For complex tasks, it'll use tools and do thorough research to give you a well-reasoned answer. It's great for anything that goes beyond simple chat, and it doesn't have the "happy sycophantic chatty go go" vibe of default GPT-4o.
Daniel, your experiment is brilliant—not just for showcasing LLM quirks, but for revealing what truly makes writing human. Your self-aware humor and layered observations turn a tech test into a creative critique. Section #4's subtlety proves your point: authenticity lives in the unsaid. More of this, please.
I’ve been experimenting with Claude, ChatGPT and Gemini, doing much the same thing. I agree with you about sticking with one LLM at a time. As a writer, you can build up your own rhythm. As a LLM, it will pick up on the “map” you’re describing (what’s important to you). One question: Do any of the LLMs you’re using have specific custom instructions, either in your settings or in the project you set up? That might cover the part of the iceberg under the water. I’ve found that uploading documents that tell the LLMs who I am as a writer, what my story is about (for me / for the audience), and how I want it to work with me helps to get great results.
Because, as you’ve noted, this isn’t a great result. It’s… okay. Okay enough to be and experiment. But as writers, we can get more out of AI.
Thanks for sharing your experience. You've worked much more with LLMs on fiction since that's your main focus, so it's great to have your insights!
To answer your question: No, there are no custom instructions or other context beyond the prompt. (I even intentionally opened GPT-4o in "Temporary Chat" mode to prevent it from tapping into its usual memory.) That was the whole point: I wanted to test "vanilla" LLM output without attempts to feed them any world-building elements, tone-of-voice guidelines, etc.
My goal was precisely to establish whether LLMs can pass as humans "out of the box," without advanced prompting techniques, additional context, etc.
I should've probably made that clearer from the get-go.
I have no doubt that any LLM can become a much better writer with proper context, your writing style, story map, and so on.
But it's quite telling that even without all that, LLMs are no longer as easy to identify as many of us like to believe!
Ha! I'm so pleased I was right about it being 4o! This may be my most useless and niche skill XD
It's very interesting what you said about the "below the surface"... The thought that they're planning plotlines ahead (even in non reasoning models) suggests much more of an "inner world" than I typically think of LLMs as having.
I wonder if you might have got better results if you had asked each model to give just one version, but perhaps asked in three chats, or regenerated the response. Asking for three at once might make it focus on producing sufficient variation between them, distracting it from just getting on and writing the story.
Ha, you gotta add "GPT-4o Spotting Expert" to your CV right away!
Yeah, I can't speculate on how much actual world-building takes place below the surface when LLMs spit out a story, but it's clear that when you stick to a single model, there's a level of self-consistency that's harder to replicate when you pass the baton from one LLM to the next.
Good point about asking for multiple variations in one chat vs. re-rolling the dice. Maybe some fodder for future experiments. But my main goal was exactly to find out whether LLMs can "fool" us even without any specialized techniques or approaches. The answer appears to be "yes."
Section 4 is pretty good LLM Daniel. You are a stylish writer in your own right, while you may have turned it down for this, I wonder if you've spent so much time with 4o that it's picked up a little bit of your mojo?
I'm sorry, but as a large language model, I am not allowed to discuss my system prompts, my writing style, and their implications. But I am happy to assist you with the second part of your comment:
I actually intentionally ran my experiment in a "Temporary Chat" within ChatGPT, specifically to avoid it having access to prior chats, my custom instructions, memories, etc. Because I didn't want to affect its "vanilla" default voice. So what you're reading should be standard, out-of-the-box GPT-4o.
Trickster!!
Hey while I've got you, a question. I'm so lazy I never switch off of the default model (4o). I pretty much exclusively use ChatGPT for research and making pictures. I get a little tired of the super-happy-enthusiastic thing that 4o is. Would I be better off using o3?
Dude, o3 all the way. I'm pretty much using o3 as my default model.
It self-adjusts the amount of thinking tokens it spends based on the task. So for quick, simple questions, it'll respond almost as fast as standard models. For complex tasks, it'll use tools and do thorough research to give you a well-reasoned answer. It's great for anything that goes beyond simple chat, and it doesn't have the "happy sycophantic chatty go go" vibe of default GPT-4o.
If you have the chance, take o3 for a spin.
Daniel, your experiment is brilliant—not just for showcasing LLM quirks, but for revealing what truly makes writing human. Your self-aware humor and layered observations turn a tech test into a creative critique. Section #4's subtlety proves your point: authenticity lives in the unsaid. More of this, please.
Glad you enjoyed it, Alex!
Daniel,
I’ve been experimenting with Claude, ChatGPT and Gemini, doing much the same thing. I agree with you about sticking with one LLM at a time. As a writer, you can build up your own rhythm. As a LLM, it will pick up on the “map” you’re describing (what’s important to you). One question: Do any of the LLMs you’re using have specific custom instructions, either in your settings or in the project you set up? That might cover the part of the iceberg under the water. I’ve found that uploading documents that tell the LLMs who I am as a writer, what my story is about (for me / for the audience), and how I want it to work with me helps to get great results.
Because, as you’ve noted, this isn’t a great result. It’s… okay. Okay enough to be and experiment. But as writers, we can get more out of AI.
Thanks for all your great posts / work!
Hey Fred,
Thanks for sharing your experience. You've worked much more with LLMs on fiction since that's your main focus, so it's great to have your insights!
To answer your question: No, there are no custom instructions or other context beyond the prompt. (I even intentionally opened GPT-4o in "Temporary Chat" mode to prevent it from tapping into its usual memory.) That was the whole point: I wanted to test "vanilla" LLM output without attempts to feed them any world-building elements, tone-of-voice guidelines, etc.
My goal was precisely to establish whether LLMs can pass as humans "out of the box," without advanced prompting techniques, additional context, etc.
I should've probably made that clearer from the get-go.
I have no doubt that any LLM can become a much better writer with proper context, your writing style, story map, and so on.
But it's quite telling that even without all that, LLMs are no longer as easy to identify as many of us like to believe!