Clear, concrete, and direct advice. Yes, like John below it'd be nice to hit some button to activate all possible modes to avoid sycophancy, but we can adapt these measures to the actual type of conversation we have with the chatbots. So Daniel's advice is something to incorporate in our everyday use of these tools.
Yeah, what I'd suggest is figuring out which of the tips work for your situation, then consider baking them into a top-level "custom instructions" prompt so that they trigger in all of your conversations. Would love to hear which options worked best for you.
I guess memory off + "temporary/incognito" chat is the closest, as it simulates vanilla LLM behavior without your background. But of course, that doesn't switch off the built-in sycophancy from RLHF anyway.
There are also examples of people hard-coding custom instructions that tell the model to be critical, find pros and cons instead of positives, etc., but all of this is just putting lipstick on a pig in some way.
Clear, concrete, and direct advice. Yes, like John below it'd be nice to hit some button to activate all possible modes to avoid sycophancy, but we can adapt these measures to the actual type of conversation we have with the chatbots. So Daniel's advice is something to incorporate in our everyday use of these tools.
Yeah, what I'd suggest is figuring out which of the tips work for your situation, then consider baking them into a top-level "custom instructions" prompt so that they trigger in all of your conversations. Would love to hear which options worked best for you.
It’s a pity that there’s not just a “zero mode” which might avoid all this palaver. But I do not think that this is an original thought on my part!
I guess memory off + "temporary/incognito" chat is the closest, as it simulates vanilla LLM behavior without your background. But of course, that doesn't switch off the built-in sycophancy from RLHF anyway.
There are also examples of people hard-coding custom instructions that tell the model to be critical, find pros and cons instead of positives, etc., but all of this is just putting lipstick on a pig in some way.
Thanks Daniel.