18 Comments
Feb 8Liked by Daniel Nest

Absolutely one of the most helpful articles I have read on this subject. Good job!

Expand full comment

Thanks for clarifying this!

I am left with a question, though: why not call them "examples" instead of "shots"? Is it just so us fancy prompt engineer types can sneer at the rest of the population with our obviously superior knowledge, lording our power over the luddites of the world the way that the arcane academia of the ancient world controlled information?

Expand full comment

One thing though, while I get it the article explain "how" n-shot works, a purpose built system will (for the forseeable future) be vastly superior to an LLM and the n-shot strategy.

The lack of contextual, semantic, cultural, data boundary on an LLM is dangerous and it's only "not-dangerous" for the people holding the keys to the doom-machine - ie, I can see Sam from OpenAI asking everyone "why would you not give us your data"

I know why.

Because then AI becomes stuck in 2021.

And no amount of n-shot will render a useful answer to a human in 10 years from now.

Expand full comment

Simple and easy to understand.

Expand full comment
Feb 8Liked by Daniel Nest

I'm curious how tests like the ones shown in your initial chart allow the LLMs to have different amount of shots for the same test. How can it be fair or accurate analysis to give Claude 2 a 0-shot rating on the GSM8K, while Grok gets 8-shots?

Expand full comment