Why Try AI

Why Try AI

3 Free Sites to Compare LLMs

Pit LLMs against each other and compare their output.

Daniel Nest's avatar
Daniel Nest
Sep 19, 2024
∙ Paid

By this point, most of us have a preferred language model to chat with.

Many stick to the good old GPT-4o, others swear by Claude 3.5 Sonnet, and a few fringe freaks still prefer talking to other humans.

But do you truly know how well your chosen model compares to other LLMs, or are you using it purely out of habit?

Chances are it’s the latter.

So for today’s post, I’ve dug up a few free sites that let you pit LLMs against each other to find out which model is best for a specific question or task.

For my demo purposes, our question will be this silly nonsense:

“What is the capital of Paris?”

I want to see how LLMs handle human stupidity.

Let’s roll!

Why Try AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

1. LMSYS Chatbot Arena’s “Side-By-Side”

LMSYS Chatbot Arena front page

The LMSYS Chatbot Arena’s Leaderboard is widely used to rank LLMs based on blind tests by real users.

But it also has a side-by-side section that lets you chat with two selected models and compare them.

How to use

1. Navigate to lmarena.ai.

2. Click the “Arena (side-by-side)” tab at the top:

LMSYS Chatbot Arena side-by-side tab

3. Select the two models you want to compare from their respective dropdowns:

LMSYS Chatbot Arena side-by-side model selection

4. Input your prompt into the box at the bottom:

LMSYS Chatbot Arena prompt input and model settings

5. Run and compare the results:

LMSYS Chatbot Arena side-by-side responses
“How rude, GPT! Why can’t you be more like your polite brother Claude?”

As you can see, it’s a straightforward process.

You can even continue the chat to see how the models handle repeated interactions:

LMSYS Chatbot Arena follow-up chats

The good

  • Great selection (68 models)

  • Can tweak some model settings like temperature, output tokens, etc.

  • Can work with uploaded documents or images

  • Convenient side-by-side comparison view

  • No sign-up required

The bad

  • Intermittent errors and runtime issues with specific models

  • No ability to set a system prompt for the models

Check out LMSYS Arena

2. Modelbench

Modelbench front page

Modelbench primarily targets developers and professionals building with LLMs, but it also offers a rather beginner-friendly way to compare model outputs.

How to use

1. Navigate to modelbench.ai

2. Click the “Sign Up For Free Today” button:

Modelbench sign-up button

3. Provide your details or sign in using your Google account:

Modelbench sign-up form

4. In the Playground, click the “Compare With” button in the top-right to create the side-by-side view:

Modelbench playground Compare With selection

5. Select your two models from the dropdown:

Modelbench playground individual model selection

6. Input your prompt into the box at the bottom:

Modelbench playground prompt input

7. Run and compare the results:

Modelbench playground side-by-side results output

Just as with LMSYS Arena, you can respond with follow-up messages to continue the conversation:

Modelbench playground side-by-side follow-up chat
Good for you, Llama 3 - sticking to your guns (correctly)!

The good

  • Convenient side-by-side comparison view

  • Massive selection (180+ models)

  • Can modify and adjust system prompts per model

  • Can work with uploaded images

  • Can tweak every model setting (temperature, tokens, Top-P, and so on)

  • Follow-up messages can be different per model

  • Additional test tools within the Workbench environment

The bad

  • Free trial is limited to 7 days

  • Requires sign-up

Check out Modelbench

3. Wordware’s “Try all the models”

Wordware's "Try all the models" app

This one’s a bit of a different beast.

Wordware lets anyone build AI apps using natural language. With this featured app called “Try all the models for a single question,” you can…well, it’s in the name.

How to use

1. Navigate directly to the Wordware app via this link.

2. Type your question into the “QUESTION” box:

Wordware's "Try all the models" app input prompt box

3. Click “Run App” and wait for your results.

Wordware's "Try all the models" app - evaluated and ranked verdict by Claude 3 Opus

Yup, that’s all there’s to it!

The stand-out feature of this app is that all responses are analyzed, evaluated, and ranked by Claude 3 Opus, which then gives you its verdict on what model best handled your specific question.

The good

  • Test over a dozen models in one go

  • Helpful evaluation and ranking by Claude 3 Opus

  • Super simple, one-input-box interface

  • No sign-up required

The bad

  • Can’t add or deselect models (it runs all models for every question)

  • A limited number of tested models

  • No helpful side-by-side view (you have to scroll through responses)

  • No follow-up interactions

  • No ability to upload files

  • Long wait time for the responses and evaluation to finish processing

Check out Wordware

Verdict

I think all sites have their strength and their usefulness when it comes to comparing LLMs. So your choice ultimately depends on your needs.

Modelbench is the most complete and robust tool, but you’ll need to sign up and eventually pay. If you’re a professional user building with LLMs, this one’s a no-brainer.

LMSYS Chatbot Arena is a great free alternative that offers many of the same capabilities without any sign-up or payment requirements.

Wordware’s “Try all the models” is perfect if you want to test multiple models on a single task or question with as little effort as possible. It even helps you make sense of the results.

So go ahead and take them for a spin. I’m curious to hear your thoughts.

Why Try AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

🫵 Over to you…

Which of the tools makes the most sense for your needs? Are you aware of any other sites where one can compare LLMs for free?

Leave a comment or shoot me an email at whytryai@substack.com.

Leave a comment

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Daniel Nest
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture