LLM Model Evaluation

Test and evaluate LLM models across AI providers

Collection category

Artificial Intelligence Demos Documentation Prototyping Testing

Overview

When building an AI application, it is important to select the right model based on factors such as cost and performance. This template allows you to easily test LLMs from OpenAI, Google, Anthropic and more.

What is LLM model testing and evaluation?

LLM model testing and evaluation involves assessing a model's performance on specific tasks, such as text generation, summarization, or question-answering, using metrics like token usage, throughput, or content length, for example.

What does the LLM model evaluation template contain?

The LLM Model Evaluation template contains a collection of requests and reusable scripts across popular AI providers - OpenAI, Google, Cohere, Anthropic, Amazon, and Groq. The scripts along with the collection runner power robust visualizations to facilitate comparing benchmarks across providers.

How to use the LLM model evaluation template?

Watch this video to learn more or click through below.

Step 1. Account setup: Log in or sign up for an account for the AI providers of your choosing. Step 2. Fetch or generate API keys: For the providers of your choosing, grab an existing key or generate a new one. Step 3. Store API keys: Save the API keys in Postman to be used in the requests. Step 4. Use and update variables: Familiarize yourself with the existing collection-level variables and update them as necessary. Step 5. Collection Runner file: To test multiple scenarios, create a custom JSON file (or create a copy of the provided examples) to be used as inputs for each iteration of the collection run. Step 6. Run the Collection Runner: Using the collection runner file, perform a collection run. Step 7. Viewing results: Visualize the results or download them to CSV. Step 8. Reset the results: Clear the cache to reset the results.

Get started fast. Fork and customize this template in Postman

Use Template

Frequently asked questions

Who can benefit from using the LLM Model Evaluation template?

Any developer looking to incorporate AI into their applications and want to test and evaluate a new model, new version of a model, or compare models across AI providers.
Any developer consuming AI models.
Any developer who needs help getting started with testing and comparing different models.
Any developer who already uses an AI provider like OpenAI and wants to test and evaluate new models and versions.

Which LLM models and AI providers can I compare using this template?

With this template, you can compare LLMs from leading AI providers, including: