Overview
When building an AI application, it is important to select the right model based on factors such as cost and performance. This template allows you to easily test LLMs from OpenAI, Google, Anthropic and more.
What is LLM model testing and evaluation?
LLM model testing and evaluation involves assessing a model's performance on specific tasks, such as text generation, summarization, or question-answering, using metrics like token usage, throughput, or content length, for example.
What does the LLM model evaluation template contain?
The LLM Model Evaluation template contains a collection of requests and reusable scripts across popular AI providers - OpenAI, Google, Cohere, Anthropic, Amazon, and Groq. The scripts along with the collection runner power robust visualizations to facilitate comparing benchmarks across providers.
How to use the LLM model evaluation template?
Step 1. Account setup: Log in or sign up for an account for the AI providers of your choosing. Step 2. Fetch or generate API keys: For the providers of your choosing, grab an existing key or generate a new one. Step 3. Store API keys: Save the API keys in Postman to be used in the requests. Step 4. Use and update variables: Familiarize yourself with the existing collection-level variables and update them as necessary. Step 5. Collection Runner file: To test multiple scenarios, create a custom JSON file (or create a copy of the provided examples) to be used as inputs for each iteration of the collection run. Step 6. Run the Collection Runner: Using the collection runner file, perform a collection run. Step 7. Viewing results: Visualize the results or download them to CSV. Step 8. Reset the results: Clear the cache to reset the results.
Frequently Asked Questions
Who can benefit from using the LLM Model Evaluation template?
+Any developer looking to incorporate AI into their applications and want to test and evaluate a new model, new version of a model, or compare models across AI providers.
Any developer consuming AI models.
Any developer who needs help getting started with testing and comparing different models.
Any developer who already uses an AI provider like OpenAI and wants to test and evaluate new models and versions.
Which LLM models and AI providers can I compare using this template?
+With this template, you can compare LLMs from leading AI providers, including:
OpenAI (e.g., GPT-4, GPT-3.5)
Google (e.g., PaLM models)
Anthropic (e.g., Claude)
Cohere
Amazon
Groq
This list includes some of the most popular providers, but the template is flexible and can be extended to include other providers as needed.
Popular Templates
Integration testing
Verify how different API endpoints, modules, and services interact with each other.
Authorization methods
Learn more about different authorization types and quickly set up auth helpers for your API in Postman.