Skip to main content
This guide walks you through setting up a benchmarking provider using the TrueFoundry platform for load testing and performance evaluation.

1. Deploy the Benchmarking Service

  • Deploy this repository as a service on TrueFoundry.
  • Configure environment variables:
    1. TOKEN_COUNT: Number of tokens to return in responses (default: 100)
    2. LATENCY: Latency in seconds (default: 0) Service deployment configuration panel showing environment variable settings for TOKEN_COUNT and LATENCY

2. Create a Provider Account

2.1. Navigate to the AI Gateway

  • In the TrueFoundry Dashboard, go to AI GatewayModels
  • Select Self Hosted Models as your model provider TrueFoundry AI Gateway interface showing Self Hosted Models provider selection

2.2. Configure Models

  • Model Type: Select Vllm-Openai
  • Base URL: Enter the URL of your deployed benchmarking service
  • Name: Add a proper display name to your models Model configuration form with fields for model type, base URL, and display name

3. Generate Load Traffic

Create a client for producing traffic at your desired RPS (Requests Per Second). You can use any HTTP client or load testing tool such as:
  • Locust: For advanced load testing scenarios
  • Custom scripts: Using any HTTP client library
Make sure to keep the provider service on auto-scaling so it can handle high RPS traffic without getting throttled.