> ## Documentation Index
> Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat Completions: Multimodal

> Send images, audio, video, and PDFs with the Chat Completions API, plus vision model support

## Working with Multi Modal

The API supports various media types including images, audio, video and pdf.

<AccordionGroup>
  <Accordion title="Images">
    **Supported Providers:** OpenAI, Bedrock, Anthropic, Google Vertex, Google Gemini

    Send images as part of your chat completion requests using either URLs or base64 encoding:

    ### Using Image URLs

    ```python theme={"dark"}
    from openai import OpenAI

    client = OpenAI(
        api_key="your_truefoundry_api_key",
        base_url="{GATEWAY_BASE_URL}"
    )

    response = client.chat.completions.create(
        model="openai-main/gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What's in this image?"},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://example.com/image.jpg"
                        }
                    }
                ]
            }
        ]
    )
    ```

    ### Using Base64 Encoded Images

    ```python theme={"dark"}
    import base64

    def encode_image(image_path):
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')

    response = client.chat.completions.create(
        model="openai-main/gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What's in this image?"},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{encode_image('image.jpeg')}"
                        }
                    }
                ]
            }
        ]
    )
    ```
  </Accordion>

  <Accordion title="Media Resolution">
    **Supported Providers:** `OpenAI`, `Azure OpenAI`, `Google Gemini`, `Google Vertex AI`, `xAI`

    The `detail` parameter in the `image_url` object allows you to control the resolution at which images are processed. This helps balance between response quality, latency, and cost.

    **Supported Values**: `low, high, auto`

    ### Example Usage

    ```python theme={"dark"}
    import base64

    from openai import OpenAI

    API_KEY = "your_truefoundry_api_key"
    BASE_URL = "{GATEWAY_BASE_URL}"

    # Read and encode the image as base64
    with open("test-img.png", "rb") as image_file:
        base64_image = base64.b64encode(image_file.read()).decode('utf-8')

    client = OpenAI(
        api_key=API_KEY,
        base_url=BASE_URL
    )

    response = client.chat.completions.create(
        model="test-123/gemini-3-pro-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What is in this image?"},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{base64_image}",
                            "detail": "low"  # Options: "low", "high", "auto"
                        }
                    }
                ]
            }
        ]
    )

    print(response.choices[0].message)
    ```

    <Note>
      For Google Gemini and Vertex AI providers, the `detail` parameter is automatically translated to the `mediaResolution` parameter:

      * `"low"` → `MEDIA_RESOLUTION_LOW` (64 tokens)
      * `"high"` → `MEDIA_RESOLUTION_HIGH` (256+ tokens with scaling)
      * `"auto"` or omitted → No explicit media resolution (model decides)
    </Note>
  </Accordion>

  <Accordion title="Audio">
    **Supported Models:** Google Gemini models (`Gemini 2.0 Flash`, etc.)

    Send audio files in supported formats (MP3, WAV, etc.). Currently supported for Google Gemini models:

    ### Using Audio URLs

    ```python theme={"dark"}
    response = client.chat.completions.create(
        model="internal-google/gemini-2-0-flash",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Transcribe this audio"},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://example.com/audio.wav",
                            "mime_type": "audio/wav" # required for gemini models
                        }
                    }
                ]
            }
        ]
    )
    ```

    ### Using Base64 Encoded Audio

    ```python theme={"dark"}
    import base64

    def encode_audio(audio_path):
        with open(audio_path, "rb") as audio_file:
            return base64.b64encode(audio_file.read()).decode('utf-8')

    response = client.chat.completions.create(
        model="internal-google/gemini-2-0-flash",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Transcribe this audio"},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:audio/wav;base64,{encode_audio('audio.wav')}"
                        }
                    }
                ]
            }
        ]
    )
    ```
  </Accordion>

  <Accordion title="Video">
    **Supported Models:** Google Gemini models (`Gemini 2.0 Flash`, etc.)

    Video processing is natively supported for Google Gemini models:

    ### Using Video URLs

    ```python theme={"dark"}
    response = client.chat.completions.create(
        model="internal-google/gemini-2-0-flash",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe what's happening in this video"},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://www.youtube.com/watch?v=example",
                            "mime_type": "video/mp4" # required for gemini models
                        }
                    }
                ]
            }
        ]
    )
    ```

    ### Using Base64 Encoded Video

    ```python theme={"dark"}
    import base64

    def encode_video(video_path):
        with open(video_path, "rb") as video_file:
            return base64.b64encode(video_file.read()).decode('utf-8')

    response = client.chat.completions.create(
        model="internal-google/gemini-2-0-flash",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe what's happening in this video"},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:video/mp4;base64,{encode_video('video.mp4')}",
                            "mime_type": "video/mp4" # required for gemini models
                        }
                    }
                ]
            }
        ]
    )
    ```
  </Accordion>

  <Accordion title="PDF Documents">
    **Supported Providers:** OpenAI, Bedrock, Anthropic, Google Vertex, Google Gemini

    PDF document processing allows models to analyze and extract information from PDF files:

    ### Using Base64 Encoded PDF

    ```python theme={"dark"}
    from openai import OpenAI

    client = OpenAI(
        api_key="your_truefoundry_api_key",
        base_url="{GATEWAY_BASE_URL}"
    )

    import base64

    with open("sample.pdf", "rb") as file_data:
        base64_image = base64.b64encode(image_file.read()).decode('utf-8')

    response = client.chat.completions.create(
        model="tfy-ai-anthropic/claude-4-sonnet-20250514",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "what's the data in the file"},
                    {
                        "type": "file",
                        "file": {
                            "filename": "sample.pdf",
                            "file_data": f"data:application/pdf;base64,{file_data}",
                        }
                    },
                ]
            }
        ]
    )

    print(response.choices[0].message.content)
    ```
  </Accordion>
</AccordionGroup>

### Vision

TrueFoundry supports vision models from all integrated providers as they become available. These models can analyze and interpret images alongside text, enabling multimodal AI applications.

| Provider     | Models                                                                                                                                    |
| :----------- | :---------------------------------------------------------------------------------------------------------------------------------------- |
| OpenAI       | `gpt-4-vision-preview, gpt-4o, gpt-4o-mini`                                                                                               |
| Anthropic    | `claude-3-sonnet, claude-3-haiku, claude-3-opus, claude-3.5-sonnet, claude-3.5-haiku, claude-4-oppus, claude-4-sonnet, claude-3-7-sonnet` |
| Gemini       | `gemini-1.0-pro-vision, gemini-1.5-flash, gemini-1.5-flash-8b, gemini-1.5-pro, gemini-2.5-pro, gemini-2.5-flash`                          |
| AWS Bedrock  | `anthropic.claude-3-5-sonnet, anthropic.claude-3-5-haiku, anthropic.claude-3-5-sonnet-20240620-v1:0`                                      |
| Azure OpenAI | `gpt-4-vision-preview, gpt-4o, gpt-4o-mini`                                                                                               |
| xAI          | `grok-2-vision-1212`                                                                                                                      |

### Using Vision Models with OpenAI SDK

```python theme={"dark"}
from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="{GATEWAY_BASE_URL}"
)

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message)
```