> ## Documentation Index
> Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Prompt Caching

> Optimize API usage and reduce costs by caching prompt prefixes.

Prompt caching optimizes API usage by allowing resumption from specific prefixes in your prompts. This significantly reduces processing time and costs for repetitive tasks or prompts with consistent elements.

<Note>
  Currently, only Anthropic models support this caching feature. See [Anthropic documentation](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) for more details.
</Note>

## Minimum Cacheable Length

| Model                                                                               | Minimum Token Length |
| ----------------------------------------------------------------------------------- | -------------------- |
| Claude Opus 4, Claude Sonnet 4, Claude Sonnet 3.7, Claude Sonnet 3.5, Claude Opus 3 | 1024 tokens          |
| Claude Haiku 3.5, Claude Haiku 3                                                    | 2048 tokens          |

## Usage

<Note>
  This feature is only available through direct REST API calls. The OpenAI SDK doesn't recognize the `cache_control` field.
</Note>

Add the `cache_control` parameter to any message content you want to cache:

```python lines theme={"dark"}
import requests
import json

URL = "{GATEWAY_BASE_URL}/chat/completions"
API_KEY = "TFY_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "X-TFY-LOGGING-CONFIG": '{"enabled": true}'
}

payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "<TEXT_TO_CACHE>",
                    "cache_control": {
                        "type": "ephemeral"
                    }
                }
            ]
        }
    ],
    "model": "MODEL_NAME",
    "stream": True
}

response = requests.post(URL, headers=headers, json=payload)
```

## Monitoring Cache Performance

Monitor cache performance using these API response fields, within `usage` in the response (or `message_start` event if streaming):

* `cache_creation_input_tokens`: Tokens written to the cache when creating a new entry
* `cache_read_input_tokens`: Tokens retrieved from the cache for this request

<img src="https://mintcdn.com/truefoundry/jw406UAsc7ErYUq8/images/Screenshot2025-07-25at3.50.35PM.png?fit=max&auto=format&n=jw406UAsc7ErYUq8&q=85&s=c1d1fd247606e1d69577c16cab6c017a" alt="Cache performance metrics" width="2536" height="1310" data-path="images/Screenshot2025-07-25at3.50.35PM.png" />
