> ## Documentation Index
> Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Reasoning Models (Claude Only)

> Access reasoning tokens from Claude models for insights.

TrueFoundry AI Gateway provides access to model reasoning processes through thinking/reasoning tokens, currently available for `Claude 3.7 Sonnet` (via `Anthropic`, `AWS Bedrock`, and `Google Vertex AI`).

These models expose their internal reasoning process, allowing you to see how they arrive at conclusions. The thinking/reasoning tokens provide step-by-step insights into the model's cognitive process.

## Enabling Reasoning Tokens

To enable thinking/reasoning tokens, your request must include:

1. The header: `X-TFY-STRICT-OPENAI: false`
2. A `thinking` field in the request body

```python lines theme={"dark"}
import requests
import json

url = "{GATEWAY_BASE_URL}/chat/completions"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "X-TFY-STRICT-OPENAI": "false"
}

payload = {
    "messages": [
        {"role": "user", "content": "How to compute 3^3^3?"}
    ],
    "model": "anthropic/claude-3-7",
    "thinking": {
        "type": "enabled",
        "budget_tokens": 16000
    },
    "max_tokens": 18000
}

response = requests.post(url, headers=headers, json=payload)
```

<Note>
  When the `X-TFY-STRICT-OPENAI` header is set to `false`, the response is no longer OpenAI-compliant, as it introduces an additional reasoning layer that OpenAI's compliance framework does not support.
</Note>

## Response Format

When reasoning tokens are enabled, the response includes both thinking and content sections:

```json lines theme={"dark"}
{
  "id": "1742890579083",
  "object": "chat.completion",
  "created": 1742890579,
  "model": "",
  "provider": "aws",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": [
          {
            "type": "thinking",
            "thinking": "The user has asked a complex question about quantum mechanics. To provide a useful answer, I should first break down the core concepts and then explain them in simple terms before diving into advanced details."
          },
          {
            "type": "text",
            "text": "Quantum mechanics is a branch of physics that explains how particles behave at very small scales. Unlike classical physics, where objects have definite positions and velocities, quantum particles exist in a superposition of states until measured. Would you like a more detailed explanation or examples?"
          }
        ]
      },
      "finish_reason": "end_turn"
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 180,
    "total_tokens": 225
  }
}
```

## Streaming with Reasoning Tokens

For `streaming responses`, the thinking section is always sent before the content section.

### Thinking Token Chunk

```json lines theme={"dark"}
{
  "id": "aws-1742890615621",
  "object": "chat.completion.chunk",
  "created": 1742890615,
  "model": "",
  "provider": "aws",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "thinking": "The user is asking about the differences between AI and machine learning. I should start by defining AI in general and then narrow down to how ML fits into it."
      }
    }
  ]
}
```

### Content Token Chunk

```json lines theme={"dark"}
{
  "id": "aws-1742890615621",
  "object": "chat.completion.chunk",
  "created": 1742890615,
  "model": "",
  "provider": "aws",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "Artificial Intelligence (AI) is a broad field of computer science focused on building systems that can perform tasks requiring human intelligence. Machine Learning (ML) is a subset of AI that enables computers to learn patterns from data and improve performance over time without explicit programming."
      }
    }
  ]
}
```

In streaming responses, the thinking chunk typically arrives first, followed by the content chunks.
