Code Mode - TrueFoundry Docs

Why Code Mode?

When an agent uses tools the standard way, every tool call is a separate round-trip with the model: the agent calls one MCP tool, the full JSON response enters the conversation, the model reasons over that JSON, then it calls the next tool — copying IDs and computing counts in prose along the way. For tasks that involve aggregating tool output, or chaining several calls together, this approach is slow, fills the context window with intermediate JSON the user doesn’t care about, and is prone to small errors like miscounts and typoed IDs. Take a common task: “How many open PRs does each contributor have on this repo?” The user only cares about a small summary table, but the underlying GitHub tool returns a full record per PR — title, labels, reviewers, timestamps, and more. Without Code Mode, every one of those records lands in context and the model has to count from prose. With Code Mode, the agent calls the same tool inside a script, runs Counter on the author logins, and prints only the table.

What is Code Mode?

Code Mode collapses tool round-trips into a single script. Using the Sandbox, the agent writes Python that calls MCP tools through an in-sandbox MCP client, processes the responses in code, and prints only what the user actually needs. The harness also exposes helpers for the agent to discover available tools and inspect their output schemas before writing the script. A Code Mode script reads naturally — the agent picks a tool, calls it, and prints only the fields it cares about:

import asyncio
from mcp_client import call_tool

async def main():
    result = await call_tool(
        "github-mcp",
        "list_pull_requests",
        body={"owner": "truefoundry", "repo": "docs-mintlify", "state": "open"},
    )
    for pr in result["pull_requests"]:
        print(pr["number"], pr["title"], pr["user"]["login"])

asyncio.run(main())

End-to-end, the script runs in the sandbox, MCP tool calls flow through the gateway, and only the printed output returns to the agent’s context: Code Mode is enabled by default. The agent decides at runtime whether a task is worth running in code or whether a single direct tool call is enough.

When does the Agent use Code Mode?

The harness picks Code Mode when the task fits one of two patterns where running the work in code is materially better than reasoning over raw JSON in chat:

Aggregate or format tool output — counts, group-bys, sums, filters, or formatted tables over the response of a single tool call.
Chain tool calls — one tool’s output feeds another tool’s input, and the intermediate JSON is plumbing the user doesn’t need to see.

Each pattern is illustrated with a worked example below.

Aggregate or format tool output

When a task asks for a count, group-by, sum, filter, or formatted table over the response of a tool call, doing the math in code is far more reliable than asking the model to do it from prose — and it keeps raw records out of context entirely. Example: The user wants open PRs on a repo grouped by contributor, with a count per author. The user only cares about a small summary table, but the tool response contains a full record per PR with titles, labels, and metadata.

	With Code Mode	Without Code Mode
Approach	Call the tool inside a script, group with `Counter` on author login, print only the summary table.	Full tool response enters the conversation. The model reads every PR and produces counts per author in natural language.
Cost	Only the summary table reaches the model — minimal tokens.	Dozens of PR records (titles, labels, metadata) sit in context even though the user only asked for counts — context bloat.
Reliability	Counts are computed by code, so they are exact.	The model can miscount PRs, merge two usernames, or invent a contributor that wasn’t in the response — group-by and count tasks are especially hallucination-prone when done from prose.

Here is the full agent trace for this example:

User

How many open PRs does each contributor have on truefoundry/docs-mintlify?

Assistant

Let me check the output schema for the GitHub list-pull-requests tool, then count PRs per author in one script.

The Agent calls get_tool_output_schema so it knows the shape of the tool response before writing Code Mode scripts. With a defined outputSchema, the model can read and transform fields in code safely instead of guessing keys or types from raw JSON.

get_tool_output_schema

Tool Call

{
  "mcp_server": "github-mcp",
  "tool_name": "list_pull_requests"
}

Tool Response

{
  "outputSchema": {
    "type": "object",
    "properties": {
      "pull_requests": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "number": { "type": "integer" },
            "title": { "type": "string" },
            "user": {
              "type": "object",
              "properties": {
                "login": { "type": "string" }
              },
              "required": ["login"]
            }
          },
          "required": ["number", "title", "user"]
        }
      }
    },
    "required": ["pull_requests"]
  }
}

Assistant

sandbox

Tool Call

cat << 'EOF' > /tmp/prs_by_author.py
import asyncio
from collections import Counter
from mcp_client import call_tool

async def main():
    result = await call_tool(
        "github-mcp",
        "list_pull_requests",
        body={
            "owner": "truefoundry",
            "repo": "docs-mintlify",
            "state": "open"
        }
    )

    prs = result.get("pull_requests", [])
    counts = Counter(pr["user"]["login"] for pr in prs)

    print(f"Open PRs: {len(prs)}\n")
    print(f"{'Author':<24} {'Count':>6}")
    print("-" * 32)
    for login, count in counts.most_common():
        print(f"{login:<24} {count:>6}")

asyncio.run(main())
EOF
python3 /tmp/prs_by_author.py

Tool Response

Open PRs: 7

Author                    Count
--------------------------------
alice-dev                     3
bob-eng                       2
carol-docs                    1
deb-docs                      1

Assistant

On truefoundry/docs-mintlify there are 7 open PRs across 4 contributors: alice-dev (3), bob-eng (2), carol-docs (1), and deb-docs (1).

Chain tool calls

When one tool’s output feeds the input of another — for example, resolving an entity’s ID before fetching its details — Code Mode lets the agent do both calls in a single script. The intermediate response never enters the conversation, so there is no chance of typing wrong IDs and no extra model turn between tool calls. Example: The user wants to know which metrics are available for app my-app in the tfy-test-cluster:test-ws workspace. To answer this, the agent has to first resolve the app to its internal id and workspaceId, then use those values to fetch the chart list.

	With Code Mode	Without Code Mode
Approach	Run `list_applications` and `list_app_metric_charts` in one script. The script reads `id` and `workspaceId` from the first response and passes them straight into the second call, then prints only the chart table.	Call the first tool, wait for a model turn, copy IDs out of the JSON into the second tool call, then wait for another turn.
Cost	Intermediate response stays in the sandbox — only the final chart table reaches context.	Full applications response (pagination, every app field) sits in the conversation before the second call runs — context bloat.
Latency	One model turn covers both tool calls.	Each tool hop needs another model call in between — extra latency.
Reliability	IDs flow inside the script — no copy-paste.	The model can typo an `applicationId` or `workspaceId`, pick the wrong app from a long list, or drop a required field so the second call fails — hallucination risk.

Here is the full agent trace for this example:

User

What metrics are available for app my-app in the tfy-test-cluster:test-ws workspace?

Assistant

I need schemas for list_applications and list_app_metric_charts, then I will resolve the app and list its charts in one script.

get_tool_output_schema

Tool Call

{
  "mcp_server": "truefoundry-mcp",
  "tool_name": "list_applications"
}

Tool Response

{
  "outputSchema": {
    "type": "object",
    "properties": {
      "data": { "type": "array" },
      "pagination": { "type": "object" }
    },
    "required": ["data", "pagination"]
  }
}

get_tool_output_schema

Tool Call

{
  "mcp_server": "truefoundry-mcp",
  "tool_name": "list_app_metric_charts"
}

Tool Response

{
  "outputSchema": {
    "type": "object",
    "properties": {
      "step": { "type": "string" },
      "graphs": { "type": "array" }
    },
    "required": ["step", "graphs"]
  }
}

Assistant

sandbox

Tool Call

cat << 'EOF' > /tmp/get_metrics.py
import asyncio
from mcp_client import call_tool

async def main():
    apps_result = await call_tool(
        "truefoundry-mcp",
        "list_applications",
        body={
            "applicationName": "my-app",
            "workspaceFqn": "tfy-test-cluster:test-ws"
        }
    )

    apps = apps_result.get("data", [])
    if not apps:
        print("No application found")
        return

    app = apps[0]
    app_id = app["id"]
    workspace_id = app["workspaceId"]
    print(f"Found app: {app['name']} | ID: {app_id} | WorkspaceID: {workspace_id}")

    charts_result = await call_tool(
        "truefoundry-mcp",
        "list_app_metric_charts",
        body={
            "applicationId": app_id,
            "workspaceId": workspace_id,
            "filterEntity": "application"
        }
    )

    graphs = charts_result.get("graphs", [])
    step = charts_result.get("step", "N/A")
    print(f"\nQuery step: {step}")
    print(f"Charts available: {len(graphs)}\n")
    print(f"{'#':<4} {'Chart ID':<50} {'Name':<45} {'Type':<25}")
    print("-" * 130)
    for i, g in enumerate(graphs, 1):
        print(f"{i:<4} {g['chart']:<50} {g['name']:<45} {g['chartType']:<25}")

asyncio.run(main())
EOF
python3 /tmp/get_metrics.py

Tool Response

Found app: my-app | ID: sm3ce3pewjy647a04fsp14z3 | WorkspaceID: cm1j7d9x5002101tu35bvadt8

Query step: 1800s
Charts available: 12

#    Chart ID                                           Name                                          Type
----------------------------------------------------------------------------------------------------------------------------------
  cpuUsage                                           Total CPU Usage                               line
  memoryUsage                                        Memory Usage                                  line
  networkBytes                                       Network Usage                                 line
  requestVolume                                      Request Volume                                stacked_area
  requestDuration                                    Request Duration                              line
  diskThroughput                                     Disk Throughput                               line
  podStatus                                          Pod Status                                    stacked_area
  quantileOverTimeCpuUsage                           Peak and P50 CPU Usage                        line
  cpuThrottling                                      CPU Throttling                                line
 containerRestarts                                  Container Restarts                            bubble
 probeFailures                                      Readiness/Liveness probe failures             bubble
 imagePullTime                                      Image Pull Time                               bubble

Assistant

my-app in tfy-test-cluster:test-ws has 12 metric charts (CPU, memory, network, request volume, and others). Query step is 1800s.

Documentation Index

​Why Code Mode?

​What is Code Mode?

​When does the Agent use Code Mode?

​Aggregate or format tool output

​Chain tool calls

Why Code Mode?

What is Code Mode?

When does the Agent use Code Mode?

Aggregate or format tool output

Chain tool calls