Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt

Use this file to discover all available pages before exploring further.

When an agent is connected to many MCP servers, each with dozens of tools, the full set of tool definitions can consume a large portion of the context window before the user even sends a message. Deferred Tool Loading solves this by giving the agent only the names and descriptions of MCP servers upfront, and letting it discover and load individual tool definitions on demand as needed. This keeps the context window lean at the start of every conversation, while still giving the agent access to the full breadth of tools when it needs them.

The problem

  • Each tool definition — name, description, input schema, output schema — consumes tokens from the available context window.
  • With many MCP servers and tools, the context window can fill up before any user interaction happens.
  • Most interactions only need a small subset of the available tools, not all of them.
For deferred MCP servers, the model already has each MCP server’s name and description in context. Individual tool definitions are skipped—meta tools for discovering available tools are exposed. The Agent Harness loads definitions on demand and runs the actual MCP tool when asked:

Example

In this example we have truefoundry-mcp with 100+ tools added with “deferred” mode and a web_search mcp server with 2 tools added without “deferred” mode

User

What vllm version is “nomic-embed-multimodal-7b-1” app in “tfy-test-cluster:test-ws” workspace using? Can you please get me the github release notes link of that version?

Assistant

The agent first lists all tools available on truefoundry-mcp mcp server.

list_tools

Tool Call
{
  "mcp_server": "truefoundry-mcp"
}
Tool Response
truefoundry-mcp: list_applications,search_workspaces,get_workspace,get_application,list_app_metric_charts,list_app_metric_charts, ...

Assistant

The agent then gets the tool information for list_applications tools.

get_tool_info

Tool Call
{
  "mcp_server": "truefoundry-mcp",
  "tool_name": "list_applications"
}
Tool Response
{
  "outputSchema": {
    "type": "object",
    "properties": {
      "data": { "type": "array", ... },
      "pagination": { "type": "object", ... }
    },
    "required": ["data", "pagination"]
  }
}

Assistant

The agent then calls the list_applications tool using the meta tool call_tool to find the application by its name.
Now I have the tool info. Let me find the application.

call_tool

Tool Call
{
  "mcp_server": "truefoundry-mcp",
  "tool_name": "list_applications",
  "input": {
    "workspaceFqn": "tfy-test-cluster:test-ws",
    "applicationName": "nomic-embed-multimodal-7b-1"
  }
}
Tool Response
{
  "data": [
    {
      "name": "nomic-embed-multimodal-7b-1",
      "type": "service",
      "lastDeployment": {
        "manifest": {
          ...
          "image": {
            "type": "image",
            "image_uri": "public.ecr.aws/truefoundrycloud/vllm/vllm-openai:v0.13.0"
            ...
          },
          
          
        }
      }
    }
  ],
  "pagination": { ... }
}

Assistant

I can see the vLLM version from the image URI: v0.13.0. Now let me get the GitHub release notes link for that version.

search

Assistant