Documentation Index
Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
Use this file to discover all available pages before exploring further.
What is context engineering?
Context engineering is providing the right information in the right format so the agent can accomplish tasks reliably. In Agent Harness, context is everything the model sees at each step — from system instructions and skills to tool results and conversation history. Getting context right directly impacts agent quality. Too little context and the agent lacks information to act. Too much and it overflows the model’s context window or drowns the signal in noise. Agent Harness automates the balance.What fills the agent context
Each turn, the model receives a stacked context built from multiple sources:| Context Source | When Loaded | Controllable? |
|---|---|---|
| System prompt and instructions | Always at start | Yes — edit in agent builder |
| Skills (SKILL.md content) | Preloaded or on-demand (progressive disclosure) | Yes — per-skill preload toggle |
| MCP tool definitions | Preloaded or deferred | Yes — per-server preload toggle |
| Conversation history | Grows each turn | Managed by compaction |
| Tool call results | After each tool execution | Managed by offloading |
Context management strategies
Agent Harness uses a layered set of strategies to keep context within model limits while maximizing the agent’s ability to reason. Each strategy targets a different source of context growth.Large Tool Result Offloading
When a tool returns more data than fits comfortably in context, the harness writes the full output to a sandbox file and replaces it with a short reference. The agent can read back details on demand.
Subagents
Delegate focused subtasks to parallel subagents, each with its own clean context. Only the final result flows back to the root agent — intermediate tool calls never touch the main context.
Deferred Tool Loading
Instead of loading all tool definitions upfront, give the agent only MCP server names and descriptions. Tool schemas are loaded on demand when the agent actually needs them.
Code Mode
The agent calls MCP tools from Python scripts in the sandbox, processing and aggregating results in code. Only the printed summary enters context — not the raw tool output.
Context compaction (summarization)
When the context window approaches the model’s limit (e.g. 85% of max tokens), and there is no more content eligible for offloading, the harness triggers compaction:- An LLM generates a structured summary of the conversation including intent, artifacts created, and next steps.
- The summary replaces the full conversation history in the agent’s working context.
- Recent messages are preserved to maintain continuity.
Compaction is a lossy operation — fine-grained details from early messages may be reduced to summaries. For tasks that depend on exact earlier outputs, the agent can re-read offloaded files from the sandbox.
Skill preloading
Skills follow a progressive disclosure pattern by default: onlyname and description are loaded at startup, and the full SKILL.md body is loaded when the agent decides a skill is relevant.
You can override this per-skill with the Preload toggle:
- Off (default): Only metadata exposed upfront. Full body loaded on demand. Saves context.
- On: Full
SKILL.mdpreloaded at start. Uses more context but avoids an extra turn.
How the strategies work together
In practice, these strategies compose. A single agent turn might:- Defer tool loading so only a handful of schemas are in context at startup.
- Delegate a research subtask to a subagent that makes dozens of tool calls in isolation.
- Offload a large tool result within the subagent to a sandbox file.
- Summarize the subagent’s work into a concise result returned to the root agent.
- Compact the root agent’s history if context is still approaching limits.
Best practices
- Start lean — Use deferred tool loading and progressive skill disclosure by default. Preload only short, always-relevant skills.
- Delegate heavy work — Use subagents for multi-step tasks with large outputs to keep the main agent’s context clean.
- Trust offloading — Large tool results are automatically offloaded to files. The agent can search or read back what it needs.
- Let Code Mode handle aggregation — When the agent needs to process, filter, or aggregate tool output, Code Mode keeps raw data out of context entirely.
- Monitor context usage — Traces show context size at each step, helping you identify where context is consumed.
- Balance preloading — If the agent frequently needs a specific skill or tool set, preloading saves latency at the cost of context space.