Context Engineering - TrueFoundry Docs

What is context engineering?

Context engineering is providing the right information in the right format so the agent can accomplish tasks reliably. In Agent Harness, context is everything the model sees at each step — from system instructions and skills to tool results and conversation history. Getting context right directly impacts agent quality. Too little context and the agent lacks information to act. Too much and it overflows the model’s context window or drowns the signal in noise. Agent Harness automates the balance.

What fills the agent context

Each turn, the model receives a stacked context built from multiple sources:

Context Source	When Loaded	Controllable?
System prompt and instructions	Always at start	Yes — edit in agent builder
Skills (SKILL.md content)	Preloaded or on-demand (progressive disclosure)	Yes — per-skill preload toggle
MCP tool definitions	Preloaded or deferred	Yes — per-server preload toggle
Conversation history	Grows each turn	Managed by compaction
Tool call results	After each tool execution	Managed by offloading

Context management strategies

Agent Harness uses a layered set of strategies to keep context within model limits while maximizing the agent’s ability to reason. Each strategy targets a different source of context growth.

Large Tool Result Offloading

When a tool returns more data than fits comfortably in context, the harness writes the full output to a sandbox file and replaces it with a short reference. The agent can read back details on demand.

Subagents

Delegate focused subtasks to parallel subagents, each with its own clean context. Only the final result flows back to the root agent — intermediate tool calls never touch the main context.

Deferred Tool Loading

Instead of loading all tool definitions upfront, give the agent only MCP server names and descriptions. Tool schemas are loaded on demand when the agent actually needs them.

Code Mode

The agent calls MCP tools from Python scripts in the sandbox, processing and aggregating results in code. Only the printed summary enters context — not the raw tool output.

Context compaction (summarization)

When the context window approaches the model’s limit (e.g. 85% of max tokens), and there is no more content eligible for offloading, the harness triggers compaction:

An LLM generates a structured summary of the conversation including intent, artifacts created, and next steps.
The summary replaces the full conversation history in the agent’s working context.
Recent messages are preserved to maintain continuity.

Compaction is a lossy operation — fine-grained details from early messages may be reduced to summaries. For tasks that depend on exact earlier outputs, the agent can re-read offloaded files from the sandbox.

Skill preloading

Skills follow a progressive disclosure pattern by default: only name and description are loaded at startup, and the full SKILL.md body is loaded when the agent decides a skill is relevant. You can override this per-skill with the Preload toggle:

Off (default): Only metadata exposed upfront. Full body loaded on demand. Saves context.
On: Full SKILL.md preloaded at start. Uses more context but avoids an extra turn.

See Skills for configuration.

How the strategies work together

In practice, these strategies compose. A single agent turn might:

Defer tool loading so only a handful of schemas are in context at startup.
Delegate a research subtask to a subagent that makes dozens of tool calls in isolation.
Offload a large tool result within the subagent to a sandbox file.
Summarize the subagent’s work into a concise result returned to the root agent.
Compact the root agent’s history if context is still approaching limits.

Best practices

Start lean — Use deferred tool loading and progressive skill disclosure by default. Preload only short, always-relevant skills.
Delegate heavy work — Use subagents for multi-step tasks with large outputs to keep the main agent’s context clean.
Trust offloading — Large tool results are automatically offloaded to files. The agent can search or read back what it needs.
Let Code Mode handle aggregation — When the agent needs to process, filter, or aggregate tool output, Code Mode keeps raw data out of context entirely.
Monitor context usage — Traces show context size at each step, helping you identify where context is consumed.
Balance preloading — If the agent frequently needs a specific skill or tool set, preloading saves latency at the cost of context space.

Documentation Index

​What is context engineering?

​What fills the agent context

​Context management strategies

Large Tool Result Offloading

Subagents

Deferred Tool Loading

Code Mode

​Context compaction (summarization)

​Skill preloading

​How the strategies work together

​Best practices

What is context engineering?

What fills the agent context

Context management strategies

Context compaction (summarization)

Skill preloading

How the strategies work together

Best practices