MCP Apps and Tasks: Governing the New First-Class MCP Extensions

Auf Geschwindigkeit ausgelegt: ~ 10 ms Latenz, auch unter Last
Unglaublich schnelle Methode zum Erstellen, Verfolgen und Bereitstellen Ihrer Modelle!
- Verarbeitet mehr als 350 RPS auf nur 1 vCPU — kein Tuning erforderlich
- Produktionsbereit mit vollem Unternehmenssupport
The same MCP 2026-07-28 release that made the protocol stateless also promoted two extensions to first-class: MCP Apps, which let a tool return an interactive UI rendered in a sandboxed iframe, and Tasks, which model long-running work as a durable state machine the client drives with tasks/get, tasks/update, and tasks/cancel. They're genuinely useful — and they're net-new governance and security surfaces. This post is how each works, where the new risk lives, and why the gateway is the place to govern them.
Hana, a product engineer, shipped a "generate quarterly report" MCP tool and used both new extensions to make it nice. She returned an MCP App — an interactive preview the user could tweak in a sandboxed panel — and, because the generation took a couple of minutes, she modeled it as a Task so the call didn't block. The demo was great. The problems showed up the week after. The App's rendered preview included a vendor name pulled straight from an untrusted record, and a teammate pointed out that nothing was screening what got rendered into that surface — the same content the model would read back. And one report Task ran for eighteen minutes, quietly burning tokens, with no progress and no cost visible until it finally returned.
Neither was a bug in Hana's code. They were the new surfaces doing exactly what the spec allows, in a system that had governance for tool calls but not for rendered UIs or long-running tasks. The 2026-07-28 release didn't just simplify the transport; it added two places where value — and risk — now live. This post is how to build on them without inheriting Hana's week.
1. What MCP Apps and Tasks Are, and Why They're Now First-Class
The release reorganized extensions into a real framework: each extension gets a reverse-DNS identifier, its own repository and maintainers, and a version that moves independently of the core spec. Clients and servers negotiate support through an extensions map in their capabilities — a server advertises that it speaks MCP Apps or a given version of Tasks, the client advertises the same, and if both agree they use it; if not, the connection still works without it. MCP Apps and Tasks are the first two first-class examples.
Crucially, neither replaces the tool primitive. A tool still takes JSON arguments and returns JSON results. An App adds an optional rendered surface on top of a tool; a Task changes how a tool call's result is delivered when the work is long-running. That's why they're best understood as augmentations — and why the governance you already apply to tools needs to extend to cover them rather than being replaced.
2. MCP Apps: Server-Rendered UI as a New Surface
An MCP App lets a tool return an interactive UI resource that the host renders in a sandboxed iframe. The analogy from the spec discussion is apt: it augments a tool the way a hosted dashboard augments an API. The tool's JSON contract is unchanged; the App is the interactive shell a host can show the user. For Hana's report tool, the App is the preview panel.
The capability is negotiated, not assumed. A server declares the App augmentation in its capabilities, and a host that doesn't support Apps simply gets the JSON result and renders nothing extra. That graceful degradation is good design — but it also means the App surface is optional and easy to ship without thinking through what it introduces, which is the subject of the next section.

3. The Security Surface of Apps: Both the User and the Model Read It
The new risk in an App is that its rendered content has two audiences. The user sees it, so anything untrusted rendered into the panel is a client-side content concern — the sandboxed iframe limits the blast radius, which is exactly why the host renders it sandboxed, but "sandboxed" bounds what the markup can do to the host, not what the content can say. Depending on host behavior, the App’s content, backing data, or UI-initiated actions may feed back into model context, so untrusted text rendered into an App is another path for the indirect prompt injection covered in our prompt-injection post: instructions smuggled into a field that ends up in front of the model.
The practical stance is to treat App content as untrusted output that needs the same screening as any other model-adjacent content: scan and sanitize what gets rendered, never interpolate raw untrusted data into the surface without escaping, and keep the App's sandbox strict. Hana's vendor-name field is the canonical mistake — untrusted data interpolated into a rendered surface that both the user and the model consume, with no guardrail in between.
4. Tasks: Long-Running Work in a Now-Stateless Protocol
Tasks were experimental in the 2025-11-25 core; the 2026-07-28 release moves them into an official extension with a lifecycle designed for the stateless world. The model is requestor-driven: a server can answer a tools/call with a task handle instead of an immediate result, and the client then drives the work with tasks/get, tasks/update, and tasks/cancel. (The earlier 2025-11-25 experimental surface also included tasks/list and tasks/result; the RC reshapes the lifecycle around get/update/cancel and removes tasks/list, which can't be scoped safely once the protocol is stateless. Treat the exact method set as RC-specific and confirm result delivery against the final spec.) A Task is, in effect, a small durable state machine plus a pointer to its result.
Support is negotiated granularly — a peer doesn't just say "I support tasks," it says which requests may be task-augmented. The capabilities map is explicit about it:
Tasks negotiation in the 2026-07-28 RC — illustrative; confirm the exact extension ID and schema against the final spec
{
"capabilities": {
"extensions": {
"io.modelcontextprotocol.tasks": { // reverse-DNS extension identifier
"requests": {
"tools": { "call": {} } // only tools/call may be task-augmented here
}
}
}
}
}The shape above is illustrative of the RC's extensions framework, not a verbatim schema. The earlier capabilities.tasks form with tasks/list belonged to the 2025-11-25 experimental Tasks API; the RC moves negotiation under a reverse-DNS-identified extensions map. Confirm the final identifier and fields against the July 28 spec before implementing.
Anything not listed under requests stays synchronous; a client must not try to task-augment a request the server didn't advertise. This is the clean, MCP-native way to do concurrency without inventing side channels — and it composes with the stateless core because the task handle is an explicit, server-minted identifier the client threads back, exactly the explicit-state pattern the stateless model encourages.

5. Observing a Task: Lifecycle, Progress, and Cost
Hana's eighteen-minute blind spot is the operational heart of Tasks. A long-running task that returns nothing until it finishes is a black box: you can't show progress, you can't see accumulating cost, and you can't tell a stalled task from a slow one. The lifecycle gives you the hooks — the client polls state and can surface progress and intermediate updates — but someone has to record those transitions, sum the tokens the task consumes across its run, and expose it.
Driving a task to completion with visibility at each step (illustrative client loop)
resp = call_tool("generate_report", args, task_augmented=True)
task_id = resp.task_id # server-minted handle
while True:
state = tasks_get(task_id) # poll; gateway records each transition
emit_progress(task_id, state.status, state.progress, state.tokens_so_far)
if state.status == "input_required":
tasks_update(task_id, provide_input())
elif state.status in ("completed", "failed", "cancelled"):
break
sleep(backoff()) # respect the server's polling guidance
result = state.result if state.status == "completed" else None # final result shape is RC-specific
This is the same observability story as the rest of the series, applied to a new object. The tracing that captures a normal request should treat a Task as a span that stays open across its lifecycle, with token and cost accounting attributed to it. TrueFoundry's MCP Gateway already traces every MCP server call and attributes usage by user, tool, and team; a Task is that same telemetry extended over time rather than captured at a single instant — which is exactly the missing piece in Hana's report tool.
6. Governance: Who Can Launch Apps and Tasks, With What Budget
The protocol decides how Apps and Tasks work; it deliberately leaves who may use them, and under what limits, to you. That's a governance gap with concrete questions. Which agents or users are allowed to launch a long-running Task at all? What's the timeout and token budget before a Task is force-cancelled — so an eighteen-minute run is a policy decision, not an accident? Which tools may return an App, and what content is allowed to render into it?
Illustrative gateway policy for the new surfaces (schema is gateway-specific)
tasks:
allow_launch:
roles: ["agent:report-writer", "team:finance"] # default-deny otherwise
limits:
max_runtime_seconds: 300 # force-cancel past this
max_tokens: 200000 # budget ceiling per task
observability: trace_lifecycle # record every state transition
apps:
output_guardrails: [pii, injection, toxicity] # scan rendered content
sandbox: strict # host renders in a locked-down iframeNone of this is exotic; it's the same role-based access, budget, and guardrail machinery already applied to model calls and tool calls, extended to two new objects. The reason to centralize it is that Apps and Tasks will be used by many agents across many servers, and per-server, per-app reimplementation is how the eighteen-minute task and the unscreened panel slip through. A single policy point makes "who can launch a Task, for how long, and what may render in an App" one answer instead of many.
Guardrails are where this gets concrete, because the gateway already screens content on a documented set of hooks. TrueFoundry’s gateway exposes four — llm_input and llm_output on model traffic, and mcp_pre_tool and mcp_post_tool on tool traffic — and any number of validators (PII detection, secrets scanning, prompt-injection and toxicity classifiers, SQL sanitizers, policy checks via Cedar or OPA) attach to each. They fan out in parallel on the same request, so an input rail runs concurrently with the model call and can cancel it before tokens are billed if it blocks. The operational discipline that keeps this safe in production is a staged rollout — audit (log violations, let traffic through), then enforce (block on failure, fail open on provider errors), then strict — so a guardrail provider outage never takes the whole path down on day one.

7. Migration: From Experimental Tasks to the New Lifecycle
If you adopted the experimental Tasks API from the 2025-11-25 core, the move to the extension is a real migration, not a rename. The lifecycle and methods are reorganized around tasks/get, tasks/update, and tasks/cancel, and capability negotiation is now explicit and granular. The good news is the deprecation policy adopted in this release guarantees a minimum twelve-month overlap between deprecating and removing anything, so the old surface keeps working while you transition rather than breaking on a date.
The low-risk path is the one the stateless migration already encourages: write tools that mint and require explicit handles, negotiate the Tasks extension rather than assuming it, and treat the release candidate as provisional — it locked on May 21, 2026 and is expected to finalize July 28, with the specification text still able to change in the validation window. A gateway that can speak both the experimental and extension forms during the overlap is genuinely useful here, the same way it shields you across the stateless transport transition.
8. Where This Belongs: the Gateway as the Governance Point
Apps and Tasks reinforce a pattern this series keeps arriving at: the protocol defines mechanism, and the gateway supplies the governance the protocol leaves open. The MCP Gateway already sits in front of every registered MCP server as the point of discovery, authentication, RBAC, and request-level tracing — which is exactly the surface where App content can be screened and Task lifecycles can be observed and bounded.
Concretely, that means App-rendered content flows through the same output guardrails as any other model-adjacent output, and a Task is traced as a long-lived span with its tokens and cost attributed to the launching agent — using the discovery, RBAC, and observability the MCP Gateway already provides, rather than bolting a second governance system onto each server. The division of labor is the familiar one: the server implements the App or Task; the gateway governs who may use it, what it may render, how long it may run, and what it cost.
9. FAQs
Are MCP Apps just a way to return HTML?
More precisely, an App augments a tool with an interactive UI resource the host renders in a sandboxed iframe — the tool still takes and returns JSON, and the App is the optional shell on top. The reason it matters for governance is that the rendered content is generated, often from untrusted inputs, and may feed back into model context through host behavior, tool outputs, or UI-initiated actions, which puts it in scope for output guardrails rather than treating it as inert presentation.
Does the stateless protocol make Tasks unnecessary?
No — they solve different problems. Stateless transport means no request is pinned to an instance; Tasks model work that takes longer than a single request-response, regardless of which instance serves each poll. In fact the task handle is exactly the explicit-state pattern the stateless model encourages: a server-minted identifier the client threads back, so any instance can report on the task.
What's the biggest new risk to plan for?
Two. On Apps, untrusted content rendered into the surface — treat it as output to be screened for injection and PII, not as harmless markup. On Tasks, unbounded long-running work — set a timeout and token budget so a runaway task is force-cancelled by policy, and trace the lifecycle so it's never a black box. Both are governance you add around the spec, not behavior the spec provides.
Do I need to migrate now?
Plan now; the release candidate locked May 21, 2026 and is expected to finalize July 28, with breaking changes if you used the experimental Tasks API. The twelve-month deprecation overlap means nothing breaks on the date, but new work should target the extension lifecycle and negotiate capabilities explicitly. Treat the RC as provisional until ratification.
App or Task governance — gateway or application?
The gateway for the cross-cutting controls: which agents may launch Apps and Tasks, the timeout and budget ceilings, output guardrails on rendered content, and lifecycle tracing — applied uniformly across every MCP server. The application still owns the domain logic of what the App shows and what the Task does, because that needs knowledge the gateway doesn't have.
The 2026-07-28 release is remembered for going stateless, but Apps and Tasks are the part that changes what you build. They add a rendered surface and a long-running one, and the spec hands you the mechanism and leaves the governance open. Close that gap at the gateway — screen what renders, bound what runs, trace both — and Hana's good demo becomes a good production system.
If the gateway is where you govern these surfaces, it helps that the same control plane is also where the agents using them run. TrueFoundry's AI Gateway is the unified entry point for model, MCP, and tool traffic, and its Agent Harness is the runtime that orchestrates an agent's plan-act-observe loop, sandboxing, approvals, and tracing on top of it — so the App content an agent renders and the Tasks it launches are screened, bounded, and traced by the same plane that runs the agent, rather than by a governance layer bolted on after the fact.
References
- Model Context Protocol — The 2026-07-28 release candidate (Extensions, MCP Apps, Tasks)
- MCP 2025-11-25 — Experimental Tasks API (the prior surface; the 2026-07-28 RC reshapes it as an extension)
- TrueFoundry — MCP Gateway (discovery, RBAC, observability)
- Our Prompt Injection and OpenTelemetry for LLMs posts — output guardrails and tracing, applied to the new surfaces.
Northwind and Hana are illustrative. The MCP Apps and Tasks details are drawn from the official Model Context Protocol 2026-07-28 release-candidate materials and related write-ups as of late May 2026; the RC locked May 21 and is expected to ratify July 28, 2026, and the specification text can still change, so treat protocol specifics — method names, capability shapes, and lifecycle states — as provisional and confirm against the final spec. Code samples are illustrative of the documented patterns, not copied from a reference implementation. TrueFoundry capabilities are summarized from public product documentation and will evolve.
TrueFoundry AI Gateway bietet eine Latenz von ~3—4 ms, verarbeitet mehr als 350 RPS auf einer vCPU, skaliert problemlos horizontal und ist produktionsbereit, während LiteLM unter einer hohen Latenz leidet, mit moderaten RPS zu kämpfen hat, keine integrierte Skalierung hat und sich am besten für leichte Workloads oder Prototyp-Workloads eignet.
Der schnellste Weg, deine KI zu entwickeln, zu steuern und zu skalieren

















.webp)
.webp)
.webp)
.webp)
.webp)

.webp)

.webp)

.webp)





