MCP Apps and Tasks: Governing the New First-Class MCP Extensions

Published: June 20, 2026

Conçu pour la vitesse : latence d'environ 10 ms, même en cas de charge

Une méthode incroyablement rapide pour créer, suivre et déployer vos modèles !

Gère plus de 350 RPS sur un seul processeur virtuel, aucun réglage n'est nécessaire
Prêt pour la production avec un support complet pour les entreprises

Commencez à utiliser Truefoundry dès maintenant Parlez à l'expert

The same MCP 2026-07-28 release that made the protocol stateless also promoted two extensions to first-class: MCP Apps, which let a tool return an interactive UI rendered in a sandboxed iframe, and Tasks, which model long-running work as a durable state machine the client drives with tasks/get, tasks/update, and tasks/cancel. They're genuinely useful — and they're net-new governance and security surfaces. This post is how each works, where the new risk lives, and why the gateway is the place to govern them.

Key Takeaways

MCP Apps and Tasks are first-class extensions in the 2026-07-28 release candidate (locked May 21, 2026; final spec expected July 28), negotiated through a capabilities map of reverse-DNS-identified extensions — clients and servers advertise what they support, and use the extension only if both agree.
An MCP App augments a tool with a rendered UI surface (sandboxed iframe), the way a dashboard augments an API. The tool still takes and returns JSON; the App is an optional interactive shell on top — and a new content surface to govern.
The App surface is rendered for the user and can feed back into model context through tool outputs and UI-initiated actions, so its content is model-adjacent, not inert presentation: untrusted content rendered into it is a two-way risk — UI-injection and data-exfiltration, not just a styling concern.
The Tasks extension reshapes long-running work for the stateless core: tools/call can return a task handle, and the client drives the lifecycle (working → input_required → completed/failed/cancelled) with tasks/get, tasks/update, tasks/cancel.
Ironically, the protocol went stateless at the transport layer and made durable, long-running state first-class at the application layer — a Task needs its lifecycle traced, its progress observed, and its cost attributed, even though no session pins it to an instance.
Governance is the open question the spec leaves to you: which agents may launch Apps and Tasks, with what budget and timeout, and what content may render — none of which the protocol decides.
The gateway is the natural control point. TrueFoundry's MCP Gateway already centralizes discovery, RBAC, and request-level tracing across registered MCP servers — the same surface where App content and Task lifecycles can be governed and observed.

Hana, a product engineer, shipped a "generate quarterly report" MCP tool and used both new extensions to make it nice. She returned an MCP App — an interactive preview the user could tweak in a sandboxed panel — and, because the generation took a couple of minutes, she modeled it as a Task so the call didn't block. The demo was great. The problems showed up the week after. The App's rendered preview included a vendor name pulled straight from an untrusted record, and a teammate pointed out that nothing was screening what got rendered into that surface — the same content the model would read back. And one report Task ran for eighteen minutes, quietly burning tokens, with no progress and no cost visible until it finally returned.

Neither was a bug in Hana's code. They were the new surfaces doing exactly what the spec allows, in a system that had governance for tool calls but not for rendered UIs or long-running tasks. The 2026-07-28 release didn't just simplify the transport; it added two places where value — and risk — now live. This post is how to build on them without inheriting Hana's week.

1. What MCP Apps and Tasks Are, and Why They're Now First-Class

The release reorganized extensions into a real framework: each extension gets a reverse-DNS identifier, its own repository and maintainers, and a version that moves independently of the core spec. Clients and servers negotiate support through an extensions map in their capabilities — a server advertises that it speaks MCP Apps or a given version of Tasks, the client advertises the same, and if both agree they use it; if not, the connection still works without it. MCP Apps and Tasks are the first two first-class examples.

Crucially, neither replaces the tool primitive. A tool still takes JSON arguments and returns JSON results. An App adds an optional rendered surface on top of a tool; a Task changes how a tool call's result is delivered when the work is long-running. That's why they're best understood as augmentations — and why the governance you already apply to tools needs to extend to cover them rather than being replaced.

Spec-version note

This post is based on the MCP 2026-07-28 release candidate, locked May 21, 2026, with the final specification expected July 28, 2026. Tasks and MCP Apps ship as extensions in this RC; method names, extension identifiers, and capability shapes may still change during the validation window. Treat the syntax here as illustrative and confirm against the final spec before implementing.

2. MCP Apps: Server-Rendered UI as a New Surface

An MCP App lets a tool return an interactive UI resource that the host renders in a sandboxed iframe. The analogy from the spec discussion is apt: it augments a tool the way a hosted dashboard augments an API. The tool's JSON contract is unchanged; the App is the interactive shell a host can show the user. For Hana's report tool, the App is the preview panel.

The capability is negotiated, not assumed. A server declares the App augmentation in its capabilities, and a host that doesn't support Apps simply gets the JSON result and renders nothing extra. That graceful degradation is good design — but it also means the App surface is optional and easy to ship without thinking through what it introduces, which is the subject of the next section.

Why MCP Apps are a new surface: a normal tool returns inert JSON read only by the model; an App returns rendered HTML in a sandboxed iframe seen by the user and able to feed back into the run. — ***Figure 1:*** Why this is a new surface, not just nicer output: a tool returns inert JSON for the model; an App returns a rendered, interactive surface the user sees and whose content and UI actions can feed back into the run — which is why it falls inside the guardrail perimeter.

3. The Security Surface of Apps: Both the User and the Model Read It

The new risk in an App is that its rendered content has two audiences. The user sees it, so anything untrusted rendered into the panel is a client-side content concern — the sandboxed iframe limits the blast radius, which is exactly why the host renders it sandboxed, but "sandboxed" bounds what the markup can do to the host, not what the content can say. Depending on host behavior, the App’s content, backing data, or UI-initiated actions may feed back into model context, so untrusted text rendered into an App is another path for the indirect prompt injection covered in our prompt-injection post: instructions smuggled into a field that ends up in front of the model.

The practical stance is to treat App content as untrusted output that needs the same screening as any other model-adjacent content: scan and sanitize what gets rendered, never interpolate raw untrusted data into the surface without escaping, and keep the App's sandbox strict. Hana's vendor-name field is the canonical mistake — untrusted data interpolated into a rendered surface that both the user and the model consume, with no guardrail in between.

‍

TrueFoundry AI Gateway offre une latence d'environ 3 à 4 ms, gère plus de 350 RPS sur 1 processeur virtuel, évolue horizontalement facilement et est prête pour la production, tandis que LiteLM souffre d'une latence élevée, peine à dépasser un RPS modéré, ne dispose pas d'une mise à l'échelle intégrée et convient parfaitement aux charges de travail légères ou aux prototypes.

Conçu pour la vitesse : latence d'environ 10 ms, même en cas de charge

Planifiez votre démo dès maintenant