Governing MCP Apps and Tasks at the Gateway

Auf Geschwindigkeit ausgelegt: ~ 10 ms Latenz, auch unter Last

Unglaublich schnelle Methode zum Erstellen, Verfolgen und Bereitstellen Ihrer Modelle!

Verarbeitet mehr als 350 RPS auf nur 1 vCPU — kein Tuning erforderlich
Produktionsbereit mit vollem Unternehmenssupport

Beginnen Sie jetzt mit Truefoundry Sprechen Sie mit dem Experten

The same MCP 2026-07-28 release that made the protocol stateless also promoted two extensions to first-class: MCP Apps, which let a tool return an interactive UI rendered in a sandboxed iframe, and Tasks, which model long-running work as a durable state machine the client drives with tasks/get, tasks/update, and tasks/cancel. They're genuinely useful — and they're net-new governance and security surfaces. This post is how each works, where the new risk lives, and why the gateway is the place to govern them.

Key Takeaways

MCP Apps and Tasks are first-class extensions in the 2026-07-28 release candidate (locked May 21, 2026; final spec expected July 28), negotiated through a capabilities map of reverse-DNS-identified extensions — clients and servers advertise what they support, and use the extension only if both agree.
An MCP App augments a tool with a rendered UI surface (sandboxed iframe), the way a dashboard augments an API. The tool still takes and returns JSON; the App is an optional interactive shell on top — and a new content surface to govern.
The App surface is rendered for the user and can feed back into model context through tool outputs and UI-initiated actions, so its content is model-adjacent, not inert presentation: untrusted content rendered into it is a two-way risk — UI-injection and data-exfiltration, not just a styling concern.
The Tasks extension reshapes long-running work for the stateless core: tools/call can return a task handle, and the client drives the lifecycle (working → input_required → completed/failed/cancelled) with tasks/get, tasks/update, tasks/cancel.
Ironically, the protocol went stateless at the transport layer and made durable, long-running state first-class at the application layer — a Task needs its lifecycle traced, its progress observed, and its cost attributed, even though no session pins it to an instance.
Governance is the open question the spec leaves to you: which agents may launch Apps and Tasks, with what budget and timeout, and what content may render — none of which the protocol decides.
The gateway is the natural control point. TrueFoundry's MCP Gateway already centralizes discovery, RBAC, and request-level tracing across registered MCP servers — the same surface where App content and Task lifecycles can be governed and observed.

Hana, a product engineer, shipped a "generate quarterly report" MCP tool and used both new extensions to make it nice. She returned an MCP App — an interactive preview the user could tweak in a sandboxed panel — and, because the generation took a couple of minutes, she modeled it as a Task so the call didn't block. The demo was great. The problems showed up the week after. The App's rendered preview included a vendor name pulled straight from an untrusted record, and a teammate pointed out that nothing was screening what got rendered into that surface — the same content the model would read back. And one report Task ran for eighteen minutes, quietly burning tokens, with no progress and no cost visible until it finally returned.

Neither was a bug in Hana's code. They were the new surfaces doing exactly what the spec allows, in a system that had governance for tool calls but not for rendered UIs or long-running tasks. The 2026-07-28 release didn't just simplify the transport; it added two places where value — and risk — now live. This post is how to build on them without inheriting Hana's week.

1. What MCP Apps and Tasks Are, and Why They're Now First-Class

The release reorganized extensions into a real framework: each extension gets a reverse-DNS identifier, its own repository and maintainers, and a version that moves independently of the core spec. Clients and servers negotiate support through an extensions map in their capabilities — a server advertises that it speaks MCP Apps or a given version of Tasks, the client advertises the same, and if both agree they use it; if not, the connection still works without it. MCP Apps and Tasks are the first two first-class examples.

Crucially, neither replaces the tool primitive. A tool still takes JSON arguments and returns JSON results. An App adds an optional rendered surface on top of a tool; a Task changes how a tool call's result is delivered when the work is long-running. That's why they're best understood as augmentations — and why the governance you already apply to tools needs to extend to cover them rather than being replaced.

Spec-version note

This post is based on the MCP 2026-07-28 release candidate, locked May 21, 2026, with the final specification expected July 28, 2026. Tasks and MCP Apps ship as extensions in this RC; method names, extension identifiers, and capability shapes may still change during the validation window. Treat the syntax here as illustrative and confirm against the final spec before implementing.

2. MCP Apps: Server-Rendered UI as a New Surface

An MCP App lets a tool return an interactive UI resource that the host renders in a sandboxed iframe. The analogy from the spec discussion is apt: it augments a tool the way a hosted dashboard augments an API. The tool's JSON contract is unchanged; the App is the interactive shell a host can show the user. For Hana's report tool, the App is the preview panel.

The capability is negotiated, not assumed. A server declares the App augmentation in its capabilities, and a host that doesn't support Apps simply gets the JSON result and renders nothing extra. That graceful degradation is good design — but it also means the App surface is optional and easy to ship without thinking through what it introduces, which is the subject of the next section.

Why MCP Apps are a new surface: a normal tool returns inert JSON read only by the model; an App returns rendered HTML in a sandboxed iframe seen by the user and able to feed back into the run. — ***Figure 1:*** Why this is a new surface, not just nicer output: a tool returns inert JSON for the model; an App returns a rendered, interactive surface the user sees and whose content and UI actions can feed back into the run — which is why it falls inside the guardrail perimeter.

3. The Security Surface of Apps: Both the User and the Model Read It

The new risk in an App is that its rendered content has two audiences. The user sees it, so anything untrusted rendered into the panel is a client-side content concern — the sandboxed iframe limits the blast radius, which is exactly why the host renders it sandboxed, but "sandboxed" bounds what the markup can do to the host, not what the content can say. Depending on host behavior, the App’s content, backing data, or UI-initiated actions may feed back into model context, so untrusted text rendered into an App is another path for the indirect prompt injection covered in our prompt-injection post: instructions smuggled into a field that ends up in front of the model.

The practical stance is to treat App content as untrusted output that needs the same screening as any other model-adjacent content: scan and sanitize what gets rendered, never interpolate raw untrusted data into the surface without escaping, and keep the App's sandbox strict. Hana's vendor-name field is the canonical mistake — untrusted data interpolated into a rendered surface that both the user and the model consume, with no guardrail in between.

An App is output, and output needs a guardrail

The instinct is to treat a UI as presentation and therefore harmless. But an App's content is generated, often from untrusted inputs, and — depending on host behavior — its content or UI-initiated actions may feed back into model context. That puts it squarely in scope for output guardrails — toxicity, PII, and injection scanning on what renders — not outside the security perimeter because it happens to be markup.

4. Tasks: Long-Running Work in a Now-Stateless Protocol

Tasks were experimental in the 2025-11-25 core; the 2026-07-28 release moves them into an official extension with a lifecycle designed for the stateless world. The model is requestor-driven: a server can answer a tools/call with a task handle instead of an immediate result, and the client then drives the work with tasks/get, tasks/update, and tasks/cancel. (The earlier 2025-11-25 experimental surface also included tasks/list and tasks/result; the RC reshapes the lifecycle around get/update/cancel and removes tasks/list, which can't be scoped safely once the protocol is stateless. Treat the exact method set as RC-specific and confirm result delivery against the final spec.) A Task is, in effect, a small durable state machine plus a pointer to its result.

Support is negotiated granularly — a peer doesn't just say "I support tasks," it says which requests may be task-augmented. The capabilities map is explicit about it:

Tasks negotiation in the 2026-07-28 RC — illustrative; confirm the exact extension ID and schema against the final spec

{
  "capabilities": {
    "extensions": {
      "io.modelcontextprotocol.tasks": {   // reverse-DNS extension identifier
        "requests": {
          "tools": { "call": {} }          // only tools/call may be task-augmented here
        }
      }
    }
  }
}

The shape above is illustrative of the RC's extensions framework, not a verbatim schema. The earlier capabilities.tasks form with tasks/list belonged to the 2025-11-25 experimental Tasks API; the RC moves negotiation under a reverse-DNS-identified extensions map. Confirm the final identifier and fields against the July 28 spec before implementing.

Anything not listed under requests stays synchronous; a client must not try to task-augment a request the server didn't advertise. This is the clean, MCP-native way to do concurrency without inventing side channels — and it composes with the stateless core because the task handle is an explicit, server-minted identifier the client threads back, exactly the explicit-state pattern the stateless model encourages.

The Tasks lifecycle: a task-augmented tools/call returns a server-minted handle; the client polls tasks/get through working and any input_required pauses to a terminal state. — ***Figure 2:*** The Tasks lifecycle (illustrative): a task-augmented tools/call returns a server-minted handle; the client polls tasks/get through working (and any input_required pauses) to a terminal state, then retrieves the result (the 2025 experimental API used tasks/result; confirm the RC's result-delivery method against the final spec). The gateway sits under all of it as the point that observes and governs the lifecycle.

5. Observing a Task: Lifecycle, Progress, and Cost

Hana's eighteen-minute blind spot is the operational heart of Tasks. A long-running task that returns nothing until it finishes is a black box: you can't show progress, you can't see accumulating cost, and you can't tell a stalled task from a slow one. The lifecycle gives you the hooks — the client polls state and can surface progress and intermediate updates — but someone has to record those transitions, sum the tokens the task consumes across its run, and expose it.

Driving a task to completion with visibility at each step (illustrative client loop)

resp = call_tool("generate_report", args, task_augmented=True)
task_id = resp.task_id                       # server-minted handle

while True:
    state = tasks_get(task_id)                # poll; gateway records each transition
    emit_progress(task_id, state.status, state.progress, state.tokens_so_far)
    if state.status == "input_required":
        tasks_update(task_id, provide_input())
    elif state.status in ("completed", "failed", "cancelled"):
        break
    sleep(backoff())                          # respect the server's polling guidance

result = state.result if state.status == "completed" else None  # final result shape is RC-specific

This is the same observability story as the rest of the series, applied to a new object. The tracing that captures a normal request should treat a Task as a span that stays open across its lifecycle, with token and cost accounting attributed to it. TrueFoundry's MCP Gateway already traces every MCP server call and attributes usage by user, tool, and team; a Task is that same telemetry extended over time rather than captured at a single instant — which is exactly the missing piece in Hana's report tool.

6. Governance: Who Can Launch Apps and Tasks, With What Budget

The protocol decides how Apps and Tasks work; it deliberately leaves who may use them, and under what limits, to you. That's a governance gap with concrete questions. Which agents or users are allowed to launch a long-running Task at all? What's the timeout and token budget before a Task is force-cancelled — so an eighteen-minute run is a policy decision, not an accident? Which tools may return an App, and what content is allowed to render into it?

Illustrative gateway policy for the new surfaces (schema is gateway-specific)

tasks:
  allow_launch:
    roles: ["agent:report-writer", "team:finance"]   # default-deny otherwise
  limits:
    max_runtime_seconds: 300                           # force-cancel past this
    max_tokens: 200000                                 # budget ceiling per task
  observability: trace_lifecycle                       # record every state transition

apps:
  output_guardrails: [pii, injection, toxicity]        # scan rendered content
  sandbox: strict                                      # host renders in a locked-down iframe

None of this is exotic; it's the same role-based access, budget, and guardrail machinery already applied to model calls and tool calls, extended to two new objects. The reason to centralize it is that Apps and Tasks will be used by many agents across many servers, and per-server, per-app reimplementation is how the eighteen-minute task and the unscreened panel slip through. A single policy point makes "who can launch a Task, for how long, and what may render in an App" one answer instead of many.

Guardrails are where this gets concrete, because the gateway already screens content on a documented set of hooks. TrueFoundry’s gateway exposes four — llm_input and llm_output on model traffic, and mcp_pre_tool and mcp_post_tool on tool traffic — and any number of validators (PII detection, secrets scanning, prompt-injection and toxicity classifiers, SQL sanitizers, policy checks via Cedar or OPA) attach to each. They fan out in parallel on the same request, so an input rail runs concurrently with the model call and can cancel it before tokens are billed if it blocks. The operational discipline that keeps this safe in production is a staged rollout — audit (log violations, let traffic through), then enforce (block on failure, fail open on provider errors), then strict — so a guardrail provider outage never takes the whole path down on day one.

Four guardrail hooks on the request path: llm_input and llm_output on model traffic, mcp_pre_tool and mcp_post_tool on tool traffic, each carrying validators like PII, secrets, injection, and policy checks. — ***Figure 3:*** The four documented guardrail hooks. App content rides the same output and post-tool rails as everything else, which is what lets one policy cover model traffic, tool traffic, and rendered App surfaces. Original schematic compiled from TrueFoundry’s public documentation.

7. Migration: From Experimental Tasks to the New Lifecycle

If you adopted the experimental Tasks API from the 2025-11-25 core, the move to the extension is a real migration, not a rename. The lifecycle and methods are reorganized around tasks/get, tasks/update, and tasks/cancel, and capability negotiation is now explicit and granular. The good news is the deprecation policy adopted in this release guarantees a minimum twelve-month overlap between deprecating and removing anything, so the old surface keeps working while you transition rather than breaking on a date.

The low-risk path is the one the stateless migration already encourages: write tools that mint and require explicit handles, negotiate the Tasks extension rather than assuming it, and treat the release candidate as provisional — it locked on May 21, 2026 and is expected to finalize July 28, with the specification text still able to change in the validation window. A gateway that can speak both the experimental and extension forms during the overlap is genuinely useful here, the same way it shields you across the stateless transport transition.

8. Where This Belongs: the Gateway as the Governance Point

Apps and Tasks reinforce a pattern this series keeps arriving at: the protocol defines mechanism, and the gateway supplies the governance the protocol leaves open. The MCP Gateway already sits in front of every registered MCP server as the point of discovery, authentication, RBAC, and request-level tracing — which is exactly the surface where App content can be screened and Task lifecycles can be observed and bounded.

TrueFoundry MCP Gateway architecture: agents and applications connect through a single MCP Gateway that applies authentication, RBAC, discovery, and observability in front of registered internal and third-party MCP servers — ***Figure 4:*** TrueFoundry's MCP Gateway architecture — one control point mediating discovery, authentication, RBAC, and observability between agents and registered MCP servers. The same documented primitives — discovery, authentication, RBAC, request tracing, audit logging, and cost/usage observability — are the natural place to extend governance to App content and Task lifecycles. Source: *TrueFoundry MCP Gateway*.

Concretely, that means App-rendered content flows through the same output guardrails as any other model-adjacent output, and a Task is traced as a long-lived span with its tokens and cost attributed to the launching agent — using the discovery, RBAC, and observability the MCP Gateway already provides, rather than bolting a second governance system onto each server. The division of labor is the familiar one: the server implements the App or Task; the gateway governs who may use it, what it may render, how long it may run, and what it cost.

9. FAQs

Are MCP Apps just a way to return HTML?

More precisely, an App augments a tool with an interactive UI resource the host renders in a sandboxed iframe — the tool still takes and returns JSON, and the App is the optional shell on top. The reason it matters for governance is that the rendered content is generated, often from untrusted inputs, and may feed back into model context through host behavior, tool outputs, or UI-initiated actions, which puts it in scope for output guardrails rather than treating it as inert presentation.

Does the stateless protocol make Tasks unnecessary?

No — they solve different problems. Stateless transport means no request is pinned to an instance; Tasks model work that takes longer than a single request-response, regardless of which instance serves each poll. In fact the task handle is exactly the explicit-state pattern the stateless model encourages: a server-minted identifier the client threads back, so any instance can report on the task.

What's the biggest new risk to plan for?

Two. On Apps, untrusted content rendered into the surface — treat it as output to be screened for injection and PII, not as harmless markup. On Tasks, unbounded long-running work — set a timeout and token budget so a runaway task is force-cancelled by policy, and trace the lifecycle so it's never a black box. Both are governance you add around the spec, not behavior the spec provides.

Do I need to migrate now?

Plan now; the release candidate locked May 21, 2026 and is expected to finalize July 28, with breaking changes if you used the experimental Tasks API. The twelve-month deprecation overlap means nothing breaks on the date, but new work should target the extension lifecycle and negotiate capabilities explicitly. Treat the RC as provisional until ratification.

App or Task governance — gateway or application?

The gateway for the cross-cutting controls: which agents may launch Apps and Tasks, the timeout and budget ceilings, output guardrails on rendered content, and lifecycle tracing — applied uniformly across every MCP server. The application still owns the domain logic of what the App shows and what the Task does, because that needs knowledge the gateway doesn't have.

The 2026-07-28 release is remembered for going stateless, but Apps and Tasks are the part that changes what you build. They add a rendered surface and a long-running one, and the spec hands you the mechanism and leaves the governance open. Close that gap at the gateway — screen what renders, bound what runs, trace both — and Hana's good demo becomes a good production system.

If the gateway is where you govern these surfaces, it helps that the same control plane is also where the agents using them run. TrueFoundry's AI Gateway is the unified entry point for model, MCP, and tool traffic, and its Agent Harness is the runtime that orchestrates an agent's plan-act-observe loop, sandboxing, approvals, and tracing on top of it — so the App content an agent renders and the Tasks it launches are screened, bounded, and traced by the same plane that runs the agent, rather than by a governance layer bolted on after the fact.

References

Model Context Protocol — The 2026-07-28 release candidate (Extensions, MCP Apps, Tasks)
MCP 2025-11-25 — Experimental Tasks API (the prior surface; the 2026-07-28 RC reshapes it as an extension)
TrueFoundry — MCP Gateway (discovery, RBAC, observability)
Our Prompt Injection and OpenTelemetry for LLMs posts — output guardrails and tracing, applied to the new surfaces.

Northwind and Hana are illustrative. The MCP Apps and Tasks details are drawn from the official Model Context Protocol 2026-07-28 release-candidate materials and related write-ups as of late May 2026; the RC locked May 21 and is expected to ratify July 28, 2026, and the specification text can still change, so treat protocol specifics — method names, capability shapes, and lifecycle states — as provisional and confirm against the final spec. Code samples are illustrative of the documented patterns, not copied from a reference implementation. TrueFoundry capabilities are summarized from public product documentation and will evolve.

‍

TrueFoundry AI Gateway bietet eine Latenz von ~3—4 ms, verarbeitet mehr als 350 RPS auf einer vCPU, skaliert problemlos horizontal und ist produktionsbereit, während LiteLM unter einer hohen Latenz leidet, mit moderaten RPS zu kämpfen hat, keine integrierte Skalierung hat und sich am besten für leichte Workloads oder Prototyp-Workloads eignet.

Auf Geschwindigkeit ausgelegt: ~ 10 ms Latenz, auch unter Last

Vereinbaren Sie jetzt Ihre Demo