MCP Transport: Stdio vs Streamable HTTP — Architecture, Latency Benchmarks, and Enterprise Trade-offs

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

Stdio is fine for the developer laptop. Streamable HTTP is what enterprise deployments actually need. We walk through both transports — wire format, connection lifecycle, auth, audit, and benchmarks — and show what changes when an MCP estate scales past one user.

Key Takeaways

→ Stdio is JSON-RPC 2.0 over newline-delimited stdin/stdout. Streamable HTTP (MCP spec 2025-03-26) is JSON-RPC 2.0 over one HTTP endpoint that supports POST and GET, with optional Server-Sent Events for streaming.
→ Stdio is a process-per-user model. 50 developers across 8 servers means ~400 concurrent processes spread across 50 laptops — a deployment topology that becomes operationally difficult to manage centrally, regardless of raw resource headroom.
→ Stdio has no transport-layer auth (env vars only) and no natural centralized interception point for audit — audit and identity have to be reconstructed out-of-band, per host. Streamable HTTP puts every call through one ingress with an Authorization header a gateway can intercept structurally.
→ On the same machine, stdio shaves a few milliseconds off a single tool call and gives you fault isolation for free. What Streamable HTTP buys you, in exchange for the gateway you have to operate, is centralized control of the things enterprises typically care about at scale: identity, audit, rate-limiting, RBAC, and horizontal scale at the server tier.
→ Migration is a transport swap, not a rewrite. The MCP SDKs separate transport from tool logic; converting a stdio server typically takes a five-line patch.

A Friday afternoon at Northwind. Six months after rolling out Cargo Copilot, Northwind's security lead asks the engineering team a routine audit question: which developers called the internal customer-data MCP tool in the last 30 days, and against which customer IDs? The team has every JSON-RPC message that ever crossed those tools — inside the stderr logs of every developer's local Cursor process. Spread across fifty laptops. With no shared timestamp source, no schema, and no way to correlate. The question takes a week to answer, and the answer is partial. The cause is not negligence. It is the transport choice they made six months ago.

Northwind started where most teams start: stdio MCP servers, one per developer machine. That is the right default for local experimentation — and the wrong default for everything else. This post explains why, with the specifics of the wire formats, the deployment models, and the migration path.

1. Stdio Transport: How JSON-RPC 2.0 Works Over stdin/stdout

The MCP transport specification defines stdio in one paragraph: the client launches the server as a subprocess; the server reads JSON-RPC 2.0 messages from stdin and writes responses to stdout. Each message is one line of UTF-8 text terminated by a newline. The server may write logs to stderr; it MUST NOT write anything to stdout that is not a valid MCP message.

A single tool call from agent to server is one line of JSON:

Wire format — newline-delimited JSON-RPC over stdio
# stdin (client → server)
{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"search_issues","arguments":{"query":"is:open label:critical"}}}

# stdout (server → client)
{"jsonrpc":"2.0","id":1,"result":{"content":[{"type":"text","text":"Found 3 issues..."}]}}

The framing rules are simple but unforgiving. The MCP specification requires messages to be on a single line, so compliant servers escape any internal newline characters as \n during JSON serialization. What actually breaks framing in production is non-JSON contamination of stdout: a stray print() statement, an uncaught exception traceback, a debug log accidentally routed to stdout instead of stderr, or a server that forgets to flush stdout after each message. In all of these cases the client either sees a malformed message or waits forever for a response that has technically been written. Every MCP SDK ships with a stdio transport implementation precisely to make these edge cases someone else's problem.

What stdio gives you in exchange for those constraints is process isolation. The agent owns the server's lifecycle: when the agent exits, the OS reclaims the process. There is no network, no auth handshake, no firewall question. For local development, this is exactly what you want.

2. Streamable HTTP Transport: Request-Response and SSE Modes

Streamable HTTP, introduced in MCP spec 2025-03-26 and retained in the November 2025 revision, replaces the older HTTP+SSE transport with a single-endpoint design. The server exposes one URL (e.g. /mcp) that accepts both POST and GET. Clients POST JSON-RPC messages; servers respond with either a single JSON body or upgrade to a Server-Sent Events stream for long-running calls. There is no separate "events" endpoint.

The client signals what it can accept; the server picks the response mode. Here is a tool call in HTTP form:

Wire format — Streamable HTTP, both response modes
POST /mcp HTTP/1.1
Content-Type: application/json
Accept: application/json, text/event-stream
Mcp-Session-Id: 1d3f...e7c2
Authorization: Bearer eyJhbGciOi...

{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"search_issues",...}}

# --- Server response: short call returns plain JSON ---
HTTP/1.1 200 OK
Content-Type: application/json

{"jsonrpc":"2.0","id":1,"result":{"content":[...]}}

# --- Server response: long call upgrades to SSE ---
HTTP/1.1 200 OK
Content-Type: text/event-stream

event: message
data: {"jsonrpc":"2.0","method":"notifications/progress","params":{...}}

event: message
data: {"jsonrpc":"2.0","id":1,"result":{"content":[...]}}

Three details matter operationally. The Mcp-Session-Id header binds requests to a session and is assigned by the server at initialization — it persists across pod restarts only if the server externalizes session state. The Accept header is mandatory: per the spec, clients MUST list both application/json and text/event-stream, and a compliant server may reject a missing or incomplete Accept with HTTP 406 Not Acceptable (per HTTP semantics; 415 Unsupported Media Type applies to incompatible Content-Type, not Accept). And per the spec's security section, servers MUST validate the Origin header on every connection to prevent DNS rebinding attacks against locally bound servers — a normative requirement, not a recommendation, with HTTP 403 Forbidden as the prescribed response to an invalid Origin.

3. Connection Lifecycle: Process-per-User vs Stateless HTTP

The two transports model connections completely differently, and this is where the operational gap opens.

Property	Stdio	Streamable HTTP
Process model	Typically one server process per client connection. The client owns the lifecycle. Some implementations multiplex tools or pool subprocesses, but the common pattern is one-per-client.	One server process serves many clients concurrently. Lifecycle decoupled from any specific client.
Cold start	Pays a process spawn + module-load cost on every fresh connection (Python/Node typically 200–400 ms).	Pays cold start once at deploy or autoscale. Subsequent requests reuse the running process.
Session state	Implicit — lives in the process. Crashes lose it.	Explicit — `Mcp-Session-Id` header. Server can externalize to Redis if it wants resilience.
Failure mode	Server crash kills one user's connection. Other users unaffected; their processes are unrelated.	Server crash affects all in-flight requests on that pod. Mitigated by replicas and graceful drains.
Network	None. Local pipes only.	TCP + TLS. Survives firewalls; can be load-balanced.

For a single developer working locally, stdio's process-per-connection model is a feature, not a bug — process isolation is free, and the cold start happens once when the IDE opens. The moment more than one user needs the server, that model becomes the constraint.

4. Multi-Tenancy: Why Stdio Hits a Wall at Scale

The stdio constraint that breaks at enterprise is more arithmetic than engineering: typical stdio MCP deployments run one process per (user, server) tuple, with no built-in sharing across users. Some implementations multiplex multiple tool definitions inside one subprocess, and a few pool subprocesses, but the common deployment pattern in the wild — and the one that ships in the official SDKs — is one process per user per server.

At Northwind, 50 developers each run an IDE with eight MCP servers attached. That is 400 stdio processes during peak hours, distributed across 50 laptops. Each process holds memory (a Python MCP server with a few dependencies sits around 60–120 MB resident; a Node server is similar), keeps file descriptors open, and maintains an active runtime blocked on stdin. The aggregate resource footprint is not catastrophic — 400 small processes is well within the budget of modern hardware — but the real cost is operational rather than computational: process count fragments the control plane.

The harder problem is shared-state servers. Imagine the internal Logistics API MCP server caches a 200 MB customer-graph in memory at startup. Under stdio, every developer's machine loads its own copy. Under Streamable HTTP, two pod replicas hold the graph for the whole company. Same data, two orders of magnitude less memory in aggregate, plus the cache is hot across users because it is shared.

It is worth naming the other side of the trade. Stdio's decentralized model has real advantages a senior infrastructure team will rightly cite: strong fault isolation (one developer's crashed server affects no one else), no shared ingress dependency, no centralized auth outage to drag down the whole estate, and minimal infrastructure to operate. For small teams, highly trusted local workflows, or air-gapped environments, those properties can genuinely outweigh the operational benefits of a centralized HTTP tier. The argument in this post is not that stdio is bad; it is that the failure modes it pushes onto the organization — fragmented audit, distributed credentials, no central rate limiting — show up exactly when an estate crosses from "a few power users" to "shared infrastructure with compliance obligations."

Figure 1. Side-by-side architectural impact at 50 developers across 8 MCP servers. Stdio fans out to ~400 concurrent processes; Streamable HTTP collapses to a small replica set behind a single ingress.

This is the constraint the TrueFoundry MCP Gateway documentation makes explicit:

"Only MCP servers that use streamable-http transport (stateful/stateless) are supported for connection via Gateway proxy."

— TrueFoundry MCP Gateway SDK usage docs

A gateway needs an HTTP endpoint it can intercept. In practice, centralized gateway deployments require an HTTP-facing transport layer, which is why stdio servers are typically wrapped using mcp-proxy (more on that in §8). The architectural decision "which transport do we ship with" is therefore also the decision "can we put a gateway in front of this."

5. Auth Injection: Where Each Transport Falls Short

Stdio has no transport-layer auth. The MCP specification is explicit: stdio implementations should pull credentials from the environment, not from an OAuth flow. In practice, that means each developer's machine holds API keys in shell environment variables, in editor settings, or in a config file that gets shared on Slack when someone joins the team. The credentials live where the process runs.

Streamable HTTP has the Authorization header. A gateway can validate the inbound credential before the request reaches the server, swap it for a downstream credential per the configured outbound auth model, and reject calls that fail policy — all without touching application code. The header-based model is what makes centralized identity, RBAC, and OAuth-token brokering possible at all.

The practical impact is most visible during a credential rotation. With stdio, a leaked GitHub token requires finding every developer machine that has it cached — across editor settings, dotfiles, password managers, and the inevitable copies developers made to help each other. With HTTP, the same rotation is one update at the gateway, and every subsequent request uses the new credential. The transport choice is not the only reason centralized auth works, but it is the one that makes it possible.

6. Audit Trails: What the Gateway Can and Cannot See

Stdio lacks a natural centralized interception point. There is no socket to tap, no proxy to insert, no header to log — so structured audit has to be reconstructed out-of-band through local forwarders, stderr collectors, eBPF tracing, or process supervisors deployed on every host where a server runs. That is buildable, but it is an operational program in itself, distinct from the application. The only first-class record of what happened inside an stdio MCP session is whatever the server chose to write to stderr — unstructured, per-process, untrusted timestamps, no correlation ID, no caller identity. For a single developer debugging locally, that is enough. For a security team reconstructing an incident across fifty laptops, it is not.

Streamable HTTP exposes every tool call at the HTTP layer, where a gateway can intercept structurally. A minimal audit record from the TrueFoundry gateway looks roughly like this:

Audit log entry — illustrative gateway record for a single tool call
{
  "timestamp":   "2026-05-14T16:23:11.482Z",
  "request_id":  "req_8f3a...e91",
  "session_id":  "1d3f...e7c2",
  "caller": {
    "subject": "user:alice@northwind.com",
    "auth_method": "TrueFoundry API Key (PAT)",
    "team": "platform-engineering"
  },
  "server":      "backend-group/github",
  "tool":        "search_issues",
  "arguments":   {"query": "repo:northwind/logistics-core is:open"},
  "outcome":     "ok",
  "latency_ms":  187,
  "outbound_auth": "OAuth2 (Authorization Code)"
}

The audit record is not a log line the server chose to emit. It is metadata the gateway produces by structure. The same record exists for every tool call across every MCP server in the estate, with consistent schema, monotonic timestamps, and identity tied to the inbound credential. That is the property that lets Northwind's security team answer the Friday-afternoon question in one query instead of a week.

7. Latency Benchmarks: Sequential vs Parallel Tool Calls on Both Transports

On the developer laptop, stdio is faster per call. Over a network with realistic load, Streamable HTTP wins on the metrics that scale. Here is the decomposition.

A note on the numbers below

These are engineering estimates assembled from published benchmarks of each underlying component (process spawn, JSON-RPC parsing, HTTP and TLS overhead). They are not measured TrueFoundry production telemetry. Every team should instrument server-timing headers and replace any line item with measured numbers from their own deployment.

‍

Single sequential tool call

Scenario	p50	p95	What contributes
Stdio (warm process, local)	~0.3–1 ms	~2–3 ms	Pipe write + JSON parse + tool logic. No network at all.
Stdio (cold start)	~250–400 ms	~600 ms+	Python/Node module load dominates per published runtime benchmarks; happens once per IDE session.
Streamable HTTP (same DC)	~5–10 ms	~15–25 ms	TLS reuse + JSON parse + server handling. TLS 1.3 keeps handshake to one round-trip on first connect, then reused.
Streamable HTTP (cross-region)	~50–120 ms	~200 ms+	Dominated by round-trip time. Co-locate the gateway with high-traffic MCP servers.

Ten parallel tool calls

Where the transports diverge most sharply:

Scenario	Wall-clock for 10 calls	Why
Stdio (single process)	Degrades sharply under parallel bursts	JSON-RPC IDs permit out-of-order responses at the protocol level, but many stdio MCP implementations share a single runtime event loop and a single stdout writer. Request handling, stdout writes, and runtime execution all cross one process boundary, which serializes throughput under high parallelism.
Stdio (one process per call)	10 cold-start hits in parallel	Spinning up 10 processes simultaneously creates fork storm and module-load contention.
Streamable HTTP (one pod)	Bounded by pod's concurrency budget; 10 calls overlap	HTTP servers run concurrent request handlers natively; 10 calls overlap if the server is async.
Streamable HTTP (N pods behind a gateway)	Linear scaling up to gateway's rate limit	Load balancer distributes calls across pods; bounded only by horizontal capacity.

The short version: stdio wins on single-call latency in its best case (warm process, same machine) and keeps the fault domain small. What HTTP gives up in raw single-call latency, it buys back in the operational controls enterprises usually need at scale — replication, parallelism through concurrent handlers, centralized rate limiting, and observability at the gateway tier. And because the LLM call wrapping every tool invocation is typically hundreds of milliseconds, a 5–10 ms HTTP overhead is rarely the dominant cost against the work being done.

8. Migration Guide: Converting a Stdio Server to Streamable HTTP

Migration is a transport-layer swap. The tools themselves — their input schemas, their handlers, their dependencies — are unchanged. There are two paths, depending on whether you control the server source.

Path A: change the transport in your own server

If you wrote the MCP server, the change is roughly five lines. With fastmcp, the difference between transport modes is a single run() argument:

Python — stdio → Streamable HTTP, single argument change
from fastmcp import FastMCP

mcp = FastMCP("logistics-api")

@mcp.tool
def lookup_shipment(shipment_id: str) -> dict:
    return {"id": shipment_id, "status": "in_transit"}

# --- BEFORE: stdio for local development ---
# mcp.run()   # default transport is stdio

# --- AFTER: streamable HTTP for the gateway ---
mcp.run(
    transport="http",
    host="0.0.0.0",
    port=8000,
    path="/mcp",
)

Path B: wrap a stdio server with mcp-proxy

Many open-source MCP servers ship stdio-only — the official GitHub, Slack, and filesystem servers, plus most community offerings. For these, TrueFoundry recommends wrapping them with mcp-proxy and deploying as a regular service. The wrapper terminates HTTP, spawns the stdio child, and shuttles JSON-RPC between them. From the gateway's perspective, the wrapped server is indistinguishable from a native HTTP server.

Shell — wrap a stdio server with mcp-proxy (verbatim from TrueFoundry docs)
# Wrap a stdio Python server with mcp-proxy and expose Streamable HTTP
mcp-proxy --port 8000 --host 0.0.0.0 --server stream python my_server.py

# Then register with the gateway as a regular HTTP MCP server:
#   url: http://my-server.northwind.internal:8000/mcp
#   transport: streamable-http

The exact flags depend on which mcp-proxy implementation is used — the TypeScript variant (punkpeye/mcp-proxy) takes --server stream as shown here; the Python variant (sparfenyuk/mcp-proxy) uses --transport streamablehttp with the wrapped command as a positional argument. Either way, verify against the current upstream README before shipping a runbook, since CLI flags can drift between releases.

Once wrapped, the server is registered with the gateway the same way any HTTP MCP server is — see our earlier post on OAuth at the MCP layer for the configuration model. Migration is rarely a code project; it is a deployment-and-registration project.

9. FAQs

Is stdio deprecated?

No. Stdio is the right transport for local development and for any setup where the client and server share a machine and a single user. The MCP spec defines both transports as first-class. What's deprecated is the older HTTP+SSE transport (separate endpoints for POST and GET-SSE), which Streamable HTTP replaced.

Can I run both transports on the same server?

Yes. Most MCP SDKs let a single server bind to multiple transports. A common pattern is stdio for local dev and Streamable HTTP for production, gated by an environment variable or command-line flag. Tool logic is shared; only transport initialization differs.

What about Server-Sent Events (SSE) as a standalone transport?

The older HTTP+SSE transport from spec 2024-11-05 used two endpoints — one for POST messages, one for the SSE stream. It is officially deprecated as of the 2025-03-26 spec, though servers can keep it running for backwards compatibility with older clients. New implementations should target Streamable HTTP.

Does the gateway add latency to every tool call?

Yes, but the cost is small. A healthy gateway adds a few milliseconds on the cache-hit path (token lookup + RBAC + JWT verify). Against the surrounding LLM call (typically hundreds of milliseconds to seconds) and the downstream MCP server, the overhead is rarely the dominant cost. See our earlier post on OAuth at the MCP layer for the full latency decomposition.

What about WebSocket?

Not part of the current MCP transport specification. Streamable HTTP with SSE covers the streaming use case without requiring WebSocket infrastructure, which is harder to load-balance and harder to secure than plain HTTP. The MCP authors chose HTTP semantics deliberately.

Where does TrueFoundry fit?

The TrueFoundry MCP Gateway is a Streamable HTTP-only ingress for MCP servers across your enterprise. Stdio servers reach it via mcp-proxy wrappers (see §8). Once registered, every server gets a uniform identity, RBAC, audit, and OAuth-broker layer at the gateway tier, regardless of how the upstream server was originally implemented.

Take the next step

If you run any MCP at non-trivial scale, the highest-leverage exercise is to list every MCP server in active use, mark each as stdio-only or HTTP-capable, and decide which ones move first. Stdio-only servers behind mcp-proxy is a routine pattern; the migration usually fits in a single sprint.

Start here: TrueFoundry MCP Gateway SDK usage docs. Or book an enterprise architecture review with our team.

The fastest way to build, govern and scale your AI

How Can You Prevent GenAI Costs From Spiraling at Scale?

Gartner report on best practices for optimizing generative and agentic AI costs and projected statistics.

Access Full 2026 Report

One Layer of Control for All AI

Route and govern model and tool traffic with a centralized AI Gateway

Book Demo

Table of Contents

Text Link

Govern, Deploy and Trace AI in Your Own Infrastructure

Book a 30-min with our AI expert

Book a Demo

Stdio vs Streamable HTTP for MCP: What changes when you move from local development to enterprise deployment

Built for Speed: ~10ms Latency, Even Under Load

1. Stdio Transport: How JSON-RPC 2.0 Works Over stdin/stdout

2. Streamable HTTP Transport: Request-Response and SSE Modes

3. Connection Lifecycle: Process-per-User vs Stateless HTTP

4. Multi-Tenancy: Why Stdio Hits a Wall at Scale

5. Auth Injection: Where Each Transport Falls Short

6. Audit Trails: What the Gateway Can and Cannot See

7. Latency Benchmarks: Sequential vs Parallel Tool Calls on Both Transports

Single sequential tool call

Ten parallel tool calls

8. Migration Guide: Converting a Stdio Server to Streamable HTTP

Path A: change the transport in your own server

Path B: wrap a stdio server with mcp-proxy

9. FAQs

Is stdio deprecated?

Can I run both transports on the same server?

What about Server-Sent Events (SSE) as a standalone transport?

Does the gateway add latency to every tool call?

What about WebSocket?

Where does TrueFoundry fit?

Take the next step

Further reading

The fastest way to build, govern and scale your AI

One Layer of Control for All AI

Govern, Deploy and Trace AI in Your Own Infrastructure

The fastest way to build, govern and scale your AI

Discover More

Gemini 3.5 Flash Is Impressive. Here's What We Actually Found.

Stdio vs Streamable HTTP for MCP: What changes when you move from local development to enterprise deployment

Types of AI Agents: Definitions, Roles, and What They Mean for Enterprise Deployment

AI Agents vs Agentic AI: What the Difference Actually Means in Production

Recent Blogs

Gemini 3.5 Flash Is Impressive. Here's What We Actually Found.

Designing a Centralized MCP Registry: Architecture Decisions for Enterprise Scale

Full-Stack LLM Tracing: Pydantic Logfire and TrueFoundry AI Gateway

Kimi K2.6: The Open-Source Coding Giant That's Reshaping Agentic AI

Open-Weight Routing at Scale: GLM-5.1 vs Claude Opus 4.7 on TrueFoundry AI Gateway

The Agent Sprawl Problem: Why Enterprises Need Control Before Autonomy

Introducing Skills Registry: Reusable Agent Skills for Production AI Systems

Types of AI Agents: Definitions, Roles, and What They Mean for Enterprise Deployment

OAuth at the MCP Layer: How We Solved Enterprise Token Management for AI Agents

Best AI Governance Tools in 2026: Compared for Enterprise Teams

Building the Infrastructure Layer That Enterprise AI Has Been Missing

Exporting TrueFoundry AI Gateway Traces to Honeycomb with OpenTelemetry

Rate Limiting AI Agents: Preventing LLM API Exhaustion

Air-Gapped AI: Deploying Enterprise LLMs in Highly Regulated Industries

The Agentic Token Explosion: Attributing, Budgeting, and Controlling LLM Costs in CI/CD

Blog

Subscribe to our newsletter