Stdio vs Streamable HTTP for MCP: What changes when you move from local development to enterprise deployment

Built for Speed: ~10ms Latency, Even Under Load
Blazingly fast way to build, track and deploy your models!
- Handles 350+ RPS on just 1 vCPU — no tuning needed
- Production-ready with full enterprise support
Stdio is fine for the developer laptop. Streamable HTTP is what enterprise deployments actually need. We walk through both transports — wire format, connection lifecycle, auth, audit, and benchmarks — and show what changes when an MCP estate scales past one user.
A Friday afternoon at Northwind. Six months after rolling out Cargo Copilot, Northwind's security lead asks the engineering team a routine audit question: which developers called the internal customer-data MCP tool in the last 30 days, and against which customer IDs? The team has every JSON-RPC message that ever crossed those tools — inside the stderr logs of every developer's local Cursor process. Spread across fifty laptops. With no shared timestamp source, no schema, and no way to correlate. The question takes a week to answer, and the answer is partial. The cause is not negligence. It is the transport choice they made six months ago.
Northwind started where most teams start: stdio MCP servers, one per developer machine. That is the right default for local experimentation — and the wrong default for everything else. This post explains why, with the specifics of the wire formats, the deployment models, and the migration path.
1. Stdio Transport: How JSON-RPC 2.0 Works Over stdin/stdout
The MCP transport specification defines stdio in one paragraph: the client launches the server as a subprocess; the server reads JSON-RPC 2.0 messages from stdin and writes responses to stdout. Each message is one line of UTF-8 text terminated by a newline. The server may write logs to stderr; it MUST NOT write anything to stdout that is not a valid MCP message.
A single tool call from agent to server is one line of JSON:
Wire format — newline-delimited JSON-RPC over stdio
# stdin (client → server)
{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"search_issues","arguments":{"query":"is:open label:critical"}}}
# stdout (server → client)
{"jsonrpc":"2.0","id":1,"result":{"content":[{"type":"text","text":"Found 3 issues..."}]}}The framing rules are simple but unforgiving. The MCP specification requires messages to be on a single line, so compliant servers escape any internal newline characters as \n during JSON serialization. What actually breaks framing in production is non-JSON contamination of stdout: a stray print() statement, an uncaught exception traceback, a debug log accidentally routed to stdout instead of stderr, or a server that forgets to flush stdout after each message. In all of these cases the client either sees a malformed message or waits forever for a response that has technically been written. Every MCP SDK ships with a stdio transport implementation precisely to make these edge cases someone else's problem.
What stdio gives you in exchange for those constraints is process isolation. The agent owns the server's lifecycle: when the agent exits, the OS reclaims the process. There is no network, no auth handshake, no firewall question. For local development, this is exactly what you want.
2. Streamable HTTP Transport: Request-Response and SSE Modes
Streamable HTTP, introduced in MCP spec 2025-03-26 and retained in the November 2025 revision, replaces the older HTTP+SSE transport with a single-endpoint design. The server exposes one URL (e.g. /mcp) that accepts both POST and GET. Clients POST JSON-RPC messages; servers respond with either a single JSON body or upgrade to a Server-Sent Events stream for long-running calls. There is no separate "events" endpoint.
The client signals what it can accept; the server picks the response mode. Here is a tool call in HTTP form:
Wire format — Streamable HTTP, both response modes
POST /mcp HTTP/1.1
Content-Type: application/json
Accept: application/json, text/event-stream
Mcp-Session-Id: 1d3f...e7c2
Authorization: Bearer eyJhbGciOi...
{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"search_issues",...}}
# --- Server response: short call returns plain JSON ---
HTTP/1.1 200 OK
Content-Type: application/json
{"jsonrpc":"2.0","id":1,"result":{"content":[...]}}
# --- Server response: long call upgrades to SSE ---
HTTP/1.1 200 OK
Content-Type: text/event-stream
event: message
data: {"jsonrpc":"2.0","method":"notifications/progress","params":{...}}
event: message
data: {"jsonrpc":"2.0","id":1,"result":{"content":[...]}}Three details matter operationally. The Mcp-Session-Id header binds requests to a session and is assigned by the server at initialization — it persists across pod restarts only if the server externalizes session state. The Accept header is mandatory: per the spec, clients MUST list both application/json and text/event-stream, and a compliant server may reject a missing or incomplete Accept with HTTP 406 Not Acceptable (per HTTP semantics; 415 Unsupported Media Type applies to incompatible Content-Type, not Accept). And per the spec's security section, servers MUST validate the Origin header on every connection to prevent DNS rebinding attacks against locally bound servers — a normative requirement, not a recommendation, with HTTP 403 Forbidden as the prescribed response to an invalid Origin.
3. Connection Lifecycle: Process-per-User vs Stateless HTTP
The two transports model connections completely differently, and this is where the operational gap opens.
For a single developer working locally, stdio's process-per-connection model is a feature, not a bug — process isolation is free, and the cold start happens once when the IDE opens. The moment more than one user needs the server, that model becomes the constraint.
4. Multi-Tenancy: Why Stdio Hits a Wall at Scale
The stdio constraint that breaks at enterprise is more arithmetic than engineering: typical stdio MCP deployments run one process per (user, server) tuple, with no built-in sharing across users. Some implementations multiplex multiple tool definitions inside one subprocess, and a few pool subprocesses, but the common deployment pattern in the wild — and the one that ships in the official SDKs — is one process per user per server.
At Northwind, 50 developers each run an IDE with eight MCP servers attached. That is 400 stdio processes during peak hours, distributed across 50 laptops. Each process holds memory (a Python MCP server with a few dependencies sits around 60–120 MB resident; a Node server is similar), keeps file descriptors open, and maintains an active runtime blocked on stdin. The aggregate resource footprint is not catastrophic — 400 small processes is well within the budget of modern hardware — but the real cost is operational rather than computational: process count fragments the control plane.
The harder problem is shared-state servers. Imagine the internal Logistics API MCP server caches a 200 MB customer-graph in memory at startup. Under stdio, every developer's machine loads its own copy. Under Streamable HTTP, two pod replicas hold the graph for the whole company. Same data, two orders of magnitude less memory in aggregate, plus the cache is hot across users because it is shared.
It is worth naming the other side of the trade. Stdio's decentralized model has real advantages a senior infrastructure team will rightly cite: strong fault isolation (one developer's crashed server affects no one else), no shared ingress dependency, no centralized auth outage to drag down the whole estate, and minimal infrastructure to operate. For small teams, highly trusted local workflows, or air-gapped environments, those properties can genuinely outweigh the operational benefits of a centralized HTTP tier. The argument in this post is not that stdio is bad; it is that the failure modes it pushes onto the organization — fragmented audit, distributed credentials, no central rate limiting — show up exactly when an estate crosses from "a few power users" to "shared infrastructure with compliance obligations."

This is the constraint the TrueFoundry MCP Gateway documentation makes explicit:
A gateway needs an HTTP endpoint it can intercept. In practice, centralized gateway deployments require an HTTP-facing transport layer, which is why stdio servers are typically wrapped using mcp-proxy (more on that in §8). The architectural decision "which transport do we ship with" is therefore also the decision "can we put a gateway in front of this."
5. Auth Injection: Where Each Transport Falls Short
Stdio has no transport-layer auth. The MCP specification is explicit: stdio implementations should pull credentials from the environment, not from an OAuth flow. In practice, that means each developer's machine holds API keys in shell environment variables, in editor settings, or in a config file that gets shared on Slack when someone joins the team. The credentials live where the process runs.
Streamable HTTP has the Authorization header. A gateway can validate the inbound credential before the request reaches the server, swap it for a downstream credential per the configured outbound auth model, and reject calls that fail policy — all without touching application code. The header-based model is what makes centralized identity, RBAC, and OAuth-token brokering possible at all.
The practical impact is most visible during a credential rotation. With stdio, a leaked GitHub token requires finding every developer machine that has it cached — across editor settings, dotfiles, password managers, and the inevitable copies developers made to help each other. With HTTP, the same rotation is one update at the gateway, and every subsequent request uses the new credential. The transport choice is not the only reason centralized auth works, but it is the one that makes it possible.
6. Audit Trails: What the Gateway Can and Cannot See
Stdio lacks a natural centralized interception point. There is no socket to tap, no proxy to insert, no header to log — so structured audit has to be reconstructed out-of-band through local forwarders, stderr collectors, eBPF tracing, or process supervisors deployed on every host where a server runs. That is buildable, but it is an operational program in itself, distinct from the application. The only first-class record of what happened inside an stdio MCP session is whatever the server chose to write to stderr — unstructured, per-process, untrusted timestamps, no correlation ID, no caller identity. For a single developer debugging locally, that is enough. For a security team reconstructing an incident across fifty laptops, it is not.
Streamable HTTP exposes every tool call at the HTTP layer, where a gateway can intercept structurally. A minimal audit record from the TrueFoundry gateway looks roughly like this:
Audit log entry — illustrative gateway record for a single tool call
{
"timestamp": "2026-05-14T16:23:11.482Z",
"request_id": "req_8f3a...e91",
"session_id": "1d3f...e7c2",
"caller": {
"subject": "user:alice@northwind.com",
"auth_method": "TrueFoundry API Key (PAT)",
"team": "platform-engineering"
},
"server": "backend-group/github",
"tool": "search_issues",
"arguments": {"query": "repo:northwind/logistics-core is:open"},
"outcome": "ok",
"latency_ms": 187,
"outbound_auth": "OAuth2 (Authorization Code)"
}The audit record is not a log line the server chose to emit. It is metadata the gateway produces by structure. The same record exists for every tool call across every MCP server in the estate, with consistent schema, monotonic timestamps, and identity tied to the inbound credential. That is the property that lets Northwind's security team answer the Friday-afternoon question in one query instead of a week.
7. Latency Benchmarks: Sequential vs Parallel Tool Calls on Both Transports
On the developer laptop, stdio is faster per call. Over a network with realistic load, Streamable HTTP wins on the metrics that scale. Here is the decomposition.
The short version: stdio wins on single-call latency in its best case (warm process, same machine) and keeps the fault domain small. What HTTP gives up in raw single-call latency, it buys back in the operational controls enterprises usually need at scale — replication, parallelism through concurrent handlers, centralized rate limiting, and observability at the gateway tier. And because the LLM call wrapping every tool invocation is typically hundreds of milliseconds, a 5–10 ms HTTP overhead is rarely the dominant cost against the work being done.
8. Migration Guide: Converting a Stdio Server to Streamable HTTP
Migration is a transport-layer swap. The tools themselves — their input schemas, their handlers, their dependencies — are unchanged. There are two paths, depending on whether you control the server source.
Path A: change the transport in your own server
If you wrote the MCP server, the change is roughly five lines. With fastmcp, the difference between transport modes is a single run() argument:
Python — stdio → Streamable HTTP, single argument change
from fastmcp import FastMCP
mcp = FastMCP("logistics-api")
@mcp.tool
def lookup_shipment(shipment_id: str) -> dict:
return {"id": shipment_id, "status": "in_transit"}
# --- BEFORE: stdio for local development ---
# mcp.run() # default transport is stdio
# --- AFTER: streamable HTTP for the gateway ---
mcp.run(
transport="http",
host="0.0.0.0",
port=8000,
path="/mcp",
)Path B: wrap a stdio server with mcp-proxy
Many open-source MCP servers ship stdio-only — the official GitHub, Slack, and filesystem servers, plus most community offerings. For these, TrueFoundry recommends wrapping them with mcp-proxy and deploying as a regular service. The wrapper terminates HTTP, spawns the stdio child, and shuttles JSON-RPC between them. From the gateway's perspective, the wrapped server is indistinguishable from a native HTTP server.
Shell — wrap a stdio server with mcp-proxy (verbatim from TrueFoundry docs)
# Wrap a stdio Python server with mcp-proxy and expose Streamable HTTP
mcp-proxy --port 8000 --host 0.0.0.0 --server stream python my_server.py
# Then register with the gateway as a regular HTTP MCP server:
# url: http://my-server.northwind.internal:8000/mcp
# transport: streamable-httpThe exact flags depend on which mcp-proxy implementation is used — the TypeScript variant (punkpeye/mcp-proxy) takes --server stream as shown here; the Python variant (sparfenyuk/mcp-proxy) uses --transport streamablehttp with the wrapped command as a positional argument. Either way, verify against the current upstream README before shipping a runbook, since CLI flags can drift between releases.
Once wrapped, the server is registered with the gateway the same way any HTTP MCP server is — see our earlier post on OAuth at the MCP layer for the configuration model. Migration is rarely a code project; it is a deployment-and-registration project.
9. FAQs
Is stdio deprecated?
No. Stdio is the right transport for local development and for any setup where the client and server share a machine and a single user. The MCP spec defines both transports as first-class. What's deprecated is the older HTTP+SSE transport (separate endpoints for POST and GET-SSE), which Streamable HTTP replaced.
Can I run both transports on the same server?
Yes. Most MCP SDKs let a single server bind to multiple transports. A common pattern is stdio for local dev and Streamable HTTP for production, gated by an environment variable or command-line flag. Tool logic is shared; only transport initialization differs.
What about Server-Sent Events (SSE) as a standalone transport?
The older HTTP+SSE transport from spec 2024-11-05 used two endpoints — one for POST messages, one for the SSE stream. It is officially deprecated as of the 2025-03-26 spec, though servers can keep it running for backwards compatibility with older clients. New implementations should target Streamable HTTP.
Does the gateway add latency to every tool call?
Yes, but the cost is small. A healthy gateway adds a few milliseconds on the cache-hit path (token lookup + RBAC + JWT verify). Against the surrounding LLM call (typically hundreds of milliseconds to seconds) and the downstream MCP server, the overhead is rarely the dominant cost. See our earlier post on OAuth at the MCP layer for the full latency decomposition.
What about WebSocket?
Not part of the current MCP transport specification. Streamable HTTP with SSE covers the streaming use case without requiring WebSocket infrastructure, which is harder to load-balance and harder to secure than plain HTTP. The MCP authors chose HTTP semantics deliberately.
Where does TrueFoundry fit?
The TrueFoundry MCP Gateway is a Streamable HTTP-only ingress for MCP servers across your enterprise. Stdio servers reach it via mcp-proxy wrappers (see §8). Once registered, every server gets a uniform identity, RBAC, audit, and OAuth-broker layer at the gateway tier, regardless of how the upstream server was originally implemented.
Take the next step
If you run any MCP at non-trivial scale, the highest-leverage exercise is to list every MCP server in active use, mark each as stdio-only or HTTP-capable, and decide which ones move first. Stdio-only servers behind mcp-proxy is a routine pattern; the migration usually fits in a single sprint.
Start here: TrueFoundry MCP Gateway SDK usage docs. Or book an enterprise architecture review with our team.
Further reading
Citations are linked inline throughout. The list below collects all URLs for printability and link-rot insurance.
- MCP Transports specification (2025-11-25). https://modelcontextprotocol.io/specification/2025-11-25/basic/transports
- MCP Transports specification (2025-03-26, introduced Streamable HTTP). https://modelcontextprotocol.io/specification/2025-03-26/basic/transports
- TrueFoundry MCP Gateway overview. https://www.truefoundry.com/docs/ai-gateway/mcp/mcp-overview
- TrueFoundry MCP Gateway authentication and security. https://www.truefoundry.com/docs/ai-gateway/mcp/mcp-gateway-auth-security
- TrueFoundry MCP Gateway SDK usage (streamable-http requirement). https://docs.truefoundry.com/gateway/mcp-gateway-sdk-usage
- TrueFoundry: Inside MCP architecture (mcp-proxy wrapper guidance). https://www.truefoundry.com/blog/inside-the-model-context-protocol-mcp-architecture-motivation-internal-usage
- JSON-RPC 2.0 specification. https://www.jsonrpc.org/specification
- AWS Lambda cold start benchmarks (Python/Node 200–400 ms). https://edgedelta.com/company/knowledge-center/aws-lambda-cold-start-cost
- mcp-proxy: stdio ↔ HTTP bridge (open source). https://github.com/sparfenyuk/mcp-proxy
Note: Northwind Logistics is a fictional company used to ground the design in a concrete deployment. Latency numbers in §7 are engineering estimates from published component benchmarks, not measured TrueFoundry production telemetry.
TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.
The fastest way to build, govern and scale your AI













.webp)




.png)



.png)







