Designing a Centralized MCP Registry: Architecture Decisions for Enterprise Scale

Built for Speed: ~10ms Latency, Even Under Load
Blazingly fast way to build, track and deploy your models!
- Handles 350+ RPS on just 1 vCPU — no tuning needed
- Production-ready with full enterprise support
Data model, auth metadata, dynamic tool discovery, and multi-tenant isolation for the layer between your agents and your tools.
When twenty developers each maintain their own ~/.cursor/mcp.json, you have not built MCP infrastructure. You have built a distributed sticky-note system that fails closed every time a credential rotates.
02:47 UTC, Tuesday. Continental Aerospace Systems — a fictional but unfortunately plausible 240-engineer avionics platform vendor — rotates its GitHub App private key on schedule. By 09:00, #platform-help has 23 threads from squad leads, each a variant of "MCP is broken in Cursor." The fix is identical every time — paste the new token into ~/.cursor/mcp.json, restart the IDE — but it has to be done 187 times, and the platform team loses a half-day before they finish.
That morning is not an MCP problem. It's a registry problem. The protocol works; what's missing is the layer between the agent and the protocol that knows which servers exist, who can use which tools, and how to tell every IDE in the organization about a rotated credential without 187 git commits. This post is about that layer, using TrueFoundry's MCP Gateway as the reference implementation throughout.
1. The Configuration Drift Problem: What 20 Developers × 8 MCP Servers Actually Costs
The shape of the problem is multiplication. Continental's 240 engineers across 14 squads run Claude Code, Cursor, and VS Code against eight MCP servers: GitHub, Sentry, Atlassian, Linear, Slack, an internal Postgres-backed fleet-telemetry server, an airworthiness-kb server backed by their FAA directives, and Exa for search. At baseline, every engineer maintains an ~/.cursor/mcp.json like this.
~/.cursor/mcp.json — local MCP configuration, per developer
{
"mcpServers": {
"github": {
"url": "https://api.githubcopilot.com/mcp",
"headers": { "Authorization": "Bearer ghp_..." }
},
"sentry": {
"url": "https://mcp.sentry.dev/mcp",
"headers": { "Authorization": "Bearer sntrys_..." }
},
"linear": { "url": "https://mcp.linear.app/mcp" },
"atlassian": { "url": "https://mcp.atlassian.com/v1/mcp" },
"slack": { "url": "https://mcp.slack.com/mcp" },
"fleet-telemetry": {
"url": "https://fleet-mcp.internal.example/mcp",
"headers": { "Authorization": "Bearer eyJ..." }
},
"airworthiness-kb": { "url": "https://kb-mcp.internal.example/mcp" },
"exa": {
"url": "https://mcp.exa.ai/mcp",
"headers": { "Authorization": "Bearer exa_..." }
}
}
}Eight servers per developer, 240 developers, gives the platform team about 1,920 implicit config entries to keep correct. The costs land in five places.
The MCP Gateway overview frames the pre-registry world as four overlapping pathologies — fragmented infrastructure, credential sprawl, zero visibility, no governance:
A registry collapses all five rows into one architectural primitive: one resource record per MCP server, owned by a control plane, that every IDE and every agent references by ID. The rest of this post is what that record contains and what the system around it has to do.
2. MCP Server Registry Data Model: What Every Entry Contains
Before discovery, auth, or isolation, we have to be precise about what a registry entry is. The shape we converged on:
MCP server registry entry — conceptual model (TypeScript interface)
// Conceptual model. The on-the-wire schema is exposed through
// the TrueFoundry UI and CLI, not as a public DDL.
interface McpServerRegistryEntry {
server_id: string; // stable, tenant-scoped identifier
display_name: string;
transport_type: "streamable_http" | "sse" | "stdio";
// Connectivity (one of, by transport)
base_url?: string; // remote
command?: string; // stdio: e.g. "npx"
args?: string[]; // stdio: argv beyond command
auth_config: AuthConfig; // §4
collaborators: Collaborator[]; // §5
tenant_id: string; // always present, always filtered on
cached_tool_schema?: ToolSchema[]; // last successful tools/list
cached_schema_at?: string;
health_status: "healthy" | "degraded" | "unhealthy" | "circuit_open";
created_at: string;
updated_at: string;
}Conceptual schema — not a deployable DDLThe TrueFoundry control plane exposes registry shape through the UI and the verified YAML manifests for stdio servers (see the stdio docs). The interface above names the responsibilities a registry entry must cover; storage layout is an implementation detail.
Four fields do the structural work. server_id + tenant_id is the primary key everything filters on. auth_config separates "where the server is" from "how to talk to it" — the move that makes credential rotation cheap (§4). collaborators is the access-policy attachment point. cached_tool_schema is what makes dynamic discovery fast enough to be usable.
How Continental's registry entries look
Continental's eight servers become eight entries. The internal fleet-telemetry-readonly is a streamable_http server using Token Passthrough — the on-call SRE's Okta JWT is forwarded upstream, which validates the audience and applies row-level security by team claim. Linear is registered as a stdio server through mcp-remote, with a per_user auth model so every engineer authorizes their own Linear account, per the stdio-server docs. The full manifests live in the companion mcp-registry-manifests.yaml.
3. Dynamic Tool Discovery: How Agents Find Tools They Weren't Pre-Configured With
Once the registry is populated, the second job is making it queryable. The interesting case is not a developer in Cursor — Cursor has a config file — but an autonomous agent, running as a service, with no static tool list. The agent wakes up holding a bearer token; the gateway has to turn that into "here are the tools you, specifically, are allowed to use right now," without the agent knowing any of the servers in advance.
The flow uses the standard MCP tools/list method as its entry point, but the gateway intercepts and rewrites the response. The diagram below shows the three-plane architecture.

Walking the discovery path step by step:
- The agent sends
tools/listto the gateway with its bearer token. It does not enumerate servers; it doesn't know about servers. - The gateway runs inbound authentication — validate the token, resolve it to a user, team, or virtual account. The auth docs list four inbound methods: PAT, Virtual Account Token, IdP JWT, and TrueFoundry OAuth.
- The gateway queries the registry for all servers in the caller's tenant where the resolved identity is a collaborator. Indexed read.
- For each accessible server, the gateway returns its
cached_tool_schemawhen one is current — the path that runs in steady state — and falls back to a fresh upstreamtools/listonly on cache miss or after alistChangedinvalidation (§7). Synchronous fan-out to every upstream on every caller request would be operationally untenable. - The aggregated list is filtered against tool-level RBAC. Continental's on-call agent is a collaborator on
fleet-telemetry-readonlybut is still denied write tools — those require a separate role. - The gateway returns one consolidated
tools/list. The agent's wiring doesn't have to track which upstream server each tool came from — though tool names, descriptions, and error messages can still telegraph origin in practice, which is why gateways often namespace tool names by server.

Two consequences are worth naming. First, an agent can be deployed against a tenant with zero MCP-specific configuration — give it a Virtual Account Token, point it at the gateway URL, and it discovers what it's allowed to use. Continental's on-call incident-response agent ships as one container with one env var. Second, expanding the agent's tool surface is a registry change, not an agent redeploy: adding airworthiness-kb to the collaborators list makes its tools appear in the next tools/list response — same binary, same config.
4. Auth Metadata Storage: Decoupling Credentials from Configuration
The single most expensive operation in the pre-registry world is credential rotation. The single cheapest in the post-registry world is the same rotation. The reason is one architectural choice: auth metadata is stored on the registry entry, not on the client.
The auth docs frame this as a separation between inbound authentication (how the client proves who it is to the gateway) and outbound authentication (how the gateway proves who it is to each downstream server). Each entry's auth_config drives the outbound side:
Two properties of auth_config matter operationally: it references credentials, it doesn't store them; and it's owned by the platform team, not the application team.
The first property is what TrueFoundry's secret-manager integration exists to make tidy. Credentials in a registry entry are stored as references using tfy-secret://<tenant>:<secret-group>:<secret-key>. The actual material lives in the tenant's secret store — TrueFoundry-native, AWS SSM, GCP Secret Manager, HashiCorp Vault, or Azure Key Vault — and the gateway resolves the reference at runtime. For self-hosted control planes, the equivalent is tfy-k8s-secret://<KEY_NAME>, backed by a Kubernetes secret.
How Continental's rotation went after the registry
Same 02:47 UTC rotation. Vault lands the new key at tfy-secret://continental-aerospace:github-app:GITHUB_APP_PRIVATE_KEY. The next GitHub tool invocation dereferences the secret, gets the new value, and forwards the request. Zero developer config changes. Zero tickets. The platform team learns about the rotation from a dashboard the next morning.
5. Multi-Tenant Isolation: Namespace Architecture for Enterprise Registries
A registry that holds servers for one tenant is straightforward. A registry that holds servers for hundreds — or, on a self-hosted control plane, for several business units inside the same enterprise — has to make a stronger guarantee: Org A's servers must be invisible to Org B, not merely unreachable. "Unreachable" is a 403 the wrong agent might see. "Invisible" is a tools/list response that doesn't hint another tenant exists.
Tenant in the URL. The v0.130 URL change moved the gateway URL from /api/llm/mcp/<server>/server to /api/llm/<tenant>/mcp/<server>/server. The tenant is now part of the addressable path — what makes MCP OAuth chaining work, and what lets one physical gateway serve many tenants without ambient state.
Tenant in every query. Inside the registry, every read is parameterized by tenant_id. There is no "list all servers" path; only "list all servers in this tenant." Hardcoding the filter at the data-access layer doesn't eliminate cross-tenant leakage as a class of bug — caching, async context propagation, background workers, search indexes, and telemetry joins can all still leak — but it removes the largest and most likely vector, the one where a missing WHERE clause in an RBAC check returns the wrong tenant's rows.
Tenant in the identity. Collaborators are resolved through the control plane's identity model. Users belong to tenants; teams are scoped to tenants; virtual accounts are created within tenants. The RBAC engine's first check is that the caller's tenant matches the resource's tenant.
Conceptual query path — tenant filter is non-negotiable
// Tenant filter applied before the access-policy check.
// This shrinks the blast radius of an RBAC bug — the wrong tenant's
// rows aren't loaded — but caching, async context, and indexers
// still need their own tenant-scoping discipline.
function listAccessibleServers(caller: Identity): McpServerRegistryEntry[] {
const rows = registry.query({
tenant_id: caller.tenant_id,
health_status: { $ne: "circuit_open" },
});
return rows.filter(server => rbac.canRead(caller, server));
}Shared services across business units at Continental
Continental's avionics and ground-systems divisions run as separate tenants on the same control plane. Most of the time that's exactly right — the fleet-telemetry server is not relevant, and not legally appropriate, for ground-systems engineers to see. But one server, an enterprise documentation MCP, should be visible to both. The pattern is an explicit cross-tenant collaborator grant: the entry's tenant_id stays in the owning tenant, but the collaborators list includes a principal from the other. Shared visibility is a deliberate, audited grant — never an accident.
6. Public vs. Self-Hosted Registration Flows
Registering a public MCP server and registering a self-hosted one are the same conceptual operation — produce a registry entry — but the workflows look different, and the registry has to support both.
For public servers (GitHub, Linear, Sentry, Atlassian, Slack, Exa, Playwright MCP, DeepWiki, Context7), the operation reduces to a few clicks. The MCP Gateway's "Add MCP Server" flow exposes five registration paths: Connect Official Remote MCP Servers, Connect any Remote MCP Server, Create a Virtual MCP Server, Import from OpenAPI Spec, and Hosted Stdio-based MCP Server. The first picks from a curated catalog with auth metadata pre-filled; you supply the OAuth client ID/secret and the rest of the entry is generated.
Self-hosted servers — Continental's internal fleet-telemetry-readonly, for example — use Connect any Remote MCP Server (for HTTPS endpoints) or Hosted Stdio-based MCP Server (for CLI-style servers). The stdio path is interesting because the gateway becomes responsible for running the process, not just calling it. The stdio docs define the verified YAML manifest shape: command, args, and an auth_data block with either auth_level: per_user (each caller's secret substituted into {{API_KEY}}) or auth_level: global (one shared value across the tenant). The companion mcp-registry-manifests.yaml includes four worked examples covering both auth levels and the Virtual MCP Server pattern.
Regardless of path, the registry validates that a newly registered server is reachable before publishing it. For remote HTTPS servers, that's a probe tools/list. For stdio, the gateway boots a short-lived process, sends initialize, and ensures the server responds correctly. A server that can't be probed is rejected at registration — not added in a broken state that surfaces as 503s once developers try to use it.
7. Tool Schema Versioning and Cache Invalidation
Caching the tool schema is the difference between a usable registry and an unusable one. A naive implementation that calls tools/list against every upstream server on every agent request multiplies the gateway's latency by the number of upstreams and adds a failure mode per server. The cache solves the latency problem and creates a new one: schema drift.
The gateway maintains a cached_tool_schema per entry, refreshed two ways.
Event-driven refresh. The MCP spec defines a listChanged notification that servers declaring { "tools": { "listChanged": true } } may emit when their tool list changes. For servers that both declare the capability and reliably emit on their persistent connection, the gateway treats the notification as an invalidation signal and fetches fresh before the next caller sees stale data. This is the fast path when it's available — but it isn't universal: many MCP servers don't implement listChanged at all, and some declare the capability but emit inconsistently across transports.
Periodic refresh. For everything else, the gateway re-probes at a tunable cadence (low single-digit minutes for active servers, longer for known-quiet ones). The probe reuses the existing connection pool. In practice, most production deployments lean on periodic refresh as the primary mechanism and treat listChanged as opportunistic when it works.
8. Health Checking and Circuit Breaking for MCP Servers
The registry has to know what's alive. Otherwise it does the worst thing: advertises a tool whose server has been 503-ing for an hour, and every agent in the tenant times out trying to call it.
The health subsystem runs three loops.
Probe. A background worker calls tools/list against each server on a tunable cadence. Success promotes the server to healthy and refreshes the cached schema; failure (network error, 5xx, malformed payload) increments a counter.
Threshold. Once the counter crosses a threshold — tuned per server class, since public SaaS gets more slack than internal — the server transitions to unhealthy and its tools are flagged. After a second threshold, the breaker opens (circuit_open). Most deployments choose to suppress an open-circuit server from tools/list responses entirely so agents see a smaller tool list rather than one with broken entries; some keep the tools visible but mark them, on the theory that agent planners can adapt better to "unavailable" than to "missing." The right call depends on how your agents handle absence.
Recovery. The breaker is half-open after a cool-down: one probe; on success, closed; on failure, the cool-down doubles. Standard exponential-backoff hygiene.
The reason this matters isn't latency — it's reliability composition. Without circuit breaking, an agent that depends on seven upstream servers fails when any one of them fails. With it, the agent loses one server's tools and continues with the other six. For Continental's on-call incident-response agent, that's the difference between "we keep triaging while Sentry is down" and "the AI assistant is itself down because Sentry is down."
Distributed configuration vs. centralized registry
The differences land in six dimensions.
Continental, before and after
9. FAQs
Does the registry have to live in the same control plane as the rest of our AI infrastructure?
It doesn't have to, but co-locating with the model gateway pays compounding dividends — the same RBAC engine, secret-manager integration, audit pipeline, and identity model serve both model traffic and tool traffic. The TrueFoundry MCP Gateway is intentionally part of the same control plane as the AI Gateway for this reason.
What happens when a developer's local IDE config conflicts with the registry?
It doesn't, because the IDE config no longer carries credentials — only a gateway URL. After v0.130, the Cursor or VS Code config for an MCP server collapses to a tenant-scoped gateway URL plus whatever OAuth or IdP bootstrap your organization already requires — the gateway handles credential resolution on the other side. There's no per-server secret in the IDE config to drift against the registry.
How does the registry handle servers with custom or non-standard auth?
Through Token Forwarding. The client sets x-tfy-mcp-headers with whatever auth the upstream requires, and the gateway forwards it. This is the escape hatch for servers whose schemes don't match any built-in model — common for legacy internal services.
Can we expose only a subset of tools from a registered server?
Yes, through a Virtual MCP Server. The virtual-server feature assembles a curated bundle of tools drawn from one or more registered servers and grants access to that bundle independently. Continental's incident-response toolkit is a virtual server exposing read-only Sentry tools plus read-only fleet-telemetry tools to the on-call team — none of the destructive operations are reachable.
What if an upstream MCP server changes its transport from streamable HTTP to SSE?
The gateway follows the server. As of v0.130, it preserves the upstream's transport rather than normalizing to streamable HTTP. Clients that follow the spec's streamable-http-first-then-SSE-fallback pattern keep working with no config changes.
How does the registry interact with autonomous agents that need to discover tools at runtime?
That's the point of §3. An agent authenticates once to the gateway, calls tools/list, and receives a curated, RBAC-filtered tool list. It can then call tools/call with anything from that list. No pre-baked inventory, no redeploys.
What's the simplest way to start?
Register one server you already use — Linear, GitHub, or Sentry are usual first picks because the auth flows are well-trodden — and point one squad's IDE config at the gateway URL. Don't try to migrate 240 developers and 8 servers in one shot; migrate one server and 10 developers, learn from the rollout, then expand.
Take the next step
If ~/.cursor/mcp.json drift has started costing your platform team time, that's the signal to centralize. The TrueFoundry MCP Gateway is the registry-and-policy layer we built for that transition; you can read the architecture docs or try it free.
Further reading
- TrueFoundry MCP Gateway — overview · centralized registry, before-and-after framing
- MCP Gateway — authentication and security · inbound vs outbound auth, all seven outbound models
- MCP Gateway — getting started · the five registration paths and collaborator/role model
- Hosted stdio-based MCP server · verified YAML manifest shape, env-based auth
- Virtual MCP Server · curated tool bundles without redeployment
- Use Secret Manager in Integrations · the
tfy-secret://reference format - v0.130 URL and transport changes · tenant-in-URL, OAuth chaining, SSE fallback
- MCP specification — tools ·
tools/list,tools/call,listChangednotification - Claude enterprise security — MCP Gateway with allowlisting · pattern for governed enterprise rollout
TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.
The fastest way to build, govern and scale your AI















.png)




.png)








