Building a Centralized MCP Registry: Architecture, Multi-Tenancy, and Dynamic Discovery

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

Data model, auth metadata, dynamic tool discovery, and multi-tenant isolation for the layer between your agents and your tools.

When twenty developers each maintain their own ~/.cursor/mcp.json, you have not built MCP infrastructure. You have built a distributed sticky-note system that fails closed every time a credential rotates.

Key Takeaways

→ The hidden cost of distributed MCP config is O(developers × servers), not O(servers). A 240-engineer org with 8 servers maintains roughly 1,900 entries by hand.
→ A registry entry is a server plus its auth metadata, access policy, transport, and a cached tool schema — not just a URL.
→ Dynamic tool discovery turns tools/list into a per-caller, RBAC-filtered query against the registry.
→ Storing auth metadata separately from server URLs makes credential rotation a one-row update instead of a fleet-wide config push.
→ Multi-tenant isolation lives in the query path: every read carries a tenant context, and "shared services" is an explicit cross-tenant grant.

02:47 UTC, Tuesday. Continental Aerospace Systems — a fictional but unfortunately plausible 240-engineer avionics platform vendor — rotates its GitHub App private key on schedule. By 09:00, #platform-help has 23 threads from squad leads, each a variant of "MCP is broken in Cursor." The fix is identical every time — paste the new token into ~/.cursor/mcp.json, restart the IDE — but it has to be done 187 times, and the platform team loses a half-day before they finish.

That morning is not an MCP problem. It's a registry problem. The protocol works; what's missing is the layer between the agent and the protocol that knows which servers exist, who can use which tools, and how to tell every IDE in the organization about a rotated credential without 187 git commits. This post is about that layer, using TrueFoundry's MCP Gateway as the reference implementation throughout.

1. The Configuration Drift Problem: What 20 Developers × 8 MCP Servers Actually Costs

The shape of the problem is multiplication. Continental's 240 engineers across 14 squads run Claude Code, Cursor, and VS Code against eight MCP servers: GitHub, Sentry, Atlassian, Linear, Slack, an internal Postgres-backed fleet-telemetry server, an airworthiness-kb server backed by their FAA directives, and Exa for search. At baseline, every engineer maintains an ~/.cursor/mcp.json like this.

~/.cursor/mcp.json — local MCP configuration, per developer

{
  "mcpServers": {
    "github": {
      "url": "https://api.githubcopilot.com/mcp",
      "headers": { "Authorization": "Bearer ghp_..." }
    },
    "sentry": {
      "url": "https://mcp.sentry.dev/mcp",
      "headers": { "Authorization": "Bearer sntrys_..." }
    },
    "linear":    { "url": "https://mcp.linear.app/mcp" },
    "atlassian": { "url": "https://mcp.atlassian.com/v1/mcp" },
    "slack":     { "url": "https://mcp.slack.com/mcp" },
    "fleet-telemetry": {
      "url": "https://fleet-mcp.internal.example/mcp",
      "headers": { "Authorization": "Bearer eyJ..." }
    },
    "airworthiness-kb": { "url": "https://kb-mcp.internal.example/mcp" },
    "exa": {
      "url": "https://mcp.exa.ai/mcp",
      "headers": { "Authorization": "Bearer exa_..." }
    }
  }
}

Eight servers per developer, 240 developers, gives the platform team about 1,920 implicit config entries to keep correct. The costs land in five places.

Failure mode	What it costs Continental
Credential rotation	Each GitHub App key rotation breaks 187 IDEs at once. Half a person-day for support, every quarter.
URL changes	When a vendor moves from /v1/mcp to /v2/mcp, every config has to be rewritten by hand. It trickles for weeks and never finishes.
Server outages	An hour of Sentry MCP 503s yields 20 separate tickets. No shared circuit breaker, no shared dashboard.
Tool sprawl	A squad enables an unvetted public server. Six weeks later, half of engineering is using it. No one approved it; no one is auditing what it exposes.
Audit and compliance	Asked which AI tools accessed customer data last quarter, Continental has no single answer. Configs are on laptops; call logs are wherever each vendor decided to keep them.

The MCP Gateway overview frames the pre-registry world as four overlapping pathologies — fragmented infrastructure, credential sprawl, zero visibility, no governance:

IT and security teams have no insight into which tools are being used, by whom, or how frequently. Without observability, you can't detect misuse, optimize costs, or meet compliance requirements.

— TrueFoundry MCP Gateway overview

A registry collapses all five rows into one architectural primitive: one resource record per MCP server, owned by a control plane, that every IDE and every agent references by ID. The rest of this post is what that record contains and what the system around it has to do.

2. MCP Server Registry Data Model: What Every Entry Contains

Before discovery, auth, or isolation, we have to be precise about what a registry entry is. The shape we converged on:

MCP server registry entry — conceptual model (TypeScript interface)

// Conceptual model. The on-the-wire schema is exposed through
// the TrueFoundry UI and CLI, not as a public DDL.
interface McpServerRegistryEntry {
  server_id:        string;            // stable, tenant-scoped identifier
  display_name:     string;
  transport_type:   "streamable_http" | "sse" | "stdio";

  // Connectivity (one of, by transport)
  base_url?:        string;            // remote
  command?:         string;            // stdio: e.g. "npx"
  args?:            string[];          // stdio: argv beyond command

  auth_config:      AuthConfig;                // §4
  collaborators:    Collaborator[];            // §5
  tenant_id:        string;            // always present, always filtered on
  cached_tool_schema?:   ToolSchema[];         // last successful tools/list
  cached_schema_at?:     string;
  health_status:    "healthy" | "degraded" | "unhealthy" | "circuit_open";

  created_at:       string;
  updated_at:       string;
}

Conceptual schema — not a deployable DDLThe TrueFoundry control plane exposes registry shape through the UI and the verified YAML manifests for stdio servers (see the stdio docs). The interface above names the responsibilities a registry entry must cover; storage layout is an implementation detail.

Four fields do the structural work. server_id + tenant_id is the primary key everything filters on. auth_config separates "where the server is" from "how to talk to it" — the move that makes credential rotation cheap (§4). collaborators is the access-policy attachment point. cached_tool_schema is what makes dynamic discovery fast enough to be usable.

How Continental's registry entries look

Continental's eight servers become eight entries. The internal fleet-telemetry-readonly is a streamable_http server using Token Passthrough — the on-call SRE's Okta JWT is forwarded upstream, which validates the audience and applies row-level security by team claim. Linear is registered as a stdio server through mcp-remote, with a per_user auth model so every engineer authorizes their own Linear account, per the stdio-server docs. The full manifests live in the companion mcp-registry-manifests.yaml.

3. Dynamic Tool Discovery: How Agents Find Tools They Weren't Pre-Configured With

Once the registry is populated, the second job is making it queryable. The interesting case is not a developer in Cursor — Cursor has a config file — but an autonomous agent, running as a service, with no static tool list. The agent wakes up holding a bearer token; the gateway has to turn that into "here are the tools you, specifically, are allowed to use right now," without the agent knowing any of the servers in advance.

The flow uses the standard MCP tools/list method as its entry point, but the gateway intercepts and rewrites the response. The diagram below shows the three-plane architecture.

**Figure 1.** Three-plane architecture. Every client speaks to the gateway; the gateway is the only thing that speaks to MCP servers. The registry, auth machinery, RBAC engine, and health/circuit-breaker subsystems all share state inside the control plane.

Walking the discovery path step by step:

The agent sends tools/list to the gateway with its bearer token. It does not enumerate servers; it doesn't know about servers.
The gateway runs inbound authentication — validate the token, resolve it to a user, team, or virtual account. The auth docs list four inbound methods: PAT, Virtual Account Token, IdP JWT, and TrueFoundry OAuth.
The gateway queries the registry for all servers in the caller's tenant where the resolved identity is a collaborator. Indexed read.
For each accessible server, the gateway returns its cached_tool_schema when one is current — the path that runs in steady state — and falls back to a fresh upstream tools/list only on cache miss or after a listChanged invalidation (§7). Synchronous fan-out to every upstream on every caller request would be operationally untenable.
The aggregated list is filtered against tool-level RBAC. Continental's on-call agent is a collaborator on fleet-telemetry-readonly but is still denied write tools — those require a separate role.
The gateway returns one consolidated tools/list. The agent's wiring doesn't have to track which upstream server each tool came from — though tool names, descriptions, and error messages can still telegraph origin in practice, which is why gateways often namespace tool names by server.

**Figure 2.**tools/list sequence for an autonomous agent. The dashed opt block fires only when fleet-telemetry is healthy; if the circuit breaker is open (§8), the gateway skips that fan-out entirely and the agent simply never sees those tools in the response.

Two consequences are worth naming. First, an agent can be deployed against a tenant with zero MCP-specific configuration — give it a Virtual Account Token, point it at the gateway URL, and it discovers what it's allowed to use. Continental's on-call incident-response agent ships as one container with one env var. Second, expanding the agent's tool surface is a registry change, not an agent redeploy: adding airworthiness-kb to the collaborators list makes its tools appear in the next tools/list response — same binary, same config.

4. Auth Metadata Storage: Decoupling Credentials from Configuration

The single most expensive operation in the pre-registry world is credential rotation. The single cheapest in the post-registry world is the same rotation. The reason is one architectural choice: auth metadata is stored on the registry entry, not on the client.

The auth docs frame this as a separation between inbound authentication (how the client proves who it is to the gateway) and outbound authentication (how the gateway proves who it is to each downstream server). Each entry's auth_config drives the outbound side:

Outbound model	When you use it
OAuth2 Authorization Code	Per-user access where each user authorizes their own account (GitHub, Slack, Atlassian). The gateway handles consent, storage, refresh.
OAuth2 Client Credentials	Server-to-server. The gateway holds the client ID and secret and refreshes automatically.
API Key — Shared	One key for everyone. Read-only APIs and shared knowledge bases.
API Key — Individual	Each user supplies their own key via Auth Overrides; the gateway substitutes API_KEY per caller.
No Auth	Public servers — Calculator, public DeepWiki. Not for production.
Token Passthrough	The user's inbound JWT is forwarded to the MCP server, which validates it directly.
Token Forwarding	Client supplies custom headers via x-tfy-mcp-headers when the upstream's auth scheme is non-standard.

Two properties of auth_config matter operationally: it references credentials, it doesn't store them; and it's owned by the platform team, not the application team.

The first property is what TrueFoundry's secret-manager integration exists to make tidy. Credentials in a registry entry are stored as references using tfy-secret://<tenant>:<secret-group>:<secret-key>. The actual material lives in the tenant's secret store — TrueFoundry-native, AWS SSM, GCP Secret Manager, HashiCorp Vault, or Azure Key Vault — and the gateway resolves the reference at runtime. For self-hosted control planes, the equivalent is tfy-k8s-secret://<KEY_NAME>, backed by a Kubernetes secret.

How Continental's rotation went after the registry

Same 02:47 UTC rotation. Vault lands the new key at tfy-secret://continental-aerospace:github-app:GITHUB_APP_PRIVATE_KEY. The next GitHub tool invocation dereferences the secret, gets the new value, and forwards the request. Zero developer config changes. Zero tickets. The platform team learns about the rotation from a dashboard the next morning.

5. Multi-Tenant Isolation: Namespace Architecture for Enterprise Registries

A registry that holds servers for one tenant is straightforward. A registry that holds servers for hundreds — or, on a self-hosted control plane, for several business units inside the same enterprise — has to make a stronger guarantee: Org A's servers must be invisible to Org B, not merely unreachable. "Unreachable" is a 403 the wrong agent might see. "Invisible" is a tools/list response that doesn't hint another tenant exists.

Tenant in the URL. The v0.130 URL change moved the gateway URL from /api/llm/mcp/<server>/server to /api/llm/<tenant>/mcp/<server>/server. The tenant is now part of the addressable path — what makes MCP OAuth chaining work, and what lets one physical gateway serve many tenants without ambient state.

Tenant in every query. Inside the registry, every read is parameterized by tenant_id. There is no "list all servers" path; only "list all servers in this tenant." Hardcoding the filter at the data-access layer doesn't eliminate cross-tenant leakage as a class of bug — caching, async context propagation, background workers, search indexes, and telemetry joins can all still leak — but it removes the largest and most likely vector, the one where a missing WHERE clause in an RBAC check returns the wrong tenant's rows.

Tenant in the identity. Collaborators are resolved through the control plane's identity model. Users belong to tenants; teams are scoped to tenants; virtual accounts are created within tenants. The RBAC engine's first check is that the caller's tenant matches the resource's tenant.

Conceptual query path — tenant filter is non-negotiable

// Tenant filter applied before the access-policy check.
// This shrinks the blast radius of an RBAC bug — the wrong tenant's
// rows aren't loaded — but caching, async context, and indexers
// still need their own tenant-scoping discipline.
function listAccessibleServers(caller: Identity): McpServerRegistryEntry[] {
  const rows = registry.query({
    tenant_id: caller.tenant_id,
    health_status: { $ne: "circuit_open" },
  });
  return rows.filter(server => rbac.canRead(caller, server));
}

Shared services across business units at Continental

Continental's avionics and ground-systems divisions run as separate tenants on the same control plane. Most of the time that's exactly right — the fleet-telemetry server is not relevant, and not legally appropriate, for ground-systems engineers to see. But one server, an enterprise documentation MCP, should be visible to both. The pattern is an explicit cross-tenant collaborator grant: the entry's tenant_id stays in the owning tenant, but the collaborators list includes a principal from the other. Shared visibility is a deliberate, audited grant — never an accident.

6. Public vs. Self-Hosted Registration Flows

Registering a public MCP server and registering a self-hosted one are the same conceptual operation — produce a registry entry — but the workflows look different, and the registry has to support both.

For public servers (GitHub, Linear, Sentry, Atlassian, Slack, Exa, Playwright MCP, DeepWiki, Context7), the operation reduces to a few clicks. The MCP Gateway's "Add MCP Server" flow exposes five registration paths: Connect Official Remote MCP Servers, Connect any Remote MCP Server, Create a Virtual MCP Server, Import from OpenAPI Spec, and Hosted Stdio-based MCP Server. The first picks from a curated catalog with auth metadata pre-filled; you supply the OAuth client ID/secret and the rest of the entry is generated.

Self-hosted servers — Continental's internal fleet-telemetry-readonly, for example — use Connect any Remote MCP Server (for HTTPS endpoints) or Hosted Stdio-based MCP Server (for CLI-style servers). The stdio path is interesting because the gateway becomes responsible for running the process, not just calling it. The stdio docs define the verified YAML manifest shape: command, args, and an auth_data block with either auth_level: per_user (each caller's secret substituted into {{API_KEY}}) or auth_level: global (one shared value across the tenant). The companion mcp-registry-manifests.yaml includes four worked examples covering both auth levels and the Virtual MCP Server pattern.

Regardless of path, the registry validates that a newly registered server is reachable before publishing it. For remote HTTPS servers, that's a probe tools/list. For stdio, the gateway boots a short-lived process, sends initialize, and ensures the server responds correctly. A server that can't be probed is rejected at registration — not added in a broken state that surfaces as 503s once developers try to use it.

7. Tool Schema Versioning and Cache Invalidation

Caching the tool schema is the difference between a usable registry and an unusable one. A naive implementation that calls tools/list against every upstream server on every agent request multiplies the gateway's latency by the number of upstreams and adds a failure mode per server. The cache solves the latency problem and creates a new one: schema drift.

The gateway maintains a cached_tool_schema per entry, refreshed two ways.

Event-driven refresh. The MCP spec defines a listChanged notification that servers declaring { "tools": { "listChanged": true } } may emit when their tool list changes. For servers that both declare the capability and reliably emit on their persistent connection, the gateway treats the notification as an invalidation signal and fetches fresh before the next caller sees stale data. This is the fast path when it's available — but it isn't universal: many MCP servers don't implement listChanged at all, and some declare the capability but emit inconsistently across transports.

Periodic refresh. For everything else, the gateway re-probes at a tunable cadence (low single-digit minutes for active servers, longer for known-quiet ones). The probe reuses the existing connection pool. In practice, most production deployments lean on periodic refresh as the primary mechanism and treat listChanged as opportunistic when it works.

Schema drift is the real failure mode

An agent holding an old schema may call a tool with parameters the upstream no longer accepts, and the upstream returns an error the agent can't recover from. Mitigation: keep schema refresh out of the request path and into a background sweeper, and surface cached_schema_at in the audit log so an SRE can correlate a wave of tool-call errors with a recent schema change.

8. Health Checking and Circuit Breaking for MCP Servers

The registry has to know what's alive. Otherwise it does the worst thing: advertises a tool whose server has been 503-ing for an hour, and every agent in the tenant times out trying to call it.

The health subsystem runs three loops.

Probe. A background worker calls tools/list against each server on a tunable cadence. Success promotes the server to healthy and refreshes the cached schema; failure (network error, 5xx, malformed payload) increments a counter.

Threshold. Once the counter crosses a threshold — tuned per server class, since public SaaS gets more slack than internal — the server transitions to unhealthy and its tools are flagged. After a second threshold, the breaker opens (circuit_open). Most deployments choose to suppress an open-circuit server from tools/list responses entirely so agents see a smaller tool list rather than one with broken entries; some keep the tools visible but mark them, on the theory that agent planners can adapt better to "unavailable" than to "missing." The right call depends on how your agents handle absence.

Recovery. The breaker is half-open after a cool-down: one probe; on success, closed; on failure, the cool-down doubles. Standard exponential-backoff hygiene.

The reason this matters isn't latency — it's reliability composition. Without circuit breaking, an agent that depends on seven upstream servers fails when any one of them fails. With it, the agent loses one server's tools and continues with the other six. For Continental's on-call incident-response agent, that's the difference between "we keep triaging while Sentry is down" and "the AI assistant is itself down because Sentry is down."

Distributed configuration vs. centralized registry

The differences land in six dimensions.

Dimension	Distributed (per-developer config)	Centralized registry
Operational overhead	O(devs × servers) entries; rotations are fleet-wide edits	O(servers) entries; rotations are single-row updates
Consistency	Drifts within hours of any change	One source of truth; every IDE and agent sees the same state on the next request
Discoverability	Tribal knowledge in Slack	One catalog; one tools/list call from any agent
Audit	Vendor-specific logs scattered across providers	One audit trail, OpenTelemetry-exportable, with user attribution
Access control	Trust-based; whoever has the IDE has the key	RBAC at the user, team, virtual-account, and tool level
Incident response speed	Minutes-to-hours to identify a broken server; per-IDE remediation	Seconds — circuit breaker auto-removes; zero developer action

Continental, before and after

Metric	Before (distributed)	After (registry)
Config entries maintained	~1,920 across 240 laptops	~8 in the control plane
Time per credential rotation	Half a platform-team day + 187 individual edits	Single Vault sync; no developer action
Tickets per server outage	~20 in #platform-help	1 dashboard alert; breaker auto-isolates
Agent onboarding	Ship a tool list; redeploy on each change	Issue a Virtual Account Token; tools discovered at runtime
Audit response time	Days of log gathering across vendors	One query against the gateway's audit store

9. FAQs

Does the registry have to live in the same control plane as the rest of our AI infrastructure?

It doesn't have to, but co-locating with the model gateway pays compounding dividends — the same RBAC engine, secret-manager integration, audit pipeline, and identity model serve both model traffic and tool traffic. The TrueFoundry MCP Gateway is intentionally part of the same control plane as the AI Gateway for this reason.

What happens when a developer's local IDE config conflicts with the registry?

It doesn't, because the IDE config no longer carries credentials — only a gateway URL. After v0.130, the Cursor or VS Code config for an MCP server collapses to a tenant-scoped gateway URL plus whatever OAuth or IdP bootstrap your organization already requires — the gateway handles credential resolution on the other side. There's no per-server secret in the IDE config to drift against the registry.

How does the registry handle servers with custom or non-standard auth?

Through Token Forwarding. The client sets x-tfy-mcp-headers with whatever auth the upstream requires, and the gateway forwards it. This is the escape hatch for servers whose schemes don't match any built-in model — common for legacy internal services.

Can we expose only a subset of tools from a registered server?

Yes, through a Virtual MCP Server. The virtual-server feature assembles a curated bundle of tools drawn from one or more registered servers and grants access to that bundle independently. Continental's incident-response toolkit is a virtual server exposing read-only Sentry tools plus read-only fleet-telemetry tools to the on-call team — none of the destructive operations are reachable.

What if an upstream MCP server changes its transport from streamable HTTP to SSE?

The gateway follows the server. As of v0.130, it preserves the upstream's transport rather than normalizing to streamable HTTP. Clients that follow the spec's streamable-http-first-then-SSE-fallback pattern keep working with no config changes.

How does the registry interact with autonomous agents that need to discover tools at runtime?

That's the point of §3. An agent authenticates once to the gateway, calls tools/list, and receives a curated, RBAC-filtered tool list. It can then call tools/call with anything from that list. No pre-baked inventory, no redeploys.

What's the simplest way to start?

Register one server you already use — Linear, GitHub, or Sentry are usual first picks because the auth flows are well-trodden — and point one squad's IDE config at the gateway URL. Don't try to migrate 240 developers and 8 servers in one shot; migrate one server and 10 developers, learn from the rollout, then expand.

Take the next step

If ~/.cursor/mcp.json drift has started costing your platform team time, that's the signal to centralize. The TrueFoundry MCP Gateway is the registry-and-policy layer we built for that transition; you can read the architecture docs or try it free.