MCP Registry & AI Gateway: Architecture and Enterprise Use Cases

Built for Speed: ~10ms Latency, Even Under Load
Blazingly fast way to build, track and deploy your models!
- Handles 350+ RPS on just 1 vCPU â no tuning needed
- Production-ready with full enterprise support
Between 2020 and 2023, foundation models like GPT-3 and GPT-4 proved that large language models can generate human-like text, write code, summarize documents, and answer complex questions, But these models were stateless and sandboxed, they couldnât access internal systems, databases, or apps â and had no way to actually take real-world actions.
You could ask a model:
âWrite a MongoDB query to list all collections in the compliance database.â
It would generate output that looked like a valid MongoDB query â but:
- It had no idea whether the database existed
- It didnât know what collections were present in the DB
- It couldnât tell whether the query would even run
- There was no way to inspect schemas, verify results, or trigger follow-up actions like summarizing or notifying another system
It was all guesswork â with no feedback loop.
To fix this, developers began building layers around LLMs:
- Prompt chaining
- Custom wrappers in Python
- REST API call injection via string templates
- Manual context stuffing
(âusing the following schema, write the queryâ)
These were clever fixes but harder to maintain.
Frameworks like LangChain, LlamaIndex, and Semantic Kernel emerged to organize these workflows. These tools helped organize things, but the problems didnât go away. Models were still hallucinating fields, falling for prompt injection, skipping validation, and had no standard way to run actual functions defined. Every new use case still felt like reinventing the wheel.
The real turning point came when developers realized:
âWe donât need models to guess commands. We need them to call real functions â the same way frontend apps call backend APIs.â
Around mid-2023, OpenAI introduced function calling, enabling models to return structured JSON output that directly mapped to real function calls.
It redefined what was possible with model integration
- LLMs could now interface with tools using clearly defined inputs and outputs
- Models behaved like API clients, rather than guess-based generators
- Tasks could now be automated with multiple steps and system interactions
With function calling came the rise of tools and agents â models that could chain actions, follow workflows, and interact with real systems
ButâĻ
1. Every implementation was vendor-specific.
2. There wasnât any shared standard format  for how tools were described or invoked.
This is where MCP (Model Context Protocol) enters â proposed as a general, open protocol for structured communication between models and external tools. Instead of hardcoding tool APIs into each LLM app, MCP offers a universal charger for AI-tool connections  â like OpenAPI, Anthropic etc but for LLMs calling tools. Itâs based on JSON-RPC 2.0, a widely used specification that supports remote procedure calls (RPCs) using JSON.
What MCP Defines
- How a model can call a tool method (e.g., runAggregation)
- How to pass parameters and get structured responses
- How tools describe their available methods (via schemas or manifests)
- A lightweight standard for tool integration across any backend (DB, CLI, cloud API)
Think of MCP as the API contract between a model and a tool. Without a standard protocol, every integration required custom engineering â costly, error-prone, and repetitive. MCP solved this by creating one universal language, simplifying tool integrations once and for all. But what exactly is MCP, and how does it work practically? Letâs dive deeper.
Model Context Protocol
âMCP (Model Context Protocol) is a lightweight protocol purpose-built for structured communication between AI models and external tools.
At its core, MCP uses JSON-RPC 2.0, a battle-tested protocol for calling remote procedures with structured inputs and outputs â perfect for turning LLM output into real-world tool invocations.
So why invent another protocol?
Existing options like REST or GraphQL are either too generic, too verbose, or just too brittle for AI-first workflows. MCP bridges this gap by providing clear structure, minimal overhead, and an explicit focus on AI-centric workflows. Itâs not meant to replace your APIs â itâs meant to let models use them safely, repeatably, and predictably.
- Host App: Orchestrates tool calls, handles LLM output, chains responses â e.g., a GRC Agent that queries Mongo, summarizes results, and sends alerts.
- MCP Server: Implements domain-specific logic like listCollections, runAggregation, or sendMessageToSlack.
- MCP Client: A minimal library that handles request formatting, authentication, retries, and connects the Host to the appropriate MCP Server.
Servers typically expose:
- Tools: Executable commands like listCollections, runAggregation, or sendSlackMessage.
- Resources: Descriptive data like schema definitions, collection metadata, or database structure.
Optionally, servers can also expose:
- Prompts: Standardized agent prompts/templates
- Client-side primitives: Caching or batching hints
- Notifications: Real-time event streams (via SSE)
Transport Options: STDIO vs HTTP/SSE
MCP is transport-agnostic and supports two modes:

Both transports follow the same JSON-RPC format, so you can switch transports without rewriting logic.
So thatâs the big idea behind MCP â a minimal, clean way for models to call tools without fragile prompt glue. No hallucinated commands. No manual context stuffing. Just clear inputs and outputs. But how does that actually work in a real app? Letâs walk through an example with a MongoDB-powered compliance assistant.
Automating a GRC Workflow Using Mongo-MCP
Imagine youâre building a GRC (Governance, Risk, and Compliance) assistant.
This assistant needs to:
- Fetch collections and audit logs from a MongoDB database
- Summarize findings for a compliance officer
- Notify relevant teams on Slack
- Optionally, file a follow-up issue on GitHub
In a traditional setup, youâd hardwire this logic using REST calls or Python scripts, stuffing schemas and credentials into prompt templates. Every integration would be custom â and fragile.
With MCP, each of these tools â MongoDB, Slack, GitHub â becomes a first-class function provider, exposing clearly defined methods like:
- listCollections
- runAggregation
- sendMessage
- createIssue

The GRC Agent (our Host App) simply calls these tools using MCPâs JSON-RPC schema
Scenario: Detect and Report Policy Violations User instruction:Â âCheck the compliance database for any policy violations today and notify the team on Slack.â
1. User Input â Host App â LLM The GRC Agent (Host App) sends the user message to the model. The model is tool-aware and responds with:
{
"tool_calls": [
{
"name": "listCollections",
"arguments": {
"database": "compliance"
}
}
]
}
2. Host App Invokes Mongo-MCP via MCP Client This tool call is converted to a JSON-RPC request:
{
"jsonrpc": "2.0",
"method": "listCollections",
"params": {
"database": "compliance"
},
"id": "req-001"
}
3. Mongo-MCP Executes the Function Mongo-MCP maps this call to:
def listCollections(database: str) -> List[str]:
return mongo_client[database].list_collection_names()
It runs the function, gets the result, and responds:
{
"jsonrpc": "2.0",
"result": [
"audit_logs",
"policy_violations",
"user_sessions"
],
"id": "req-001"
}
4. Agent Chains the Next Call: runAggregation The model now generates a follow-up call based on the available collections:
{
"tool_calls": [
{
"name": "runAggregation",
"arguments": {
"database": "compliance",
"collection": "policy_violations",
"pipeline": [
{ "$match": { "timestamp": { "$gte": "2025-08-05" } } },
{ "$group": { "_id": "$severity", "count": { "$sum": 1 } } }
]
}
}
]
}
This results in another JSON-RPC call to Mongo-MCP, and the server returns:
{
"jsonrpc": "2.0",
"result": [
{ "_id": "high", "count": 5 },
{ "_id": "medium", "count": 12 }
],
"id": "req-002"
}
The agent passes this result back to the model with a prompt like:
âSummarize this policy violation data in plain English.â
The model replies:
âToday, there were 5 high-severity and 12 medium-severity policy violations in the compliance database.â
5. Model Calls Slack-MCP to Notify the Team Now the agent issues a final structured tool call:
{
"tool_calls": [
{
"name": "sendMessage",
"arguments": {
"channel": "#compliance-alerts",
"message": "5 high and 12 medium policy violations detected today. Please review."
}
}
]
}
The Slack-MCP server sends the message, and the workflow is complete. All this happened through structured JSON calls, not string manipulation or prompt engineering.
Why do you need a MCP Server Registry in MCP Gateway?
The Mongo-MCP demo looks clean. The model made structured tool calls. Each function worked as expected. No hallucinations. No brittle string templates. But thatâs a happy path â and real systems arenât just about workingâĻ theyâre about working safely, reliably, and observably at scale.
In production, raw MCP falls short in a few key areas:
1. No Access Control (RBAC) Raw MCP has no built-in way to restrict who can call what.
- What if you want to allow run Aggregation but block deleteCollection?
- What if a model should only query certain datasets (e.g. finance canât access HR)?
In real orgs, RBAC (role-based access control) is non-negotiable â especially when models are wired into sensitive tools.
2. No Authentication or API Keys Raw MCP doesnât handle
- Validating which agent or model is making the call
- Scoping credentials per team, environment, or project
- Token expiry or revocation
This means anyone with access to the mcp server can call any tool â and thereâs no audit trail.
3. No Observability You canât fix what you canât see.
- Whatâs the latency of each tool call?
- Whatâs the failure rate or retry count?
- Which tools are being overused or timing out?
With raw MCP, you donât have dashboards, logs, or traces. Youâre flying blind.
4. No Guardrails LLMs are creative â sometimes too creative.
Raw MCP has:
- No token limits (e.g. prevent massive aggregations)
- No result-size bounds (e.g. returning 5MB from Mongo)
- No circuit breakers or pause prompts (e.g. âAre you sure you want to send this Slack alert?â)
Without guardrails, one prompt bug can lead to thousands of Slack messages or accidental data wipes.
5. No Retry, Throttling, or Quotas In production
In production, tools donât always behave perfectly â they can fail, time out, or respond slowly. Without safeguards, even well-behaved models can:
- Hit rate limits
- Hammer services with retries
- Expose sensitive tools to misuse
The raw MCP protocol assumes everything just works â a âhappy pathâ world. But real-world infrastructure is messy. You need smart retry logic, caching, rate limiting, and access control to stay sane at scale.
- Authentication + token-based access
- RBAC per model, user, org, and tool
- Observability and tracing
- Quotas, limits, and retry strategies
- Approval workflows for sensitive actions
This is exactly what the TrueFoundry Gateway provides.
From Single-Server Demos to Enterprise-Grade Agent Workflows
In the first half of this article we learned what the Model-Context-Protocol is and used a Mongo mcp server to automate a legacy GRC platform. That toy example is great for a hack-day, but it quickly runs into real-world friction:
TrueFoundryâs AI Gateway packages the missing plumbingâan MCP registry, central auth, RBAC, guard-rails, and rich observabilityâso teams can move from âhello-world agentâ to production safely and repeatedly.
What the Gateway Adds on Top of Raw MCP
TrueFoundry positions one of the best MCP gateway as a control-plane that sits between your agents (or Chat UI) and every registered MCP server and LLM provider.
Key capabilities include:
- Central MCP Registry â add public or self-hosted servers once; discover them everywhereÂ
- Unified credentials â generate a single Personal-Access-Token (PAT) or machine-scoped Virtual-Account-Token (VAT) that is automatically exchanged for per-server OAuth / header tokens behind the scenesÂ
- Fine-grained RBAC â restrict which users, apps, or environments can see or execute which toolsÂ
- Agent Playground & built-in MCP client â rapid prototyping without writing a line of codeÂ
- Observability & guard-rails â latency, cost, traces, approval flows, rate-limits, and caching baked inÂ
Think of it as the API gateway + service mesh + secret store for the emerging MCP environment. âAs noted in the MCP Server Authentication guide, the Gateway handles credential storage and token lifecycle automatically.â
Hands-On: Registering Your First MCP Server
The very first step in the UI is to create a groupâe.g. dev-mcps or prod-mcps. Groups let you attach different RBAC rules and approval flows to different environments.
âYou can follow the TrueFoundry MCP Server Getting Started guide for detailed steps.â
AI Gateway â MCP Servers â âAdd New MCP Server Group
name: prod-mcps
access control:
- Manage: SRE-Admins
- User : Prod-Runtime-Service-AccountsInside the group choose Add/Edit MCP Server and fill three things :Â
You can just as easily add:
- No-Auth demo servers (e.g., Calculator)
- Header-Auth servers that accept a static API key (e.g., Hugging Face)Â
- Any number of future servers (Atlassian, Datadog, internal micro-services)
Behind the scenes the Gateway stores credentials in its secret store and handles token refresh.

Authentication & RBAC
TrueFoundry supports three auth schemes per MCP server :
âThese modes are described in more detail in the MCP Server Authentication documentation.â
Once a server is registered you donât hand raw tokens to every developer. Instead they authenticate once to the Gateway and receive:
- PAT â user-scoped, human-friendly, good for CLI / experiments
- VAT â service-account scoped, locked to selected servers, perfect for production appsÂ
The Gateway checks the calling token against:
- Group-level permissions (can this user reach any server in prod-mcps?)
- Server-level permissions (is slack-mcp whitelisted?)
- Tool-level permissions (can they call sendMessageToChannel?)
If any check fails the request is rejected before it hits your Slack workspace.

From Playground to Code: The Agent API
After experimenting in the UI you can click âAPI Code Snippetâ to generate working Python, JS, or cURL examples .Below is a trimmed JSON body that wires three servers together (GitHub, Slack, Calculator):
POST/api/llm/agent/responses
{
"model": "gpt-4o",
"stream": true,
"iteration_limit": 5,
"messages": [
{
"role": "user",
"content": "Summarize open PRs on repo X and DM me the top blockers."
}
],
"mcp_servers": [
{
"integration_fqn": "truefoundry:prod-mcps:github-mcp",
"tools": [ {"name": "listPullRequests"}, {"name": "createComment"} ]
},
{
"integration_fqn": "truefoundry:prod-mcps:slack-mcp",
"tools": [ {"name": "sendMessageToUser"} ]
},
{
"integration_fqn": "truefoundry:common:calculator-mcp",
"tools": [ {"name": "add"} ]
}
]
}
âYou can find a similar example in the Use MCP Server in Code Agent guide, which also includes complete Python and JS snippets.â
The streaming response interleaves:
- assistant tokens (LLM reasoning)
- tool-call chunks (function name + incremental arguments)
- tool-result events (JSON output)Â
This lets you build reactive UIs that show each step of the agentic loop in real time.
Observability, Guardrails & Policy
Even a âhello worldâ agent can cost real money and do real damage. TrueFoundry ships first-class observability:
- Latency / error dashboards â TTFT, task latency, HTTP errors, tool retriesÂ
- Token & cost tracking â attribute spend per model, per team, per feature flagÂ
- OpenTelemetry traces â hop-by-hop spans across agent, MCP proxy, and LLMÂ
- Rate-limits & caching â prevent runaway loops and reuse identical web-search results
- Guard-rail hooks â enforce PII-scrubbing, NSFW filters, or âhuman-in-the-loop approvalâ on any destructive toolÂ
All metrics come out of the box; no side-car agents or custom exporters required.

Supporting All MCP Transports (HTTP & STDIO)
Many open-source servers still speak stdio (stdin/stdout) instead of HTTP. TrueFoundry recommends wrapping them with mcp-proxy and deploying as a regular serviceÂ
# wrap a Python stdio server
mcp-proxy --port 8000 --host 0.0.0.0 --server stream python my_server.py
Ready-made templates exist for Notion and Perplexity servers, plus K8s manifests for Node or Python images. Once proxied, registration is identical to any other HTTP MCP endpoint.
âTrueFoundryâs MCP Server STDIO guide covers this proxying approach and provides deployment templates.â
Example Walk-Through: Enterprise Compliance Bot
Letâs return to our legacy GRC scenario but crank the ambition up:
âKeep our compliance evidence up-to-date. If a policy file changes in GitHub, store the diff in MongodB, create a Jira ticket, and post a summary in Slack.â
Servers Involved
Flow
- GitHub webhook hits a cloud function.
- The function calls the Agent API (VAT token) with the four servers enabled.
- LLM reasons â calls github.getFileDiff â mongo.insertDocument â jira.createIssue â slack.sendMessage.
- Each tool call and result is streamed back; observability captures latency of every hop.
- If the diff is > x (certain number of lines of code, as threshold), a guard-rail inserts ârequires manual approvalâ and pauses execution; a security lead can approve via the Gateway UI.
Governance
- The VAT attached to the function only sees the four listed serversâleast privilege.
- Separate dev and prod groups let you test against a sandbox Jira and staging Slack.
- Auditors can replay any incident: traces + full JSON payloads are retained 30 days by default.
Extending the Ecosystem: Building Your Own MCP Server
Because MCP is just JSON-RPC over HTTP or stdio, any internal service can expose tools, here is a small example :Â
- Drop this container into your Kubernetes cluster.
- Register it in the dev group with No-Auth while developing.
- Change to Header-Auth or OAuth2 before moving it to prod-mcps.
From that moment onwards, every agent in your company can reason over compliance controls with exactly the same ergonomics as Slack or GitHub.
Best Practices Cheat-Sheet
âThis sheet has been adapted from the TrueFoundry MCP Overview, these guidelines help ensure secure and reliable deployments.â
ConclusionÂ
MCP lets large language models (LLMs) speak the same language as tools â but it doesnât handle things like security, discovery, or governance at an enterprise level. Thatâs where TrueFoundryâs AI Gateway steps in: it adds everything teams need out of the box â like a full MCP registry, built-in authentication, role-based access control (RBAC), deep observability, and a powerful Agent API to tie it all together.
Frequently Asked Questions
What is an MCP registry in AI systems?
An MCP registry is a centralized catalog that stores and manages available Model Context Protocol (MCP) servers and their specific capabilities. It acts as a searchable directory for AI agents, allowing them to dynamically discover tools and resources. This registry ensures models can identify the correct services needed to execute complex, multi-step tasks.
What is the difference between an MCP registry and an AI gateway?
An MCP registry is a passive directory of tools, while an AI gateway is the active enforcement layer that manages traffic. The registry tells the system what resources exist, whereas the gateway controls access, handles authentication, and applies security policies like PII masking or rate limiting during the actual execution of model requests.
Can an AI gateway function without an MCP registry?
Yes, an AI gateway can function without an MCP registry using static configurations, but it loses the flexibility of dynamic tool discovery. Without a registry, developers must manually hardcode every server connection, making the system difficult to scale and maintain. Integrating a registry allows the gateway to automatically adapt as new tools are added to the ecosystem.
How does an AI gateway enforce policies from an MCP registry?
The AI gateway retrieves metadata and governance rules from the MCP registry to validate incoming requests in real time. It cross-references user permissions and security requirements defined in the registry before routing traffic to the appropriate server. This ensures that every interaction remains compliant with organizational standards and data residency protocols.
TrueFoundry AI Gateway delivers ~3â4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.
The fastest way to build, govern and scale your AI















.png)
.webp)
.webp)
.webp)


.webp)
.webp)
.webp)
.png)








