TrueFoundry’s first billboard just went live in the Bay Area for Super Bowl weekend! 🎉 Check it out

Case Study

Summary

Aviva Credito is a Mexico-based lender focused on expanding access to credit. To reach customers that traditional banks and fully online fintechs struggle to serve, Aviva operates small physical kiosks supported by an automated, tablet-first onboarding experience - building trust while reducing fraud risk.

As Aviva’s AI initiatives grew from computer vision models to production-grade chatbots and document verification workflows, the team faced two recurring challenges: (1) deploying and operating LLM services without requiring deep Kubernetes expertise, and (2) managing multiple LLM providers with consistent observability, cost control, and agility.

By using TrueFoundry’s Deployment and AI Gateway, Aviva empowered every ML/AI engineer to ship production services independently, Observability across Azure and GCP model providers, and created a scalable foundation for safety and agentic workflows.

Credito’s Mission

Aviva’s mission is to increase access to credit for underserved communities in Mexico. Aviva’s model combines a physical presence, small kiosks with a single employee, while keeping the full process automated through tablets to deliver the best of both worlds: high trust and lower fraud, with the speed of automation.

From Customer Conversations to Document Verification

Aviva’s AI team builds and operates production systems across:
  • Chatbots: Multiple production assistants backed by self hosted / public models, evolving toward squad-based orchestration (and standard agent-to-agent patterns over time).
  • Document AI: OCR + LLM parsing for visual documents, plus validation flows for proof of address, proof of identification, proof of bank accounts, and location checks.
  • Interaction intelligence: Extracting structured signals from interview scripts, feedback messages, and transcribed voice conversations.

Aviva’s first major inflection point came from a practical need: deploying an LLM model to recognize Mexico’s INE identity cards. The ML team could finetune / build the model, but shipping it reliably required an operational path they didn’t yet have. Early attempts ranged from manual VM-based deploys (slow and error-prone) to managed services that either lacked GPU support or failed to deliver quickly.


TrueFoundry’s deployment experience changed that: clear logs and observability sidecars surfaced the root cause behind a failing container, allowing the team to fix the image and successfully deploy in under an hour.

Platformizing AI for Speed, Reliability, and Governance

Once the first deployments landed, Aviva adopted a platform mindset: make every AI service repeatable to deploy, easy to monitor, and simple to hand off between engineers. TrueFoundry became the operating layer that removed infrastructure friction while enforcing best practices.

1. From Manual Deployments to Self-Serve Production

  • Self-serve deployments for AI/ML engineers: engineers can deploy and update services directly, without relying on platform specialists.
  • Fast onboarding: new engineers are expected to push an update or deploy a model within their first week, preserving a tight feedback loop between code and production behavior.
  • Operational safety rails: platform warnings and recommendations (e.g., availability-zone resilience and resource sizing) guide teams toward Kubernetes best practices.

2. The AI Gateway: One Interface Across Model Providers

As Aviva adopted multiple foundation models across Azure and Google Cloud (choosing models based on task-level quality), operational complexity grew quickly: secrets sprawl, inconsistent SDK integrations, and fragmented observability. TrueFoundry’s AI Gateway provided a unified control plane.
  • Provider independence: applications call a consistent gateway interface while Aviva can switch providers, models, and versions without rewriting integration code.
  • Centralized observability: a single place to monitor request volume, latency, failure modes, and costs across environments.
  • Cost and usage control: usage spikes can be traced back to the originating service via gateway logs, enabling rapid remediation.

3. Resilience and Developer Experience: Fallbacks + MCP Servers

Two day-to-day realities shaped Aviva’s gateway adoption: latency variability across providers, and tooling ergonomics for developers.
  • Latency-aware fallbacks: when p99 latency increased on a primary provider, Aviva introduced an automatic fallback model to keep customer-facing experiences stable.
  • Persistent MCP connections: by hosting the Atlassian MCP server on TrueFoundry, Aviva avoided repeated reconnections in Cursor and made knowledge tools easier to use day-to-day.
  • Proactive roadmap: Aviva plans to expand guardrails and safety controls as agentic workflows become more central.

Impact

By centralizing all LLM traffic through TrueFoundry’s AI Gateway, Aviva gained end-to-end visibility and control across a rapidly scaling, multi-cloud AI stack. Over a 90-day period, the team managed nearly half a million production requests and over 1.8B input tokens with predictable cost, measurable reliability, and significantly improved engineering velocity. The Gateway enabled rapid detection of cost and latency anomalies, model-level routing and failover without application changes, and a shared abstraction that allowed engineers to deploy, upgrade, and operate LLM-powered services independently.

Key Results in 90 days

  • 10M+ production LLM requests routed through a single control plane
  • 5B+ input tokens, 210M+ output tokens tracked centrally across Azure and GCP
  • <1% effective failure rate, with granular breakdown by error type and provider
  • P99 latency issues detected and mitigated in minutes via automated model fallback
  • 7+ production services upgraded in <20 minutes, without infra dependencies
  • Faster onboarding: engineers use LLMs immediately via a shared gateway abstraction

Customer Quotes

TrueFoundry’s AI Gateway gave us a single place to manage how we use LLMs across Azure and GCP. We can detect cost and latency issues quickly, trace them to specific services, and switch models without touching application code.

Matt, Aviva

It’s a powerful abstraction. It saves time for everyone and significantly lowers the knowledge barrier to start using LLMs in production

Enrique, Aviva

GenAI infra- simple, faster, cheaper

Trusted by 10+ Fortune 500s