Supply Chain Attacks in AI: What the LiteLLM Incident Reveals

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

On March 24, 2026, a researcher for FutureSearch was engaged in a routine task. The task was to install a Python package for a project. In a matter of minutes, the workstation began to behave erratically.

The memory usage grew substantially. Unrecognized processes appeared. The system crashed. The researcher had unknowingly unleashed one of the most sophisticated software supply chain attacks ever launched against the AI ecosystem. This attack had been active for months, infecting thousands of developer environments before anyone publicly recognized its presence.

This article will examine the events that took place and explain how AI tooling is particularly vulnerable to this kind of attack. It will also discuss how engineering teams can protect themselves.

What Happened: A Supply Chain Attack Three Steps Deep

The threat actor group identified as "TeamPCP," which has been active since at least December 2025 and has been tracked across nine phases of this ongoing attack by researchers at Wiz, did not directly target their ultimate goal. Instead, this threat actor group carried out a careful, multi-stage supply chain attack.

Step 1: On March 19, TeamPCP compromised Trivy, a widely used open source security scanner integrated into the CI/CD pipelines of many projects.

Step 2: Using that foothold, they obtained the PyPI publishing credentials of a maintainer of a popular LLM proxy library downloaded roughly 3.4 million times per day.

Step 3: On March 24, they uploaded two malicious versions of the package to PyPI. The compromised versions were live for approximately three hours before PyPI quarantined them.

This is the defining characteristic of modern supply chain attacks: the attacker never has to breach you directly. They breach someone your tools trust, and let that trust carry them the rest of the way.

Inside the Payload: Three Stages of Compromise

The malicious package executed a sophisticated three-stage payload that security researchers at Snyk have documented in detail.

Stage 1 — Information collection. The payload silently harvested everything it could find: SSH private keys, AWS and GCP access credentials, Azure service principals, Kubernetes configurations, .env files, git credentials, database passwords, shell history, CI/CD secrets, and cryptocurrency wallet seed phrases.

Stage 2 — Encryption and exfiltration. Collected data was encrypted and transmitted to an attacker-controlled server. Temporary files (session.key, payload.enc, tpcp.tar.gz) were created in the system temp directory during this process.

Stage 3 — Persistence and lateral movement. The payload installed a backdoor Python script disguised as a systemd service called "System Telemetry Service". This persistence script polled an attacker-controlled URL every five minutes for new commands — and was capable of spreading across your entire Kubernetes cluster by deploying privileged pods to every node in kube-system.

What made this especially dangerous was the delivery mechanism: a .pth file. Python path configuration files execute automatically whenever any Python process starts — including pip itself, including CI/CD build steps, including test runners. You didn't have to run your application. You just had to have installed the package, and the payload ran.

Why AI Infrastructure Is a Special Target

The package that was affected was not randomly chosen. One of the most privileged positions in the modern software stack is that of Large Language Model (LLM) proxies and AI gateways. When you install and set up an LLM gateway, you put it in the position to be directly between applications and the AI service providers that are being used. It has access to OpenAI API keys, Anthropic credentials, Azure and Google Cloud Platform service accounts. It has access to environment variables and, in many cases, the secrets manager. This is by design and is necessary to route the requests, enforce rate limits, and log usage. This is the intended behavior.

This also means that if the LLM gateway and/or any of its dependencies are compromised, the attacker has full visibility into the AI infrastructure. Sonatype researchers wrote in their report about this incident: “Because LiteLLM typically sits directly between applications and multiple AI service providers, it often has access to API keys, environment variables, and other sensitive configuration data. Compromising a package in this position allows attackers to intercept and exfiltrate valuable secrets without needing to directly breach upstream systems.”

Why Compliance Frameworks Didn't Catch It

The package involved held SOC 2 Type 1 and ISO 27001 certifications. This is worth examining not to criticise those frameworks — they matter, and teams that pursue them are doing the right thing — but because it illustrates a structural gap.

Compliance frameworks audit what you're doing against a checklist. They cover access controls, data handling, and incident response. They don't typically examine whether the security scanner in your CI/CD pipeline has been compromised by a threat actor who then pivoted to steal your PyPI publishing credentials.

Even standard pip hash verification wouldn't have caught this attack. The malicious .pth file in the compromised version was correctly declared in the package's RECORD file with a matching hash. The package passed every integrity check PyPI provides. It was a valid package that happened to be weaponised.

This is the supply chain security gap that the AI ecosystem specifically needs to close: the question isn't just "are our systems secure?" It's "are the tools we use to build and secure our systems secure?"

Key Metrics for Evaluating Gateway

Criteria	What should you evaluate ?	Priority	TrueFoundry
Latency	Adds <10ms p95 overhead for time-to-first-token?	Must Have	✅ Supported
Data Residency	Keeps logs within your region (EU/US)?	Depends on use case	✅ Supported
Latency-Based Routing	Automatically reroutes based on real-time latency/failures?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported

Evaluating an AI Gateway?

A practical guide used by platform & infra teams

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

How TrueFoundry Thinks About This Problem

At TrueFoundry, supply chain security is a first-class concern in how we design our MLOps platform — not an afterthought.

When enterprises deploy models and LLM gateways through TrueFoundry, we ask a different kind of question: not just "is this endpoint secure?" but "what is the blast radius if any component in this stack is compromised?"

A few principles shape how we build:

Infrastructure runs in your VPC. When your AI infrastructure runs inside your own cloud boundary, secrets never travel to external systems. Even if a dependency somewhere in the ecosystem were compromised, the exfiltration endpoint wouldn't be reachable from within your network perimeter.

Dependencies are pinned and audited. Rather than silently pulling latest on every build, TrueFoundry maintains pinned, reviewed dependencies across the platform stack. This eliminates an entire class of supply chain vector.

Component isolation limits blast radius. A compromise in one layer of the stack doesn't automatically grant access to another. Principle of least privilege, enforced at the infrastructure level.

None of this is exotic engineering. It's discipline applied to a threat model that AI tooling, as an industry, hasn't taken seriously enough: the threat isn't coming through your firewall. It's coming through your requirements.txt.

What Your Team Should Do Right Now

If you use Python-based AI tooling, and nearly every team building on LLMs does, the following actions are worth prioritising immediately.

1. Check your installed package versions. Run pip show litellm | grep Version. If your output shows 1.82.7 or 1.82.8, treat the system as compromised and do not simply upgrade in place. The payload may have already run. Rebuild from a clean state on a known-clean machine.

2. Audit .pth files across machines and CI runners.

find $(python3 -c "import site; print(' '.join(site.getsitepackages()))") \
  -name "*.pth" -exec grep -l "base64\|subprocess\|exec" {} \;

3. Check for persistence artifacts.

ls -la ~/.config/sysmon/sysmon.py 2>/dev/null && echo "BACKDOOR FOUND"
systemctl --user status sysmon.service 2>/dev/null
ls /tmp/tpcp.tar.gz /tmp/session.key /tmp/payload.enc 2>/dev/null

4. Rotate credentials aggressively. If any affected machine had access to cloud credentials, SSH keys, API keys, Kubernetes service account tokens, or database passwords, rotate all of them now. Don't assess — rotate. The payload specifically targeted AWS Secrets Manager, SSM Parameter Store, and Kubernetes cluster secrets across all namespaces.

5. Check Kubernetes for malicious pods.

kubectl get pods -A | grep "node-setup-"

Pods named node-setup-{node_name} in the kube-system namespace are a known indicator of compromise from this campaign.

6. Move toward a private package registry. PyPI is not the only option for dependency resolution. A private package mirror with pinned hashes and an approval workflow eliminates this entire class of attack vector. Tools like Artifactory, AWS CodeArtifact, or Google Artifact Registry can serve as intermediaries.

7. Treat your CI/CD supply chain as attack surface. The initial compromise in the TeamPCP campaign wasn't of the target library — it was of the security scanner used in that library's CI pipeline. Your build infrastructure, your GitHub Actions, and your third-party integrations are all part of your attack surface. Audit them accordingly.

The Broader Pattern: This Is Not Isolated

What makes this instance particularly interesting is that it was not an opportunistic attack. Rather, TeamPCP has been executing this effortful campaign since at least December 2025. Before attacking LiteLLM, TeamPCP compromised Aqua’s Trivy security scanner (March 19), CheckMarx’s VS Code extensions and GitHub Actions (March 23), and multiple NPM packages containing a self-propagating worm.

Note that the RSA key pair used is identical to that used in all other attacks, and so is the name of the tpcp.tar.gz bundle and the tpcp-docs-prefixed GitHub repos. This suggests that TeamPCP is a professional threat actor that is methodically executing its campaign.

The implication here is that the teams that have been compromised by this campaign were not negligent. Rather, they were engaging in best practice: leveraging very popular and well-reviewed open-source software.

So what makes this interesting is that TeamPCP has not identified a weakness in any one organization’s defenses. Rather, TeamPCP has identified a weakness in the way that the wider AI ecosystem has approached trust within its dependency chain.

The Trust Problem AI Infrastructure Needs to Solve

The AI ecosystem has built something extraordinary in a very short time. The speed and openness that made that possible — the culture of pip install and share, of building on each other's work is genuinely valuable and worth preserving.

But speed without supply chain security creates debt. The attack surface of a modern AI stack is no longer just the endpoints you expose. It's every package in your dependency tree, every tool in your CI/CD pipeline, every open source component that has access to your secrets at build or runtime.

Closing this gap requires more than better incident response from affected teams. It requires the whole AI infrastructure ecosystem — maintainers, platform vendors, enterprises, and security teams — to treat supply chain provenance as a first-class engineering concern.

The researcher whose machine crashed probably didn't think he was about to expose a months-long campaign targeting AI infrastructure. Neither did any of the developers who ran a routine pip install. That's the nature of software supply chain attacks. By the time you see them, the damage is often already done.

The AI industry can build better. It needs to.

Frequently Asked Questions

What is a software supply chain attack?

A software supply chain attack occurs when a threat actor compromises a trusted component upstream of the final target — such as a developer tool, open source library, or CI/CD pipeline — and uses it to distribute malicious code to every downstream user of that component. Rather than attacking an organisation directly, attackers exploit the implicit trust that developers place in widely used packages and tooling.

How can AI and ML teams protect their infrastructure from supply chain attacks?

Protecting AI and ML infrastructure from supply chain attacks requires several complementary measures. Teams should use a private package registry (such as AWS CodeArtifact, Google Artifact Registry, or Artifactory) with pinned dependency hashes rather than pulling directly from public PyPI. Regularly auditing .pth files in Python site-packages directories can surface malicious additions early. Running AI infrastructure — including LLM gateways and model serving components — within a private VPC limits an attacker's ability to exfiltrate credentials to external servers even if a dependency is compromised. Maintaining a Software Bill of Materials (SBOM) for your ML stack enables faster identification of exposure when a new incident is disclosed. Finally, CI/CD pipelines themselves should be treated as attack surface: the tools used to build and secure software — including security scanners and GitHub Actions — can be and have been compromised as part of broader supply chain campaigns.

What are the factors that should be considered by the teams in the evaluation of the secure LLM gateway to mitigate the risks of supply chain attacks?

The LLM gateway should function in the VPC of the organization. This way, if the dependencies are compromised, there are no exfiltration routes. The dependencies should be pinned and audited rather than being resolved at installation time using public registries. Credentials should not be managed through environment variables but should instead be managed through the cloud provider’s native secrets manager. Audit logging of all model invocations, key usage, and configuration changes should also be performed. This way, any abnormal behavior can be easily identified. TrueFoundry has all these configurations set up by default, thus reducing the attack surface in comparison to self-managed open-source tools.

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now

The fastest way to build, govern and scale your AI

How Can You Prevent GenAI Costs From Spiraling at Scale?

Gartner report on best practices for optimizing generative and agentic AI costs and projected statistics.

Access Full 2026 Report

Gartner Hype Cycle for Platform Engineering 2026

Access Full 2026 Report

One Layer of Control for All AI

Route and govern model and tool traffic with a centralized AI Gateway

Book Demo

Table of Contents

Text Link

One Gateway for Every LLM, Agent and MCP Server

Book a 30-min with our AI expert

Book a Demo

Summarize with

Blurry red snowflake on white background, symmetrical frosty design with soft edges and abstract shape.

Supply Chain Attacks in AI: What Recent Incidents Reveal About AI Infrastructure Security

Built for Speed: ~10ms Latency, Even Under Load

What Happened: A Supply Chain Attack Three Steps Deep

Inside the Payload: Three Stages of Compromise

Why AI Infrastructure Is a Special Target

Why Compliance Frameworks Didn't Catch It

How TrueFoundry Thinks About This Problem

What Your Team Should Do Right Now

The Broader Pattern: This Is Not Isolated

The fastest way to build, govern and scale your AI

One Layer of Control for All AI

One Gateway for Every LLM, Agent and MCP Server

The fastest way to build, govern and scale your AI

TrueFoundry + Seldon: Unified Control Plane for Enterprise AI

TrueFoundry + Seldon: One Control Plane for Enterprise AI

Braintrust Reviews 2026: What Users Actually Say and What Enterprises Need to Know

Braintrust Pricing in 2026: Full Breakdown of Plans, Costs, and What Enterprises Should Know

Recent Blogs

TrueFoundry + Seldon: One Control Plane for Enterprise AI

TrueFoundry + Seldon: Unified Control Plane for Enterprise AI

Braintrust Reviews 2026: What Users Actually Say and What Enterprises Need to Know

Self-Hosting Open-Weight LLMs Behind the AI Gateway

Braintrust Pricing in 2026: Full Breakdown of Plans, Costs, and What Enterprises Should Know

OpenRouter Pricing in 2026: Full Breakdown of Plans, Costs, and Hidden Fees

Kimi K2.7 Code Cuts Reasoning Costs by 30% — And Beats Claude Opus 4.8 on MCP Tool Use

Claude Code with LiteLLM: Setup Guide + When to Use TrueFoundry AI Gateway

Seeing the Bill Before It Lands: Forecasting Enterprise AI Spend

KV Cache Routing: Why Standard Load Balancers Break Prefix Caching (and How to Fix It)

MCP Apps and Tasks: Governing the New First-Class MCP Extensions

Governing Multi-Agent Systems: Agent Identity, A2A, and the Agent Gateway

Grok 4.3 on Amazon Bedrock: We Routed Four Frontier Models Through One Gateway and Measured the Cost

JIT Context: Why the Best Agents Load Late and Load Little

Best AI Cost Optimization Tools in 2026: Compared for Enterprise Teams

Blog

Supply Chain Attacks in AI: What Recent Incidents Reveal About AI Infrastructure Security

Built for Speed: ~10ms Latency, Even Under Load

What Happened: A Supply Chain Attack Three Steps Deep

Inside the Payload: Three Stages of Compromise

Why AI Infrastructure Is a Special Target

Why Compliance Frameworks Didn't Catch It

How TrueFoundry Thinks About This Problem

What Your Team Should Do Right Now

The Broader Pattern: This Is Not Isolated

The fastest way to build, govern and scale your AI

One Layer of Control for All AI

One Gateway for Every LLM, Agent and MCP Server

The fastest way to build, govern and scale your AI

Discover More

TrueFoundry + Seldon: Unified Control Plane for Enterprise AI

TrueFoundry + Seldon: One Control Plane for Enterprise AI

Braintrust Reviews 2026: What Users Actually Say and What Enterprises Need to Know

Braintrust Pricing in 2026: Full Breakdown of Plans, Costs, and What Enterprises Should Know

Recent Blogs

TrueFoundry + Seldon: One Control Plane for Enterprise AI

TrueFoundry + Seldon: Unified Control Plane for Enterprise AI

Braintrust Reviews 2026: What Users Actually Say and What Enterprises Need to Know

Self-Hosting Open-Weight LLMs Behind the AI Gateway

Braintrust Pricing in 2026: Full Breakdown of Plans, Costs, and What Enterprises Should Know

OpenRouter Pricing in 2026: Full Breakdown of Plans, Costs, and Hidden Fees

Kimi K2.7 Code Cuts Reasoning Costs by 30% — And Beats Claude Opus 4.8 on MCP Tool Use

Claude Code with LiteLLM: Setup Guide + When to Use TrueFoundry AI Gateway

Seeing the Bill Before It Lands: Forecasting Enterprise AI Spend

KV Cache Routing: Why Standard Load Balancers Break Prefix Caching (and How to Fix It)

MCP Apps and Tasks: Governing the New First-Class MCP Extensions

Governing Multi-Agent Systems: Agent Identity, A2A, and the Agent Gateway

Grok 4.3 on Amazon Bedrock: We Routed Four Frontier Models Through One Gateway and Measured the Cost

JIT Context: Why the Best Agents Load Late and Load Little

Best AI Cost Optimization Tools in 2026: Compared for Enterprise Teams

Blog

Subscribe to our newsletter