AWS Bedrock Pricing Explained: Everything You Need To Know
AWS Bedrock has emerged as the default "easy button" for teams looking to access foundation models directly within the AWS ecosystem. Its pay-as-you-go pricing and managed infrastructure make it incredibly appealing for fast experimentation -- you get access to Claude 3.5 Sonnet, Llama 3, and Amazon Titan without managing a single GPU.
However, AWS Bedrock pricing is not a single flat rate. It is a complex menu where costs vary wildly based on model choice, token usage, traffic patterns, and the supporting AWS services you inevitably turn on.
This blog takes a practical, fact-based approach to explaining how AWS Bedrock pricing works, where costs typically spike at scale, and why enterprises often layer platforms like TrueFoundry on top for better cost predictability and infrastructure control.
How AWS Bedrock Is Priced: An Overview
Before examining detailed costs, it’s important to understand the overall pricing philosophy. AWS Bedrock follows a pure usage-based pricing approach with no upfront platform or subscription fees. You don't pay to turn the service on. Instead, you are billed primarily for model inference, measured through tokens or generated outputs. However, the price per unit varies significantly depending on the foundation model provider; an Anthropic model costs significantly more than a Meta or Mistral model, even if the token count is identical.
Understanding AWS Bedrock Pricing Units
AWS Bedrock pricing is driven by how models consume resources during inference. While most text models bill per token, multimodal models differ.
For text models, you deal with Input Tokens and Output Tokens. Input tokens encompass the text you send to the model, including your system prompt, the user's question, and any RAG context you inject. Output tokens are the text the model generates. It is critical to note that output tokens are usually significantly more expensive -- often 3x to 5x higher -- than input tokens because generating text is computationally heavier than reading it. For image models like Amazon Titan Image Generator or Stable Diffusion, billing switches to a per-image basis, calculated on resolution and step count.
Fig 1: The Cost Multiplier Effect

The Two Core Pricing Models in AWS Bedrock
AWS Bedrock pricing isn't just "pay per token." Teams must choose between two pricing models that trade flexibility for guaranteed capacity.
On-Demand Pricing (Pay-As-You-Go)
On-Demand pricing is the default option for most AWS Bedrock users. It offers flexibility but comes with operational risks. You are charged strictly per 1,000 input and output tokens processed, which is perfect for variable traffic patterns or early experimentation where usage is "bursty." However, the downside is reliability. AWS enforces throttling limits, meaning that during peak demand, your requests might fail with a ThrottlingException without warning because you are sharing capacity with other AWS customers.
Provisioned Throughput Pricing (Committed Capacity)
Provisioned Throughput is designed for teams that need guaranteed model availability at scale. It introduces predictability but requires a financial commitment. Instead of paying per token, you purchase dedicated "Model Units" to reserve inference capacity, guaranteeing a specific throughput (e.g., processing 20k tokens per minute). The catch is that you are charged a fixed hourly fee regardless of whether you send zero requests or max out the unit. This model acts like a reserved instance; it typically requires a commitment period of 1 to 6 months, reducing your ability to switch models quickly if a better one is released next week.
AWS Bedrock Pricing by Model Provider
Unlike OpenAI, where you pay one vendor, Bedrock is a marketplace. Pricing strategies differ by vendor.
Amazon Titan Models
Amazon Titan models are AWS-native foundation models designed for general-purpose AI workloads. Because they are first-party models, they are typically priced lower than third-party alternatives. This makes them suitable for cost-sensitive production use cases like embedding generation or simple classification, where "good enough" performance at a low price point is the goal.
Third-Party Models (Anthropic, Meta, Others)
AWS Bedrock also provides access to models from external providers like Anthropic, Cohere, and AI21 Labs. Pricing here is generally higher due to the advanced capabilities and external licensing involved. Be aware of the "Output Tax" on high-reasoning models like Claude 3.5 Sonnet, where output tokens are significantly more expensive. Costs here can rise quickly for chat-heavy applications where the model "thinks" or generates long, verbose responses.
How Usage Patterns Affect AWS Bedrock Cost
Your application design impacts your bill just as much as the model price tag. If you have a RAG application that stuffs 10k tokens of context into every prompt, your input costs will dominate your bill. Similarly, Agentic Workflows act as cost multipliers; a single user query might trigger an agent to make five internal calls (search, summarize, plan, critique, finalize), and you pay for every step in that chain. Finally, verbose responses drain budgets -- if you don't limit max_tokens, a model might ramble, and paying for 500 tokens of "fluff" adds up over thousands of users.
The Hidden Costs of the Bedrock Ecosystem
The base token price is often just the tip of the iceberg. Real-world AI applications use "Knowledge Bases" and "Agents," which carry their own separate meters.
Fig 2: The Hidden Cost Stack

Knowledge Bases (Vector Search)
Bedrock Knowledge Bases aren't free magic. Under the hood, they spin up an OpenSearch Serverless vector store. The surprise here is that OpenSearch Serverless has a minimum monthly cost -- often around $700/month for the lowest capacity setting -- even if you have zero traffic. This creates massive "sticker shock" for smaller teams or simple POCs who expected a purely serverless bill.
Agents & Recursive Calls
"Agents" perform multiple steps to answer one user query (Thinking → Searching → Summarizing). The impact is that a single user question might trigger 10x the tokens you expected due to this internal looping and reasoning trace. You pay for the input and output of every intermediate thought, which creates a multiplier effect on your invoice.
CloudWatch Log Costs
To audit your AI, you enable detailed logging. Bedrock sends full prompt/response payloads to AWS CloudWatch Logs. CloudWatch charges high fees for ingestion and storage compared to simple S3 storage. Storing gigabytes of text logs can quietly add hundreds of dollars to your monthly bill, effectively taxing your observability.
Why AWS Bedrock Costs Are Hard to Predict?
Many teams underestimate AWS Bedrock pricing during early experimentation because the variables are hard to isolate. Token usage varies widely based on user behavior; one user might ask a simple question while another pastes a 50-page PDF for summarization. Additionally, developers love to experiment, and switching from Claude Haiku to Claude Sonnet changes your pricing tier instantly, often without Finance realizing until the end of the month. Finally, AWS budgets operate at the account level, making it very difficult to see "How much did the Marketing Bot spend vs. the Engineering Bot?" without complex tagging strategies.
When AWS Bedrock Pricing Makes Sense?
Despite its complexity, AWS Bedrock pricing works well for certain scenarios. If you are already standardized on AWS, the "native integration" tax is worth it for the security and compliance benefits like IAM and PrivateLink. For applications with spiky traffic that are used infrequently or unpredictably, the On-Demand model is perfect because you pay nothing when the app sits idle. It allows early-stage projects to launch quickly without managing GPUs or Kubernetes clusters, validating product-market fit before optimizing costs.
Where AWS Bedrock Pricing Starts Creating Challenges?
As AI workloads mature, teams often encounter structural limitations. At high volumes (millions of requests), the per-token markup becomes expensive compared to owning the compute. Fine-tuning on Bedrock often requires purchasing Provisioned Throughput, which dramatically raises the barrier to entry compared to fine-tuning Llama 3 on your own GPU. Furthermore, multi-team environments lack fine-grained budget enforcement at the application level, leading to "tragedy of the commons" usage where one team drains the shared budget.
How TrueFoundry Changes the Cost Equation
TrueFoundry offers a different approach for teams that need stronger cost control and infrastructure ownership.
Instead of renting an API, TrueFoundry allows you to deploy the same open-source models (Llama 3, Mistral, Qwen) directly onto your own AWS EC2 or EKS clusters. This unlocks Spot Instance pricing. By orchestrating models on Spot Instances (spare AWS capacity), you can run inference for 60-70% less than On-Demand prices.
Crucially, TrueFoundry handles the reliability automatically. If a Spot Instance is reclaimed, it falls back to On-Demand instantly to prevent downtime. Unlike Bedrock Provisioned Throughput, you don't need 1-month commitments. You can scale your fine-tuned models to zero at night and pay nothing.
AWS Bedrock vs TrueFoundry: Detailed Comparison
This is a factual comparison focusing on the economics of serving models.
Ready to Build AI with Predictable Costs?
As AI adoption grows, pricing clarity becomes critical for sustainable scaling. You shouldn't have to choose between innovation and bankruptcy.
TrueFoundry gives you the power to control your AI spend, offering granular visibility into every model and the flexibility to run workloads on the most cost-effective infrastructure available -- whether that's Bedrock API or a Spot Instance in your VPC.
FAQs
Is there a free tier for AWS Bedrock?
No, AWS Bedrock does not have a specific free tier. However, new AWS accounts may have general free tier credits that can apply to some underlying resources, but Bedrock usage itself is billed from the first token.
What are the cost-driving factors of AWS Bedrock?
The main drivers are Token Volume (input/output), Model Selection (Claude is pricier than Titan), Provisioned Throughput commitments, and auxiliary costs like CloudWatch Logs and Knowledge Base vector storage (OpenSearch).
How is TrueFoundry more cost-effective than AWS Bedrock?
TrueFoundry reduces costs by enabling you to run open-source models on AWS Spot Instances, which are significantly cheaper than Bedrock's token rates. It also provides granular caching and routing to prevent redundant expensive calls.
Built for Speed: ~10ms Latency, Even Under Load
Blazingly fast way to build, track and deploy your models!
- Handles 350+ RPS on just 1 vCPU — no tuning needed
- Production-ready with full enterprise support
TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.










