New Launch: truefailover™ keeps your AI apps always on—even during model or provider outages. Learn more

No items found.

AWS Bedrock Pricing 2026: On-Demand, Throughput, and Hidden Costs

January 21, 2026
|
9:30
min read
SHARE

Introduction

AWS Bedrock has emerged as a compelling option for teams that want access to leading foundation models without leaving the AWS ecosystem. By offering fully managed model access from providers like Anthropic, Meta, and Amazon, Bedrock removes the operational overhead of model hosting while preserving tight integration with existing AWS services.

For early experimentation and pilot use cases, AWS Bedrock’s pay-as-you-go pricing and managed infrastructure are attractive. Teams can invoke models through simple APIs, scale traffic on demand, and rely on AWS-native security and compliance controls. This makes Bedrock a natural starting point for organizations already invested in AWS.

However, AWS Bedrock pricing is not a single flat rate. Costs vary significantly based on model selection, input and output token volume, request concurrency, and surrounding infrastructure such as networking, storage, and orchestration services. As usage grows from prototypes to production-grade AI systems, especially those involving RAG pipelines, agentic workflows, or real-time streaming - costs can become harder to predict and optimize.

This blog takes a practical, fact-based approach to explaining how AWS Bedrock pricing works in real-world deployments, where expenses typically escalate at scale, and why many enterprises eventually evaluate platforms like TrueFoundry to gain better cost transparency, workload control, and architectural flexibility for AI systems.

How AWS Bedrock Is Priced?

Before diving into detailed numbers, it’s important to understand the pricing philosophy behind AWS Bedrock.

AWS Bedrock follows a pure usage-based pricing model. There are no platform subscription fees, no minimum commitments, and no upfront infrastructure costs to get started. You pay only when you invoke a model and only for the work that model actually performs.

At a high level:

  • You are billed per model inference, not per deployment or environment
  • Costs are driven by how much data the model processes and generates
  • Pricing differs significantly based on the model provider and model size

For example, invoking a smaller Amazon Titan or Meta Llama model may cost a fraction of invoking a large Anthropic Claude model with long context windows. This flexibility allows teams to choose the “right-sized” model for each workload but it also introduces cost variability as usage grows.

This model works well for experimentation and early production use. However, because pricing is tied directly to inference volume and complexity, costs can scale rapidly when AI features move from internal demos to customer-facing systems.

Understanding AWS Bedrock Pricing Units

AWS Bedrock pricing is fundamentally tied to how models consume resources during inference. To estimate and control costs, teams must understand the billing units involved.

Token-Based Pricing (Most Text Models)

Most large language models on Bedrock use token-based billing, split into two components:

  • Input tokens
    These represent the text (prompt, instructions, conversation history, retrieved context) sent to the model for processing.
  • Output tokens
    These represent the text generated by the model in response.

Both input and output tokens are billed separately, often at different rates.

Example: Token-Based Cost in Practice

Consider a customer support chatbot built on AWS Bedrock:

  • User question + system prompt + conversation history: 2,000 input tokens
  • Model generates a detailed response: 500 output tokens

If the selected model charges:

  • $X per 1,000 input tokens
  • $Y per 1,000 output tokens

Then a single request is billed as:

  • (2 × X) for input
  • (0.5 × Y) for output

Now multiply that by thousands of daily conversations, add longer chat histories, and include RAG context pulled from documents and costs can scale quickly without careful prompt and context management.

Request-Based or Image-Based Pricing (Select Models)

Not all Bedrock models use token-based pricing.

  • Image generation models are often billed per image generated, sometimes varying by resolution or quality
  • Embedding models may charge per request or per batch size
  • Some specialized models use flat per-invocation pricing rather than token counts

This means teams running multi-modal pipelines (text + image + embeddings) must track multiple pricing dimensions simultaneously.

Why Pricing Units Matter at Scale

The key takeaway is that AWS Bedrock pricing is granular and flexible but not inherently predictable.

  • Long prompts, large documents, and RAG pipelines increase input tokens
  • Streaming or verbose responses increase output tokens
  • Higher traffic multiplies costs linearly
  • Different models introduce different pricing curves

Without guardrails, it’s easy for inference costs to grow faster than expected, especially once AI becomes part of a core user workflow.

The Two Core Pricing Models in AWS Bedrock

AWS Bedrock pricing is not limited to simple per-token billing. Teams must also choose how inference capacity is allocated, which directly impacts cost predictability, reliability, and scalability.

At a high level, AWS Bedrock offers two distinct pricing models:

  • On-Demand (pay-as-you-go) for maximum flexibility
  • Provisioned Throughput (committed capacity) for guaranteed availability

Each model represents a trade-off between cost efficiency, reliability, and financial commitment.

On-Demand Pricing (Pay-As-You-Go)

On-Demand pricing is the default option for most teams getting started with AWS Bedrock.

Under this model:

  • You are billed per 1,000 input tokens and per 1,000 output tokens
  • Pricing varies by model provider, model size, and region
  • There are no upfront commitments or reservations

This makes On-Demand pricing attractive for:

  • Early experimentation and proofs of concept
  • Chatbots and AI features with unpredictable or bursty traffic
  • Teams that want to avoid long-term commitments

However, this flexibility comes with important operational limitations.

AWS enforces soft and hard throttling limits on Bedrock’s On-Demand usage, especially during periods of high demand. If the underlying model capacity is constrained, requests may be delayed or rejected, even if you are willing to pay for them. These limits are not always predictable and may change based on regional demand.

For production systems, this introduces risk:

  • AI features may degrade or fail during traffic spikes
  • Latency can increase without warning
  • Teams may need to request quota increases well in advance

In practice, many teams discover that On-Demand pricing is ideal for development and early rollout but insufficient for reliability-sensitive production workloads unless combined with careful capacity planning.

Provisioned Throughput Pricing (Committed Capacity)

Provisioned Throughput is designed for teams that need guaranteed, always-available inference capacity.

Instead of paying per token, you:

  • Purchase dedicated Model Units for a specific foundation model
  • Receive reserved inference capacity with no throttling risk
  • Are charged a fixed hourly rate, regardless of actual usage

This model shifts Bedrock pricing from variable consumption to capacity-based billing.

Key characteristics include:

  • Costs typically range from tens to hundreds of dollars per hour, depending on model size and region
  • Charges apply 24/7, even during idle periods
  • Commitment periods are usually one month or six months

Provisioned Throughput is well-suited for:

  • High-traffic, customer-facing AI applications
  • Latency-sensitive workloads where throttling is unacceptable
  • Enterprises with predictable inference demand

However, it introduces new trade-offs. If your workload fluctuates or remains underutilized, you may end up paying for unused capacity. This makes Provisioned Throughput less flexible and potentially inefficient for teams whose AI usage is still evolving.

Choosing Between Flexibility and Predictability

The choice between On-Demand and Provisioned Throughput is not purely financial—it’s architectural.

  • On-Demand prioritizes flexibility but sacrifices reliability under load
  • Provisioned Throughput guarantees availability but requires capacity planning and long-term commitment

Many teams start with On-Demand pricing, then move to Provisioned Throughput once AI becomes mission-critical. At that point, however, Bedrock begins to resemble traditional infrastructure reservation models, often prompting teams to reassess whether managed inference is still the most cost-effective approach at scale.

AWS Bedrock Pricing by Model Provider

One of the most important and often underestimated factors in AWS Bedrock pricing is model provider selection.

Unlike platforms that apply a uniform pricing layer, AWS Bedrock exposes the native cost structures of each foundation model vendor. This means that two applications with identical traffic patterns can have dramatically different monthly costs depending solely on the model chosen.

Amazon Titan Models

Amazon Titan models are AWS-native foundation models built and operated directly by Amazon.

Key characteristics include:

  • Lower per-token pricing compared to most third-party models
  • Tight integration with AWS IAM, logging, and monitoring services
  • Designed for scalability, reliability, and predictable performance

Because Amazon controls the full stack, from infrastructure to model serving -Titan models are typically the most cost-efficient option on Bedrock.

They are commonly used for:

  • Internal enterprise tools and copilots
  • Document summarization and classification
  • Search, embeddings, and retrieval-heavy workloads
  • Early-stage production systems where cost control is critical

For teams optimizing VPC-level security, IAM governance, and predictable billing, Titan models often provide the best balance between capability and cost. As a result, many enterprises standardize on Titan for baseline workloads and selectively use premium models only where needed.

Third-Party Models (Anthropic, Meta, Others)

AWS Bedrock also offers access to foundation models from external providers such as Anthropic, Meta, and other ecosystem partners.

These models are often chosen for their:

  • Advanced reasoning and conversational quality
  • Larger context windows and stronger instruction-following
  • Superior performance on complex or agentic tasks

However, these benefits come with higher and more variable costs.

Common pricing characteristics include:

  • Higher per-token rates compared to Amazon Titan
  • Output tokens priced significantly higher than input tokens
  • Steeper cost curves for chat-heavy and multi-turn conversations

For example, conversational agents that maintain long histories or generate verbose responses can quickly accumulate output token charges. In multi-step reasoning or agent workflows, where a single user request may trigger several model calls—costs can multiply unexpectedly.

As a result, third-party models are often reserved for:

  • High-value customer-facing experiences
  • Complex reasoning, planning, or analysis taskS
  • Scenarios where model quality directly impacts business outcomes

Why Provider Choice Matters at Scale

In production environments, model choice becomes a financial decision as much as a technical one.

  • Titan models offer cost predictability and operational simplicity
  • Third-party models deliver capability at a premium
  • Mixing models strategically is often necessary to balance quality and cost

Without careful routing, teams may default to premium models everywhere, only to discover that AWS Bedrock costs scale faster than expected as traffic grows.

How Usage Patterns Affect AWS Bedrock Cost

AWS Bedrock pricing is extremely sensitive to how AI applications are designed and used in production. Small architectural decisions at the prompt or workflow level can materially impact monthly spend.

Key usage-driven cost factors include:

  • Long prompts and verbose responses
    Every additional instruction, system prompt, conversation history, or retrieved document increases input tokens. Similarly, detailed or streaming responses inflate output tokens—often priced higher than input tokens. Over time, these “small” additions compound into significant inference costs.

  • Agentic workflows multiply inference usage
    Agent-based systems rarely make a single model call. A typical agent may reason, retrieve data, re-rank results, summarize, and respond, each step triggering a separate inference request. What appears to be one user interaction can result in 5–10 model calls, multiplying token consumption and cost.

  • RAG pipelines add hidden layers of spend
    Retrieval-augmented generation introduces embedding creation, vector search, and context injection before text generation even begins. These steps add both embedding inference costs and larger input prompts, increasing downstream generation expenses.

In practice, Bedrock costs tend to grow non-linearly as applications evolve from simple prompts to multi-step AI systems.

The Hidden Costs of the Bedrock Ecosystem

For many teams, base model pricing is only the starting point. Real-world Bedrock applications rely on additional managed components, each with its own billing model.

Knowledge Bases (Vector Search)

AWS Bedrock Knowledge Bases are not free.

While the Bedrock API abstracts retrieval logic, the underlying vector store is typically powered by Amazon OpenSearch Serverless, which has its own cost structure.

The surprise for many teams:

  • OpenSearch Serverless has a minimum monthly cost, often around $600–$700/month, even with little or no query traffic.
  • This baseline charge applies regardless of how frequently the knowledge base is used.

For small teams or early-stage products, this fixed cost can outweigh model inference spend entirely.

Agents and Recursive Calls

Bedrock Agents simplify orchestration, but they hide cost complexity.

An agent answering a single user question may internally:

  1. Analyze the request
  2. Query a knowledge base
  3. Call a model to summarize results
  4. Refine or re-check the answer

Each step consumes tokens. As a result, a single user query can trigger multiple inference cycles, often consuming 5–10× more tokens than expected.

CloudWatch Logging Costs

For compliance and debugging, teams often enable detailed logging.

  • Bedrock logs are sent to AWS CloudWatch
  • CloudWatch charges for log ingestion, indexing, and retention
  • At scale, these fees are significantly higher than storing logs in S3

In regulated environments, logging costs can quietly become a meaningful part of total spend.

Why AWS Bedrock Costs Are Hard to Predict

Many teams underestimate AWS Bedrock pricing during early experimentation. The difficulty lies not in the pricing itself but in forecasting how usage will evolve.

Key challenges include:

  • Highly variable token usage
    User behavior, prompt design, response verbosity, and document size all influence token counts. Two identical users can generate very different costs.
  • Model-level pricing fragmentation
    Each model provider has distinct pricing for input, output, embeddings, and images. Experimentation across models quickly becomes expensive without strict controls.
  • Limited per-application visibility
    AWS budgets and alerts operate primarily at the account or service level. In multi-team environments, attributing Bedrock costs to individual applications or features is difficult.

As a result, finance and platform teams often struggle to explain why costs increased, only that they did.

When AWS Bedrock Pricing Makes Sense

Despite its complexity, AWS Bedrock remains a strong choice in several scenarios.

It works well for:

  • Teams already standardized on AWS
    Bedrock integrates seamlessly with IAM, VPCs, KMS, and AWS compliance tooling.
  • Early-stage AI initiatives
    Teams can launch quickly without managing inference infrastructure, scaling, or model serving.
  • Regulated industries
    AWS certifications and security controls help meet baseline regulatory requirements without custom setups.

For experimentation, pilots, and moderate-scale production use, Bedrock offers convenience and speed.

Where AWS Bedrock Pricing Starts Creating Challenges

As AI workloads mature, structural limitations in Bedrock’s pricing model become more visible.

Common friction points include:

  • Unpredictable monthly spend
    Token-based billing scales linearly with usage, but usage rarely grows linearly in real products.
  • Limited infrastructure-level optimization
    Teams cannot control instance types, spot pricing, or autoscaling strategies for inference.
  • Weak cost isolation in multi-team environments
    Multiple applications sharing the same AWS account struggle with cost attribution and enforcement.

At this stage, teams begin evaluating alternatives, not to replace Bedrock entirely, but to regain control.

How TrueFoundry Changes the Cost Equation

TrueFoundry takes a fundamentally different approach.

Instead of abstracting infrastructure behind token pricing, TrueFoundry lets teams deploy the same open models (Llama, Mistral, fine-tuned variants) directly on their own AWS EC2 or EKS clusters.

Key cost advantages include:

  • Spot Instance–backed clusters that reduce inference costs by 60–70% compared to on-demand pricing
  • Automatic fallback to on-demand instances to prevent downtime
  • No long-term commitments - models can scale to zero during off-hours, incurring zero cost

This shifts AI spend from opaque usage meters to controllable infrastructure economics.

AWS Bedrock vs TrueFoundry: Cost and Control

In practice, enterprises find TrueFoundry more cost-effective for heavy or customized workloads. Because TrueFoundry supports any open-source model and fine-tuning in your environment, you avoid per-token fees on third-party endpoints. By contrast, Bedrock charges for every model call and includes AWS’s margins.

Feature AWS Bedrock TrueFoundry
Pricing Model Pay-per-use (token/hourly). No free tier (new accounts may use AWS credits). On-demand rates vary by model/provider. Provisioned throughput billed hourly per unit with 1- or 6-month commitments. Platform subscription + your own compute. No token fees. You provision any cloud or cluster as required.
Cost Control AWS-managed endpoints with fixed per-token pricing. Limited optimization levers (batching, smaller models, caching). Usage spikes directly increase spend. Full control over instance size, autoscaling, and spot usage. Fine-grained cost allocation and usage reporting. Teams often reuse idle capacity across workloads.
Model Flexibility Curated catalog (Titan, Claude, Llama, etc.). No direct open fine-tuning endpoints; must use Bedrock-managed workflows with token-based costs. Any open-source or custom model supported. Add models easily via UI or API. Native support for HuggingFace models and custom pipelines.
Fine-Tuning Supported via AWS-managed supervised or reinforcement fine-tuning. Billed by tokens and storage. Serving custom models requires provisioned throughput. Fully supported on your infrastructure. Distributed training via TrueFoundry UI/API. More cost-efficient—no token markup, only compute cost.
Infrastructure Fully AWS-owned and managed. Built on AWS services like Lambda, ECS, and OpenSearch. Limits and scaling policies controlled by AWS. Customer-owned infrastructure. Deploy in your VPC or on-prem data center. Full visibility and control for compliance and sovereignty needs.
Data Privacy Data remains within AWS. Prompts and responses are not used for model training by default. Data stays entirely within your environment. Full control over retention, isolation, and governance.

FAQ

Is there a free tier for AWS Bedrock?

Bedrock is a paid service. It isn’t covered by AWS’s “always free” tier, so you’ll incur charges per usage. (However, new AWS accounts do get temporary credits – e.g. AWS now offers $200 in free credits to spend on services including Bedrock.)

What are the cost-driving factors of AWS Bedrock?

The main drivers are (1) compute (model selection and instance capacity); (2) model pricing (which foundation model or provider you use); (3) storage (e.g. fine-tuned model hosting, vector DB size); and (4) data transfer. In practice, token usage (prompt+response length), choice of model (Llama vs. Titan vs. Claude), batch vs. on-demand, and additional services (Guardrails filters, agent orchestration, logging) all compound costs.

How is TrueFoundry more cost-effective than AWS Bedrock?

TrueFoundry lets you run open-source models on your own infrastructure, eliminating pay-per-token fees. You pay for the TrueFoundry software (seat/subscription) plus your own compute; heavy usage can use spot instances or existing GPUs. Customers report TrueFoundry cutting cloud AI spend roughly in half. In contrast, AWS Bedrock’s all-inclusive model has no hard cap – your bill rises with usage. For bursty or large-scale workloads where you can optimize capacity, TrueFoundry often yields lower total cost and higher control over resources.

The fastest way to build, govern and scale your AI

Discover More

No items found.
January 27, 2026
|
5 min read

AWS Bedrock Pricing Explained: Everything You Need To Know

No items found.
January 26, 2026
|
5 min read

10 Best AI Observability Platforms for LLMs in 2026

No items found.
January 26, 2026
|
5 min read

What Makes a Leading AI Gateway for LLM Workload Optimization in 2026?

No items found.
January 26, 2026
|
5 min read

Agent Gateway: Unifying Multi-Agent AI Workflows for Enterprises

No items found.
No items found.

The Complete Guide to AI Gateways and MCP Servers

Simplify orchestration, enforce RBAC, and operationalize agentic AI with battle-tested patterns from TrueFoundry.
Take a quick product tour
Start Product Tour
Product Tour