Deterministic vs Agentic Workflows: Lessons from Building a Shopping Assistant
.png)
Diseñado para la velocidad: ~ 10 ms de latencia, incluso bajo carga
¡Una forma increíblemente rápida de crear, rastrear e implementar sus modelos!
- Gestiona más de 350 RPS en solo 1 vCPU, sin necesidad de ajustes
- Listo para la producción con soporte empresarial completo
Lessons in Deterministic Workflows, Agentic Reasoning, and State Management
Building AI assistants often starts with a deceptively simple problem. A user asks a question, the system retrieves information, and an LLM generates a response. The first version usually works surprisingly well.
The real challenge emerges as capabilities expand. As new tools and workflows are introduced, users begin asking questions that span multiple domains. Conversations become longer and more contextual, transitioning the assistant from a simple QA bot into a tool that helps users complete complex tasks.
At TrueFoundry, we encountered this exact challenge while building a conversational shopping assistant. The assistant began as a Product Detail Page (PDP) companion capable of answering questions about a single product. Over time, it evolved into a catalog-wide shopping assistant capable of handling product discovery, retrieval, review summarization, coupon processing, store discovery, inventory checks, fulfillment selection, and cart operations.
While the external capabilities expanded significantly, the most critical engineering challenges emerged in two areas:
- Deciding when to use deterministic workflows versus agentic reasoning.
- Evolving state management from a single-product conversation model into a multi-product shopping experience.
1. System Architecture Overview
The shopping assistant is built on four major architectural layers designed to balance performance, cost, and reliability.
A. Data Layer
The system maintains strict separation between structured and unstructured storage systems:
- Product Catalog: Stored in PostgreSQL (Cloud SQL), containing product metadata, pricing, variants, category information, and fulfillment metadata.
- Reviews: Customer reviews are stored in Qdrant. This allows semantic retrieval of review content rather than relying solely on keyword matching. When users ask semantic questions like "What do customers say about battery life?", the system retrieves relevant reviews and generates an evidence-based summary.
B. Data Ingestion Pipeline
Catalog and review data arrive continuously through Google Cloud Storage (GCS) and are processed via Google Dataflow pipelines:
- Catalog Pipeline: GCS → Dataflow → Cloud SQL (Processed daily).
- Review Pipeline: GCS → Dataflow → Embedding Generation → Qdrant (Processed on both daily and hourly schedules for fresh customer feedback).
C. Model Layer
All model interactions are routed through the TrueFoundry AI Gateway. This provides centralized model access, observability, governance, cost tracking, rate limiting, and model abstraction. The assistant utilizes Gemini 2.5 Flash and Qwen models to balance latency, quality, and operational cost.
D. Orchestration Layer
The conversational workflow is orchestrated using LangGraph, which enables explicit state transitions and workflow management while supporting both deterministic execution paths and agentic reasoning.
2. Challenge #1: Deterministic vs. Agentic Workflows
A common trend in AI systems is to make everything completely agentic under the assumption that if an LLM can reason, it should handle everything. In production, a pure-agent approach quickly becomes expensive, slow, and difficult to control.
We split shopping interactions into two distinct execution paradigms based on clear trade-offs in performance and intent:
Under the Hood: The Mechanics of Execution
The architectural decision to separate these flows is driven heavily by the internal loops, total LLM call counts, and token processing overhead.
[ReAct Loop]
--> (1. Complex Tool-Selection Prompt)
--> [Tool Call]
--> (2. Synthesis Prompt)
[Deterministic Flow]
--> [Tool Call]
--> (1. Light Formatting Prompt)
The ReAct Agent Lifecycle (3-Step Loop)
For complex workflows, a dedicated ReAct agent handles execution dynamically through a three-step cycle:
- Step 1: The Planning Prompt (LLM Call): The system passes the user query along with system prompts containing the definitions of all available tools. The LLM must reason about what the user said, identify which tool to use, and extract the exact parameters required for that tool.
- Step 2: The Tool Execution: The system executes the actual tool or API call using the extracted parameters.
- Step 3: The Observation & Synthesis Prompt (LLM Call): The raw tool response is fed back into the LLM. The model evaluates the tool's response against the user's original question and modifies it into a clean, contextual user response.
The Deterministic Lifecycle (2-Step Shortcut)
Capabilities like product specifications, reviews, and coupons do not need to reason about tool choices or execution ordering. To optimize performance, we bypass the planning phase entirely:
- Step 1: Direct Tool Execution: Because intent classification maps directly to a known tool, the first LLM prompt call is completely skipped. The system executes the tool immediately, cutting out a full inference cycle and reducing the time-to-first-byte.
- Step 2: Response Synthesis (LLM Call): The system passes the raw tool data and the user query directly to a highly focused prompt to format the final answer.
The Latency and Prompt Complexity Penalty
Beyond counting the number of calls, the complexity of the prompt significantly dictates response speeds:
- ReAct Prompt Burden: In Step 1 of a ReAct agent, the prompt is highly complex. The LLM must look across multiple tools, evaluate execution logic, and maintain context. This high cognitive load increases token processing time and generation latency.
- Deterministic Prompt Efficiency: In a deterministic flow, the final prompt is straightforward: "Here is the question, and here is the raw tool response. Give the answer." Because the model doesn't need to evaluate paths or track tools, token generation is fast and lightweight.
Core Lesson: Forcing a ReAct reasoning loop onto basic data queries adds unnecessary latency, cost, and operational complexity without improving answer quality. If an interaction can be handled deterministically, it must be.
Guardrailing the ReAct Agents
Where deterministic paths break down (e.g., "Can I get this today?" involving location resolution, store discovery, and fulfillment checks), ReAct agents are vital. However, fully autonomous agents often produce inconsistent customer experiences. To maintain control, TrueFoundry implemented structured workflow stages inside the dynamic agents:
- Inventory Agent Stages: Location Resolution → Store Selection → Inventory Check → Result Explanation.
- Purchase Agent Stages: Product Validation → Inventory Verification → Fulfillment Selection → Cart Operation → Confirmation.
3. Challenge #2: State Management Evolution
Evolving from a single-product page assistant to a catalog-wide shopping companion is fundamentally a state management problem.
- Phase 1 (Single Product State): Product context was implicitly derived from the active page. The state model was a simple, flat conversation history thread.
- Phase 2 (Multi-Product Conversations): Catalog search allows users to introduce, compare, and switch between multiple products simultaneously.
To scale without introducing system fragility, the state architecture was completely overhauled across four key design principles:
1. Separation of State Scopes
Mixing user data with transient product data creates fragile systems. State was refactored into distinct boundaries:
- User-Level State: Persistent across the entire shopping session (e.g., favorite store, location preferences, fulfillment choices).
- Product-Level State: Tied to specific items (e.g., catalog metadata, active coupons, reviews).
2. Explicit Context Tracking (current_product_id)
We shifted product context from implicit to explicit by introducing current_product_id as a first-class state variable. When a user shifts focus to a new item, updating this single ID automatically flushes and refreshes downstream catalog data, reviews, coupons, and inventory variables.
3. Agent-Specific Thread Memory
Maintaining a single, flat conversation_history = [] array introduced massive token noise and degraded model performance. Search conversations require entirely different context windows than final checkout operations. Memory was scoped directly into isolated agent threads (Search, Inventory, Purchase, Product) to minimize cross-context pollution.
4. Product Reference Resolution
Intent classification alone is insufficient when dealing with a global catalog. When a user asks "Is this available near me?", the system relies on a newly introduced Product Reference Resolution layer to mathematically deduce what item "this" refers to, evaluate if the query is ambiguous, and decide if a clarification prompt is required.
Key Architecture Takeaway: State should represent workflows (Search, Inventory, Purchase) rather than data domains (Reviews, Coupons, Products). Designing state around user intent and workflow progression significantly simplifies orchestration and reduces multi-agent tracking complexity.
TrueFoundry AI Gateway ofrece una latencia de entre 3 y 4 ms, gestiona más de 350 RPS en una vCPU, se escala horizontalmente con facilidad y está listo para la producción, mientras que LitellM presenta una latencia alta, tiene dificultades para superar un RPS moderado, carece de escalado integrado y es ideal para cargas de trabajo ligeras o de prototipos.
La forma más rápida de crear, gobernar y escalar su IA



Controle, implemente y rastree la IA en su propia infraestructura
Blogs recientes
Preguntas frecuentes
When should I use a deterministic workflow instead of a ReAct agent?
Use deterministic workflows when the intent is clear, the tool is known, and the output is predictable, for example, fetching product specs, retrieving coupons, or summarizing reviews. Reserve ReAct agents for multi-step goals where the execution path depends on intermediate results.
How many LLM calls does a ReAct agent use compared to a deterministic flow?
A ReAct loop requires two LLM calls per cycle, one for planning and tool selection, one for synthesis, plus additional cycles if the agent needs to re-evaluate. A deterministic flow skips the planning call entirely, reducing it to a single lightweight synthesis prompt.
How does the system know which product a user is referring to in a multi-product conversation?
A Product Reference Resolution layer evaluates the current state, conversation context, and the current_product_id variable to deduce what item the user means. If the reference is ambiguous, the system triggers a clarification prompt rather than guessing.
Why is agent-specific thread memory important for performance?
A single flat conversation history accumulates irrelevant context across unrelated workflows, increasing token load and degrading model focus. Scoping memory to individual agent threads (Search, Inventory, Purchase) keeps each context window tight and relevant, improving both speed and answer quality.











.webp)
.webp)
.webp)


.webp)
.webp)
.webp)
.png)









