Building Resilient Web Automation: From Web Scraping to Semantic Web Operating

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

‍

The API Gap

It is a familiar 10AM scenario for operations teams: a critical workflow needs to be automated—verifying supplier inventory, conducting competitive pricing analysis, or securing reservations—but the target platform offers no programmatic access.

While we live in an era of connectivity, many high-value platforms lock their data behind "digital moats." They lack public APIs, forcing developers to rely on the fallback of web scraping. However, traditional scraping is notoriously fragile. It relies on "brittle selectors"—hardcoded CSS paths or XPaths (e.g., div.btn-primary) that break the moment a frontend developer changes a class name to btn-submit.

To address this, we built the Restaurant Booking Automation Accelerator. It is a reference implementation for a new class of automation: resilient agents that do not just "scrape" the web, but operate it.

The Shift: From Selectors to Semantic Intent

The core innovation in this accelerator is moving away from the Document Object Model (DOM) to the Accessibility Object Model (AOM).

In a traditional script, if a button moves from a sidebar to a header, the automation fails. In this agentic system, we provide the inference engine with a snapshot of the Accessibility Tree. This is a semantic representation of the page designed for screen readers, stripping away styling divs to reveal the core utility of the interface.

This allows the system to reason based on intent rather than coordinates: "I see a calendar widget; I will click the date '15th' because that matches the user's request." If the site undergoes a redesign but the semantic role of the button remains "Confirm Booking," the agent self-heals and the workflow succeeds.

Architecture: The Controller-Worker Pattern

We structured the application using a specialized Controller/Worker pattern. Rather than a monolithic script, we have distinct agents utilizing Playwright for execution and LLMs for decision-making.

Figure 1: High-Level Architecture

As shown in the architecture diagram, the Workflow Controller manages the state, delegating tasks to two specialized components:

The Search Agent (Discovery): This agent manages the non-linear "shopping" phase.
- Dynamic URL Construction: Instead of clicking through five landing pages, it constructs query parameters (e.g., ?cuisine=italian&party_size=4) to navigate directly to results.
- Contextual Extraction: It identifies "cards" in the UI to extract ratings, prices, and time slots without needing specific HTML tags.
- Adaptive Navigation: It treats pop-ups and cookie banners as "obstacles" to be dismissed rather than errors that crash the script.
The Booking Agent (Transaction): Once a target is selected, this agent handles the stateful, high-precision interaction.
- Semantic Form Mapping: It maps user data to input fields based on labels (First Name) rather than arbitrary IDs (input#user_fname).
- Temporal Reasoning: It navigates time-pickers and handles "sold out" states, capable of logic like selecting a 7:15 PM slot if the requested 7:00 PM is unavailable.

Infrastructure: TrueFoundry & The Model Context Protocol (MCP)

Running these agents in production requires a robust control plane. We utilize the TrueFoundry Platform to manage the infrastructure and the Model Context Protocol (MCP) to standardize the browser integration.

Figure 2: How TrueFoundry supports the application lifecycle

TrueFoundry AI Gateway: This provides the necessary unified management and observability. We can centrally monitor every "thought" the agent has, logging AOM snapshots and decision trees. Crucially, it enforces rate limiting, ensuring our agents act as good citizens and do not overwhelm target servers.
MCP & Isolation: MCP abstracts the browser capabilities into standardized tools. The platform ensures that every user session runs in an isolated container. This means User A's session cookies and local storage are physically separated from User B's, eliminating the risk of data cross-contamination.

User Experience: Supervised Autonomy

For transactional workflows, we implement a "Verify-then-Execute" pattern. The agent performs the heavy lifting of discovery but requires human confirmation before final execution.

Step 1: Intent & Discovery

The system accepts natural language inputs and normalizes them into structured JSON (Location, Time, Party Size) for the Search Agent.

Step 2: The Confirmation Gate

Upon finding a slot, the Booking Agent pauses. It presents the details to the user and enters a WAIT state, proceeding only after receiving a clear signal.

Engineering for Edge Cases: The WAF Problem

The most critical test of a web agent is its ability to handle "Human-in-the-Loop" (HITL) scenarios. Modern sites often use Web Application Firewalls (WAFs) that trigger CAPTCHAs or email verification codes when they detect automation.

A standard script fails here. Our system uses a Pause-and-Resume State Machine.

Figure 3: Exception Handling State Logic

As detailed in the diagram above (Steps 7-11), when the agent detects a challenge prompt:

It halts execution and notifies the user via the chat interface.
The browser session remains alive (maintained within the container's TTL).
Once the user provides the code, the agent resumes the session seamlessly to complete the booking.

Conclusion: Web Operating

We are moving from "Web Scraping" to "Web Operating." By leveraging Playwright for the "hands" and semantic inference for the "eyes," we can treat the human-facing web as a programmatic interface.

This accelerator demonstrates that with the right architecture—semantic interpretation, stateful orchestration, and secure infrastructure like TrueFoundry—we can build resilient automations that bridge the API gap.

Explore TrueFoundry Accelerators

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now