Blank white background with no objects or features visible.

TrueFoundryはSeldon AIの買収を発表し、エンタープライズAI向けコントロールプレーンを拡張します。プレスリリース全文はこちら→

エンタープライズグレードでのループエンジニアリング:ラップトップループからガバナンスされたランタイムへ

By Boyu Wang

Published: July 4, 2026

In the first week of June 2026, the AI-coding conversation reorganized itself around one idea. Peter Steinberger — creator of the OpenClaw agent project — posted that the skill is no longer prompting coding agents but designing the loops that prompt them; Boris Cherny, who heads Claude Code at Anthropic, put his own job the same way: "My job is to write loops." Addy Osmani's widely shared essay gave the practice a name — loop engineering — and an anatomy: automations, worktrees, skills, connectors, sub-agents, and external state. The idea is real and worth taking seriously. It's also, as practiced, an individual-developer pattern running on individual laptops — and the gap between a loop in your terminal and a loop an enterprise can run unattended at 3 a.m. is exactly the gap this post is about.

Key Takeaways

  • Loop engineering is designing the system that prompts your agents: it finds the work, hands it out, checks the result, records what's done, and decides the next thing — design once instead of prompting per turn.
  • The anatomy is converging across tools (per Osmani's breakdown of the Codex app and Claude Code): scheduled automations, isolated workspaces, skills, MCP connectors, maker/checker sub-agents, and state outside the context window.
  • Loops are not free wins: practitioners' own caveats say they pay off when the task repeats, verification is automated, and the token budget can absorb waste — an unattended loop also makes mistakes and spends money unattended.
  • The enterprise gap is the runtime: laptop loops die with the laptop, hold credentials in cron jobs, can't pause days for an approval, and multiply into automation sprawl nobody can inventory.
  • Every loop primitive has an enterprise-grade equivalent on a managed runtime: skills in a versioned registry, connectors behind an MCP gateway, sub-agents with their own identity and scope, durable state, and API-triggered runs.
  • Governance adds precisely what unattended operation requires: human-in-the-loop gates on destructive actions, per-agent budgets and rate limits, guardrails on every call, and a per-step trace of every run.
  • TrueFoundry's Agent Harness — on the AI Gateway and MCP Gateway — is that runtime: the loop primitives as governed platform features, deployable SaaS, self-hosted, or on-prem.

Noor, a staff engineer, built the loop everyone now describes: a morning automation that reads overnight CI failures and open issues, drafts fixes in isolated checkouts, has a second agent review them, opens PRs, and leaves a tidy state file saying what's done and what's next. For three weeks it was the best tool on the team — and then it became the subject of three uncomfortable meetings. Finance asked why one weekend's token bill spiked: the loop had spent two days enthusiastically retrying against a broken test environment with nobody watching. Security asked where the loop's credentials lived: a config file on Noor's laptop, holding her personal tokens, running under her identity at 6 a.m. And her director asked the question with no answer at all: four other engineers had copied the pattern, each differently — so how many loops did the org have, what could each touch, and who approved any of it? The loop worked. The runtime was a laptop, and everything wrong traced back to that.

This post takes loop engineering seriously on its own terms — what it is, why it's genuinely the next layer of the discipline, where its sharp edges are — and then does the mapping Noor's director was implicitly asking for: each primitive of the loop, translated from laptop convention to governed infrastructure. The vocabulary follows Osmani's essay and the surrounding discussion; the enterprise half follows TrueFoundry's Agent Harness documentation. The honest summary of the gap is one sentence: loops are a runtime problem wearing a workflow costume.

1. What Loop Engineering Actually Is

Strip the discourse and the definition is compact. For two years, working with a coding agent meant holding it: you prompt, you read, you prompt again — the human is the loop. Loop engineering replaces that with a designed system: something finds the work (a schedule scanning CI, issues, error feeds), hands it to an agent, checks the result with a verifier that isn't the maker, records what happened in state that outlives the conversation, and decides what's next. You build it once; it prompts the agents from then on. Osmani places it "one floor above" harness engineering — the harness runs one agent through one task; the loop decides which tasks, when, and what counts as done. Some writers describe it as two nested loops: an inner loop doing work against a spec, an outer loop watching the world and writing the next spec.

If that sounds familiar from this series, it should — the loop is the plan→act→observe pattern promoted one level up, with the same anatomy our harness deep dive took apart: stop conditions (here, "run until the tests pass," judged by a separate model), error handling, and state. Simon Willison was writing about designing agentic loops back in September 2025; what changed in mid-2026, as Osmani notes, is that the primitives stopped being a pile of personal bash and started shipping inside the products — which is exactly the moment a practice stops being a hack and starts being infrastructure. Infrastructure, as we'll see, with infrastructure's requirements.

2. The Anatomy: Six Primitives

Osmani's breakdown — which maps near-identically onto both the Codex app and Claude Code, the convergence being his most interesting observation — gives the loop six parts. Automations are the heartbeat: prompts that fire on a schedule, do discovery and triage, and surface findings to an inbox; plus run-until-done primitives where a goal executes until a verifiable condition holds. Isolated workspaces (git worktrees, in the coding case) keep parallel agents from colliding on the same files. Skills — the SKILL.md pattern — codify project knowledge once, so the loop stops re-deriving your conventions every cycle. Connectors, built on MCP, plug the loop into real systems — the difference between an agent that says "here's the fix" and a loop that opens the PR and updates the ticket. Sub-agents split the maker from the checker, because the model that wrote the code grades its own homework too kindly. And external state holds what's done and what's next, because the model forgets everything between runs and the memory has to live outside the context.

Fig 1: The loop's cycle (after Osmani's anatomy) and the control layer enterprise operation adds beneath it. A laptop loop has the top half; a governed runtime has both.

Two things stand out before we translate this list. First, almost none of it is novel machinery — schedules, isolation, packaged knowledge, integrations, delegation, durable state are the oldest ideas in operations, newly arranged around a model. Second, and more importantly: every primitive is a capability grant. An automation is standing permission to act on a schedule; a connector is standing access to a real system; a sub-agent is a delegated identity. On a laptop those grants are invisible. At an organization's scale, they're the whole question.

3. An Honest Account of When Loops Pay

The discourse is days old and already has a healthy skeptical wing, which deserves airtime before the enterprise translation — translating a bad idea into governed infrastructure just yields governed waste. The practitioners' own conditions for loops paying off, as early commentary converged on them: the task repeats (a loop amortizes design cost across runs; for a one-off, a good prompt is faster); verification is automatable ("all tests pass and lint is clean" is a real stop condition, "looks good" is not); the token budget can absorb the waste (Osmani's own essay leads with the cost warning, and Noor's weekend bill is the canonical incident); and the agent already has the tools the task needs. Miss one and the loop costs more than it returns.

There's also a deeper caution the brute-force-loop enthusiasm tends to skip: failure rates stack. A loop chaining five steps, each 95% reliable, completes cleanly about three-quarters of the time — and an unattended loop's mistakes compound into state, where tomorrow's run builds on them. This is why the maker/checker split is load-bearing rather than decorative, and why Willison's old joke — an agent as "an LLM wrecking its environment in a loop" — is the right thing to keep taped above the automation tab. None of this argues against loops; it argues that a loop is a production system — and the rest of this post treats it as one.

4. The Enterprise Gap: the Laptop Is the Wrong Runtime

Now run Noor's three meetings as an engineering review. Lifecycle: her loop's runtime was her machine — it dies with the laptop lid, doesn't survive a crash mid-run, and can't pause for a three-day approval, because nothing durable holds the run. One early playbook drew exactly this line: the moment a loop must run at 3 a.m. with no terminal open, survive crashes, and wait indefinitely on a human, you've left tool territory for runtime territory. Credentials: the loop ran as Noor — her tokens, in plaintext config, exercising her permissions on a schedule, indistinguishable in any audit log from Noor herself; a standing automation holding a person's keys is the non-human-identity problem in miniature. Cost: nothing stood between the loop and the provider's meter — no budget, no rate limit, no per-run attribution; the spend was discovered on the invoice. Inventory: five engineers, five divergent copies, zero registry — our coding-agent governance and agent-sprawl story arriving through the side door of productivity tooling.

None of these are flaws in loop engineering; they're properties of the laptop as a runtime, invisible right up until the loop is good enough to matter. The fix isn't banning the pattern — bans produce shadow versions of it — but giving the same six primitives a home built for unattended, multi-tenant, audited operation. That's the mapping the rest of this post does.

5. The Primitives, Translated to a Governed Runtime

Here is the translation, primitive by primitive, with TrueFoundry's Agent Harness as the worked example. Skills translate almost verbatim: the same SKILL.md artifact, but published to the Skills Registry with versions, provenance, and RBAC, and mounted on demand — codified knowledge as a governed catalog instead of a folder on five laptops. Connectors translate to the MCP Gateway: the same MCP servers, reached through a registry with central auth, per-tool RBAC, guardrails, and per-user OAuth delegation — scoped, rotatable credentials instead of whatever tokens were in Noor's config file. Sub-agents translate to harness sub-agents: each checker or explorer with its own isolated context, scoped tool access, and trace — the maker/checker split with an identity boundary, not just a prompt boundary. Isolated workspaces translate to the harness Sandbox: a secure execution environment per run — the worktree idea generalized into "every agent gets its own machine." State becomes durable, platform-held run state that survives crashes and laptop lids. And automations become triggered runs: the same agent definition invoked on schedule or event via REST API or SDK, results landing in traces and notifications rather than a terminal scrollback.

TrueFoundry Agent Harness builder: model selection, MCP servers, skills, instructions, and a playground test button
Fig 2: The loop's parts as platform objects: model, MCP servers, skills, instructions — one governed definition. Source: TrueFoundry Agent Harness docs.

The structural difference the translation buys is the one the laptop can't: no keys anywhere in the loop. Agent definitions reference models, MCP servers, and skills by name; provider credentials live in the AI Gateway, tool auth in the MCP Gateway, configured once by the platform team. Noor's loop, rebuilt this way, runs as itself — a registered agent with its own identity and grants — and "what can this loop touch?" turns into a configuration lookup, not an archaeology project.

Noor's morning loop as a governed agent definition (illustrative)

agent:
  name: morning-triage-loop
  model: claude-sonnet-4-6            # by name — keys live at the gateway
  instructions: ./triage.md@v4         # versioned scaffold
  skills: [ci-triage-runbook@v2]       # from the Skills Registry, RBAC'd
  mcp_servers: [github, ci, linear]    # via MCP Gateway: scoped, rotatable auth
  subagents:
    - { name: fix-drafter,  mcp_servers: [github] }
    - { name: fix-reviewer, mcp_servers: [] }      # checker ≠ maker, read-only
  harness:
    max_steps: 60
    max_tokens_per_run: 1_500_000      # the weekend incident, capped
  approvals: { merge_pr: pause_for_human }          # waits days if it must
trigger: { schedule: "0 6 * * 1-5" }   # the heartbeat — no laptop required

6. Unattended by Design: Approvals, Guardrails, and Stop Conditions

The defining property of a loop is that nobody is watching, so the runtime's job is to make "nobody is watching" safe. Three mechanisms carry that load. Human-in-the-loop gates: the harness pauses a sensitive tool call — merging a PR, mutating a production system — and holds the run's state durably until someone approves; the loop doesn't die because a human took the weekend. Critically, destructive tools are flagged once at the MCP Gateway, org-wide — the policy applies to every loop automatically, instead of depending on each of five engineers remembering to configure it. Guardrails: every model call and tool call passes the gateway's pre- and post-call checks — PII, content policies, custom rules — so a loop processing real tickets at 6 a.m. is screened exactly like an interactive session at 2 p.m. Stop conditions: run-until-done is only as safe as its bounds, so step ceilings, token budgets, and stall detection are harness configuration, not hope — the difference between "the loop converged" and "the loop stopped when its budget said so," with the trace telling you which.

The Agent Harness orchestrating a run: model, tools and MCP servers, sandbox, and approval gates, with guardrails enforced and every step recorded
Fig 3: The runtime under the loop: model, governed tools, sandbox, and approval gates, with guardrails enforced and every step recorded. Source: TrueFoundry docs.

This is also where the maker/checker idea completes itself. Osmani's verifier sub-agent checks the work; the runtime's gates check the actions. A loop can have an excellent reviewer agent and still need a human between it and an irreversible merge — and the harness is where that judgment becomes enforcement.

7. コスト計測:ループは生産性向上パターンである以前に、コストパターンである

オスマニのエッセイは、後に請求書で裏付けられる注意点から始まる。トークンコストは大きく変動し、監視されていないループにはコスト管理に人間が介在しないというものだ。TrueFoundryのコスト管理スタックは、この財務上の予期せぬ事態を、3つの層からなる運用パラメータへと変える。まず、 主体ごとの厳格な制限: 予算、クォータ、レート制限が、ループ自身のエージェントIDとして適用されます。これらはゲートウェイでGitOps YAMLとして宣言され、エージェント、チーム、ユーザー、モデルごとに強制可能です。これにより、ヌールが週末に起こしたインシデントも、エージェントごとの予算に照らして再実行されれば、月曜日の請求書ではなく、上限付きの実行と土曜の夜のアラートで済むようになります。次に、 実行ごとの帰属: ハーネスは、トークンと、モデル呼び出し、ツール呼び出し、サンドボックス実行ごとのコストで、すべての実行を追跡します。これにより、チェッカーのコストが作成者のコストとは別に可視化され、セクション3で述べた「繰り返し」と「単発」の判断が、感覚ではなく数値に基づいて行われるようになります。最後に、 ロールアップ: ゲートウェイのコスト分析は、ループごと、チームごと、モデルごとの支出を時系列で集計します。これにより、「ループは費用対効果があるのか?」という問いが議論の対象ではなくなり、ダッシュボードで一目でわかるようになります。

TrueFoundry cost management dashboard: AI spend broken down per team, user, model, and agent over time
図4:ロールアップ層:エージェント、チーム、モデルごとの支出の帰属。各ループの経済性は、プロバイダーの請求書の残余ではなく、第一級の明細項目として追跡されます。出典: TrueFoundry.
End-to-end agent run trace with per-step model calls and tool calls, each carrying cost and latency
図5:すべてのループ実行は、コストが添付されたステップごとのトレースとして記録されます。これにより、ループの経済性や午前3時の挙動が、推測ではなく明確に読み取れるようになります。出典: TrueFoundry.

このトレースは、コスト管理以外にも価値を発揮します。監視されずに実行され、奇妙なプルリクエスト(PR)を生成したループは、フォレンジック調査の対象となり、その完全なステップごとの記録が役立ちます。

8. ランタイムがしないこと:あなたはエンジニアのままである

オスマニは、エンタープライズ翻訳においてそのまま維持すべき警告でエッセイを締めくくっています。それは、「ループは仕事を変えるが、あなたを仕事から排除するわけではない」というものです。検証は依然としてあなたの責任であり、チェッカーサブエージェントがループの「完了」を 意味あるものにするのですが、完了は主張であって証明ではありません。ループが改善するにつれて、理解負債はより速く増大します。存在する情報とあなたが理解する情報の間のギャップは、未読のプルリクエスト(PR)ごとに拡大していきます。そして、快適な姿勢こそが危険な姿勢なのです。同じループは、自分が理解している仕事に使うエンジニアにとっては強力なテコとなりますが、理解することを避けるために使うエンジニアにとっては衰退を加速させるものとなります。管理されたランタイムは、その記録を変えるわけではありません。組織が何を 把握できるかを変えるのです。 のそれ、つまりレビュー率、承認パターン、ループごとの結果を把握することです。これは管理するための前提条件であり、判断の代わりではありません。

ランタイムが変えるのは、そのパターンを誰が利用できるかという点です。今日のループエンジニアリングは、個人的な自動化リグを維持しようとするエンジニアを選びます。マネージドハーネス上では、誰でも定義を作成できます。ノーコードビルダーでもSDKでも同じ成果物であり、危険な部分は構築によって処理されます。このバージョンでのヌールの5回目の会議は短く、登録された11のループ、それぞれが触れるもの、先週のループごとのコストと結果、そして新しいループは同じレール内で火曜日に出荷されます。コンプライアンスの負担を伴うループではなく、常に暗黙的に想定していたランタイムを持つループです。

9. よくある質問

一人のエンジニアの朝のループに、エンタープライズランタイムは必要ですか? 一つのループで、リスクの低いターゲットに対して、作成者が費用を監視している場合、おそらく必要ありません。そうでないと主張するのはベンダーのパフォーマンスでしょう。ランタイムが必要となるのは、ループの数が増え、リスクが高まる場合です。複数のループ、共有システム、実際の認証情報、無人での稼働時間などです。ヌールの話は典型的な軌跡です。パターンは機能するため広がり、ガバナンスのギャップも同じ速さで拡大します。

ループは停止せずに人間による承認のために一時停止できますか? ラップトップ上では、実質的に不可能です。実行の状態は、月曜日まで存続しないプロセス内に存在します。ハーネス上では可能です。ヒューマン・イン・ザ・ループのゲートが、誰かが決定するまで実行を永続的に保持するため、「ループがPRを開き、人間がマージする」ということが、単なる願望ではなくアーキテクチャとなります。

ループを禁止することなく、ループの乱立をどう防ぎますか? 管理されたパスを最も簡単なパスにしましょう。レジストリは「いくつあり、何に触れることができるか」に答え、中央の認証情報は、トークンを構成ファイルに貼り付ける理由をなくします。そして、午後にはトレースされ、予算化されたループを出荷するビルダーは、利便性の点で個人のバッシュスクリプトの山を凌駕します。禁止はシャドウループを生み出し、より良いデフォルトは登録されたループを生み出します。

メイカー/チェッカー分離は本当に役立つのでしょうか、それとも単にコストを倍増させるだけでしょうか? それは実際のトークンを消費し、選択的にそれらを得ます。間違いが高くつく場所でセカンドオピニオンを利用しましょう。チェッカーには異なる指示(理想的には異なるモデル)を与え、読み取り専用に保ちます。編集できるチェッカーは、単なる2番目のメイカーにすぎません。ステップごとのトレースは、ループごとに、それが費用に見合うだけの効果を上げているかどうかを示します。

ループエンジニアリングの瞬間は現実のものです。レバレッジポイントはプロンプトから、プロンプトを生成するシステムへと移行しました。しかし、ループはあなたが関与せずに実行された瞬間から本番システムとなります。本番システムには、ID、境界、ゲート、トレース、そしてラップトップが閉じても終了しないランタイムが必要です。オスマニが言うように、エンジニアであり続けるつもりでループを設計しましょう。私たちが付け加えるなら、実行されるために構築されたものの上でそれを実行しましょう。

参考文献

  • Addy Osmani — 「Loop Engineering」 (2026年6月) — この投稿が依拠する解剖学、引用元を明記して言い換え。彼のハーネスエンジニアリングと長期実行エージェントに関するエッセイも参照。
  • ピーター・シュタインバーガーとボリス・チェルニーによる2026年6月の解説。エージェントにプロンプトを出すのではなく、ループを設計することについて — オスマニのエッセイと当時の報道を通じて。
  • Simon Willison — 「Designing agentic loops」 (2025年9月) — ループ設計を重要なスキルとして早期に明確化したもの。
  • TrueFoundry Agent Harness — マネージドランタイム:定義、スキル、サンドボックス、承認、トレース、デプロイ可能なSaaS/セルフホスト/オンプレミス。
  • TrueFoundry MCP ゲートウェイゲートウェイにおけるAI FinOps — ループ下のコネクタガバナンスおよびコスト強制レイヤー。

NorthwindとNoorは例示的な複合体であり、特定の組織や事件ではありません。ループエンジニアリングの枠組みは、Addy Osmani氏の2026年6月のエッセイおよび関連するコミュニティの議論(Steinberger、Cherny、および初期の分析)に基づいています。引用元を明記して要約しており、引用されたフレーズは15語未満で出典が明記されています。CodexアプリおよびClaude Codeのツール機能は、執筆時点での公開議論に基づいて記述されており、急速に進化しています。設定スニペットは読みやすさのために簡略化されており、文字通りの製品スキーマではありません。セクション3の信頼性計算は例示的なものです。TrueFoundryの機能は執筆時点での公開ドキュメントを反映しています。最新のドキュメントでご確認ください。

The fastest way to build, govern and scale your AI

Sign Up
Table of Contents

One Gateway for Every LLM, Agent and MCP Server

Book a 30-min with our AI expert

Book a Demo

The fastest way to build, govern and scale your AI

Book Demo
Summarize with
ChatGPT logo by OpenAI
Perplexity AI logo
Blurry red snowflake on white background, symmetrical frosty design with soft edges and abstract shape.

Discover More

July 20, 2023
|
5 min read

LLMOps CoE: MLOpsランドスケープにおける次のフロンティア

August 27, 2025
|
5 min read

AI Gateways: From Outage Panic to Enterprise Backbone

OpenRouter vs AI Gateway
July 4, 2026
|
5 min read

OpenRouter 対 AIゲートウェイ:どちらがあなたに最適ですか?

comparison
July 4, 2026
|
5 min read

プロンプトエンジニアリング:LLMとの対話方法を学ぶ

Thought Leadership
LLMs & GenAI
July 4, 2026
|
5 min read

True ML Talks #12 - Llama-Index共同創設者

True ML Talks
July 4, 2026
|
5 min read

AIワークロードがクラウド料金を膨らませていませんか?

Thought Leadership
No items found.

Recent Blogs

Black left pointing arrow symbol on white background, directional indicator.
Black left pointing arrow symbol on white background, directional indicator.
Take a quick product tour
Start Product Tour
Product Tour