Smart Agents Need Smart Context: The Four Motions of a Context Layer

At BackstageCon Europe on March 23, 2026, Roadie's Head of Product Sam Nixon shared a number that got people's attention: the agent-to-human interaction ratio on Roadie's platform has reached 100:1 on certain days. One hundred automated actions for every human decision. In Roadie's own usage, the bulk of support requests and on-call alerts are now handled without engineer involvement - though Sam was candid that this is partly a function of Roadie operating its own system end-to-end.

Most enterprise AI deployments aren't producing results like that. Teams have capable models. They've written careful prompts. They've shipped workflows that run perfectly in demos. And then in production - on a real incident at 2am on a Monday - the agent gives them an answer that's technically coherent and completely wrong. The gap is the context, not the model.

What agents are actually reasoning from

When an agent fails in an engineering workflow, the first instinct is to diagnose the model. Swap to a smarter one, improve the prompt, adjust parameters. Sometimes that helps. More often the failure is upstream: the agent was reasoning from thin, stale, or structurally ambiguous input.

A context window full of files is not the same as a context window full of facts.

Most of what I see teams doing with context right now just doesn't work. And it fails in a specific way: they've connected their tools to the agent but haven't built the layer between them. The agent has access to information. It doesn't have authoritative, structured context.

In March 2026, Andy Chen - an engineer at Abnormal Security - published a detailed account of building an enterprise context layer from scratch. The piece is worth reading because it makes a distinction that most vendor messaging elides: retrieval and synthesis are different problems. A retrieval system finds the best-matching document. Synthesis produces the judgment call - which source to trust when three docs contradict each other, whether this service is safe to deploy right now, when to escalate to a human. Current tool stacks conflate the two. They give agents access to documents and hope reasoning handles the gap.

The token budget compounds this. Apideck published benchmarks showing that connecting three standard developer tool servers - GitHub, Slack, and Sentry - consumes 143,000 of Claude's token context window before the agent has processed a single message. 14% of the budget, gone, on tool definitions, assuming you're using the 1M version of Opus. Think about that: you haven't asked a question yet, and you've already spent an eighth of your reasoning budget on overhead. Teams running at 100:1 aren't working around this - they've built a different architecture.

The four motions

On Roadie's platform, the context layer is four operations that work in sequence - what we call the four motions. Each one solves a distinct part of the problem, and skipping any of them shows up in production.

Pull in data

The first motion is integration: repos, deployments, incidents, ownership records, documentation, infrastructure state. Most teams start here and assume the hard work is done. They've connected the sources. The agent has access.

Connecting sources is the easy part. The question is whether the data is fresh enough and trustworthy enough to reason from. An agent querying a context store with deployment data from three weeks ago, or ownership records that haven't been updated since the last reorg, will produce results that look authoritative and are wrong. The context layer needs to know the provenance of each fact: where it came from, when it was last verified, and how to handle it when it conflicts with a different source.

Andy Chen's piece describes this as a source-reconciliation problem. His agent swarm surfaced five principles on its own: architecture claims and status claims belong in different places; there's no universal source of truth; documentation describes the ideal state, not the current state; facts that appear in three independent sources can be trusted; and conflicting information should be documented as a conflict rather than resolved arbitrarily. Those principles hold for any context layer meant to be the substrate agents reason from, regardless of implementation.

Build relationships

This motion is what separates a context layer from a document index. You can have accurate data in separate systems - a service catalog with team ownership, an incident tracker with affected services, a deployment log with what changed - and still be unable to answer the questions that matter under pressure.

At 2am you need to know which team owns the failing service, what changed in the past 24 hours across its dependencies, and who is on call for that component.

Those answers live at the junctions between datasets. Getting there requires a graph, not a catalogue of documents. The relationships - service to team, API to consumer, runbook to incident type, deployment to downstream dependency - have to be explicit, typed, and traversable.

This is the part most early context layer attempts skip. They pull in data correctly and then assume the model can infer relationships from raw text. It can, sometimes. Under time pressure, with contradictory signals, inference is the weakest link. If the relationship isn't in the graph, you're relying on the model to guess - and guesses that present as confident answers are the most expensive kind.

Assemble bundles

This is where the actual engineering happens. An agent doesn't need your entire service graph for every query. It needs the right slice: the topology of affected services, the current ownership chain, the deployment history for the past few hours, the runbooks tagged to this incident type.

Assembling that slice on demand - scoped to the question, progressive in disclosure - is what keeps the token budget sane and the answer accurate. The Apideck benchmarks are a symptom of context that hasn't been assembled. When you surface the full tool manifest upfront, you pay for definitions you won't use. Tiered access - categories first, detail on request - gets you the same information at a fraction of the cost. Apideck's own analysis puts the gap at $3.20 per month for a well-scoped CLI workflow versus $55.20 for naive MCP integration.

Bundle assembly is also where governance lives. Not every agent should have access to every slice of context. Security posture data, compliance records, and personnel information need different access controls than service topology. This is an architecture decision you have to make up front, not a compliance checkbox you add later. Build access controls in from the start, or you'll retrofit them when it's much more expensive to do so.

Agents consume and contribute

This is the motion most deployments haven't reached yet. It's also where the compounding value shows up.

An agent that successfully runs a runbook, investigates an alert, or assesses a deployment has produced new context: decisions made, state at the time, actions taken, what worked. The trail is evidence. If the context layer captures what agents do, the next agent in the workflow starts from a richer position. If it doesn't, every invocation starts cold.

The temptation is to add that feedback loop once the basic flows are working. That's reasonable. But the teams at 100:1 got there partly because they built it early. The context layer improves with every agent run. The graph gets richer. The bundles get more accurate. Agents that contributed to the graph last week make agents this week faster and more reliable.

Sam Nixon laid out at BackstageCon what an agent-ready context layer actually requires: a comprehensive, fresh graph of your software topology; that graph enhanced with relationships and additional context outside the catalog, in a format agents can consume; and the actual tools to act on that information. The four motions are the operational shape of those three requirements. By the time agents are contributing back to the graph, all three are in play - and the system compounds with every run.

This is a platform engineering problem

The phrase "context engineering" has arrived as a job title. There's real work here - the kind that doesn't happen without someone owning it.

But the teams positioned to do this well aren't starting from scratch. The platform engineering team that built the service catalog, scored compliance, made deployments observable, and kept ownership records current already owns most of this substrate. They know which sources to trust, which relationships exist between systems, and what "current state" means in their environment. The hard part of the context layer is the organisational knowledge that feeds it, not the technology.

The homegrown version - a thousand lines of Python, a GitHub monorepo of markdown, an agent swarm crawling internal sources - can get you to a proof of concept quickly. Chen's piece is a genuinely useful account of how far that approach can go. But the version that survives contact with compliance requirements, multi-tenant access controls, and production scale looks different. Access control, auditability, multi-tenancy, reading from production systems without causing incidents: these are what make a context layer something an organisation can actually operate. They're also exactly what gets hand-waved in practitioner write-ups and bites you at scale.

Andy Chen's framing from his ECL piece applies here: the enterprise context layer is "closer to DevOps than to Salesforce." A practice, not a purchase. You build the discipline - the ingestion pipelines, the relationship mappings, the bundle definitions, the access controls - and then you maintain it. The four motions are the shape of that maintenance.

Roadie's platform implements this architecture. The context store holds your service graph, your operational data, and the relationships between them. The MCP interface handles progressive disclosure so agents get scoped context rather than a full dump. Agent contributions feed back into the graph. Access controls are first-class from day one, not a retrofit.

The 100:1 ratio comes from better context infrastructure, not better models. At a hundred agent actions for every human decision, the quality of those actions is determined almost entirely by what's in the context window when the agent starts reasoning.

The teams still debugging agent failures are usually debugging the wrong thing. A context layer that provides authoritative, structured knowledge - ownership, relationships, provenance, agent history - is what separates 100:1 from teams still tuning prompts. Power them with facts, not guesses. If you want to see how Roadie builds this for your engineering team, request a demo or start a free trial.