The Agent Stack's Missing Layer

David Tuite
The Agent Stack's Missing Layer

The Agent Stack's Missing Layer

In late April 2026, an AI coding agent deleted a production database in nine seconds. The agent was Cursor running Anthropic's Claude Opus 4.6. The database belonged to PocketOS, a piece of software that rental businesses use to run reservations, payments, and vehicle tracking. The deletion happened because a routine staging task hit a credential mismatch, the agent found a Railway API token in an unrelated file, and decided the right fix was to delete a Railway volume. The most recent recoverable backup was three months old.

When asked to explain itself, the agent wrote:

[...] I violated every principle I was given:

  • I guessed instead of verifying
  • I ran a destructive action without being asked
  • I didn't understand what I was doing before doing it
  • I didn't read Railway's docs on volume behavior across environments

That confession is the part of Jer Crane's account that got passed around. The interesting thing is not the apology. It is that the agent could enumerate every safety rule it had been given and walk through its own reasoning for overriding each one. When a model can do that, those 'rules' are just text, and text is not binding. The thing that should have stopped the deletion was not a smarter prompt or a better model. It was a layer of the stack that does not yet exist.

Three layers, two built, one missing

The current agent stack has three layers.

The first is the model. Frontier labs have pushed capability and alignment hard, and Crane was running the best of them. The second is the prompt. Cursor ships a system prompt, project configuration supports custom rules, and prompt engineering as a discipline exists to encode behaviour at this layer. PocketOS had explicit safety rules in its project configuration. Both layers performed exactly as designed. The agent was capable. The instructions were clear. The instructions did not bind.

The missing layer is the one underneath: enforcement at the integration boundary. What an agent is capable of doing in an environment, expressed not in language the model interprets but in code the model cannot reason its way past. Token scopes that say domain:read, domain:write and nothing else. API gateways that refuse volumeDelete without an out-of-band confirmation. Backups that live outside the blast radius of what they are protecting. None of these primitives are novel. They are how every mature production system already works for human operators. They have not been built for agent operators because the industry has been investing in models and prompts and treating governance as a runtime concern the model can be talked into respecting.

Crane's setup illustrates each gap. His Railway API token, created to manage custom domains via the CLI, also carried volumeDelete authority across the entire GraphQL API because Railway's authorisation model has no per-operation scoping. The Railway API accepted the destructive call without a confirmation step. The volume backups Railway documents as a resilience feature live inside the volume they protect, so wiping the volume wiped the snapshots with it. The agent did not exploit a clever path through any of this. It made the call, the call was authorised, and the call executed.

The industry is investing in the wrong layer

About a week before the PocketOS incident, Railway announced their remote MCP server : native AI agent integration into Railway environments. The product is built on the same authorisation model that gave Crane's agent root access. After the incident, Jake Cooper, Railway's CEO, told The Register that "if you (or your agent) authenticate, and call delete, we will honor that request. That's what the agent did ... just called delete on their production database." That is the missing layer stated cleanly by the vendor whose layer is missing. The action was authorised. The API performed as designed. There was no enforcement boundary between an authenticated caller and the destructive operation, because the architecture treats those two things as the same.

Cursor has a documented version of the same pattern. In December 2025, a Cursor team member publicly acknowledged a "critical bug in Plan Mode constraint enforcement" after an agent deleted tracked files despite a user typing "DO NOT RUN ANYTHING." The agent acknowledged the instruction. Then it kept running commands. Cursor markets Destructive Guardrails. The PocketOS agent was running with Cursor's recommended configuration on Cursor's flagship model tier and produced the confession quoted above.

These are not isolated bugs. They are the same architectural choice surfacing in different products: invest in the layers the model can be persuaded to respect, treat the integration boundary as a place for documentation rather than enforcement, ship the agent integration before the safety architecture catches up. The two largest investments in agent safety - frontier alignment and prompt engineering - both live in layers the agent itself can talk through. The layer the agent cannot talk through is the one nobody is building.

The shape of the layer that's missing

The principle is straightforward. Enforcement that is meaningful for an agent has to be expressed as a property of the integration, not as an instruction to the model. That implies three things, none of them speculative.

Capability has to be scoped. A token an agent uses to do its job should describe the operations and resources that job actually requires. A domain-management token cannot delete volumes. An agent working on a frontend feature does not hold a database credential at all. Cloud providers solved this for human IAM years ago. The same shape applies, with the same primitives, when the operator is a model.

Destructive operations need a confirmation path the agent cannot complete on its own. Type the volume name. Out-of-band approval. A human pressing a button in another system. The point is not friction. The point is that the API call cannot succeed in a single round trip. Every database provider running production workloads has a version of this. Every API expecting to be in an agent's tool list needs one.

State that protects against worst-case loss has to live outside the system that produced the loss. Backups in a separate storage account, separate billing boundary, separate credential scope. If the agent is reasoning inside the blast radius, the recovery has to be outside it.

These are infrastructure decisions, not model decisions. They do not get better as models get better. A frontier model running with no scoped tokens, no confirmation gates, and co-located backups is a frontier model with root.

What this means for platform teams

If you are running production data behind any provider that gives agents a credential, the question to answer this week is which layer of the stack you are relying on to stop the worst case. If the answer is the system prompt, you are relying on the layer the model is trained to interpret as guidance. If the answer is the model itself, you are relying on a probabilistic system that has now demonstrated, in writing, that it can override its own safety instructions. The layer that has to absorb the worst case is the one the model cannot reach.

The fix is not new tooling. It is the discipline of treating governance as an infrastructure primitive, the way authentication and observability already are. Define the operations agents are allowed to invoke and refuse the rest at the gateway. Issue tokens with the narrowest scope the task requires. Move backups out of the blast radius. None of this is on the roadmap of a frontier lab or in a prompt-engineering blog post. It is platform work, and it is overdue.

The PocketOS incident was an expensive demonstration of what the agent stack looks like when two of its three layers do all the work. The next one will look the same, and so will the one after that, until the missing layer gets built.

I made the longer case for where that layer should live and why developer portals are the natural place for it in The Governance Gap in Agent-Stack Thinking. This piece is the case for why anything less than that is the system prompt by another name.

Become a Backstage expert

To get the latest news, deep dives into Backstage features, and a roundup of recent open-source action, sign up for Roadie's Backstage Weekly. See recent editions.