The Word 'Context' Has Stopped Meaning Anything

Roadie is working in this space - we're building context infrastructure for AI agents. I want to say that upfront, because this piece is about the word "context" and we have a commercial stake in what it means. If you want working definitions before the argument, our working glossary of context terms is the right starting point. This piece is about what happens to a precise technical term when the market gets hold of it.

The term earned its name by pointing at something real

On June 19, 2025, Tobi Lütke posted on X : "I really like the term 'context engineering' over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM." Six days later, Andrej Karpathy amplified it : "context engineering is the delicate art and science of filling the context window" in "every industrial-strength LLM app." These posts did something specific: they named an architectural discipline, not a copywriting skill. The work is deciding what data the agent retrieves, in what order, structured how, from what source.

That framing - system design, not wordsmithing - was a genuine upgrade. The real distinction between prompt engineering and context engineering runs deeper than vocabulary: it is the difference between adjusting text and redesigning information systems.

The defenders of the term have a substantive case. Phil Schmid's practitioner definition (June 30, 2025) arrives at the same point: most agent failures are context failures, not model failures. DBReunig noted in July 2025 that within a month of the Lütke and Karpathy posts, "context engineering" had reached a quarter of "prompt engineering"'s search volume - and a marketing buzzword spikes and falls rather than sustains. A Stanford and SambaNova study published in October 2025 showed that incremental, structured context updates reduced adaptation latency by up to 86% compared to static or regenerated prompts. Sourcegraph's May 2026 heuristic arrives at the same point from the practitioner side: the clearest tell of genuine context engineering is whether your improvements come from rewiring what data the agent retrieves, not from rewording the prompt.

The discipline is real.

Then the market got hold of it

By 2026 the label had been stretched to cover almost anything sitting between data and a model - a retrieval pipeline, a session-memory store, a rebadged data catalogue. Practitioners noticed the drift. A thread on Reddit's r/AI_Agents was titled, flatly, "The word 'context' has stopped meaning anything in enterprise AI" , and the same complaint turned up on Mastodon and on Bluesky . When a term is made to cover everything, it stops marking anything in particular.

Where the definition still holds

Anthropic published their technical definition in September 2025 : context engineering is "the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference." That definition is narrow by design. It covers the architectural discipline Lütke and Karpathy named. It does not extend to every product that sits between data and a model.

Context engineering - when the term is used precisely - describes the work of determining what information an AI model needs, in what format, at what time, and from what source. A system that does this well has four properties that distinguish it from retrieval infrastructure.

The data is structured and typed. A typed entity graph delivers deterministic query results. A semantic search index delivers probabilistic relevance rankings. These belong to different reliability classes when agents are making operational decisions.

The system is queryable in the technical sense: it returns a typed, traversable object. When an agent asks who owns fraud-detection-service, the answer should be a structured record with an owner ID, a team ID, and SLO data - not a paragraph mentioning ownership.

Entity relationships are first-class data, represented explicitly in the graph rather than guessed ad hoc from document proximity. The edge may be declared by a team, ingested from a deploy system, or inferred from infrastructure code - but once promoted into the graph, it is queryable relationship data. Proximity in a document and a graph edge are different things.

The context is authoritative enough to drive decisions without human review. If someone needs to verify the output before the agent acts on it, the context is advisory. A context layer for operational systems has to be right every time it is queried.

Sourcegraph put the test well in May 2026: "The clearest tell that you've crossed from one discipline into the other is whether your improvements come from rewording or from rewiring. If you're swapping nouns and adjectives, you're still doing prompt engineering." The old-document problem runs the same test in practice: can your system distinguish a canonical architectural decision record from a three-year-old wiki page that contradicts it? Purely semantic retrieval surfaces both. A system with typed provenance and declared authority can tell them apart.

A working test

When a vendor says they do context engineering, four questions settle it:

Where does the context come from - what is the authoritative source?
How stale can it get before producing errors?
Is the data structured or unstructured?
Is it retrieved fresh each turn or maintained across turns?

A context layer is not a feature. It is an architectural commitment. If a vendor cannot answer all four, they have a retrieval system with better marketing.

Now you're probably thinking: you work in this space, of course you're defending the term. Fair. Roadie is building the infrastructure that context engineering runs on - it's the same space this piece is about, and getting the definition right is part of the work we're doing. From that position: most of what is being called context engineering in enterprise pitches in 2026 is either RAG with better marketing or a service catalogue with a new name.

The distinction matters most where agents stop answering questions and start taking operational decisions - routing production incidents, gating deployments, provisioning infrastructure. When agents answer questions, fuzzy context produces a worse answer. When agents route production incidents or gate deployments, fuzzy context produces an outage. What a context layer actually does in practice, and why an engineering graph is the highest-signal context source, is a different argument.

But it starts from this one: the word has to mean something first.

The Word 'Context' Has Stopped Meaning Anything

The term earned its name by pointing at something real

Then the market got hold of it

Where the definition still holds

A working test

Become a Backstage expert