Context Engineering: The Missing Discipline in AI-Assisted Development

A typical AI coding assistant is trained on publicly available GitHub repositories, RFCs, and Stack Overflow answers, making it perfectly capable of handling difficult coding implementations. However, what it lacks is context. The assistant doesn't know who owns the payment-service or which S3 bucket naming convention your security team mandated last year.

The industry spent the last two years obsessed with prompt engineering. Teams refined instructions, added chain-of-thought reasoning, and built elaborate system prompts. None of that addresses the real issue: your AI doesn't know your company. It knows the world, but it doesn't know your world. Thankfully, context engineering can help, and it's going to define the next wave of platform engineering work.

What Is Context Engineering

The working definition of context engineering is the systematic practice of curating, structuring, and retrieving information to ground AI models in specific domain knowledge.

The roots of context engineering go back to context-aware computing from the early 1990s, when researchers like Bill Schilit started building systems that could adapt their behavior based on location, user, and environment. The insight was the same then as now: the value of any intelligent system scales with the quality of context it can access. RAG architectures, popularized by the 2020 Lewis et al. paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," brought this concept into the LLM era. But RAG is just the retrieval mechanism. Context engineering is the discipline of deciding what to retrieve, how to structure it, and which sources to trust.

This difference matters because the industry is moving from model-centric thinking ("we need a better model") to context-centric thinking ("we need better data retrieval"). GPT-4 Turbo and Claude 3.5 Sonnet are good enough for most coding tasks. The bottleneck has shifted from model intelligence to grounding.

Context Engineering vs. Prompt Engineering

For the past two years, teams have focused on prompt engineering, refining instructions so models behave correctly. That works when the problem is ambiguity or reasoning quality. It fails when the model simply lacks the right information.

No prompt can reliably answer:

Who owns checkout-service?

What lifecycle state is legacy-auth-api in?

What security constraints apply to payment-service?

If the data isn't accessible through a retrieval layer, the model will infer, and sometimes hallucinate.

Prompt engineering optimizes how you ask. Context engineering optimizes what the model knows.

The Three Layers of Context

When your developers use GitHub Copilot or Cursor, those tools are solving a layered context problem, and they're only solving part of it.

Layer 1 is local context: the file you have open, the function you're editing, the variables in scope. This is what language servers (LSP) have always done, and what Copilot does well. The model sees your current cursor position and the surrounding tokens. For greenfield code written against public libraries, this is often enough.

Layer 2 is repository context: the patterns, structures, and dependencies in the current codebase. Cursor's codebase indexing handles much of this, using vector embeddings to make local code semantically searchable. You can ask "how does the authentication middleware work?" and get a coherent answer drawn from files across the repo.

Layer 3 is organizational context, and this is where every current AI coding tool struggles. Organizational context is the knowledge that lives outside any single repository:

Who owns payment-service and what's its current lifecycle status?
What's the approved pattern for creating a new S3 bucket with FIPS-compliant encryption?
Where's the API spec for the internal event bus, and what events does order-service emit?
What's the SLA tier for fraud-detection-service and what are its production constraints?

No amount of clever prompting retrieves this information. It doesn't exist in any repo. It lives in your organization's institutional knowledge, and without a structured home, it's functionally invisible to any AI system.

The most natural system to solve Layer 3 is the Internal Developer Portal (IDP).

Why the IDP Is the Natural Context Engine

Your IDP already contains the two things an AI needs most: metadata and semantics.

Metadata lives in catalog-info.yaml, the structured backbone of every Backstage service entry. Each record carries owner, tier, lifecycle, dependencies, and tags in a machine-readable, version-controlled format. That's exactly the information an AI needs to answer "what are the production constraints on payment-service?" without guessing. The answer isn't buried in a README or a Slack thread. It's in a schema-enforced file your platform team already maintains as part of normal operations.

Semantics live in TechDocs: your architectural decision records, how-to guides, runbooks, and onboarding documentation. When a developer asks Cursor, "What's the right approach for adding distributed tracing to a new Go service?", the correct answer exists in your TechDocs, not on Stack Overflow. TechDocs is already your single source of truth for how things work here. It needs to be your AI's source of truth too.

Spotify proved this architecture works at scale. Their internal "AiKA" (AI Knowledge Assistant) and "Honk" background coding agents both integrate deeply with Backstage's catalog and metadata graph. Spotify didn't need to build a separate AI knowledge layer; the IDP was already taking care of that. The IDP's role as a context engine follows directly from its architecture: it's the only system in your organization that maintains structured, governed, and continuously updated metadata about every service you run.

Because Roadie is managed Backstage, this architectural pattern is available to you without the overhead of building and maintaining the context infrastructure yourself. The Spotify engineering team spent significant effort wiring Backstage into AiKA and Honk. Roadie ships that foundation.

Connecting AI Tools via the Model Context Protocol

Knowing that the IDP holds the right data is one thing; getting AI tools to query it reliably is another. That's where the Model Context Protocol (MCP) comes in.

MCP is USB-C for AI. Before USB-C, every device had its own connector. You needed a drawer full of adapters for every combination. Before MCP, every AI tool had its own proprietary method for connecting to external data sources, if it connected at all. MCP is the open standard that lets any compliant AI client (Cursor, Claude Desktop, a custom agentic pipeline) connect to any compliant MCP Server and query its data through a consistent interface.

Roadie acts as an MCP Server, exposing your Backstage catalog and TechDocs as queryable endpoints. Here's what that looks like in practice:

A developer opens Cursor and asks: "Generate a Terraform config for payment-service that meets its production SLA requirements."
Cursor's MCP client queries Roadie's MCP server for the payment-service catalog entry.
Roadie returns structured metadata: tier-1 service, multi-region deployment, PCI-compliant, owner: group:payments-team, depends on fraud-detection-service and event-bus.
Cursor incorporates this context into the generation, auto-applying the correct instance types, encryption configuration, and cross-region failover settings.

Without MCP and a populated catalog, the developer gets a generic Terraform template that needs manual adjustment and a second round-trip to whoever actually knows the service requirements. With it, the AI generates something correct for this service, at this organization. That's the difference between an AI assistant and an AI that actually knows where it works.

The Problem with "Just Dump It in a Vector DB"

Here's an approach I've seen teams try when they get serious about AI grounding: export everything (Confluence pages, Jira tickets, Slack threads, internal wikis), embed it all in Pinecone or Weaviate, and call it a "context lake."

This doesn't work. Vector databases are great tools. The issue is that unstructured, uncurated data produces what I'd call "context poisoning".

Your Confluence instance has 2,000 pages. Maybe 300 are accurate. Another 400 are outdated, written before the infrastructure migration, the reorg, or the service's deprecation. The remaining 1,300 are drafts, duplicates, or meeting notes that were never meant to be authoritative. When your AI retrieves from this pool, it has no way to distinguish the canonical architectural decision record from the three-year-old wiki page that contradicts it. The model doesn't know that legacy-payment-gateway-setup.md was superseded by a new doc eighteen months ago. So it pulls from both, fuses them, and produces output that's partly right and partly dangerously wrong.

This is the risk with some proprietary "context lake" approaches: you can't engineer good context from bad source data, no matter how sophisticated your retrieval layer is. And Cortex's scorecard-based "AI readiness checks" are a step forward, enforcing that services have owners and documentation, but scorecards tell you whether context exists, not whether it's accurate or semantically coherent.

The catalog-info.yaml schema enforces structure at the source. Every service entry has a defined owner (not "the payments team" but group:payments), a machine-readable lifecycle status (production, deprecated, experimental), and explicit dependency links. A deprecated service can't masquerade as current because its lifecycle field says otherwise. An unmaintained entry gets surfaced by catalog health checks before it becomes a source of bad context. The data is searchable and governed.

That said, this does require your catalog to be accurate. A Backstage instance with stale catalog-info.yaml files is its own form of context poisoning. The discipline of context engineering starts with the discipline of catalog maintenance. Structured vs. Unstructured Context in AI Systems

Final Thoughts: Building the Map for Your Digital Workforce

The agentic AI era is arriving faster than most platform teams are ready for. Chatbot-style AI assistants are forgiving. A slightly wrong answer gets ignored and corrected. Agents that take action aren't forgiving. An agent that auto-generates a pull request, provisions cloud infrastructure, or routes a production incident based on service ownership data needs that data to be correct. The cost of a context error scales directly with the autonomy of the system making decisions from it.

The teams preparing well are focusing on reliable organizational context, not model sophistication or GPU budgets. The catalog entries you keep accurate, the TechDocs you maintain, and the ownership mappings you get right form your AI workforce's institutional memory. That's what your AI tools actually run on.

Context engineering is platform engineering's next frontier. The good news is that if you've been running Backstage, you're further along than you think. You have the schema, the governance model, and the data. You just need to connect it.

Your AI needs a brain. Start building your organization's memory today with Roadie's managed Backstage platform. Book a demo to see how your Service Catalog can become your context engine.