The Context Engineering Glossary for Platform Engineers

Your team just wired an LLM into your Internal Developer Portal. The architecture review kicks off and someone asks whether you're doing RAG or agentic retrieval. Someone else flags context drift as a risk. A third person raises privilege leakage in the system prompt. You nod along, but the vocabulary is moving faster than the documentation.

This glossary defines every key term in the context engineering stack through the specific lens of platform engineering — Service Catalogs, TechDocs, golden paths, and on-call data — not abstract data science. Bookmark it, share it with your team, and use it as a reference before your next architecture decision.

This glossary focuses on context supply, not model training, fine-tuning, or prompt copywriting. Context engineering does not make models smarter — it determines what the model is allowed to know, when, and why. If an answer is wrong, the first place to look is rarely the model; it’s the data pipeline feeding it.

Section 1: Context Fundamentals

What Is Context Engineering?

Context engineering is the practice of curating, structuring, and retrieving the right infrastructure data so that an LLM can answer domain-specific questions accurately. The word choice matters: it's engineering, not prompting. Where prompt engineering focuses on the wording of individual queries, context engineering focuses on the entire information supply chain that feeds the model before it generates a word.

For platform teams, context engineering means deciding which fields from your catalog-info.yaml get indexed, how your TechDocs chunks get sized and tagged, and what real-time operational signals get injected at query time. A developer asking "Is the payments service production-ready?" gets a useful answer only if the lifecycle field from the Service Catalog was curated, indexed, and retrieved correctly. The LLM itself contributes maybe 20% of that answer's quality; context engineering accounts for the rest.

An LLM is a powerful reasoning engine with no institutional memory. Context engineering is how you give it one. In other words, context engineering is not about improving reasoning quality — it’s about constraining the information surface the model can reason over.

What Is a Context Window?

The context window is the total number of tokens an LLM can process in a single request, covering the system prompt, retrieved documents, conversation history, and the generated response combined. GPT-4o supports up to 128,000 tokens , and Gemini 3.5 Pro pushes to 1 million tokens . These numbers sound large until you picture a Service Catalog with 800 registered components, each with full metadata and linked TechDocs pages.

Stuffing everything into the context window is not a strategy. Irrelevant data degrades output quality, increases cost per query (models like Claude charge per input token ), and slows response time. The engineering discipline is in selecting the right 2,000 tokens out of 2,000,000 available, pulling only the service metadata relevant to the specific query, not the entire catalog.

Efficient context selection is where retrieval architecture pays for itself.

What Is Grounding in LLMs?

Grounding anchors an LLM's response in verified, authoritative data sources rather than the model's pre-trained weights. Without grounding, a model answering "Who is the on-call engineer for the checkout service?" will either hallucinate a plausible name or admit it doesn't know. With grounding, the response comes from the live PagerDuty schedule injected at query time.

In a platform engineering context, your Service Catalog is the primary grounding layer. When every answer the AI gives traces back to a specific entity in the catalog, with a citable source, you've achieved grounded output. Ungrounded AI assistants erode trust fast: one invented service name or wrong runbook link and developers stop using the tool. RAG is an architectural mechanism; grounding is the result. You can implement RAG without achieving grounding if the retrieved data isn’t authoritative or current.

Section 2: Architecture and Retrieval Terms

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is the architectural pattern where a system retrieves relevant documents from an external knowledge base before passing them to an LLM for response generation. The model doesn't rely on what it learned during training; it reads what you give it at runtime.

The flow for an IDP-backed assistant looks like this:

A developer asks: "How do I rotate credentials for the auth service?" The system encodes that query, searches TechDocs for credential rotation guides tagged to the auth service, pulls the service owner from the catalog, and injects both into the prompt. The LLM generates a specific, sourced answer, not a generic "here's how credential rotation works" response scraped from its training data.

RAG is the foundational pattern for any AI assistant built on top of an IDP. Every other term in this glossary relates to how well your RAG implementation performs.

What Are Vector Embeddings?

A vector embedding is a numerical representation of text, typically a list of 768 to 3,072 floating-point numbers, that captures semantic meaning rather than just the words themselves. Two sentences that mean the same thing will have similar embeddings even if they share no words. "Service is deprecated" and "component has reached end-of-life" end up close together in embedding space; "deploy to production" and "YAML syntax error" end up far apart.

To build RAG for your IDP, every TechDocs page, every catalog entity description, and every relevant metadata field needs to be converted into an embedding and stored. When a developer submits a query, the query also gets embedded, and the system retrieves the stored documents whose embeddings are most similar. That's semantic search.

Generating and managing these embeddings is non-trivial. You need to pick an embedding model (OpenAI's text-embedding-3-large or a self-hosted Sentence Transformers variant), decide chunk sizes, handle incremental updates when docs change, and keep embeddings in sync with the underlying catalog. Roadie handles this entire pipeline automatically for TechDocs on your managed Backstage instance. You don't maintain a separate embedding job or manage model versions.

What Is a Vector Database?

A vector database is a storage engine purpose-built for indexing and querying high-dimensional embedding vectors. It provides Approximate Nearest Neighbor (ANN) search at scale, which means it can find the 10 most semantically similar chunks from a corpus of 500,000 embeddings in under 100 milliseconds. Standard relational databases like PostgreSQL can store vectors (via pgvector ), but dedicated systems like Pinecone , Weaviate , and Qdrant are optimized for this workload.

For platform teams evaluating AI tooling, the vector database is an infrastructure dependency that often gets underestimated. It requires provisioning, access control, index tuning, and synchronization with your source catalog. When Roadie embeds your TechDocs, the vector storage layer is managed within the platform. You're not standing up a Qdrant cluster alongside your Backstage deployment.

What Is Semantic Search?

Semantic search finds content based on meaning and intent, not keyword overlap. In an IDP context, it's the difference between a developer searching for "payment processor" and finding the checkout-service, billing-api, and stripe-gateway components — even though none of them are literally named "payment processor" — versus a keyword search that returns zero results because the exact string doesn't match any component name.

This matters especially for large catalogs and for developers who are new to the codebase. They don't know the internal naming conventions. They describe what they're looking for in plain English. Semantic search over vector embeddings bridges the delta between how developers think and how services are named.

On its own, semantic search is insufficient for an AI assistant — it retrieves candidates, but the Service Catalog determines which of those candidates are valid, owned, and safe to surface.

Section 3: Platform Data Types (The Context Sources)

Service Catalog Context

Service Catalog context is the structured metadata that lives in your catalog-info.yaml files and gets surfaced through the Backstage catalog API . Fields like owner, lifecycle, tier, tags, system, and dependsOn are machine-readable facts that give an LLM the authority to answer structural questions.

"Who owns the recommendations engine?" gets answered from the owner field. "Is this service production-ready?" gets answered from the lifecycle: production tag. "What services would be affected if the user-profile API went down?" gets answered from dependency relationships in the catalog graph. This data is already structured, already maintained (or should be), and it's the highest-signal context source you have. Poor catalog hygiene directly degrades AI output quality.

TechDocs Context

TechDocs context is unstructured markdown documentation that lives alongside your service code and gets rendered in Backstage TechDocs . It answers the "how" questions that structured catalog metadata can't: how to run the service locally, how to interpret a specific error code, how to onboard to the payments team's workflow.

When ingested into a RAG system, TechDocs pages get chunked (typically into 512-token segments with overlap), embedded, and indexed against their source entity. A developer asking "What does a 503 from the auth service usually mean?" should retrieve the relevant troubleshooting section from the auth service's TechDocs, not a generic HTTP guide. The specificity of the retrieval depends entirely on how well TechDocs are written and tagged. Vague documentation produces vague answers.

Operational Context

Operational context is real-time data injected at query time rather than pre-indexed into a vector database. It includes current PagerDuty on-call schedules, Kubernetes pod health and restart counts, recent Argo CD deployment status, open Jira incidents, and GitHub Actions build logs.

This data changes too fast for batch indexing to keep up. Instead, you pull it live via API calls triggered by the query itself. A developer asking "Why is checkout slow right now?" needs the current K8s resource utilization for the checkout pods, not the documentation about checkout's architecture. Mixing pre-indexed catalog and TechDocs context with real-time operational context is what separates a genuinely useful AI assistant from a documentation search engine.

Operational context informs decisions; it does not imply automated remediation unless explicitly authorized. Observing live state and acting on it are separate trust boundaries.

Golden Path Context

Golden path context comes from your Backstage Scaffolder templates, the opinionated, pre-approved patterns your platform team maintains for creating new services, adding CI/CD pipelines, or spinning up databases. This context feeds the AI's code generation and workflow guidance capabilities.

When a developer asks "How do I create a new Python microservice that follows our standards?" the answer shouldn't come from a generic tutorial. It should come from your actual Scaffolder template, including your team's specific conventions around naming, logging configuration, health check endpoints, and observability setup. Golden path context ensures that AI-assisted code generation produces output that passes your internal review standards on the first attempt.

Section 4: Agentic Capabilities

What Is Agentic Context Injection?

Agentic context injection is the dynamic process by which an AI system decides which data sources to query based on the intent of the user's question, rather than fetching a fixed set of context for every request. It's the difference between a system that always retrieves the top-10 catalog entries regardless of the question, and a system that recognizes "my build is failing" as a signal to pull CI/CD logs, not architecture documentation.

A well-designed agentic system routes queries through an intent classifier first. Questions about ownership route to the catalog API. Questions about procedures route to TechDocs embeddings. Questions about current system state trigger live operational data calls. This routing logic is itself a form of engineering. It determines response latency, token cost, and answer relevance simultaneously.

Without strict boundaries, agentic retrieval increases blast radius: every additional tool or data source expands what the system can surface or misuse. Intent routing must be auditable, deterministic, and permission-aware to be safe in production.

Tool Use and Function Calling

Function calling is the capability that allows an LLM to request the execution of a predefined function, a structured API call, rather than generating a text answer directly. The model outputs a JSON object specifying which function to call and with which parameters; your application executes the call and feeds the result back to the model.

For IDP AI assistants, function calling turns the LLM into an active participant in your platform's API surface. Instead of the model trying to recall what it knows about a service's on-call engineer, it calls get_oncall_for_service(service_id="checkout"), gets a live response from PagerDuty, and incorporates that response into its answer. Functions you'd expose typically include catalog entity lookup, TechDocs page retrieval, incident history queries, and deployment status checks. The LLM becomes a reasoning layer over your actual infrastructure data.

What Is a System Prompt?

The system prompt is the foundational instruction block prepended to every conversation with the AI assistant. It defines the model's persona (a senior platform engineer, not a general assistant), its constraints ("only answer questions about services in this catalog"), its output format preferences, and its access permissions.

For a platform assistant, the system prompt is effectively a policy document. It specifies that the model should cite its sources, decline to speculate about services not in the catalog, and escalate ambiguous ownership questions to a human. A weak system prompt produces an assistant that will confidently make things up. A well-engineered system prompt is a first line of defense against the risks described in the next section. In practice, the system prompt is inseparable from access control. It should reflect the same RBAC assumptions as the IDP itself — otherwise the model’s behavior will drift from the platform’s security model.

Section 5: Quality and Risk Definitions

What Is LLM Hallucination?

Hallucination is when an LLM generates information that is factually incorrect but presented with full confidence. In a platform engineering context, hallucinations take a specific and damaging form: the model invents service names, fabricates runbook steps, cites non-existent on-call rotations, or describes API contracts that don't match the actual implementation.

The primary defense against hallucination is grounding (see above), combined with explicit system prompt instructions to cite sources. If the model's answer can't be traced to a specific catalog entity or TechDocs page, it shouldn't be trusted. Measuring hallucination rate by sampling model responses against the catalog is a useful quality metric for AI-enabled IDP rollouts.

What Is Context Drift?

Context drift is the discrepancy between the data the AI has indexed and the actual current state of your infrastructure. A TechDocs page describing the old three-tier deployment model that your team migrated away from six months ago is a context drift problem. A catalog entry with a stale owner field pointing to a team that was reorganized is another.

Context drift is not a one-time fix. It's an ongoing operational concern. The mitigation is a combination of automated re-indexing (triggering embedding updates when catalog-info.yaml files change) and documentation standards that treat TechDocs as a first-class engineering artifact. An AI assistant is only as current as the data it reads. If your catalog hygiene is poor, context drift will silently produce incorrect answers with no obvious signal that something is wrong.

What Is Context Poisoning?

Context poisoning occurs when low-quality, contradictory, or maliciously crafted documentation gets retrieved and influences the model's output. Two TechDocs pages for the same service that give conflicting deployment instructions will cause the model to blend them into a response that's confidently wrong. A poorly maintained runbook that describes a procedure deprecated two years ago is a context poisoning vector.

The solution is content governance: ownership requirements for every TechDocs page, last-reviewed timestamps surfaced in the catalog, and automated quality checks that flag documentation not updated in over 90 days. The AI doesn't discriminate between trusted and untrusted docs. The retrieval system surfaces whatever scores highest semantically. You own the quality of what gets indexed.

What Is Context Overreach?

Context overreach happens when you inject too much data into the prompt, including irrelevant retrieved chunks that dilute the signal and confuse the model. A developer asking about the auth service's rate limits doesn't need context from the billing service's TechDocs, even if billing is a downstream dependency. Retrieving ten chunks when three would suffice increases token cost, slows the response, and statistically introduces off-topic content that nudges the model toward a less precise answer.

The fix is tighter retrieval: stricter similarity thresholds, metadata filtering (retrieve only docs tagged to the queried service), and re-ranker models that score retrieved chunks for relevance before they enter the prompt. Context budgeting, deciding in advance how many tokens each source type is allowed to consume, is a practical starting point.

What Is Privilege Leakage in AI Systems?

Privilege leakage occurs when the AI assistant returns information about services, infrastructure, or documentation that the querying user shouldn't have access to, because the retrieval layer doesn't enforce the same Role-Based Access Controls (RBAC) as the IDP itself. A junior engineer asking a general question about "our database infrastructure" shouldn't receive details about the security team's secrets management service, even if that service's TechDocs scored highly in the semantic search results.

Preventing privilege leakage requires that your retrieval pipeline filters indexed documents by the user's Backstage permissions before returning results. It's not enough to apply RBAC at the catalog UI layer; the vector search results that feed the LLM must respect the same access policies. This is one of the most commonly overlooked security requirements in IDP AI implementations.

What Are Implicit Trust Chains?

An implicit trust chain forms when a document retrieved as context itself references other documents — runbooks, architecture decision records, external wikis — that are outdated, incorrect, or not indexed by the retrieval system. The model reads the retrieved doc, which cites "the standard deployment procedure in the ops runbook," but the ops runbook lives in Confluence and isn't indexed. The model either ignores the reference, invents what it thinks the runbook says, or generates an incomplete answer.

Auditing your documentation for external references and either bringing those references into your indexed corpus or explicitly removing the links is a necessary part of context engineering. Every document in your retrieval index is implicitly vouching for everything it cites.

The pattern running through every definition in this glossary is simple: context engineering is now a core platform responsibility, not an AI feature bolted on at the edge. LLMs are capable of sophisticated reasoning, but they reason over whatever you give them. Platform teams that invest in clean catalogs, maintained TechDocs, and well-governed golden paths aren't just doing good hygiene. They're building the infrastructure that makes AI actually work.