Your IDP Is an AI Goldmine: How Internal Developer Platforms Enable Context Engineering
By David Tuite • March 19th, 2026
An on-call engineer gets paged at 3am. Checkout is degraded. They open their AI assistant and ask: "What services does checkout-api depend on, and who's on call for them?" The agent either hallucinates a plausible-sounding list of services that don't exist, or shrugs and admits it doesn't know.
The LLM has the capability to reason about dependency graphs in generic contexts, but it can’t tell you anything about a dependency graph it doesn’t have access to. The gap between what a modern LLM can do and what it actually knows about your specific systems is what kills AI adoption for anything beyond code generation.
The Two Categories of AI Tasks in Engineering Orgs
AI tasks that show up in engineering organizations can usually be split between generic and org-specific tasks.
Generic tasks such as writing a unit test, suggesting a refactor, or explaining a regex work well out of the box because the underlying knowledge is universal. The model has seen thousands of similar examples in its training data.
Org-specific tasks are different. "Which team owns auth-gateway?" "Did payments-service deploy anything in the last four hours?" "What's the runbook for checkout-api queue consumer failures?" These questions require private, structural knowledge about your organization that no pre-trained model has and can't hallucinate accurately. The knowledge is specific, relational, and continuously changing.
Most teams try to close this gap by dumping documentation into a vector store and calling it RAG. Confluence pages, GitHub READMEs, and runbook docs get chunked, embedded, and retrieved at query time. This might work until the docs go stale (immediately), ownership information gets siloed in a wiki nobody maintains (also immediately), or the agent retrieves a plausible document that describes the system as it existed eighteen months ago. Unstructured documentation is a poor substrate for org-specific AI tasks. It has no canonical entity IDs, no typed relations, and no consistent update cadence. You end up with confident wrong answers, which are worse than no answers.
The most structured, continuously updated source of engineering context in your org is your Internal Developer Platform. Context engineering, or deciding what data populates a model’s context window at inference time, treats the IDP as context infrastructure rather than a simple developer portal. The underlying knowledge is already there, and exposing that data involves wiring it into the model’s context.
What Your IDP Actually Knows (and Why That Data Is Rare)
A mature Backstage IDP maintains a layered graph of operational facts about every registered service. Each context layer maps to a different type of data:
1. Service catalog: Component, API, System, and Resource entity kinds carry spec.type, spec.lifecycle, and metadata.tags for tech stack metadata. spec.owner links every component to the team or group accountable for it. This alone answers a class of AI queries ("who owns this service?", "which services are in production lifecycle?") that most agents can't currently handle.
2. Ownership graph: The traversal spec.owner > Group entity > spec.members gives you a directed chain from any service name to a list of actual humans or an on-call rotation. When a PagerDuty plugin is attached, the group entity can resolve directly to an active incident responder, not just a team name.
3. Dependency map: spec.dependsOn, spec.providesApis, and spec.consumesApis form a queryable directed graph of service-to-service relationships. This is the data an AI agent needs to answer "what else does this change affect?" during change-impact analysis or incident scope assessment.
4. Deployment history: GitHub Actions , Tekton, and ArgoCD Backstage plugins surface deploy metadata as catalog annotations (backstage.io/last-deploy-timestamp, commit SHA, deploying user). An agent with access to this data can answer "did anything deploy to checkout-api in the past 6 hours?" without needing to query GitHub directly.
5. Incident data: PagerDuty and Opsgenie plugins embed open incident counts, on-call rotation names, and service health thresholds as entity annotations. This is the difference between an agent that helps triage and one that produces noise.
All of this data is machine-readable by design. Every piece of it exists as structured YAML at the source, delivered as typed JSON through the Backstage Catalog API . Contrast that with a Confluence page about checkout-api, which might contain some of this information, written in prose, last updated whenever someone remembered to do it. The IDP version is authoritative, entity-keyed, and alive.
Proprietary portals that don't offer accessing the catalog entities through a consistent data structure lack the structural backbone that makes this data tractable as an AI context source.
Context Engineering: What Goes Into the Window and Why It Matters
Unlike prompt engineering, which focuses on how you phrase a request, context engineering relies on a set of decisions to determine what the model sees before it generates anything at all. These decisions might include what data to retrieve, how to structure it, when to inject it, and how much to trim.
For an AI agent operating in a production engineering org, IDP data maps to four distinct context types, each relevant to a different query class:
- Factual context (ownership, lifecycle, tech stack) answers "who owns this" and "what kind of thing is this"
- Relational context (dependency maps) answers "what else is affected" and "what does this call"
- Historical context (deployment events, incident records) answers "what changed recently" and "has this broken before"
- Procedural context (runbooks, ADRs, TechRadar entries linked to catalog entities) answers "how do we handle this"
The architectural advantage of IDP data over unstructured docs is precision. Catalog entities have canonical identifiers and typed relations, which means retrieval can combine semantic search with structured filtering. A vector similarity search can surface the relevant entity description, while the entity name, namespace, and relations ensure the agent retrieves the correct checkout-api component, not just a document that happens to mention checkout in passing. Semantically similar isn't good enough; the context needs to resolve to the exact service entity.
Three Patterns for Wiring Your IDP to an AI Agent
The following three patterns describe different ways of consuming the Backstage Catalog API as an underlying data source. Each uses a different retrieval mechanism and requires distinct infrastructure.
Pattern A: RAG Over Catalog Entities
Embed catalog entity descriptors as structured text chunks and store them in a vector index, such as pgvector or a hosted vector service (use what you already have). Retrieve the relevant chunks at query time and inject them into the system prompt. LangChain and LlamaIndex both have straightforward document loader patterns for this.
The chunking strategy matters. Avoid embedding an entire entity as a single chunk. Instead, split entities by context type: a facts chunk (name, owner, lifecycle, description), a dependencies chunk (dependsOn, providesApis, consumesApis), and an incident/deployment chunk (annotations). This produces three embeddings per entity, all keyed to the same entity name, and enables more precise retrieval when a query targets dependencies rather than ownership.
Best for: Read-only Q&A at scale ("list all services owned by team-payments", "which services are in experimental lifecycle?").
Trade-off: Can impact index freshness. If you don't wire catalog change events to index updates, your RAG pipeline will drift from the live catalog.
Pattern B: MCP Server Wrapping the Backstage Catalog API
Run a Model Context Protocol (MCP) server that wraps the Backstage Catalog API and exposes catalog operations as agent-callable tools. Anthropic’s MCP server (released November 2024) defines a standard for exposing external systems in this way, allowing agents to fetch fresh catalog data during inference.
The MCP server translates existing Catalog API endpoints into tool definitions. For example:
| MCP Tool | Backstage Endpoint |
|---|---|
get_component_by_name | GET /api/catalog/entities/by-name/component/{namespace}/{name} |
list_entities_by_owner | GET /api/catalog/entities?filter=spec.owner={team} |
get_entity_relations | GET /api/catalog/entities/by-name/{kind}/{namespace}/{name} |
When the agent receives a query, it can call these tools dynamically and retrieve fresh catalog data during reasoning.
For example, answering a question like:
Which services owned by
team-paymentshave dependencies oncheckout-api?
might involve multiple tool calls:
- Query services owned by the team
- Retrieve each entity's dependency relations
- Filter those that reference
checkout-api
An MCP server orchestrates those API calls while exposing them to the agent as simple tools.
Best for: Multi-step agentic workflows that need to traverse the service graph dynamically.
Trade-off: Every tool call translates into a Catalog API request, so complex queries can introduce additional latency compared to pre-indexed retrieval.
Pattern C: Direct Function-Tool Definitions
Define Catalog API endpoints as function tools directly in your OpenAI or Claude API call. No additional infrastructure. The agent calls the tool during inference, fetches entity data, and incorporates it into its response.
Here's a minimal tool definition for get_component_by_name:
{
"name": "get_component_by_name",
"description": "Retrieve a Backstage catalog entity for a named service component. Returns ownership, dependencies, lifecycle status, runbook URL, and deployment metadata.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The component name as registered in the Backstage catalog (e.g., 'checkout-api', 'payments-service')"
},
"namespace": {
"type": "string",
"description": "The Backstage namespace, defaults to 'default'",
"default": "default"
}
},
"required": ["name"]
}
}
The implementation calls GET /api/catalog/entities/by-name/component/{namespace}/{name} against your Backstage instance.
Best for: Shipping a proof of concept today. No new infrastructure, real-time data, works with any LLM that supports function calling.
Trade-off: You're making an API call per query. At high volume, Pattern A's pre-indexed retrieval will be faster.
Choosing the right pattern
Start with Pattern C today. Graduate to Pattern B as your agentic workflows get more complex, particularly when they require multi‑hop traversal. Pattern A is the right call if you're building read-heavy Q&A at scale and want to minimize per-query API latency.
The table below summarizes the key trade‑offs between the three patterns.
| Pattern | Freshness | Latency | Infra Complexity | Best For |
|---|---|---|---|---|
| A: RAG over catalog entities | Index refresh cadence | Low (pre-indexed) | Medium (embedding pipeline + vector store) | Read-only Q&A at scale |
| B: MCP server | Real-time | Higher (per-hop API call) | High (MCP server) | Multi-step agentic workflows |
| C: Direct function tools | Real-time | Medium (per-query API) | Low (none) | Zero-infra proof of concept |

Building the Pipeline: From catalog-info.yaml to Context String
In the pipeline from IDP to AI agent, Backstage acts as the system of record, where services are described in catalog-info.yaml with structured relationships and metadata. The Catalog API provides a queryable interface over that data, returning entity definitions as JSON. From there, the pipeline converts those entities into smaller, purpose-built context representations — either as embeddings for retrieval or as structured responses returned through tool calls.
A typical catalog-info.yaml in a production Backstage instance can define metadata such as its name, description, owner, dependencies, and the APIs it provides.
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: checkout-api
description: "Handles payment checkout flows and queue consumer processing"
annotations:
pagerduty.com/service-id: "P1234XY"
backstage.io/last-deploy-timestamp: "2025-01-14T22:31:00Z"
runbook-url: https://runbooks.internal/checkout-api
spec:
type: service
lifecycle: production
owner: team-payments
system: payments-platform
dependsOn:
- component:default/payments-service
- component:default/inventory-api
- resource:default/checkout-queue
providesApis:
- checkout-api-v2
consumesApis:
- payments-processing-api
The Catalog API surfaces the YAML as a JSON entity at GET /api/catalog/entities/by-name/component/default/checkout-api. The fields your agent needs are in metadata, spec, and metadata.annotations. An important thing to keep in mind is that while the catalog-info.yaml file is the most common method to declare entities, the catalog can also be populated from multiple data sources, such as AWS, and in many scenarios, enriching an entity from multiple sources can provide a richer context to the AI agent.
Here's a Python function that takes the entity JSON and returns a structured context string ready to inject into a system prompt or retrieve from a vector index:
def parse_entity_ref(ref: str) -> str:
"""
Convert Backstage entity refs like:
'component:default/payments-service'
into just:
'payments-service'
"""
try:
return ref.split("/")[-1]
except Exception:
return ref
def entity_to_context_string(entity: dict) -> str:
metadata = entity.get("metadata", {})
spec = entity.get("spec", {})
annotations = metadata.get("annotations", {})
name = metadata.get("name", "unknown")
description = metadata.get("description", "No description provided")
owner = spec.get("owner", "UNKNOWN — ownership gap")
system = spec.get("system", "UNKNOWN — system unassigned")
lifecycle = spec.get("lifecycle", "unknown")
depends_on = [parse_entity_ref(r) for r in spec.get("dependsOn", [])]
provides_apis = [parse_entity_ref(r) for r in spec.get("providesApis", [])]
consumes_apis = [parse_entity_ref(r) for r in spec.get("consumesApis", [])]
runbook = annotations.get("runbook-url", "No runbook linked")
last_deploy = annotations.get("backstage.io/last-deploy-timestamp", "No deploy data")
pagerduty_id = annotations.get("pagerduty.com/service-id", "No PagerDuty link")
return f"""Service: {name}
Description: {description}
Owner: {owner}
System: {system}
Lifecycle: {lifecycle}
Dependencies: {', '.join(depends_on) if depends_on else 'None recorded'}
Provides APIs: {', '.join(provides_apis) if provides_apis else 'None recorded'}
Consumes APIs: {', '.join(consumes_apis) if consumes_apis else 'None recorded'}
Last Deploy: {last_deploy}
Runbook: {runbook}
"""
Example: What the Model Actually Sees
When the agent retrieves context for a service like checkout-api, the information injected into the model’s context window is a structured block derived from the catalog entity, not the raw YAML or JSON.
A typical context injection might look like this:
Description: Handles payment checkout flows and queue consumer processing
Owner: team-payments
System: payments-platform
Lifecycle: production
Dependencies: payments-service, inventory-api, checkout-queue
Provides APIs: checkout-api-v2
Consumes APIs: auth-api
Last Deploy: 2025-01-14T22:31:00Z
Runbook: https://runbooks.internal/checkout-api
This block is small enough to fit comfortably inside an LLM context window while still giving the model the critical operational facts it needs to answer questions like:
"Who owns checkout-api?"
"What services might be affected if checkout-api fails?"
"Did anything deploy recently that could explain this incident?"
For Pattern A (RAG), different slices of this information are typically embedded separately. For example, one embedding for service facts, one for dependency relations, and one for operational history. So, the retrieval layer can return only the context relevant to the user’s question.
The model reasons over authoritative service metadata pulled directly from the IDP catalog, rather than relying on inference or approximation.
If you're running on Roadie, the Catalog API is already available at a stable, authenticated endpoint. Roadie also ships the AI Assistant RAG plugin , which implements the embedding, indexing, and retrieval layer of Pattern A out of the box. If your org is already on Roadie, the pipeline in this section is largely already running. You would only need to connect your LLM endpoint to it, instead of building the chunking infrastructure from scratch.
Operational Hygiene: Your Context Is Only as Good as Your Catalog
Incomplete catalog data creates predictable failure patterns when used as AI context. Here’s a typical example: an AI agent pages the wrong team during an incident because spec.owner was missing from a catalog entity, and the agent fell back to a default or hallucinated a plausible owner name.
To avoid this class of failure, a catalog needs to meet three completeness requirements before it’s safe to use as an AI context source:
spec.owner must be populated on every component: Unenforced ownership means the agent has no escalation path. An agent that can't answer "who owns this service?" is useless for incident triage and on-call routing, which are the two highest-value use cases for real-time IDP context.
metadata.description must be non-empty: Empty descriptions degrade embedding quality and cause false-positive retrievals in Pattern A. A query for "checkout flow services" can return inventory-api simply because its description is empty, effectively turning it into a wildcard candidate during retrieval.
System relations must be defined: Without spec.system, the dependency graph is a set of disconnected nodes. An agent trying to answer "what other services are in the same system as checkout-api?" can't traverse a graph that doesn't have system edges. This matters for change-impact analysis, which needs to understand blast radius within a system boundary.
Beyond completeness, two operational concerns apply regardless of which pattern you use:
Access control: AI agents query the catalog on behalf of users. Backstage's permission framework must be enforced at the API layer so agents can't surface catalog data that the requesting user isn't authorized to see. Don't skip this step just because the agent interface feels informal.
Catalog freshness: Catalog auto-sync must be wired to source change events such as SCM push hooks and CI completion events, not nightly batch jobs. Deployment history and incident annotations are time-sensitive. A last-deploy timestamp from eight hours ago is misleading context during an active incident. Every hour of staleness widens the hallucination window on operational queries.
That said, wiring SCM-triggered catalog refresh reliably is harder in practice than it sounds. Sync failures, webhook misconfigurations, and integration drift are recurring operational costs of self-hosted Backstage. On Roadie's SaaS platform, SCM-triggered catalog refresh and entity validation are managed infrastructure, which removes the self‑hosted costs associated with stale‑catalog problems caused by sync failures. This is a concrete build-vs-buy consideration for teams deciding where to invest engineering time.
Start Here: Audit Your Catalog for AI Context Readiness Today
To get started auditing your catalog for AI context readiness, here are three steps you can complete before end of day. No new dependencies required.
Step 1: Run a completeness audit. Hit GET /api/catalog/entities?filter=kind=component against your Backstage instance and pipe the response through this script:
import requests
from collections import defaultdict
BACKSTAGE_URL = "https://your-backstage.example.com"
TOKEN = "your-backstage-token"
def audit_catalog_completeness():
url = f"{BACKSTAGE_URL}/api/catalog/entities?filter=kind=component"
headers = {"Authorization": f"Bearer {TOKEN}"}
response = requests.get(url, headers=headers)
entities = response.json()
gaps = defaultdict(list)
for entity in entities:
name = entity["metadata"]["name"]
spec = entity.get("spec", {})
metadata = entity.get("metadata", {})
if not spec.get("owner"):
gaps["missing_owner"].append(name)
if not spec.get("system"):
gaps["missing_system"].append(name)
if not metadata.get("description"):
gaps["missing_description"].append(name)
total = len(entities)
print(f"\nCatalog Completeness Audit — {total} components\n")
for gap_type, names in gaps.items():
pct = len(names) / total * 100
print(f"{gap_type}: {len(names)} entities ({pct:.1f}%)")
for name in names[:5]:
print(f" - {name}")
if len(names) > 5:
print(f" ... and {len(names) - 5} more")
audit_catalog_completeness()
This tells you exactly how much context debt you're sitting on. If 30% of your components are missing spec.owner, that's 30% of the queries an AI agent handles about ownership that will produce wrong or empty answers.
Step 2: Fix the gaps on your top 10 most critical services first. Define "most critical" as the highest deploy frequency, most upstream dependents, or most incident-prone, whichever your team can quantify. These are the services an AI agent will be asked about most often. A complete catalog entry for checkout-api is worth more than partial entries for 50 internal tools nobody queries.
Step 3: Write and test one function-tool definition. Take the JSON tool definition from Pattern C above, attach it to a Claude or OpenAI Playground session, and ask a real question such as "Who is on call for payments-service?" or "What does checkout-api depend on?" If the catalog entry is complete, the answer will be correct. If the answer is wrong or empty, the output will point to the gap in the catalog data, such as a missing spec.owner, an empty spec.dependsOn, or a PagerDuty annotation that hasn’t been set.
The catalog completeness audit is the forcing function here. A RAG pipeline built on top of an incomplete catalog cannot reliably produce accurate answers. Getting the data right comes first. The context pipeline is the easy part.
Frequently Asked Questions
What is context engineering for AI agents?
Context engineering is the discipline of deciding what data populates a model's context window at inference time: what to retrieve, how to structure it, when to inject it, and how much to trim. Unlike prompt engineering, which focuses on how you phrase a request, context engineering controls what the model sees before generating a response. This means wiring structured IDP data, including ownership graphs, dependency maps, and deployment history, directly into agent queries.
Why is an IDP better than Confluence for AI context?
An Internal Developer Platform like Backstage stores engineering knowledge as structured, entity-keyed, machine-readable YAML. Every component has a canonical ID, typed relations (spec.dependsOn, spec.owner), and is updated continuously via SCM and CI integrations. Confluence pages go stale immediately, have no canonical entity IDs, and contain no typed relations. Structured IDP data produces grounded, accurate answers.
Which pattern should I start with: RAG, MCP, or function tools?
Pattern C (direct function-tool definitions) is usually the best place to start. It requires no new infrastructure, delivers real-time Catalog API data, and works with any LLM that supports function calling. You can ship a working proof of concept today. Graduate to Pattern B (MCP server) when your agentic workflows need multi-hop catalog traversal. Pattern A (RAG over catalog entities) is a better fit when you need low-latency, read-only Q&A at scale.
Final Thoughts
Every engineering org with a functioning IDP already has the context to make org-specific AI tasks tractable. Ownership graphs, dependency maps, deployment history, and incident annotations are already present as structured, entity-keyed data that updates continuously. Bridging the gap between "we have an IDP" and "our AI agent knows our system" is largely a matter of wiring.
The three patterns above provide a concrete path from the Backstage Catalog API to a grounded AI agent, whether you want to ship something quickly (Pattern C), build a scalable retrieval pipeline (Pattern A), or support multi-hop agentic workflows (Pattern B). The catalog completeness requirements define the data quality bar that makes any of these patterns reliable in production.
Your IDP's catalog is already the most accurate, continuously-updated map of your engineering org, making it a practical foundation for AI agents’ context infrastructure.
If you're running Backstage and want the Catalog API, entity sync, and AI Assistant RAG pipeline without the self-hosted maintenance overhead, Roadie's managed platform ships all three. See how Roadie turns your IDP into a context engine.