You've watched an AI assistant confidently name the wrong team as the owner of a service that just fired an alert. You've seen it suggest an architectural change that ignores a hard dependency your team has known about for two years. The Stack Overflow Developer Survey 2024 found 76% of developers are already using or planning to use AI tools, and McKinsey research puts the speed improvement for coding tasks at roughly 2x. That adoption pressure means teams are buying AI platforms before they've developed the evaluation criteria to distinguish genuinely useful tooling from a well-marketed wrapper around a generic LLM.
If you want to determine whether an AI platform will be useful in production, evaluate what context it can access when it generates output. Code generation quality, IDE integrations, and supported languages are all downstream of that. A platform with no access to your service catalog, team ownership model, and deployment history produces answers whose accuracy ceiling is generic internet knowledge. These answers sound authoritative but reflect a system no one runs. Context engineering is the architecture layer that sets the ceiling on what any amount of prompt optimization can achieve.
This checklist runs across five categories and 15 questions. A "yes" to each means the platform treats context as infrastructure. Bring it to your next vendor call, and you'll immediately see the difference between a context-rich engineering platform and an AI chat interface bolted onto a proprietary portal.
Category 1: Service Catalog Completeness
Q1: Does the platform index your full service catalog, including component kind, spec.type, system membership, and API definitions, or does it index source code repositories only?
AI coding assistants operate within a repository context alone. If your catalog entity knows that payment-api is a spec.type: service within the checkout system and exposes a specific OpenAPI definition, an AI query can contextualize recommendations at the system level. A platform limited to repository indexing can only draw on codebase content, and questions about service topology (which team owns a downstream dependency, what SLA that service exposes) require catalog entity data that lives outside any repository.
Q2: Can AI queries traverse the entity graph (for example, owner > system > component > API > dependency), or is catalog access a flat key-value lookup?
Graph traversal is the capability that separates structural answers from lookups. A flat lookup can tell you who owns payment-api. Graph traversal can tell you which team owns the service that payment-api depends on, what that service's SLA is, and where its TechDocs live, all in a single query. Roadie's Catalog Graph plugin models these relationships explicitly: entities link via YAML-defined relations (ownedBy, dependsOn, consumesApi, providesApi), and the CatalogGraphPage lets you filter by kind and relation type to any configured depth.
Q3: Is the catalog schema extensible, with support for custom entity kinds and metadata fields that AI can subsequently query against?
Your schema will evolve. Teams add compliance metadata, cost center annotations, SLO targets, and custom entity types that reflect their actual domain. If the schema is rigid, any organizational knowledge that doesn't fit the platform's data model becomes invisible to AI queries, no matter how carefully your teams encoded it. Roadie's catalog schema supports custom entity kinds and additional metadata fields, so domain-specific context stays queryable as your catalog grows.
Answering "which team owns the downstream service my payment API depends on?" requires traversing component and group entities linked by explicit relation types, which is why catalog depth and graph traversal set the quality ceiling for any RAG implementation you deploy.
Category 2: Ownership and Team Metadata
Q4: Is service ownership a first-class field in the data model (for example, spec.owner), or is it applied as a tag or annotation without structural enforcement?
Tags and annotations are human-readable strings, and AI agents require typed, relational fields to traverse ownership data programmatically. When spec.owner is a typed, validated field pointing to a Group or User entity, an AI agent resolves it to an actual node in the graph. An annotation like team: payments is a string that requires pattern matching and is frequently stale, inconsistently formatted, or simply absent. The Backstage entity YAML schema enforces spec.owner as a structured reference, and Roadie's catalog preserves that structural integrity across all indexed entities.
Q5: When the AI returns a recommendation or surfaces an incident, can it identify and surface the owning group, their on-call contact, and their TechDocs in a single query traversal?
The practical value of ownership data depends on how many query hops it takes to go from "here's the alert" to "here's the person and the runbook." If the AI returns a partial answer and you have to manually look up PagerDuty separately, the system delivers no speed advantage over your current workflow. A well-structured entity graph maps component to group to user to documentation, and a properly configured RAG retrieval pipeline returns all of those nodes together as a single grounded response.
Q6: Does ownership data stay synchronized with your actual org structure, including LDAP, GitHub teams, and PagerDuty schedules, or is synchronization manual?
Ownership data that reflects headcount six months ago misdirects incident response by pointing to the wrong person. The catalog must ingest org changes via automated ingestion, and if the vendor's answer to "how is ownership kept current" is "engineers update their catalog entity," that's a maintenance process that degrades under the pressure of every other thing those engineers are doing. Ask specifically how ownership synchronizes with your authoritative sources (LDAP , GitHub teams , PagerDuty) and what the propagation lag is.
Category 3: Historical and Operational Context
Q7: Does the platform ingest deployment history as structured data, or does it display recent deploys in a UI panel without making that data queryable?
There's a meaningful difference between displaying deployment history and indexing it as structured context. A UI panel showing your last ten deploys is useful to a human reading a dashboard. Structured deployment data indexed as catalog context means an AI assistant can correlate "the service started returning 500s" with "a deploy touched this component's dependency 14 minutes ago" without requiring a human to manually cross-reference three tools.
Q8: Can the AI assistant cross-reference a live service alert with the most recent deployments that touched the affected component's dependencies?
This query pattern is what most teams actually need during an incident. It requires the alerting event to be connected to an entity in the catalog, and deployment events must be indexed against those same entities. Both conditions must be satisfied for the correlation query to work. You can confirm this during a vendor evaluation in about 10 minutes by asking the vendor to cross-reference a live alert against a recent deploy, using your own service topology.
Q9: Is embedding generation event-driven, triggered by catalog changes, deploy events, or CI runs, or is it only periodic, and is the scheduling configurable?
A context store that refreshes every 24 hours will be stale during an incident at 2am when the deploy that caused the problem shipped at 1am. Roadie's RAG AI Plugin exposes an endpoint for configuring both periodic and event-based embedding generation. Event-based generation lets you trigger re-indexing on a catalog mutation, a deployment webhook, or a CI pipeline completion, so the AI's knowledge of your system reflects the current state at the moment of each query. For environments where deploys happen multiple times per day, event-based generation is the only operationally sound configuration.
Category 4: AI Architecture and Reliability
Q10: Is the AI layer implemented as RAG against your live catalog data, or does it depend on fine-tuning, a generic model, or a static snapshot?
Fine-tuning encodes patterns at training time, producing a model whose knowledge is anchored to the state of your systems when the training data was assembled. Any change after that point (a new catalog entity, a team restructure, or a deprecated API) requires a new training cycle before the model reflects it. RAG retrieves from the live index at query time , so the model's answers reflect the current organizational state. The Roadie AI Assistant uses RAG across indexed catalog entities, TechDocs, OpenAPI specs, and Tech Insights scorecard data, making current entity state the ground truth for every response.
Q11: What is the documented hallucination mitigation strategy, and does it address your specific data sources or rely on a claim that the model is generally accurate?
Effective hallucination mitigation names the specific mechanism and the data sources that ground each query type. For Roadie's AI Assistant, that mechanism is RAG: every response is grounded in retrieved catalog entities and TechDocs content, so the model's outputs for team names or API endpoints are bounded by what exists in your indexed data. Ask vendors to explain what happens when retrieval returns no matching context, because that edge case is where wrong answers appear.
Q12: Can you swap LLM providers without re-engineering the retrieval pipeline or the vector store?
Provider portability matters because the model that performs best for your queries today may not be the right choice in 12 months, and the organizational security policy sometimes dictates which providers are permissible. Roadie's RAG AI plugin supports both AWS Bedrock and OpenAI for embedding generation and response synthesis. The vector storage layer runs on PostgreSQL with the pgvector extension , which most engineering teams already operate as part of their standard database infrastructure, so the vector store adds no new operational dependency to your stack.
Category 5: Extensibility, Governance, and Lock-in
Q13: Is the platform's underlying data model built on an open standard, or does adopting it mean migrating your catalog data into a proprietary entity schema?
A proprietary entity schema creates lock-in at the data model layer. Your service catalog represents organizational knowledge accumulated over years of engineering work: ownership records, dependency mappings, API contracts, and team structures. If that data lives in a schema owned by a vendor, migration means rebuilding your catalog from scratch. Roadie's catalog is built on the Backstage entity YAML schema, an open specification that defines component, system, API, group, and user entity types. That data is yours to take, extend, or migrate, and the tooling ecosystem built around that specification is available regardless of which managed platform you use.
Q14: Can AI agents be cataloged as first-class entities, with ownership, dependencies, and provenance tracked, or does the platform's data model need to be extended to accommodate them?
As teams move from AI assistants to AI agents that take actions in production systems, the governance questions that apply to services apply equally to agents. Which team owns this agent? What APIs does it call? What data does it access? Cataloging agents as entities with spec.owner and dependency relations ensures that the same governance infrastructure tracking your services can track your agents, and that audit trails for agent-initiated writes are as traceable as any other production action. Ask whether the platform's entity schema can accommodate an Agent kind with the same first-class treatment it gives Component or API.
Q15: What is the realistic operational burden of keeping the platform current, and who owns upgrades, plugin compatibility, and security patches?
Proprietary developer portals typically push upgrade complexity to the customer. Each major version requires testing plugin compatibility, migrating configuration, and potentially rebuilding custom integrations. Over a 3-year horizon across 50+ services, that cost compounds into a recurring engineering tax. SaaS platforms built on open standards can absorb the core upgrade burden centrally while preserving the extensibility that makes the catalog useful. Get a specific, documented answer for who owns breaking changes before you commit to a data model.
Run This Audit Before Your Next Vendor Call
Before you spend time in a demo, run this against your current setup. It takes under 60 minutes and will tell you exactly where your context infrastructure has debt.
- First, query your catalog API and count what percentage of your services have a populated
spec.ownerfield. A catalog where 40% of components have no ownership data is already showing you where AI will fail. - Second, verify whether that ownership data is traversable via API. Pull the entity for any component, follow its
ownedByrelation, and confirm you can resolve through to a user and their contact information programmatically. - Third, check whether your CI/CD pipeline or deployment tooling can emit webhook events that a context platform could consume for event-based re-indexing. If your deploy process has no webhook output, periodic indexing is your only option, and that freshness ceiling is a concrete operational risk in production.
This audit helps you identify exactly where an AI platform will produce confident, wrong answers during your next production incident. Any platform worth evaluating should have a direct, documented answer for every item on this list.
See how Roadie provides structured engineering context for your team and AI agents. Request a demo.
