What Your Engineering Organisation Doesn't Know About Itself

Jian Reis
What Your Engineering Organisation Doesn't Know About Itself

What Your Engineering Organisation Doesn't Know About Itself

When a coding agent returns wrong answers about your services, the default reaction in most engineering teams is to reach for model explanations: context window too small, retrieval quality poor, the model hallucinated. All of these can be true. But a piece at dekodiert.de names a different failure mode with more precision than most of the AI-in-engineering discourse manages: organisations lack honest self-description of their own decision and business logic.

The article describes a workshop with a client that surfaced 47 distinct knowledge assets the organisation relied on for operational decisions. Of those 47, 21 existed nowhere as documents. Seven couldn't be explained clearly by the people who held them. Four had documented versions that contradicted actual practice. That's 32 of 47 knowledge assets - more than two-thirds - that an AI agent cannot use reliably even with perfect retrieval, because the knowledge either isn't recorded, can't be stated, or is actively misrepresented in the official record.

The correct diagnosis is organisational opacity. Before asking which AI model to deploy against your engineering systems, the more useful question is: how legible is your engineering organisation to a system that can only act on what's been written down?

The three categories of organisational opacity

The taxonomy maps naturally onto an engineering organisation, though its examples come from a sales and operations context. The three categories behave differently and require different responses.

The first category - undocumented but knowable - is the largest, and it's the one that engineering teams most consistently underestimate. Service ownership, dependency relationships, deployment state, SLO definitions, on-call assignments, cost attribution. An engineering team knows who owns the payments service. They know which downstream services depend on it. They know the SLO is 99.9% even if nobody has written it down. They know which team gets paged when it falls over. None of this is tacit in the philosophically interesting sense - it's not knowledge that in way resists articulation. It just hasn't been written anywhere that a machine can read.

Chris Argyris distinguished between an organisation's espoused theory - what it claims about its own behaviour - and its theory-in-use, the patterns it actually follows. The piece draws on this framework explicitly. An organisation's espoused theory says "service ownership is tracked in our CODEOWNERS file." The theory-in-use says ownership questions get resolved by asking someone who has worked on that part of the stack for four years and knows how things actually got structured. An AI agent operates only on the espoused theory. It reads the CODEOWNERS file. If the file is wrong, out of date, or simply absent, the agent gives a canonically correct but functionally wrong answer - and Argyris would note that the people in the organisation can work around this gap intuitively, while the agent cannot.

The second category - tacit knowledge that genuinely resists articulation - is real but smaller: seven of the 47 assets in that workshop. In an engineering context, this is the architectural instinct that comes from having lived with a system through multiple major incidents. A senior engineer can reconstruct the original decision using an architectural decision record, or describe the failure modes in a post-mortem. But the reasoning that connects the original decision to all its downstream consequences lives somewhere between the design document and four years of accumulated fire-drill memory. Architectural decision records and TechDocs help - they capture what can be captured. They can't capture what wasn't fully articulable even at the time.

The third category - politically concealed - is smallest numerically, and the most consequential operationally. Four of the 47 assets had documented versions that contradicted actual practice. In an engineering org, this looks like: the official on-call rotation document says team A owns service X. In practice, team B handles escalations for service X because of a handover that was never formally completed after a reorg. An agent following the official record routes an incident to the wrong team. Discovering this at 2am during a major outage is how teams learn that their espoused theory and their theory-in-use have diverged in a real and critical way.

Politically concealed information is a social and political problem. Documentation tooling can describe what teams claim, but it can't change what they actually do. No catalog field closes a social contract that hasn't been renegotiated by the people responsible for it.

What a service catalog actually covers

The useful thing about the taxonomy is that it lets you scope the remediation precisely. A service catalog's job is the first category: making the undocumented-but-knowable layer legible. That's 21 of those 47 knowledge assets. More than most teams expect before they start, and less than some catalog advocates claim.

What this looks like in practice: a service catalog entry that captures service ownership, lifecycle, system membership, and dependency relationships gives an AI agent a machine-readable map of who is responsible for what. Add annotations for on-call routing, cost attribution, and documentation pointers, and you have covered most of what makes a coding agent's answers wrong today. Most of that completeness comes through ingestion pipelines and policy validation rather than manual entry - the catalog provides the structure, and the platform fills it. Service catalog context is the highest-signal context source you have for engineering queries precisely because it's typed, graph-structured, and maintained by the people who understand the domain - the information has been through a human editorial process rather than being solely extracted from unstructured sources.

In practice, field completion doesn't happen uniformly. Ownership and system membership fill in quickly. Engineers feel the absence of that data immediately - it affects on-call rotations and incident routing in ways that are personally uncomfortable. The feedback loop from "this field is empty" to "the wrong person got paged at 3am" is short enough that teams self-motivate to close it.

Cost attribution and SLO definitions consistently lag. Filling in a cost centre field requires a cross-team agreement about how infrastructure costs get allocated - and those agreements move at the speed of the organisation, not the speed of the platform team. SLO definitions face the same friction. Teams know roughly what their SLOs are. Getting those SLOs into a machine-readable format that can be referenced by an agent requires someone to commit to a specific number, write it down, and own the consequence of having written it down. That friction is real, and it explains why cost attribution and SLO coverage consistently lag even in organisations with strong catalog adoption.

The catalog works as a forcing function rather than a documentation tool. The organisational pressure that makes documentation happen is external to the catalog. The catalog provides the structure that the pressure can act on - and that distinction matters for adoption strategy.

In catalogs at the scale Roadie operates - north of 200,000 entities - the completion-rate signal becomes one of the most useful views in the catalog. Services with no owner defined, no runbook link, no SLO entry: those gaps show you exactly which parts of your engineering organisation are opaque to an AI agent before you deploy one.

Scoring legibility before you deploy

The piece notes that organisational legibility is hard to assess. Tech Insights is the direct counter to that claim: you can score it.

Tech Insights rules let you evaluate catalog entities against any coverage dimension you define:

  • No owner defined: a legibility failure.
  • No SLO entry: a legibility failure.
  • No runbook annotation: a legibility failure.

Run those rules across 200 services and you get a legibility scorecard for your engineering organisation - specific, actionable, and updated automatically every time the catalog changes. Coverage percentages across ownership, SLO definition, runbook linkage, and cost attribution tell you, domain by domain, where the agent's answers are likely to go wrong.

The scorecard serves a second purpose. Before deploying an agent with significant scope - writing runbooks, triaging incidents, suggesting architectural changes - you can use completion rates as a readiness gate rather than a retrospective diagnostic. Ownership coverage at 70% means 30% of your services have no agent-readable owner - incident ownership queries for those services will return wrong answers regardless of model quality. That's a knowable condition before you deploy. No model in the world can confidently close a data coverage gap in a way that's useful for teams.

If you want to expose your catalog as context for AI agents today, exposing your Backstage catalog via an MCP server is the practical starting point for wiring the catalog into a coding agent's context. For the structural question of what makes engineering graph context different from generic retrieval - why entity relationships matter as much as entity attributes, and what that means for retrieval architecture - Context Engineering for Platform Engineers covers the analysis.

What the catalog cannot fix

The tacit knowledge category - those seven assets from the workshop - is partially addressable and partially not. A coding agent writing a runbook for an unfamiliar service can use catalog metadata to answer who owns it, what its SLOs are, and how it connects to other services in the graph. It can't answer why the retry logic is implemented the way it is, or why the service doesn't use the standard circuit breaker pattern that everything else in the system uses. Those answers live in the memory of engineers who were present for the original design conversations and the three incidents that shaped the current implementation. Capturing the decisions via ADRs is worth doing - it makes the tacit layer thinner. It doesn't make it zero.

The politically concealed category is different in kind. If an organisation's official ownership model doesn't match its actual operational model, adding catalog fields creates a cleaner-looking version of the wrong answer. An agent that reads a well-formatted catalog-info.yaml pointing at the wrong team will route incidents with high confidence to the wrong place. The right response to concealed information is the human conversation that hasn't happened yet - about who actually owns what after the reorg, about which cost centre is genuinely accountable for which services. The conversation has to happen somewhere external to the tooling, and the catalog's job is to reflect the outcome once it does.

The honest scope: a service catalog significantly improves an agent's ability to reason about the undocumented-but-knowable layer. For a reasonably maintained catalog, that means substantially closing it - 21 of the 47 knowledge assets from the workshop, covering service ownership, dependency maps, deployment state, SLO definitions, on-call routing, and cost attribution. But the catalog doesn't close the gap entirely, and teams that expect it to will still find their agents producing confidently wrong answers about decisions that were never honestly documented.

The question before the deployment question

Before asking which AI model to deploy against your engineering systems, check your catalog's field completion rate across ownership, SLOs, runbooks, and cost attribution. Not as a proxy for model readiness - as a direct measure of organisational legibility.

The gaps you find are documentation gaps that predate your AI deployment by years. What changes when you put an agent on top of a catalog is the rate at which those gaps produce wrong answers - not daily when a human works around an empty field by asking someone with context, but continuously, at machine speed, for every query that depends on information that has never been recorded.

Teams that close the undocumented-but-knowable gap before deploying agents get compounding returns. The catalog work that makes agent answers reliable is the same work that makes on-call rotations correct, incident routing accurate, and cost attribution meaningful. Those weren't AI problems before you added an agent. They just weren't generating failures at machine speed. The agent made the problem visible; it didn't create it.

The tacit knowledge layer and the politically concealed layer remain. With eyes open, the scope is 21 of 47 - more than most teams expect before they start, and enough to make the difference between an AI agent that confuses your engineers and one that actually helps them reason about the systems they've built.

Become a Backstage expert

To get the latest news, deep dives into Backstage features, and a roundup of recent open-source action, sign up for Roadie's Backstage Weekly. See recent editions.