<?xml version="1.0" encoding="utf-8"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Roadie Blog</title><link>https://roadie.io/blog/</link><description>
  Technical how-tos, feature announcements recaps of community sessions and general engineering effectiveness content.
</description><lastBuildDate>Sun, 17 May 2026 00:26:25 GMT</lastBuildDate><docs>https://validator.w3.org/feed/docs/rss2.html</docs><generator>https://github.com/jpmonette/feed</generator><language>en</language><copyright>All rights reserved 2026, Roadie</copyright><item><title><![CDATA[The Governance Gap in Agent-Stack Thinking]]></title><link>https://roadie.io/blog/governance-gap-agent-stack/</link><guid isPermaLink="false">https://roadie.io/blog/governance-gap-agent-stack/</guid><pubDate>Wed, 13 May 2026 09:00:00 GMT</pubDate><description><![CDATA[The first wave of agent building left a governance gap. Here's why runtime governance - policy enforcement, context quality, and human oversight - is the fifth infrastructure bet.]]></description><content:encoded><![CDATA[<h1>The Governance Gap in Agent-Stack Thinking</h1><p>Addy Osmani published <a href="https://addyo.substack.com/p/the-agent-stack-bet">The Agent Stack Bet</a> a little while ago and it's getting the attention it deserves. He names four infrastructure bets that teams building production agents need to place: dedicated agent identity, universal context integration, persistent durable execution, and purpose-built platform primitives over DIY plumbing. The framing is right, and his list is almost complete.</p><p>Almost. What Osmani describes is the infrastructure that lets agents operate. He's largely silent on the infrastructure that makes operating them safe. The gap shows in month three, not the first sprint - when someone has to account for what the agent did, not just whether it's running.</p><p>The fifth bet is runtime governance. The Cloud Security Alliance found that only 16% of enterprises currently govern AI agent access to core business systems effectively.</p><h2>What the four bets actually buy you</h2><p>The four bets are real and the industry is under-invested in all of them. Agent identity matters because agents operating on shared credentials are impossible to audit and trivially compromised. Context integration matters because an agent reasoning from thin or stale information is worse than useless - it's confidently wrong. Persistent durable execution matters because multi-step workflows that can't survive a restart or a credential rotation can't do real work. Building on platform primitives rather than hand-rolling infrastructure is sound engineering at any scale.</p><p>But notice what those four bets describe: the agent as a machine. A machine with an identity, access to data, an ability to run for a long time, and a well-built chassis. They don't describe who operates that machine, what it's allowed to do under different conditions, how you inspect what it did, or who is responsible when it acts outside its intended scope.</p><p>Osmani's piece is about infrastructure for building agents. When teams move from "this works in staging" to "this is running in production on 40 workflows", they discover that infrastructure is necessary but not sufficient. The gap is operational.</p><h2>What governance debt actually looks like</h2><p>Osmani calls this "governance debt" - his phrase for the silent accumulation of security and audit risk that eventually forces a full rewrite, usually right after the first incident that reaches the CISO. The frame is right.</p><p>An agent with the four bets in place can run cleanly for weeks across dozens of production workflows. Then it takes an action that shouldn't have happened. Maybe it escalated a ticket to an external partner using a template that was out of date. Maybe it triggered a deployment to a production environment during a freeze window because it didn't have visibility into the freeze state. Maybe it queried a data source that had recently been reclassified as sensitive.</p><p>The incident review happens. The question is simple: why did it do that?</p><p>Agents do produce decision traces. The model usually surfaces what it reasoned about, what it called, and what it tried. The problem at production scale isn't the absence of traces - it's that raw model traces aren't structured for accountability. Without an audit trail that captures what context the agent saw at runtime, what policy it operated under, and what decision pathway led to that specific action, you can't answer the question that actually matters: why was it allowed to do that?</p><p>That's governance debt coming due. It's a showstopper. Engineering leadership, legal, compliance - they don't care how impressive the efficiency ratio is. They care whether you can account for what the system did. When it arrives, the failure tends to follow a recognisable shape: the agent ran cleanly for weeks, then took one action nobody had authorised, and the question of who was responsible stalled the rollout regardless of how the infrastructure had performed.</p><h2>The three things governance actually is</h2><p>Runtime governance covers three distinct functions, and they have to work together.</p><h3>Policy enforcement</h3><p>An agent with a valid identity and access to correct context can still take actions outside its intended scope. The distinction matters: identity establishes who the agent is, policy establishes what it can do right now. Those are different questions with different infrastructure answers. Osmani correctly argues that policy should be enforced at the platform level, not in application middleware. But that principle needs to be cashed out operationally. Runtime governance means a policy layer that evaluates each agent action against current rules before executing it, not after. Not a system prompt saying "don't touch production". An infrastructure-level enforcement point that determines what the agent can do before it does it.</p><p>The policy needs to be dynamic, too. A deployment agent that has write access during normal operating hours should not have the same access during an active incident, or during a code freeze, or when the target service is in a degraded state. Static permission grants don't handle this. Runtime policy enforcement does.</p><h3>Context quality standards</h3><p>This is the one that surprises most teams. You've built the context layer. You've integrated your sources. The agent has what it needs.</p><p>Context has a quality dimension that's separate from its existence. A deployment record from three weeks ago tells you less than one from three hours ago. An ownership record created before a reorg may point to a team that no longer owns the service. A runbook never validated against current infrastructure may be accurate, or may be subtly wrong in ways that only show up in edge cases.</p><p>Without provenance tracking - where did this fact come from, when was it last verified, how should conflicts between sources be handled - the agent consumes data of unknown reliability. At small scale that's manageable. At production scale, with agents acting on context across hundreds of services simultaneously, an untracked staleness problem propagates into dozens of decisions before anyone notices. Governance includes the standards that keep context trustworthy, not just the pipeline for ingesting it.</p><p>The governance question here is accountability: not just who built the pipeline, but who signs off that the context an agent is about to act on is trustworthy enough for the action it's about to take. That accountability has to be explicit. If it isn't, it defaults to nobody, which means the agent is operating without a quality floor.</p><h3>Designed human oversight</h3><p>Osmani mentions human-in-the-loop approval gates as part of his persistent execution bet. Right call, but the framing can be tightened. Human-in-the-loop should be a governance design pattern, not a recovery mechanism you activate when something goes wrong.</p><p>The difference is architecture. Recovery-mode oversight says: pause the agent when it's about to do something catastrophic. For that to work, you need to know in advance what "catastrophic" looks like, and you need to have defined the triggers correctly. In production, you won't always know. The novel failure modes - the ones that damage trust are usually the ones nobody anticipated.</p><p>Designed oversight says: at these specific points in the workflow, a human reviews the agent's proposed action before it runs. Not because you expect failure, but because the workflow has high enough stakes that human judgment belongs in the loop by design.</p><p>When the ratio of agent actions to human decisions reaches production scale, the humans aren't reviewing everything - and they shouldn't be. The whole point is to get humans out of the routine path. Governance determines what the humans do review: the decision points where errors compound, where actions are irreversible, where the agent is operating at the edge of its validated context. You have to design those checkpoints in advance, not discover the need for them afterwards.</p><h2>Why the IDP is the natural governance layer</h2><p>The hard parts of governance infrastructure are largely already built - for humans.</p><p>A mature internal developer portal already governs what developers can do. It controls which scaffolding templates are available. It enforces which deployment targets a team can push to. It gates access to production systems. It tracks ownership, so every service has a named team accountable for it. It records the relationship between teams, services, APIs, and dependencies.</p><p>Extending that governance to agents is not starting from scratch. The portal already knows the ownership graph. It already has the policy model for what different teams can access and change. It already maintains the service topology that tells you what an agent is allowed to touch on behalf of which team.</p><p>The audit trail question - who ran this, from what state, and when? - is the same question the portal already answers for human actions. The infrastructure for answering it is the same infrastructure runtime governance for agents needs.</p><p>I've argued before that the biggest mistake platform teams make is treating agent deployment as a technical problem when it's an organisational one. You can't measure deployment frequency across your organisation until you agree on what a deployment is. No tool can solve that alignment problem for you. The same logic applies to governance. You can't enforce what agents are allowed to do until you've agreed on what they should be allowed to do - and that agreement has to exist at the team level, the service level, and the environment level simultaneously. The portal is where those agreements already live, because it's where platform teams have spent years capturing them.</p><p>Platform teams are positioned to own the governance layer because they already own the hard parts. They understand what "context quality" means in their environment because they've spent years keeping the catalogue accurate for humans. The policy model already exists because they've spent years managing what developers are allowed to do. Runtime governance for agents extends that practice.</p><h2>The fifth bet</h2><p>Osmani asks what happens to teams that don't place the four bets. They stay trapped at the demo stage - agents that impress in staging and fail in production.</p><p>The governance gap creates a different but equally costly trap. Teams place the four bets correctly. They build something that genuinely works. They scale it to production. Then they get shut down after the first serious accountability failure - not because the infrastructure was wrong, but because there was no governance layer to make it auditable, policy-constrained, and safe to operate at the scale they'd reached. Gartner projects that over 40% of agentic AI projects will be cancelled by 2027 due to inadequate risk controls.</p><p>Build it early. Policy enforcement, context quality standards, and designed human oversight are much cheaper to add before an agent is running 40 production workflows than after you're trying to reconstruct why one of them did something wrong.</p><p>The fifth bet doesn't generate the conference talks. It generates the confidence to keep the programme running past month three.</p>
]]></content:encoded></item><item><title><![CDATA[Context, Agents, MCP: A Working Glossary for Platform Teams]]></title><link>https://roadie.io/blog/context-agents-mcp-glossary/</link><guid isPermaLink="false">https://roadie.io/blog/context-agents-mcp-glossary/</guid><pubDate>Mon, 11 May 2026 00:01:00 GMT</pubDate><description><![CDATA[Shared definitions for platform engineering teams working with AI agents: agent, harness, context layer, context lake, business context layer, minimum viable context, MCP, MCP Server, MCP Gateway, and tools.]]></description><content:encoded><![CDATA[<h1>Context, Agents, MCP: A Working Glossary for Platform Teams</h1><p>The vocabulary around AI agents is a mess right now. "Context" means different things depending on which tool or blog post you're reading. "MCP" gets used for the protocol and the server in the same sentence. "Agent" covers everything from a one-shot LLM call to a fully autonomous system that writes code and merges PRs.</p><p>If you're building on a platform team and trying to hold a coherent conversation about how AI fits into your stack, shared definitions help. These are the terms we use at Roadie. They cluster into three groups: agents, context, and MCP. For each term below: a definition, a concrete example from platform engineering, and a note on how it connects to the others. If you want the full argument for why context architecture matters before diving into definitions, <a href="/blog/smart-agents-smart-context/">Smart Agents Need Smart Context</a> covers that ground.</p><h2>Agents</h2><p>An agent is an LLM-driven system that can plan and execute multi-step work by combining reasoning with tool use. In practice, an agent wraps a model with a goal or task specification, access to tools and data sources, memory or state, and guardrails covering permissions, policies, and evaluation hooks.</p><p>What separates an agent from a plain prompt-and-response is that the model decides what steps to take, uses tools to get information or take actions, and checks its work against the original goal. An agent investigating a production incident queries logs, checks recent deployments, correlates alerts, and produces a structured finding. It can open a ticket or page a team.</p><p>Agents need context to do useful work. That context comes from the context layer and is accessed via tools. MCP is the protocol that makes tool access standardised across different systems.</p><h3>Harness</h3><p>A harness is the surrounding runtime and scaffolding that makes an agent reliable and testable.</p><p>If the agent is the reasoning engine, the harness is everything around it: the system instructions that set the agent's behaviour, the adapters that connect it to tools and handle retries, the state management that tracks intermediate work across a multi-step task, and the logging and evaluation hooks that let you see what the agent did and whether it did it correctly.</p><p>When a team says "we've built an agent for X", they usually mean they've built a harness around a model for X. The model is often off-the-shelf. The harness is the engineering work - and it's where most of the investment sits. Swapping the underlying model in a well-built harness takes days. Rebuilding the harness from scratch takes months.</p><p>In the context of this glossary, the harness is what connects an agent to its context sources and MCP tools. It handles auth, retries, and observability so the agent logic doesn't have to.</p><h3>Skill</h3><p>A skill is a reusable, named capability that packages decision logic, procedural knowledge, and optionally tool use into a unit an agent can invoke for a specific class of task. Where a tool is a discrete function - fetching data or taking an action - a skill is the logic that defines how to approach a task: which steps to take, which tools to call, what conventions to apply, and how to structure the result. Some skills are purely instructional, encoding domain knowledge or output standards with no tool calls. Others orchestrate sequences of tool calls. Most non-trivial ones combine both.</p><p>An agent handling an on-call alert might invoke a "triage-deployment-failure" skill, which encodes the steps to follow, calls the deployment history tool, queries active alerts, checks the owning team's runbook, and returns a structured finding. The calling agent gets a result without reconstructing that logic each time.</p><p>For platform teams, skills are the unit of reuse in a multi-agent system. A blast-radius-assessment skill, once defined, can be called by any agent in the stack - an incident-responder, a change-management agent, a developer-facing assistant. They're also a natural testing boundary: you can verify that a skill produces the right output for a given set of tool responses without testing the full agent reasoning loop.</p><p>Skills are distinct from a Harness, which wraps a specific agent and manages its runtime. A skill is portable logic that any harness exposing the right tools can invoke.</p><h2>Context</h2><p>Context is the information an agent needs to do useful work for a specific situation.</p><p>This is distinct from the model's general training. A model is trained on broad data. Context is what you inject at runtime to make that general capability specific to the task at hand. When an agent reviews a pull request, the relevant context isn't everything the model knows about code review - it's this PR, this repo's conventions, this team's recent deployment history, and the services this change touches.</p><p>Good context is relevant to the task, scoped so the model isn't processing noise, and trustworthy enough to act on. Bad context - stale, incomplete, or inaccurate - doesn't just fail to help. It causes the agent to act on wrong information, which is often worse than no information at all. The sub-concepts below describe different ways of structuring and thinking about context when you build infrastructure to support it.</p><h3>Context Plane / Context Layer</h3><p>The context plane - also called the context layer - is what an agent has access to at runtime: entities, relationships, temporal state, rules, provenance, and standard operating procedures compiled into a queryable store. Usually this is a graph.</p><p>In a platform engineering context, services are nodes, ownership and dependency relationships are edges, and metadata attaches to each node. An agent assessing the blast radius of a proposed change queries this graph and gets a typed, traversable result - not a text description. The context plane is what turns a service catalog from a documentation tool into a data source agents can actually use at runtime.</p><p>The quality of the context plane depends on the quality and freshness of its underlying sources - catalog data, deployment records, observability outputs, runbooks. A graph built on stale or partial data produces stale or partial answers.</p><h3>Context Lake</h3><p>A context lake is the broader collection of raw context sources available to a platform: service catalog data, repository metadata, observability signals, incident history, ticketing systems, runbooks, deployment records, and anything else an agent might eventually draw from.</p><p>The context lake is everything that could feed the context layer. It's distinct from the context layer in that not all of it is structured, current, or relevant to any given task. An agent investigating an incident needs recent deployment history and active alerts, not every runbook in the organisation. The context layer selects and structures what's relevant for each task; the context lake is where those raw inputs live.</p><p>The term is in wider market use. What Roadie means by it specifically: the full set of raw data sources that platform teams manage and that the context layer compiles from - not the compiled, queryable result, but the underlying inputs.</p><h3>Business Context Layer</h3><p>The business context layer is the set of organisational and operational facts that sit above raw technical data: team ownership, cost attribution, service criticality, SLOs, compliance requirements, and incident accountability.</p><p>Technical context tells an agent what a service does and how it's connected. The business context layer tells it who owns it, what it costs to run, and what the consequences of a failure are. An agent that can traverse a dependency graph but doesn't know which team owns a downstream service - or whether that service is customer-facing - doesn't have enough context to make reliable decisions about escalation or rollback.</p><p>For most platform teams, the business context layer is the hardest part of the context problem. Not because the information doesn't exist, but because it's distributed across ticketing systems, cost dashboards, spreadsheets, and institutional memory.</p><h3>Minimum Viable Context</h3><p>Minimum Viable Context (MVC) is the smallest set of context that enables an agent to complete a specific task reliably. More context is not always better. A larger context window costs more, takes longer to process, and can distract the model from what it actually needs to do. Too little context produces hallucinations and errors. Minimum Viable Context is the engineering discipline of finding the right set - the context that's necessary, and no more.</p><p>Determining the MVC for a task means asking: what does this agent actually need to do this job without making critical errors? For an agent investigating a deployment failure, the MVC might be the deployment record, the last three alerts, and the owning team's runbook. It probably doesn't need the full incident history for every service in the organisation.</p><p>Getting MVC right is one of the main levers for improving agent reliability and cost. It's also a harness design problem: a well-built harness assembles only the context the task requires, rather than passing everything available into the model's context window.</p><h2>MCP</h2><p>MCP, or Model Context Protocol, is a lightweight protocol that standardises how an application or agent provides tools and context to an LLM. MCP defines a common interface for discovering tools, calling them with structured inputs, and returning structured outputs - so models can interact with external systems without custom integration code for each one.</p><p>The practical benefit is that MCP solves an integration problem at the protocol level. Before MCP, connecting a model to an external tool meant writing bespoke code for that tool, and repeating that work for every new tool. MCP replaces that with a shared specification that tools and models both implement once.</p><h3>MCP Server</h3><p>An MCP Server is a service that implements the MCP specification and exposes capabilities to models. It publishes a catalog of available tools, validates inputs, enforces auth and permissions, executes tool calls or forwards them to the underlying service, and returns results in a consistent, machine-readable format.</p><p>In a platform engineering context, a service catalog backed by an MCP Server means any agent can call "which teams own services in the payments domain?" and get a typed, structured answer - not a blob of markdown from a docs search. The MCP Server handles the translation between the model's tool call and the catalog's underlying data model.</p><p>A catalog-backed MCP Server tends to be the highest-value starting point for platform teams building agent tooling, because service ownership and dependency data is already maintained and is high signal for most agent tasks.</p><h3>MCP Gateway</h3><p>An MCP Gateway is a routing and policy layer that sits in front of one or more MCP Servers.</p><p>As you add more MCP Servers - one for the catalog, one for your observability platform, one for your ticketing system - you end up with scattered auth configurations, inconsistent rate limits, and no single point for setting policies about what agents can do. The gateway centralises all of that: authentication and tenant isolation, tool allowlists and safety policies, request routing and load balancing, observability across all tool calls, and version compatibility between clients and servers.</p><p>Most platform teams don't need a gateway on day one. In our experience, it becomes worth introducing somewhere between three and ten MCP Servers. Before that point, direct connections work fine. After it, the overhead of managing each server's auth and policies separately makes a gateway the cheaper option overall.</p><h3>Tools</h3><p>Tools are the discrete, callable functions exposed via MCP that let an LLM take actions or fetch data. Each tool has three parts: a name and description so the model knows the tool exists and when to use it, a strict input schema so calls are valid and typed, and a structured output so results are usable without parsing. Examples: "get service entity", "query deployment history", "search runbooks", "create incident".</p><p>Tools are what make agents useful rather than just knowledgeable. A model without tools can reason about a problem. A model with the right tools can act on it. In a platform engineering context, the quality of the tools - specifically how precisely their schemas match real workflows - determines most of the difference between agents that work and agents that look good in demos.</p>
]]></content:encoded></item><item><title><![CDATA[Smart Agents Need Smart Context: The Four Motions of a Context Layer]]></title><link>https://roadie.io/blog/smart-agents-smart-context/</link><guid isPermaLink="false">https://roadie.io/blog/smart-agents-smart-context/</guid><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><description><![CDATA[Most enterprise AI deployments fail in production because the gap is context, not the model. The four motions of a context layer - ingest, organise, retrieve, refresh - explained for platform teams.]]></description><content:encoded><![CDATA[<h1>Smart Agents Need Smart Context: The Four Motions of a Context Layer</h1><p>At <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/co-located-events/backstagecon/">BackstageCon Europe</a> on March 23, 2026, Roadie's Head of Product Sam Nixon <a href="https://www.youtube.com/watch?v=8FXaQiiE9bg">shared</a> a number that got people's attention: the agent-to-human interaction ratio on Roadie's platform has reached 100:1 on certain days. One hundred automated actions for every human decision. In Roadie's own usage, the bulk of support requests and on-call alerts are now handled without engineer involvement - though Sam was candid that this is partly a function of Roadie operating its own system end-to-end.</p><p>Most enterprise AI deployments aren't producing results like that. Teams have capable models. They've written careful prompts. They've shipped workflows that run perfectly in demos. And then in production - on a real incident at 2am on a Monday - the agent gives them an answer that's technically coherent and completely wrong. The gap is the context, not the model.</p><h2>What agents are actually reasoning from</h2><p>When an agent fails in an engineering workflow, the first instinct is to diagnose the model. Swap to a smarter one, improve the prompt, adjust parameters. Sometimes that helps. More often the failure is upstream: the agent was reasoning from thin, stale, or structurally ambiguous input.</p><p>A context window full of files is not the same as a context window full of facts.</p><p>Most of what I see teams doing with context right now just doesn't work. And it fails in a specific way: they've connected their tools to the agent but haven't built the layer between them. The agent has access to information. It doesn't have authoritative, structured context.</p><p>In March 2026, <a href="https://andychen32.substack.com/p/the-enterprise-context-layer">Andy Chen</a> - an engineer at Abnormal Security - published a detailed account of building an enterprise context layer from scratch. The piece is worth reading because it makes a distinction that most vendor messaging elides: retrieval and synthesis are different problems. A retrieval system finds the best-matching document. Synthesis produces the judgment call - which source to trust when three docs contradict each other, whether this service is safe to deploy right now, when to escalate to a human. Current tool stacks conflate the two. They give agents access to documents and hope reasoning handles the gap.</p><p>The token budget compounds this. <a href="https://www.apideck.com/blog/mcp-server-eating-context-window-cli-alternative">Apideck published benchmarks</a> showing that connecting three standard developer tool servers - GitHub, Slack, and Sentry - consumes 143,000 of Claude's token context window before the agent has processed a single message. 14% of the budget, gone, on tool definitions, assuming you're using the 1M version of Opus. Think about that: you haven't asked a question yet, and you've already spent an eighth of your reasoning budget on overhead. Teams running at 100:1 aren't working around this - they've built a different architecture.</p><h2>The four motions</h2><p>On Roadie's platform, the context layer is four operations that work in sequence - what we call the four motions. Each one solves a distinct part of the problem, and skipping any of them shows up in production.</p><h3>Pull in data</h3><p>The first motion is integration: repos, deployments, incidents, ownership records, documentation, infrastructure state. Most teams start here and assume the hard work is done. They've connected the sources. The agent has access.</p><p>Connecting sources is the easy part. The question is whether the data is fresh enough and trustworthy enough to reason from. An agent querying a context store with deployment data from three weeks ago, or ownership records that haven't been updated since the last reorg, will produce results that look authoritative and are wrong. The context layer needs to know the provenance of each fact: where it came from, when it was last verified, and how to handle it when it conflicts with a different source.</p><p><a href="https://andychen32.substack.com/p/the-enterprise-context-layer">Andy Chen's piece</a> describes this as a source-reconciliation problem. His agent swarm surfaced five principles on its own: architecture claims and status claims belong in different places; there's no universal source of truth; documentation describes the ideal state, not the current state; facts that appear in three independent sources can be trusted; and conflicting information should be documented as a conflict rather than resolved arbitrarily. Those principles hold for any context layer meant to be the substrate agents reason from, regardless of implementation.</p><h3>Build relationships</h3><p>This motion is what separates a context layer from a document index. You can have accurate data in separate systems - a service catalog with team ownership, an incident tracker with affected services, a deployment log with what changed - and still be unable to answer the questions that matter under pressure.</p><p>At 2am you need to know which team owns the failing service, what changed in the past 24 hours across its dependencies, and who is on call for that component.</p><p>Those answers live at the junctions between datasets. Getting there requires a graph, not a catalogue of documents. The relationships - service to team, API to consumer, runbook to incident type, deployment to downstream dependency - have to be explicit, typed, and traversable.</p><p>This is the part most early context layer attempts skip. They pull in data correctly and then assume the model can infer relationships from raw text. It can, sometimes. Under time pressure, with contradictory signals, inference is the weakest link. If the relationship isn't in the graph, you're relying on the model to guess - and guesses that present as confident answers are the most expensive kind.</p><h3>Assemble bundles</h3><p>This is where the actual engineering happens. An agent doesn't need your entire service graph for every query. It needs the right slice: the topology of affected services, the current ownership chain, the deployment history for the past few hours, the runbooks tagged to this incident type.</p><p>Assembling that slice on demand - scoped to the question, progressive in disclosure - is what keeps the token budget sane and the answer accurate. The Apideck benchmarks are a symptom of context that hasn't been assembled. When you surface the full tool manifest upfront, you pay for definitions you won't use. Tiered access - categories first, detail on request - gets you the same information at a fraction of the cost. Apideck's own analysis puts the gap at $3.20 per month for a well-scoped CLI workflow versus $55.20 for naive MCP integration.</p><p>Bundle assembly is also where governance lives. Not every agent should have access to every slice of context. Security posture data, compliance records, and personnel information need different access controls than service topology. This is an architecture decision you have to make up front, not a compliance checkbox you add later. Build access controls in from the start, or you'll retrofit them when it's much more expensive to do so.</p><h3>Agents consume and contribute</h3><p>This is the motion most deployments haven't reached yet. It's also where the compounding value shows up.</p><p>An agent that successfully runs a runbook, investigates an alert, or assesses a deployment has produced new context: decisions made, state at the time, actions taken, what worked. The trail is evidence. If the context layer captures what agents do, the next agent in the workflow starts from a richer position. If it doesn't, every invocation starts cold.</p><p>The temptation is to add that feedback loop once the basic flows are working. That's reasonable. But the teams at 100:1 got there partly because they built it early. The context layer improves with every agent run. The graph gets richer. The bundles get more accurate. Agents that contributed to the graph last week make agents this week faster and more reliable.</p><p>Sam Nixon laid out at BackstageCon what an agent-ready context layer actually requires: a comprehensive, fresh graph of your software topology; that graph enhanced with relationships and additional context outside the catalog, in a format agents can consume; and the actual tools to act on that information. The four motions are the operational shape of those three requirements. By the time agents are contributing back to the graph, all three are in play - and the system compounds with every run.</p><h2>This is a platform engineering problem</h2><p>The phrase "context engineering" has arrived as a job title. There's real work here - the kind that doesn't happen without someone owning it.</p><p>But the teams positioned to do this well aren't starting from scratch. The platform engineering team that built the service catalog, scored compliance, made deployments observable, and kept ownership records current already owns most of this substrate. They know which sources to trust, which relationships exist between systems, and what "current state" means in their environment. The hard part of the context layer is the organisational knowledge that feeds it, not the technology.</p><p>The homegrown version - a thousand lines of Python, a GitHub monorepo of markdown, an agent swarm crawling internal sources - can get you to a proof of concept quickly. Chen's piece is a genuinely useful account of how far that approach can go. But the version that survives contact with compliance requirements, multi-tenant access controls, and production scale looks different. Access control, auditability, multi-tenancy, reading from production systems without causing incidents: these are what make a context layer something an organisation can actually operate. They're also exactly what gets hand-waved in practitioner write-ups and bites you at scale.</p><p><a href="https://andychen32.substack.com/p/the-enterprise-context-layer">Andy Chen's framing from his ECL piece</a> applies here: the enterprise context layer is "closer to DevOps than to Salesforce." A practice, not a purchase. You build the discipline - the ingestion pipelines, the relationship mappings, the bundle definitions, the access controls - and then you maintain it. The four motions are the shape of that maintenance.</p><p>Roadie's platform implements this architecture. The context store holds your service graph, your operational data, and the relationships between them. The MCP interface handles progressive disclosure so agents get scoped context rather than a full dump. Agent contributions feed back into the graph. Access controls are first-class from day one, not a retrofit.</p><p>The 100:1 ratio comes from better context infrastructure, not better models. At a hundred agent actions for every human decision, the quality of those actions is determined almost entirely by what's in the context window when the agent starts reasoning.</p><p>The teams still debugging agent failures are usually debugging the wrong thing. A context layer that provides authoritative, structured knowledge - ownership, relationships, provenance, agent history - is what separates 100:1 from teams still tuning prompts. Power them with facts, not guesses. If you want to see how Roadie builds this for your engineering team, <a href="https://roadie.io/request-demo/">request a demo or start a free trial</a>.</p>
]]></content:encoded></item><item><title><![CDATA[AI Coding Assistants Can Read Your Code. They Can't See Your Platform.]]></title><link>https://roadie.io/blog/ai-coding-assistants-missing-platform-context/</link><guid isPermaLink="false">https://roadie.io/blog/ai-coding-assistants-missing-platform-context/</guid><pubDate>Thu, 30 Apr 2026 14:00:00 GMT</pubDate><description><![CDATA[AI coding tools fail in production because they don’t have access to the context they need. This article explains the operational context layer your AI assistant is missing.]]></description><content:encoded><![CDATA[<p>Your AI coding assistant receives the file currently open, whatever retrieval mechanism (keyword, embedding, or graph-based) pulls from surrounding files, and possibly a repo-level index if you've configured one. That is the full extent of its world, with no organizational structure, no ownership, no reliability targets, and no deployment history.</p><p>Consider a practical scenario: an engineer asks their assistant to refactor a synchronous downstream call in <code>payments-service</code> into retry logic with exponential backoff. The suggestion looks clean. The tests pass. The PR gets approved. But the assistant couldn't see that <code>payments-service</code> is already running at 99.91% against a 99.9% SLO, its p99 latency sits at 450ms against a 500ms budget, and the Fintech squad placed a code freeze on it 48 hours ago ahead of a critical release. The retry logic hammers a downstream service that's already degraded, pushes latency past the budget, and burns the remaining error budget in under an hour.</p><p>The model produced syntactically correct, architecturally reasonable code. Every piece of information that would have changed its output lived in the service catalog, the observability platform, and the on-call tooling, none of which it could reach. The problem is a missing operational data layer between your engineering metadata and the AI's context window.</p><h2>The Four Categories of Missing Platform Context</h2><p>The industry conversation about context for AI coding assistants has focused almost entirely on codebase coverage: how many files can the retriever pull, how good is the semantic index, and does it understand cross-repo dependencies? <a href="https://www.augmentcode.com/tools/the-context-gap-why-some-ai-coding-tools-break">Augment Code's treatment of the context delta</a> and <a href="https://sourcegraph.com/blog/lessons-from-building-ai-coding-assistants-context-retrieval-and-evaluation">Sourcegraph's breakdown of keyword, embedding, and graph-based retrievers</a> describe this codebase-level layer well. While this is important, it's actually the easier half of the problem. The missing context that actually costs engineering teams time and trust falls into four categories.</p><h3>Service ownership</h3><p><a href="https://blog.thepete.net/blog/2025/05/22/why-your-ai-coding-assistant-keeps-doing-it-wrong-and-how-to-fix-it/">Pete Hodgson's Constraint-Context matrix analysis</a> highlights that AI assistants carry no awareness of the constraints that only humans in your org hold. If <code>payments-service</code> is owned by the Fintech squad, and they have a code freeze in effect, any PR against it requires their explicit review and sign-off. None of this exists in the repository. It lives in <code>spec.owner</code> in your service catalog, team topology tooling, or PR process documentation. The assistant treats the code as ownerless.</p><h3>SLOs and reliability targets</h3><p>Adding a synchronous downstream call to a service running a 99.95% availability SLO introduces a tail-latency risk that the model has zero signal for. It can read your retry configuration, but it can't read the Datadog SLO tracking data that tells you you've consumed 78% of this month's error budget. That signal lives in your observability platform and is only surfaced as structured metadata if you've built the bridge from observability to catalog.</p><h3>Deployment history and cadence</h3><p>A major refactor on a service deployed three times a day carries entirely different risk from the same refactor on a quarterly-release service. The <code>EntityArgoCDHistoryCard</code> exposes per-entity deployment history (such as revisions, timestamps, commit info, and sync status) at the catalog level. An AI assistant working from the file buffer has no access to whether this service last deployed four hours ago or four months ago, and that cadence signal is material to any reasonable risk assessment.</p><h3>Incident history</h3><p>If <code>payments-service</code> generated three P1 incidents in the last 60 days because of connection pool exhaustion under load, and the assistant suggests a pattern that increases connection pressure (spawning additional database clients per request, for instance), that's a regression. Post-mortem context is organizational memory. The model can't infer it from code because it lives in incident reports and runbooks.</p><h2>Why Individual Workarounds Don't Scale</h2><p>Teams are already responding to this missing context with rational, locally effective solutions. Google Cloud's <a href="https://cloud.google.com/blog/topics/developers-practitioners/five-best-practices-for-using-ai-coding-assistants">AI coding assistant best practices guide</a> recommends GEMINI.md files that give the assistant project-specific context, including what framework you're using, what conventions to follow, and what patterns to avoid. That's a sensible individual workaround for a team of five.</p><p>The problem is the maintenance surface. A GEMINI.md file written this morning doesn't reflect the deployment that happened this afternoon or the SLO breach that opened an hour ago. It's a static document maintained by humans in a system that changes continuously. If you have 50 engineers, 200 services, and 10 teams, accurate context file maintenance becomes a full-time job, and there’s little incentive to keep any individual file current because the cost of staleness is invisible until something breaks.</p><p>The queryability problem is separate. An AI retrieval pipeline can't ask a GEMINI.md "which services owned by the Payments team have open P1s?" and get a structured answer it can reason over. The file format is designed for human reading, so structured cross-service queries require a different architecture entirely. Because each file lives with its service, the assistant working on Service A has no access to Service B's SLO status even when Service A calls Service B synchronously.</p><p>With one hundred engineers and five hundred services, these files are perpetually stale and inconsistently structured, giving teams a false sense of coverage. The context that the AI receives reflects the state of a service as understood by whoever last edited the file, which is rarely the person who owns the service today.</p><h2>Platform Context Engineering: The Architectural Shift</h2><p>Engineering organizations need to identify the authoritative, structured, machine-readable data layer that represents operational reality, and confirm if it has an API surface that AI retrieval pipelines can query. That reframe shifts the problem from prompt engineering (improving how the AI is instructed) to context infrastructure (improving what structured data the AI can retrieve).</p><p>Platform-level context engineering requires entities (services, teams, APIs, infrastructure components), relationships (ownership, dependency, responsibility), and operational signals (SLO status, deployment cadence, incident frequency). All of this information needs to be kept current through automation and exposed through a queryable API. Machine-readable, automatically updated data is the baseline requirement. The goal is a structured entity graph that works for both human engineers browsing the catalog and AI agents querying it programmatically.</p><p>This is an important distinction for AI retrieval specifically because a pipeline can only return what's structured and indexed. An LLM can ingest an unstructured text file, but it cannot reliably extract "which of the services that I depend on are currently breaching their error budget" from a collection of Confluence pages. Structured entity data with relationships and operational signals enables precise retrieval, which can return specific details like "Payments team, code freeze active, SLO at 99.91%, last deployed 4 hours ago" as a structured response that the retrieval pipeline can return directly.</p><h2>Closing the Delta: From Entity Graph to AI Context</h2><p>Roadie's engineering context platform implements this architecture as a production-ready retrieval layer. The catalog reflects the actual ownership state of the repository without manual maintenance, because each entity carries structured ownership via <code>spec.owner</code>, which is populated automatically through the CODEOWNERS integration. Tech Insights surfaces SLO data as structured scorecard metadata: add a <code>datadoghq.com/slo_tag</code> annotation to your <code>catalog-info.yaml</code>, point the built-in Datadog Data Source at your Datadog instance, and SLO compliance data becomes a queryable fact on the entity, accessible from the same pipeline that serves the AI. Each of these steps still asks service owners to maintain one annotation. That's unavoidable, but it's a big improvement over GEMINI.md files. The right long-term approach is to ingest this association directly from the source systems, so the annotation burden eventually disappears.</p><p>The data flow from operational signals to the AI context runs through this architecture:</p><p><img src="https://mermaid.ink/img/pako:eAEB2AEn_nsiY29kZSI6ImdyYXBoIFREXG4gICAgQVtFbnRpdHkgQ2F0YWxvZzxici8-c3BlYy5vd25lciArIENPREVPV05FUlNdIC0tPiBFW1JBRyBBSSBQbHVnaW48YnIvPkluZGV4aW5nIFBpcGVsaW5lXVxuICAgIEJbVGVjaCBJbnNpZ2h0czxici8-U0xPIFNjb3JlY2FyZHMgdmlhIERhdGFkb2ddIC0tPiBFXG4gICAgQ1tBcmdvIENEIFBsdWdpbjxici8-RW50aXR5QXJnb0NESGlzdG9yeUNhcmRdIC0tPiBFXG4gICAgRFtUZWNoRG9jcyArIE9wZW5BUEkgU3BlY3NdIC0tPiBFXG4gICAgRSAtLT4gRltwZ1ZlY3RvciBTdG9yZTxici8-UG9zdGdyZVNRTF1cbiAgICBGIC0tPiBHW0xMTSBCYWNrZW5kPGJyLz5BV1MgQmVkcm9jayBvciBPcGVuQUldXG4gICAgRyAtLT4gSFtBSSBDb250ZXh0IFdpbmRvdzxici8-T3duZXJzaGlwICsgU0xPICsgRGVwbG95bWVudCArIERvY3NdIiwibWVybWFpZCI6IntcInRoZW1lXCI6XCJkZWZhdWx0XCJ9In29sJX3" alt="Diagram"></p><p>Roadie's RAG AI Plugin makes this entity graph AI-consumable. It indexes catalog entities, TechDocs, OpenAPI specs, and Tech Insights scorecard data as embeddings, stored in PostgreSQL with the pgVector extension (enabled via <code>CREATE EXTENSION IF NOT EXISTS vector</code>). The LLM backend supports AWS Bedrock and OpenAI. The retrieval pipeline is explicitly extensible: teams can add incident post-mortem data, on-call schedules, or architecture decision records as additional indexed sources.</p><p>An assistant pulling from this pipeline can receive: "payments-service, owner: fintech-squad, SLO target: 99.95%, SLO status: 99.91% this month (78% error budget consumed), last deployed: 4 hours ago via Argo CD revision abc123f, P1 incidents last 60 days: 3 (connection pool exhaustion)." The assistant now has access to the same judgment inputs that a careful engineer would consult.</p><p>For Roadie's cloud-hosted platform, the AI Assistant is configured via Administration > Settings > AI Assistant. For self-hosted Backstage instances, the plugin is available as the <code>@roadiehq/rag-ai</code> frontend package with the corresponding backend plugin.</p><h2>Three steps to better retrieval context</h2><p>You don’t need a complete overhaul to improve your retrieval context. There are three steps you can take this week to give your AI retrieval pipeline ownership, reliability, and deployment recency signals.</p><p><strong>First</strong>, audit which services in your catalog have <code>spec.owner</code> set. Entities missing that field are invisible to any ownership-aware retrieval. Use CODEOWNERS auto-assignment to populate ownership systematically. The catalog can derive it from your existing file ownership structure, so you're not asking teams to maintain a new artifact.</p><p><strong>Second</strong>, if you're tracking SLOs in Datadog, enable the Tech Insights Datadog Data Source. Navigate to the Tech Insights section in Roadie, add the built-in Datadog Data Source, and annotate your <code>catalog-info.yaml</code> files with <code>datadoghq.com/slo_tag</code>. Your SLO compliance data becomes a structured, queryable fact on each entity, updated automatically as Datadog updates.</p><p><strong>Third</strong>, if you're using Argo CD, install the Argo CD plugin and add <code>EntityArgoCDHistoryCard</code> to your entity page for each service. Link components to their Argo CD applications via the <code>argocd/app-name</code> annotation. You get per-entity deployment history, sync status, and revision data, all queryable by any retrieval pipeline you build on top.</p><p>All three steps change what AI agents receive without changing how your engineers work with those agents.</p><p><a href="https://roadie.io/demo">See what structured engineering context looks like for your team and your AI agents. Explore Roadie</a></p>
]]></content:encoded></item><item><title><![CDATA[Context Engineering for Developers: The Infrastructure Layer That Makes AI Actually Useful]]></title><link>https://roadie.io/blog/context-engineering-for-developers-ai-infrastructure/</link><guid isPermaLink="false">https://roadie.io/blog/context-engineering-for-developers-ai-infrastructure/</guid><pubDate>Thu, 23 Apr 2026 15:00:00 GMT</pubDate><description><![CDATA[AI coding tools fail in production engineering orgs because they lack structured system-level context. Here's how to build the metadata infrastructure that produces reliable AI suggestions.]]></description><content:encoded><![CDATA[<p>An engineer debugging elevated p99 latency in a payment processing service asks Cursor to trace the issue and propose a fix. The suggestion Cursor returns compiles cleanly, follows the team's Go conventions based on the open files, and appears logically sound. It adds a direct database query to bypass what looks like a slow intermediate abstraction layer.</p><p>The suggestion is wrong in three distinct ways. The intermediate layer being bypassed is the team's PCI-scope data access abstraction, required by a compliance contract with the platform team. The downstream service the new code queries belongs to a squad that accepts calls exclusively through an async queue, documented in an API contract that lives in Confluence. The logging Cursor outputs plain strings into a service that has enforced structured JSON since 2023, codified in an architectural decision record written by an engineer who left the company 18 months ago.</p><p>Cursor processed the open files, a handful of related functions, and the natural language problem description. The compliance boundary, the ownership edge, the API contract, and the logging convention don't live in source files.</p><p>This is the core failure mode of AI coding assistance in production engineering orgs. Outputs that are locally correct and globally invalid because the model operated without system-level context. Expanding the token window doesn't address this, and feeding the AI more source code doesn't address it either. What's absent is structured, queryable metadata about how services relate, who owns what, and what constraints govern each component.</p><h2>Context Engineering Is Not Prompt Engineering</h2><p>Prompt engineering occupies the interaction layer of a larger system. It addresses how you word an instruction to improve a model response within a single exchange. Context engineering addresses the architecture of the full information environment that the model operates in, covering retrieval mechanisms, memory systems, structured metadata, entity relationship models, state management, and output formatting constraints.</p><p>The practical distinction shows up where the work happens. Improving your prompt is a single-file edit. Building context infrastructure requires decisions about data schemas, entity relationship models, API surface areas, retrieval strategies, and update mechanisms. Prompt engineering is one layer of a three-layer stack, and context engineering designs the entire stack:</p><ol><li>Structured data layer: service metadata, ownership graphs, dependency declarations, API contracts, SLO definitions, incident histories, and architectural decision records.</li><li>Retrieval layer: how agents and developer tools access that data at query time, via catalog API queries, entity graph traversals, or vector store lookups.</li><li>Interaction layer: how retrieved context gets injected into the model window at inference time, via prompt templates, system messages, few-shot examples, and output schema constraints.</li></ol><p>Most discourse on context engineering addresses layer three. The architectural decisions that determine whether AI tools actually perform in production engineering orgs happen at layer one.</p><p><img src="https://mermaid.ink/img/pako:eAEBewKE_XsiY29kZSI6ImZsb3djaGFydCBURFxuICAgIHN1YmdyYXBoIEwxW1wiTGF5ZXIgMTogU3RydWN0dXJlZCBEYXRhXCJdXG4gICAgICAgIEExW1NlcnZpY2UgTWV0YWRhdGFdXG4gICAgICAgIEEyW093bmVyc2hpcCBHcmFwaF1cbiAgICAgICAgQTNbRGVwZW5kZW5jeSBEZWNsYXJhdGlvbnNdXG4gICAgICAgIEE0W1NMTyBEZWZpbml0aW9uc11cbiAgICAgICAgQTVbQURScyBhbmQgSW5jaWRlbnRzXVxuICAgIGVuZFxuICAgIHN1YmdyYXBoIEwyW1wiTGF5ZXIgMjogUmV0cmlldmFsXCJdXG4gICAgICAgIEIxW0NhdGFsb2cgQVBJIFF1ZXJpZXNdXG4gICAgICAgIEIyW0VudGl0eSBHcmFwaCBUcmF2ZXJzYWxdXG4gICAgICAgIEIzW1JBRyBhbmQgVmVjdG9yIFNlYXJjaF1cbiAgICBlbmRcbiAgICBzdWJncmFwaCBMM1tcIkxheWVyIDM6IEludGVyYWN0aW9uXCJdXG4gICAgICAgIEMxW1Byb21wdCBBc3NlbWJseV1cbiAgICAgICAgQzJbU3lzdGVtIE1lc3NhZ2VzXVxuICAgICAgICBDM1tPdXRwdXQgU2NoZW1hIENvbnN0cmFpbnRzXVxuICAgIGVuZFxuICAgIEwxIC0tPiBMMlxuICAgIEwyIC0tPiBMM1xuICAgIEwzIC0tPiBEW01vZGVsIFJlc3BvbnNlXSIsIm1lcm1haWQiOiJ7XCJ0aGVtZVwiOlwiZGVmYXVsdFwifSJ9n_fCgA==" alt="Diagram"></p><p>A RAG pipeline over Confluence documentation retrieves text chunks ranked by semantic similarity, and its ceiling is determined entirely by the quality, structure, and currency of the underlying documents. A typed entity graph with declared schemas delivers deterministic query results. When an AI agent asks which team owns <code>payment-processor</code> and what its declared dependencies are, a typed API response gives the agent something it can act on without hedging. Semantic search over unstructured documentation works well for discovery tasks. For AI agents making operational decisions at runtime (routing escalations, scoping incident blast radius, and verifying SLO compliance before a deployment), the two retrieval mechanisms belong to categorically different reliability classes. Build your data layer to support both, with explicit clarity about which class of problem each handles.</p><h2>What Structured Engineering Context Actually Looks Like</h2><p>The entities that constitute useful engineering context at the operational layer are services and their declared owners, dependencies with SLO targets, deployment environment definitions, API contracts, past incidents and their resolutions, and architectural decision records. That information exists in most engineering orgs today, but it’s distributed across YAML configs, Confluence pages, Google Docs, Slack channels, and tribal knowledge.</p><p>The <a href="https://backstage.io/docs/features/software-catalog/descriptor-format">Backstage entity descriptor format</a> provides a practical starting schema for the structured data layer. A well-formed Component entity gives an AI agent something it can query deterministically:</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-processor
  description: Handles charge authorization and settlement for the checkout flow
  annotations:
    roadie.io/oncall-runbook: "https://wiki.internal/runbooks/payment-processor"
    roadie.io/slo-target: "p99 &#x3C; 200ms, availability > 99.95%"
    pagerduty.com/integration-key: "abc123xyz"
spec:
  type: service
  lifecycle: production
  owner: group:payments-team
  system: checkout
  dependsOn:
    - component:ledger-service
    - component:fraud-detection
    - resource:payments-db
  providesApis:
    - payment-processor-api
</code></pre><p>An AI agent triaging a latency incident on <code>payment-processor</code> can resolve the owner to <code>payments-team</code> in one API call, pull the SLO target to assess whether the current p99 violates it, identify <code>ledger-service</code> and <code>fraud-detection</code> as declared dependencies to check, and surface the runbook URL. A RAG pipeline over documentation pages might surface similar information after several retrieval rounds, but a typed entity query retrieves it deterministically in milliseconds, with no ambiguity about which document version applies.</p><p>The architectural significance extends beyond any single entity. When <code>ledger-service</code> also has a well-formed descriptor, you can traverse the dependency graph to find which teams own the services that <code>payment-processor</code> depends on, which of those are currently below their SLO targets, and which dependencies changed in the last 24 hours. Those traversals require typed entity relationships with declared schemas, where each edge carries semantic meaning, and each node exposes a queryable API.</p><h2>How Context Infrastructure Reduces Context Switching in Practice</h2><p>Incidents are where the cost of manual context assembly becomes most visible. The engineer receiving a PagerDuty alert at 2am has no buffer for multi-tool lookups.</p><p>Without context infrastructure, the engineer opens the alerting service in their IDE, switches to Confluence for the runbook, searches Slack for recent incident discussions, and then checks GitHub blame for the last deployer. At that point, they ask Copilot for help. Copilot has access to the open files and the natural language question. It answers without any of the operational context that the engineer spent 25 minutes assembling.</p><p>With a queryable context layer, the same engineer issues one API call against a structured entity graph. The response returns the service owner, the last deployment timestamp, the linked runbook, the current SLO status, and the full dependency list. An AI agent receives that same response and uses it to constrain its output. It can flag that a proposed change affects <code>fraud-detection</code>, which is owned by a separate team and is currently operating at 99.96% availability against a 99.95% target. The suggestion is system-aware because the agent's context is system-aware.
<img src="https://mermaid.ink/img/pako:eAEB8gIN_XsiY29kZSI6ImZsb3djaGFydCBURFxuICAgIEFbUGFnZXJEdXR5IEFsZXJ0IEZpcmVzXSAtLT4gQltXaXRob3V0IENvbnRleHQgSW5mcmFzdHJ1Y3R1cmVdXG4gICAgQSAtLT4gQ1tXaXRoIENvbnRleHQgSW5mcmFzdHJ1Y3R1cmVdXG5cbiAgICBCIC0tPiBEW09wZW4gc2VydmljZSBpbiBJREVdXG4gICAgRCAtLT4gRVtTZWFyY2ggQ29uZmx1ZW5jZSBmb3IgcnVuYm9va11cbiAgICBFIC0tPiBGW1NlYXJjaCBTbGFjayBmb3IgaW5jaWRlbnQgaGlzdG9yeV1cbiAgICBGIC0tPiBHW0NoZWNrIEdpdEh1YiBmb3IgbGFzdCBkZXBsb3llcl1cbiAgICBHIC0tPiBIW0FzayBDb3BpbG90IGZvciBoZWxwXVxuICAgIEggLS0-IElbQ29waWxvdCByZXR1cm5zIGdlbmVyaWMgc3VnZ2VzdGlvbl1cbiAgICBJIC0tPiBKW01hbnVhbCB2ZXJpZmljYXRpb24gbG9vcCBiZWdpbnNdXG5cbiAgICBDIC0tPiBLW1F1ZXJ5IGNvbnRleHQgaW5mcmFzdHJ1Y3R1cmVdXG4gICAgSyAtLT4gTFtSZWNlaXZlOiBvd25lciwgcnVuYm9vaywgU0xPIHN0YXR1cywgZGVwZW5kZW5jaWVzXVxuICAgIEwgLS0-IE1bQUkgYWdlbnQgcHJvY2Vzc2VzIHNhbWUgc3RydWN0dXJlZCBjb250ZXh0XVxuICAgIE0gLS0-IE5bU3lzdGVtLWF3YXJlIHN1Z2dlc3Rpb24gcmV0dXJuZWRdXG4gICAgTiAtLT4gT1tFbmdpbmVlciB2YWxpZGF0ZXMgYWdhaW5zdCBleHBsaWNpdCBzdHJ1Y3R1cmVkIGNvbnRleHRdIiwibWVybWFpZCI6IntcInRoZW1lXCI6XCJkZWZhdWx0XCJ9In3Ohfzt" alt="Diagram"></p><p>The design decision embedded here matters. If the AI agent fetches the context and routes it silently into the model window while the developer sees only the final suggestion, you've solved the AI's information problem but preserved the developer's cognitive debt problem. The right implementation surfaces the resolved metadata as part of the engineer's workflow, so the structured context is visible to both the person and the agent drawing on it. Engineers who can see what their AI agent knows maintain comprehension, and that understanding deepens over time.</p><p>Context infrastructure also eliminates the manual assembly step that causes context switching in the first place. The developer stops acting as the integration layer between their tools, stitching together information from runbooks, Slack, and GitHub before they can form a useful question.</p><h2>Building Context Infrastructure: What Engineering Teams Should Prioritize</h2><p>The sequence is important because each layer depends on the one below it. Teams that ship AI features on top of incomplete metadata then wonder why the productivity gains don't materialize are building the wrong way around.</p><p>Ship in this order, and don't advance to the next step until each one is complete:</p><ol><li><p>Declare entity ownership across your entire service catalog. Every service, library, data pipeline, and API must have a declared <code>owner</code> field pointing to a team or group entity. This is the minimum viable signal for routing, escalation, and impact scoping. AI agents querying your catalog with no ownership data will produce unreliable escalation paths and incorrect blast radius assessments.</p></li><li><p>Attach structured operational metadata to those entities. SLO targets, runbook URLs, deployment environment declarations, and dependency lists all belong in schema-defined fields. Free-text description fields are retrieval fodder for RAG pipelines. Structured annotation fields are what AI agents query programmatically. You need both, and they serve different purposes.</p></li><li><p>Expose your catalog via an API that agents can call. A service catalog that only renders in a browser UI is inaccessible to AI agents and automation tooling. A REST or GraphQL catalog API is the connective tissue between your structured metadata layer and any AI capability you want to build on top of it. Without programmatic access, you have a portal, and portals don't compose with agent workflows.</p></li><li><p>Run a coverage audit before adding AI features. If more than 30% of your catalog entities are missing owner, system, dependsOn, or a custom operational annotation (runbook URL or SLO target), context-aware AI features will underdeliver.</p></li></ol><p><a href="https://roadie.io">Roadie's engineering context platform</a> implements this architecture as a SaaS product built on open standards. Its entity graph exposes typed, relationship-aware service metadata via REST API, queryable by both human engineers and AI agents against the same underlying data layer. Catalog v2 is explicitly designed for this pattern, providing the coverage and programmatic access that make AI-layer features reliable at scale. Teams running custom Backstage installations get the UI, while teams running Roadie get the queryable context infrastructure that those AI features actually require.</p><h2>Start Here: Run a Metadata Completeness Audit on Your Service Catalog Today</h2><p>The single most useful action you can take before evaluating any AI developer tooling is a metadata completeness audit on your existing service catalog. Pull every Component entity from your catalog and check four fields: <code>owner</code>, <code>system</code>, <code>dependsOn</code>, and at least one custom operational annotation (runbook URL or SLO target).</p><p>For teams running a Backstage-compatible catalog API, the query is straightforward:</p><pre><code>GET /api/catalog/entities?filter=kind=Component
</code></pre><p>Parse the response for entities where <code>spec.owner</code> is empty, <code>spec.dependsOn</code> is an empty array or absent, <code>spec.system</code> is unset, or <code>metadata.annotations</code> contains none of your defined operational fields. Flag any entity missing two or more of those fields as a coverage priority.</p><p>If more than 30% of your catalog entities fail that check, fixing metadata coverage will deliver more AI productivity gain than any retrieval or prompt optimization effort. A missing <code>owner</code> field alone means AI agents have no reliable signal for routing, escalation, or impact scoping. A service with no <code>dependsOn</code> declarations means dependency traversal returns an empty graph, making blast radius assessment impossible. If your catalog has above 70% coverage on all four fields, you have the foundation to build context-aware AI features that will actually perform.</p><p>Metadata coverage is an engineering problem with an engineering solution: schema enforcement, catalog-as-code with required fields, and ownership reviews in your service onboarding checklist. Ship those fixes, then revisit your AI tooling configuration with a complete data layer underneath it.</p><hr><p>Give your engineering team and AI agents the structured context they need to actually ship faster. <a href="https://roadie.io/request-demo">See how Roadie's engineering context platform works and request a demo.</a></p>
]]></content:encoded></item><item><title><![CDATA[Why Conflating RAG with Context Engineering Costs You in Production]]></title><link>https://roadie.io/blog/rag-vs-context-engineering-production/</link><guid isPermaLink="false">https://roadie.io/blog/rag-vs-context-engineering-production/</guid><pubDate>Thu, 16 Apr 2026 15:00:00 GMT</pubDate><description><![CDATA[RAG is not context engineering. Learn why production AI systems fail when all context is routed through vector retrieval and how to design context properly.]]></description><content:encoded><![CDATA[<p>Your AI ops agent just escalated an incident to the wrong team. The runbook it retrieved was accurate: the mitigation steps for <code>payment-service</code> were current, the alert thresholds matched production, and the linked dashboards loaded correctly. But the ownership record the agent surfaced named Team A, who had transferred the service to Team B six weeks ago. The service catalog updated that record the same afternoon. The agent's context assembly pipeline routes every question through a vector store, while ownership data lives as a structured entity record in a service catalog.</p><p><a href="https://atlan.com/know/what-is-context-engineering/">Atlan's 2026 analysis of enterprise AI failures</a> found that context failures (missing, stale, or conflicting information) are one of the leading causes of production AI breakdowns. Treating retrieval augmented generation as a complete context engineering system is the architectural decision that produces these failures.</p><p>Context engineering involves intentionally designing every slot in an LLM's context window. RAG is one retrieval primitive within that system, well-suited to semantic document lookup. An agent that draws its entire context from a vector store has wired up one slot out of six, and the remaining five require source systems with entirely different query mechanisms.</p><h2>What is RAG?</h2><p>RAG has a precise mechanical definition. At query time, you embed the user's input, run an approximate nearest-neighbor search over a vector index (<a href="https://www.pinecone.io">Pinecone</a>, <a href="https://weaviate.io">Weaviate</a>, or <a href="https://github.com/pgvector/pgvector">pgvector</a>), retrieve the top-k chunks, and inject them into the prompt. The model generates a response grounded in those retrieved passages.</p><p>The pattern works well for unstructured document retrieval at scale. If an engineer asks "what does our authentication service's token expiry policy say?", an embedding over documentation chunks surfaces the relevant passage reliably. Production RAG implementations often add considerable sophistication on top of this core pattern. <a href="https://en.wikipedia.org/wiki/Okapi_BM25">BM25 sparse retrieval</a> runs alongside dense vectors, reciprocal rank fusion merges ranked result sets, and multi-stage rerankers score passages for relevance before final injection. These techniques improve retrieval precision, but retrieval precision only matters if retrieval is the right tool.</p><p>This retrieval primitive isn’t suitable for ownership data. "Which team owns payment-service?" is a lookup against a structured entity record that links a service node to a team node in a typed graph. Embedding the query and running cosine similarity over chunked service documentation may surface a passage that mentions the team, if someone wrote that team name into a document at some point. A structured entity graph returns the authoritative, current record by traversing typed relationships directly.</p><h2>What is Context Engineering?</h2><p>Treating context engineering as a synonym for RAG collapses a complex architecture into a single retrieval pattern. Context engineering is the discipline of intentionally deciding what occupies every slot in an LLM's context window: what enters, in what format, from what source, updated on what cadence, and subject to what size constraints. A production context payload is assembled from six distinct source categories, each with its own query mechanism and staleness profile:</p><ol><li>System prompt: behavioral constraints, output format requirements, tool definitions, and role instructions. Static for a given deployment, updated by engineers.</li><li>Structured entity metadata: service ownership, dependency graphs, API configurations, on-call schedules, and tier assignments. Authoritative and current, queried from an entity graph.</li><li>Conversation history: prior turns in the current session, typically compressed with selective summarization to fit within token budgets.</li><li>Retrieved documents via RAG: runbooks, architecture docs, policy references, and knowledge base content. Semantically retrieved from a vector index.</li><li>Tool call outputs: live data fetched at inference time, including API responses, database query results, alert feeds, and CI/CD records.</li><li>Agent working memory: accumulated reasoning state across turns, including tested hypotheses, ruled-out causes, and timeline observations.</li></ol><p>Each source category produces outputs with different reliability characteristics. A system prompt updated by a senior engineer carries different authority than a documentation chunk that may reflect last year's architecture. An entity graph record updated when a team transfers service ownership is more authoritative for ownership questions than any passage in a runbook.</p><p>The architectural decision in context engineering is matching each question type to the right source, then assembling the payload efficiently within the model's token budget. <a href="https://www.singlestore.com/blog/context-engineering-a-definitive-guide/">SingleStore's context engineering analysis</a> describes this as the difference between a system that retrieves documents and one that curates, structures, and evolves the full information payload.</p><h2>The Taxonomy: Where RAG Sits in the Context Engineering Stack</h2><p>In a production context payload, RAG occupies a single slot among six, each with a different retrieval mechanism. For example, an incident triage agent assembles its payload from <code>entity_metadata</code> queried from an entity graph, <code>retrieved_docs</code> fetched from a Pinecone index over the runbook repository, <code>tool_outputs</code> pulled from live alert APIs, and <code>working_memory</code> maintained across turns in <a href="https://www.langchain.com">LangChain</a> state. RAG accounts for the <code>retrieved_docs</code> field:</p><pre><code class="language-json">{
  "system_prompt": "You are an incident triage assistant. Treat entity_metadata as authoritative for ownership and dependency data...",
  "entity_metadata": {
    "service": "payment-service",
    "owner": "team-payments-infra",
    "tier": 1,
    "dependencies": ["auth-service", "postgres-primary", "redis-cache"],
    "on_call": "alice@company.com",
    "last_deploy": "2025-01-14T09:22:00Z"
  },
  "conversation_history": [
    {"role": "user", "content": "payment-service is throwing 500s on checkout"},
    {"role": "assistant", "content": "Alert confirms elevated error rate since 09:41 UTC..."}
  ],
  "retrieved_docs": [
    {"source": "runbook-payment-service-v3.md", "chunk": "For elevated 5xx rates, check connection pool exhaustion first..."}
  ],
  "tool_outputs": {
    "current_alerts": {"metric": "error_rate", "value": 0.34, "threshold": 0.05},
    "recent_deploys": [{"sha": "a3f2c1", "deployed_at": "2025-01-14T09:22:00Z"}]
  },
  "working_memory": {
    "hypotheses_tested": ["connection_pool", "auth_service_upstream"],
    "timeline": "Error rate spiked at 09:41, 19 minutes after the a3f2c1 deploy"
  }
}
</code></pre><p><a href="https://arxiv.org/abs/2409.14924">A 2025 survey on LLM-based agents from researchers at NUS, Renmin University, and Fudan University</a> formally positions RAG as one component within the broader agent memory and context engineering landscape, intersecting with agent memory, LLM internal memory, and the full context engineering system as a whole.</p><h2>What Breaks When You Treat RAG as Context Engineering</h2><p>When a context pipeline routes all queries through a vector store, four failure patterns appear consistently in production.</p><h3>Stale structured data</h3><p>A RAG pipeline retrieves documentation that reflects what was true when it was written, not what is operationally true now. Reranking operates on retrieval precision over the indexed content, and stale source data produces wrong answers regardless of ranking quality. Ownership data requires a queryable source that updates when ownership changes, and documentation repositories are generally not kept in sync at that granularity.</p><h3>Entity relationship traversal</h3><p>“Which services depend on this database?” routed through a vector store returns passages where an engineer happened to write about the database's consumers. That result set is incomplete if documentation coverage is patchy, outdated if it wasn't maintained, and missing entirely if the dependency was never documented. This question requires a traversal over structured dependency metadata, which a typed entity layer can answer deterministically by following dependency edges directly.</p><h3>Cross-turn reasoning state</h3><p>In multi‑step workflows like incident response, re‑retrieving documents on each turn causes previously processed context to crowd out the agent’s evolving incident timeline. Hypotheses, checks, and temporal observations must be maintained in a persistent working memory layer, managed by the agent’s state system (for example, <a href="https://langchain-doc.readthedocs.io/en/latest/modules/memory/types/summary_buffer.html">LangChain's <code>ConversationSummaryBufferMemory</code></a> or an equivalent custom store).</p><h3>Context window inflation</h3><p>A naive strategy that injects the top‑20 retrieved passages can consume tens of thousands of tokens on documentation that’s irrelevant to the current reasoning step. Because cost per agent invocation scales with context size, bloated payloads add latency and expense without improving output reliability.</p><h2>What a Production Context Engineering Stack Actually Looks Like</h2><p>An AI agent handling incident triage puts all six context slots into production simultaneously, each drawing from a distinct source system matched to its query requirements.</p><p><img src="https://mermaid.ink/img/pako:eNpdU21vokAQ_iub_dQmvhQUa8mliQeNbaKRqleSQz-sMtWNsEuWRUub_vcbFvTOI4Hs7Mwz8zwzwxfdyhioS98TedrumdJk6a8Ewec1Gu1AoF0oQbrkVw6KvBagyjVptx-JN4o8KTR8aDLKc0g3SUkmrAS1Xok6gTcygYsgWpS5hpQESqaZ_rFR3cefsGdHLhVLyFaKXCvGhc6NayllQmJ454Jrjr71VbanafQkNNclmYJmMdPMgGYnASrf86yF0AxEDGLLoU44E-0tS5IW0dyw-yeb91yJOCKUVbXIM8-1VKWBLYo0ZYp_QkwyhVSJxkb8x2buR3PQisMRo3y5rQvOR2MiMSlRhdhIeSAc-XxcI5ezyAidFTorGuUTfgTCEsAhMBFXQhJZkkrjNTacRqFUBy522IT0zNdTMs_bFUmigOVSVP5cMw2XiTxN6x6Oo5umiWPFsn1NWrKYw21Tae6byLdFdPMGW-wJWeAHTGTABeDQAJciBJwiVjjDljMDGwUv0Y1Rg6daW8B2oPwCS3aJj4piuatZv3Q9__bCcBHUcwkvyxVi8-TJxFr28EC0JNYUvwe4DKOR5YVNl56vzEbK2Wwons3wL7ZBh-ZiMplG-Jq642DZ7lf_wUYiey9hRQykV7sgxT0lVseptntNW3SneEzdd5bk0KIpqJRVNv2qsq-o3kMKK-riEVecFYle0ZX4RlzGxG8pU-pqVSBSyWK3v-QpMtwC8DnbKZZeblW158qThdDUtQYmB3W_6Ad1bee-4_Tu7Z7V7zs9y3ZatKRu_6HfsQYPtnPXHw7tIV5_t-inqXrXcYb3A8vu2Y7j9B1rYH3_ARhnSA8?type=png" alt=""></p><p>The system prompt carries behavioral constraints, tool definitions, and output format requirements. These load at agent initialization in LangChain or <a href="https://www.llamaindex.ai">LlamaIndex</a> and remain static across turns.</p><p>The structured entity metadata slot is populated by querying an entity graph directly. For the incident agent, that query specifies <code>service=payment-service</code> and returns owner, tier, current on-call, dependencies, and last deploy time. Because this data comes from the same graph that engineers use operationally, the agent inherits its authority and update semantics. <a href="https://roadie.io">Roadie's context engineering platform</a> provides that graph: a queryable record of services, ownership, dependency relationships, and operational metadata that engineers and agents query from the same source. The graph updates when teams update their service records, so the agent always operates on current data. In practice the graph is fed from the operational systems themselves - Git, cloud accounts, PagerDuty, Kubernetes - so ownership and dependency changes surface without a developer remembering to touch a metadata file.</p><p>The conversation history slot carries a rolling summary of the current incident session, managed by the agent's memory layer. The <code>retrieved_docs</code> slot runs RAG over a hybrid Pinecone or Weaviate index over the runbook repository, using dense embedding similarity for semantic matching and BM25 for exact term hits, merged via reciprocal rank fusion. The <code>tool_outputs</code> slot carries live data fetched at inference time from <a href="https://www.pagerduty.com">PagerDuty</a>, <a href="https://www.datadoghq.com">Datadog</a>, and the CI/CD system.</p><h2>Audit Your Agent's Context Sources Today</h2><p><a href="https://www.gartner.com/en/newsroom/press-releases/2025-06-02-gartner-predicts-by-2028-80-percent-of-genai-business-apps-will-be-developed-on-existing-data-management-platforms">Gartner predicts context engineering will appear in 80% of enterprise AI tools by 2028</a>. To prepare, audit your agent’s context pipeline. For each context slot populated on a turn, answer four questions: what is its authoritative source, such as documentation, an entity graph, a live API, or in memory state? How stale can it be before causing errors? Is the data structured or unstructured? Is it retrieved fresh each turn or carried across turns?</p><p>Agents that rely solely on vector databases can accurately answer “what does the documentation say,” but fail on questions about current service ownership, dependencies, or multi-step reasoning state. Those require explicit wiring of additional slots, including entity metadata, tool outputs, and working memory, sourced from systems designed for authority and freshness.</p><p>The highest value addition is usually structured entity metadata, including service ownership, dependency graphs, on-call assignments, and API configurations. These change faster than documentation can keep up. <a href="https://roadie.io">Roadie</a> exposes this metadata via a queryable API, making it straightforward to integrate into LangChain or LlamaIndex agents as an additional tool.</p><p>Production-grade agents treat context assembly as infrastructure rather than retrieval, matching each question type to a source with the right authority, freshness, and query semantics. Retrieval precision alone is not enough.</p><p><a href="https://roadie.io/request-demo/">Explore Roadie's context engineering platform</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Context Engineering Is the Prerequisite Your Enterprise AI Deployment Is Missing]]></title><link>https://roadie.io/blog/context-engineering-enterprise-ai-prerequisite/</link><guid isPermaLink="false">https://roadie.io/blog/context-engineering-enterprise-ai-prerequisite/</guid><pubDate>Thu, 02 Apr 2026 09:00:00 GMT</pubDate><description><![CDATA[Context engineering is the architectural discipline that helps prevent enterprise LLM hallucinations. Here's what it covers, why most teams skip it, and how to audit your readiness before shipping.]]></description><content:encoded><![CDATA[<p>An engineering team wires an LLM assistant into their internal developer portal. Rollout takes two sprints. The demo works. Two weeks after shipping, the assistant is returning incident summaries that belong to a different service entirely, citing API endpoints deprecated 18 months ago, and naming engineers who left the company last year as service owners. The developers who built it start tweaking the system prompt. The outputs don't improve. Someone suggests switching models. The outputs still don't improve. Six weeks later, the initiative is shelved.</p><p>The model is operating without grounding in the organization’s actual engineering reality. It doesn’t know which team owns which service, which APIs are live, or which runbooks are current. In that environment, outputs default to general training patterns rather than verified organizational state, and the most likely outcome is a hallucination. You can address this using context engineering, which many enterprise AI initiatives skip entirely.</p><h2>What Context Engineering Actually Is</h2><p>Context engineering is the practice of curating, structuring, governing, and delivering domain-specific information to an LLM across the full request lifecycle. It covers data collection and normalization, semantic modeling and entity resolution, retrieval strategy design, freshness guarantees, and output validation. Each of those layers determines whether the model has accurate, current, and scoped information to work with at inference time.</p><p>Context engineering is closely related to, but distinct from, the two other disciplines that shape LLM behavior. Prompt engineering manages the instructions and format directives inside a request. Fine-tuning adjusts model weights using training examples to shift a model's behavioral defaults. Context engineering constructs and delivers the factual content that fills the context window. All three operate at different points in the system, and <a href="https://platform.openai.com/docs/guides/prompt-engineering">OpenAI's prompt engineering guidance</a> makes clear that in-context instruction can only do so much when the information being processed is itself inaccurate or absent. A production LLM feature requires all three to be addressed as separate, independent problems.</p><p>In practice, teams will often try fine-tuning to resolve an issue when context engineering is the right tool, and the substitution carries real operational cost. Fine-tuning encodes knowledge into weights at training time, so retraining is required every time your organizational data changes, which could be as often as daily. Context engineering delivers that information on demand, moving through four architectural stages, from raw data to validated output:</p><ol><li>Data collection and normalization across source systems</li><li>Semantic modeling and entity resolution (resolving "payments-api", "payments_api", and "Payments API" to a single canonical entity)</li><li>Retrieval and delivery strategy (hybrid search, re-ranking, query rewriting, and knowledge graph traversal)</li><li>Context validation and eval loop (scoring outputs against known-good answers before and after any retrieval change)</li></ol><h2>The Context Debt You're Already Accumulating</h2><p>A <a href="https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf">2015 NeurIPS paper</a> found that in production ML systems, the non-model work (e.g., data pipelines, feature engineering, and monitoring) tends to dominate total engineering cost. Context engineering handles the equivalent complexity for AI applications. Leave it unaddressed, and you accrue context debt.</p><p>Three failure modes tend to compound as LLM features evolve. Context is often embedded directly into system prompts, where it is rarely versioned or auditable. Retrieval introduces another source of error when vector search returns content that is semantically similar to a query while being incorrect for the specific service, team, or incident involved, since similarity scoring and factual correctness are independent properties and optimization for one does not guarantee the other. Outdated grounding data adds further divergence, as documentation that reflects earlier architectures continues to be retrieved and treated as current.</p><p>Each LLM feature built on an unstructured context layer adds another load-bearing workaround to an increasingly brittle foundation. Over time, unresolved context issues increase the cost of every change to retrieval, evaluation, or schema design. Ad‑hoc fixes create undocumented dependencies between prompts, data sources, and retrievers, and resolving conflicts between them often requires replacing the context layer across multiple features at once.</p><h2>Why Enterprise Data Is the Hardest Part of This Problem</h2><p>Every <a href="https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/">retrieval-augmented generation (RAG)</a> tutorial assumes a clean, queryable dataset, but enterprise data rarely is. A mid-sized engineering organization typically has service metadata distributed across GitHub (repository structure, CI config), Jira (ownership assigned inconsistently by team), PagerDuty (on-call rotation with no canonical service identifier), Confluence (runbooks at varying levels of staleness), and one or more custom CMDBs that were accurate two platform migrations ago. These systems are unlikely to have a consistent schema. Ownership modeling defaults to ad hoc assignments with no cross-tool canonical identity. Lineage tracking (who last verified this record and when) exists in almost no tooling by default.</p><p>The gap between "how do I build a retriever?" and "do I have anything worth retrieving?" is where most enterprise AI initiatives stall. You’ll answer the first question during the first sprint, but the second question won’t surface until the first production failure.</p><p>The retrieval problem gets more complicated when you factor in data type variance. Service ownership and dependency metadata are structured, entity-resolved, and queryable if your catalog enforces a schema. Incident postmortems are semi-structured, time-sensitive, and specific to a point-in-time system state that may no longer apply. Runbooks are unstructured prose, version-critical, and frequently orphaned when the system they document changes. A single RAG pipeline optimized for one of these types will return misleading results for the other two.</p><h2>RAG Is One Layer in the Context Engineering Stack</h2><p>RAG handles retrieval, and context engineering determines what it has to work with and how trustworthy that material is. The full context engineering stack for an engineering AI tool has four layers. Schema-enforced software catalogs provide a structured, canonical, machine-queryable foundation, with service records that carry verified owners, API contracts, dependency relationships, and linked runbooks. Knowledge graphs (<a href="https://neo4j.com/">Neo4j</a>, <a href="https://jena.apache.org/">Apache Jena</a>) extend that foundation with relationship-aware traversal, enabling multi-hop reasoning across entities that flat vector search cannot replicate. Agentic context gathering adds dynamic retrieval triggered by user intent rather than static query patterns. Interaction history provides user-scoped persistence for stateful context across sessions.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6qh04x8XSEmVFgLnlKXTod/65210051d2fb3a0d4c436f23b6915935/mermaid-diagram-2026-04-02T10-16-29.png" alt="Context Engineering Architecture for Enterprise AI"></p><p>Engineers who've moved to models with 1M-token context windows sometimes assume retrieval precision becomes a solved problem at that scale. Research has shown that models consistently miss or deprioritize information buried in the middle of a large input, a phenomenon known as the <a href="https://arxiv.org/abs/2307.03172">lost in the middle problem</a>. Flooding a context window with loosely relevant chunks degrades output quality regardless of window size, so retrieval precision matters at any scale.</p><h2>Context Infrastructure Is a Prerequisite</h2><p>Context infrastructure requires design decisions that feed directly into your data model, your retrieval architecture, your eval criteria, and your schema enforcement strategy. Those decisions can't be made as patches after an LLM feature is already running in production. The context layer has to be designed, built, and validated before the first LLM call reaches a user. Retrofitting a context layer after an LLM feature is live requires changing data models, retrieval behavior, and evaluation logic while users are already depending on the system’s outputs.</p><p>A well-maintained <a href="https://backstage.io/docs/features/software-catalog/">software catalog</a> that enforces ownership metadata, keeps documentation versioned and linked against specific services, and tracks dependency relationships and API contracts already forms the critical foundation layer for a context infrastructure layer for engineering AI tools. When the catalog is built to the right standard, it and the context infrastructure for AI are the same artifact.</p><p><a href="https://roadie.io">Roadie</a> provides structured, production‑grade engineering context used by both human engineers and AI agents. Service records capture verified ownership, runbooks, API specifications, and dependency relationships in a schema‑enforced and machine‑queryable form. This context is often surfaced through a Backstage catalog, though the same pattern applies to any structured engineering metadata source. Queries against structured systems return resolved entities with verified ownership, while queries against unstructured wikis return the highest‑scoring text fragment with no guarantees of accuracy or freshness.</p><h2>Run a Context Readiness Audit Before Your Next AI Feature Ships</h2><p>When context infrastructure is weak, failures tend to show up after deployment. A readiness audit is how you surface those risks before they become production incidents. Before writing a single line of LLM integration code for the next internal AI feature, ask the following questions about every data source it will consume. If the answer to any of these is no, you’re not ready to start development.</p><ol><li><p>Can every entity the LLM might reference (service, team, API endpoint, incident) be resolved to a single canonical, schema-enforced record? If the same service has three names across four systems, the model will invent a fifth.</p></li><li><p>Does every entity record carry ownership metadata with a verified-by date? If no one is accountable for keeping a record accurate, treat the record as unverified until proven otherwise.</p></li><li><p>Has the retrieval mechanism been tested against adversarial and edge-case queries, specifically queries where semantic similarity would surface the wrong result? Happy-path retrieval tests do not validate production behavior.</p></li><li><p>Is there an eval loop that scores LLM outputs against a set of known-good answers before and after any context change? <a href="https://www.langchain.com/langsmith">LangSmith</a> provides the tracing and evaluation infrastructure for this kind of continuous context validation. Shipping a change to your retrieval pipeline without an eval loop leaves you with no signal on whether the change improved or degraded output quality.</p></li><li><p>Is the context layer versioned and observable? Can you trace exactly which context payload was passed for any given model output? <a href="https://opentelemetry.io/">OpenTelemetry</a> instrumentation on your context assembly pipeline gives you the observability data you need to debug a hallucination with precision rather than guesswork.</p></li></ol><p>Five yes answers mean your context layer is ready to support an LLM feature. Fewer than five means you have unresolved architectural work to complete first. The audit takes an afternoon. The alternative is weeks of post-deployment firefighting over outputs that were structurally guaranteed to be wrong before the first user ever touched the feature.</p><hr><p>See how Roadie's context infrastructure makes AI features in engineering workflows actually work. <a href="https://roadie.io/request-demo">Book a demo</a></p>
]]></content:encoded></item><item><title><![CDATA[Prompt Engineering vs Context Engineering: What's the Actual Difference and Why It Matters for Production AI]]></title><link>https://roadie.io/blog/prompt-engineering-vs-context-engineering/</link><guid isPermaLink="false">https://roadie.io/blog/prompt-engineering-vs-context-engineering/</guid><pubDate>Thu, 26 Mar 2026 15:00:00 GMT</pubDate><description><![CDATA[Learn the difference between prompt engineering and context engineering, why context pipelines are the bottleneck in production AI systems, and how to wire your service catalog into a working retrieval pipeline.]]></description><content:encoded><![CDATA[<p>Many AI projects fail because of poor or irrelevant context inputs rather than model capability limitations. As <a href="https://www.youtube.com/watch?v=LCEmiRjPEtQ&#x26;t=620s">Andrej Karpathy puts it</a>, the LLM is the CPU and the context window is RAM. Everything the model can reason about during inference must fit into that buffer before the first token is generated. Your documentation site, service catalog, and runbooks are invisible to the model unless they’re explicitly loaded in.</p><p>This constraint defines where prompt engineering ends and context engineering begins. Prompts control how a model behaves, and context engineering determines what information actually reaches the context window when the model is called.</p><h2>What is Prompt Engineering?</h2><p>Prompt engineering covers everything within the instruction itself, such as the role specification, task description, output format, and behavioral constraints. It uses several well-understood techniques, including few-shot examples to calibrate output style, <a href="https://arxiv.org/abs/2201.11903">chain-of-thought decomposition</a> to externalize reasoning, JSON mode to constrain structure, and negative instructions to prevent specific failure patterns.</p><p>If you provide a stable, well-scoped problem, prompt engineering can produce precise and consistent outputs for tasks where the input is predictable, the task is bounded, and the instruction is the primary variable. Some examples include classifying a support ticket, extracting fields from a contract, or generating a unit test from a function signature. These techniques assume the required information is already present in the context window.</p><p>However, the limitations of this approach become clear quickly in practice. A service dependency lookup agent can have a carefully crafted system prompt that is role‑specified, chain‑of‑thought enabled, and output schema constrained, but still return incorrect results when it runs without the relevant entity metadata:</p><pre><code>User: Which services does payments-service depend on?
Agent: The payments-service typically depends on fraud-detection,
       transaction-gateway, and ledger-store. [hallucinated]
</code></pre><p>If you inject the actual entity metadata from your catalog, an identical prompt with the relevant context produces a correct, grounded answer instead of a hallucinated one:</p><pre><code>User: Which services does payments-service depend on?
Agent: Based on the catalog data, payments-service depends on
       fraud-detection-service (v2.1), ledger-api (v1.4).
       [correct, sourced from injected context]
</code></pre><p>Nothing about the prompt changed. The improvement came entirely from what was loaded into the context window, which is exactly what context engineering controls.</p><h2>What is Context Engineering?</h2><p>Context engineering is about designing, populating, and optimizing everything that enters the context window before each LLM call. It’s a runtime system that assembles context dynamically for each invocation.</p><p><a href="https://blog.langchain.com/context-engineering-for-agents/">LangChain frames the scope</a> across four operations.</p><ol><li>“Write” defines what state to persist across turns, such as conversation summaries, tool outputs, and agent memory.</li><li>“Select” determines what to retrieve for a given call, pulling only the relevant subset of the knowledge base.</li><li>“Compress” reduces token volume without discarding signal, using techniques like summarization, LLMLingua-style compression, and chunk deduplication.</li><li>“Isolate” decides what to offload to sub-agents rather than placing everything in a single context window, which matters for multi-step tasks that would otherwise exhaust the budget.</li></ol><p>Scope is where prompt engineering and context engineering diverge most clearly. A prompt is a static artifact that a developer writes, reviews, and commits. A context pipeline is infrastructure made up of retrieval chains, reranking models, summarization steps, schema injection, token budget management, and assembly logic. These components execute at runtime before each model call. Prompt engineering is typically owned by the person who writes the system prompt. Context engineering is owned by platform teams and lives alongside API servers, data pipelines, and observability tooling.</p><p>Diagnosing a context pipeline that misbehaves in production involves a different debugging surface. Rewriting the system prompt won’t resolve the issue. Instead, you need to trace retrieval results, audit the assembled context, and inspect token utilization.</p><h2>Four Ways Prompt-Only Approaches Break in Production</h2><h3>1. Positional Attention Degrades Recall on Long Contexts</h3><p><a href="https://arxiv.org/abs/2307.03172">Stanford's "Lost in the Middle" research</a> (Nelson F. Liu et al., 2023) showed that LLM recall performance degrades significantly when critical information appears in the middle of a long context. Models attend most reliably to content at the start and end of the context window. Put 30 retrieved documents into the context in arbitrary order, and the model will effectively ignore the middle 20. This is a positional attention pattern baked into how transformers process long sequences, which you can’t instruct your way around. The solution is to fix how you assemble and order the context.</p><h3>2. Context Rot Hits Before You Hit Token Limits</h3><p>Many teams observe accuracy degradation as context grows, even before hitting token limits. This is often due to reduced signal density and attention dilution rather than hard limits. <a href="https://research.trychroma.com/context-rot">Research on context rot</a> shows that even on simple tasks, performance drops as total token count grows, well before you approach 128K or 200K limits. Switching to a larger context window, for example, from GPT‑4o to Claude’s 200K window, can delay the failure, but it does not address the underlying issue. A smaller context with tightly selected, high‑signal content consistently outperforms a much larger context filled with irrelevant material.</p><h3>3. Static Prompts Can't Track Dynamic Systems</h3><p>A static prompt encodes knowledge at write time. It doesn't know that <code>payments-service</code> changed its primary upstream dependency last Tuesday, that the on-call rotation shifted, or that the runbook was updated after last quarter's incident. The model answers from training data or from whatever static text you last injected, both stale by definition in a dynamic engineering environment.</p><h3>4. Non-Determinism From Uncontrolled Context Is Impossible to Debug</h3><p>For pipelines that depend on parseable, stable responses, such as incident triage agents, onboarding bots, and automated code review tools, it can be very difficult to identify failures due to information changing between prompts. You can't reproduce these by running the prompt again. You need observability on what was actually in the context window for the failing call.</p><h2>Anatomy of a Production Context Pipeline</h2><p>A typical production context pipeline runs as a sequence: user query, intent classification, parallel retrieval (vector search plus catalog API lookup), reranking, compression, context assembly, token budget audit, and LLM call.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/3YLsEjbebRFNLz9WUHWN8q/b657cf6888485234250c991dce871706/mermaid-diagram-2026-03-26T15-36-01.png" alt="Anatomy of a Production Context Pipeline"></p><p>A Python implementation of the core assembly step can use <a href="https://python.langchain.com/docs/introduction/">LangChain</a> with a <a href="https://backstage.io">Backstage</a> catalog API and a <a href="https://github.com/pgvector/pgvector">pgvector</a> store:</p><pre><code class="language-python">import tiktoken
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import PGVector
import requests

ENCODING = tiktoken.encoding_for_model("gpt-4o")
TOKEN_BUDGET = 128_000
SLOT_WEIGHTS = {
    "system_instructions": 0.10,  # ~12,800 tokens
    "retrieved_context":   0.55,  # ~70,400 tokens
    "tool_schemas":        0.15,  # ~19,200 tokens
    "conversation_history": 0.20, # ~25,600 tokens
}

def fetch_catalog_entities(service_name: str, backstage_url: str) -> list[dict]:
    """Query Backstage catalog API for a component and its dependency metadata."""
    resp = requests.get(
        f"{backstage_url}/api/catalog/entities",
        params={"filter": f"kind=Component,metadata.name={service_name}"},
        headers={"Authorization": "Bearer \${BACKSTAGE_TOKEN}"},
    )
    resp.raise_for_status()
    entities = resp.json()
    return [
        {
            "name":        e["metadata"]["name"],
            "owner":       e["spec"].get("owner", "unknown"),
            "depends_on":  e["spec"].get("dependsOn", []),
            "description": e["metadata"].get("description", ""),
            "annotations": e["metadata"].get("annotations", {}),
        }
        for e in entities
    ]

def assemble_context(
    query: str,
    system_instructions: str,
    tool_schemas: list[dict],
    conversation_history: list[dict],
    backstage_url: str,
    vector_store: PGVector,
    service_name: str,
) -> dict:
    # Fetch typed entity metadata from the catalog
    catalog_entities = fetch_catalog_entities(service_name, backstage_url)

    # Vector similarity search over embedded TechDocs chunks
    docs = vector_store.similarity_search(query, k=8)

    # Assemble named slots with explicit budget allocation
    context = {
        "system_instructions": system_instructions,
        "retrieved_context": {
            "catalog_entities": catalog_entities,
            "techdocs": [d.page_content for d in docs],
        },
        "tool_schemas": tool_schemas,
        "conversation_history": conversation_history,
        "user_message": query,
    }

    # Enforce token budget before the LLM call
    # This is an approximation—actual token usage depends on how the context is serialized into the model’s message format.
    total_tokens = len(ENCODING.encode(str(context)))
    assert total_tokens &#x3C; TOKEN_BUDGET, (
        f"Context exceeds budget: {total_tokens} / {TOKEN_BUDGET} tokens"
    )
    return context
</code></pre><p>The slot allocation (10% system instructions, 55% retrieved context, 15% tool schemas, 20% conversation history) is a starting point, not a fixed rule. Incident triage agents typically allocate more budget to retrieved context because the reasoning depends heavily on documents. Coding assistants often shift more budget toward conversation history to preserve code state across turns. You can encode these allocations directly in code so you can see exactly how a context was constructed when a call fails.</p><p>In production, each stage of the pipeline needs to be inspectable. That usually includes the retrieved documents, reranked results, final assembled context, and token counts. Without this visibility, debugging becomes guesswork. You can address this by capturing full context snapshots per request using tools like LangSmith or custom logging pipelines.</p><p>When a task no longer fits cleanly within a single context window, you can decompose it into sub‑agents. The sub‑agent boundary determines what information each agent needs access to and what it can safely ignore.</p><h2>Prompt Engineering vs Context Engineering: Scope, Ownership, and Failure Mode</h2><p>In production systems, prompt engineering functions as one component within a larger context assembly pipeline, with different ownership, tooling, and failure modes.</p><p>| Dimension | Prompt Engineering | Context Engineering |
|---|---|---|
| Scope | Instruction design for a single call | Full context window lifecycle across all calls |
| Owner | Prompt author / ML engineer | Platform engineer / infrastructure team |
| Primary artifact | System prompt string | Retrieval and assembly pipeline |
| Repeatability guarantee | Varies with context state | High when context assembly is deterministic |
| Failure mode | Non-determinism from uncontrolled inputs | Context rot or retrieval miss |
| Debugging approach | Rewrite instructions and re-test | Trace retrieval results, audit token budgets, inspect assembled context |</p><h2>The Developer Portal as Context Infrastructure</h2><p>For AI workflows inside an engineering organization, document retrieval on its own is rarely enough. When teams ask questions about a service, they usually need concrete facts like who owns <code>payments-service</code>, what it depends on upstream, what its SLO targets are, where the runbook lives, and who is on call. That information exists as structured data, not as natural language text, and treating it as just another document tends to produce unreliable results.</p><p>The <a href="https://backstage.io">Backstage</a> catalog API exposes this data directly. A <code>GET /api/catalog/entities?filter=kind=Component</code> call returns JSON with fields like <code>kind</code>, <code>spec.owner</code>, <code>spec.dependsOn</code>, <code>metadata.annotations</code>, and <code>metadata.description</code>. These map directly to an LLM context schema with minimal transformation. The more structured the source data, the smaller the hallucination surface area. The model doesn't need to infer ownership or dependency relationships from text because the typed fields make both explicit.</p><p>The difference becomes most apparent during incident triage workflows. In practice, this usually involves an internal tool being queried as the first step of triage. For example, if an agent receives a PagerDuty alert for <code>payments-service</code> without catalog context, it falls back to generic advice like checking database connections, CPU utilization, or recent deployments. When catalog data is injected at call time, including the owning team, upstream dependencies, and the linked runbook, the response references the actual dependency chain and points to the correct escalation path.</p><p><a href="https://roadie.io">Roadie</a> provides this data layer for both human engineers and AI agents. The catalog API, TechDocs indexing, and entity relationship graph are maintained out of the box. Teams don't absorb the operational cost of self-hosting Backstage, including upgrade cycles timed with every Backstage release, plugin compatibility work, and infrastructure ownership, while also building the AI pipelines on top of it. Roadie handles the context source, while your team focuses on the retrieval and assembly pipeline.</p><h2>Start Here: Wire Your Catalog Into a Context Pipeline Today</h2><p>The service metadata in your catalog is already structured, authoritative, and queryable. You can wire that data into a retrieval pipeline that your agents can consume in three steps:</p><p><strong>1. Export catalog entities:</strong> Call <code>GET /api/catalog/entities?filter=kind=Component</code> against your instance. The response gives you all service entities as JSON. Extract <code>metadata.name</code>, <code>spec.owner</code>, <code>spec.dependsOn</code>, and <code>metadata.description</code> for embedding.</p><p><strong>2. Chunk and embed:</strong> Concatenate the relevant fields into a single text block per entity and generate embeddings.</p><pre><code>```python
import openai

entity_text = (
    f"{entity['name']}: {entity['description']}. "
    f"Owner: {entity['owner']}. "
    f"Depends on: {', '.join(entity['depends_on'])}"
)
response = openai.embeddings.create(
    model="text-embedding-3-small",
    input=entity_text,
)
vector = response.data[0].embedding
# Store vector alongside raw entity JSON in pgvector
```
</code></pre><p>Store vectors alongside the raw JSON entity payload in <a href="https://github.com/pgvector/pgvector">pgvector</a>. For higher query volume or managed infrastructure, <a href="https://www.pinecone.io">Pinecone</a> offers the same cosine similarity search without the Postgres operational overhead.</p><p><strong>3. Build the context assembly function:</strong> On each agent query, run similarity search against the vector store, retrieve the top‑k entity payloads, and inject the serialized JSON into the model’s context before the LLM call.</p><p>Any team with a populated service catalog can implement this with minimal overhead and typically see an immediate reduction in hallucinated service names and incorrect ownership references. Once real service metadata occupies the context window, the model stops inventing relationships from training data.</p><p>If your team is already running Backstage or Roadie, your context source exists. The next step is building the retrieval pipeline on top of it.  <a href="https://roadie.io/demo">Book a demo</a> to see how Roadie structures catalog data for AI-ready context retrieval.</p>
]]></content:encoded></item><item><title><![CDATA[Your IDP Is an AI Goldmine: How Internal Developer Platforms Enable Context Engineering]]></title><link>https://roadie.io/blog/idp-ai-goldmine-context-engineering/</link><guid isPermaLink="false">https://roadie.io/blog/idp-ai-goldmine-context-engineering/</guid><pubDate>Thu, 19 Mar 2026 10:00:00 GMT</pubDate><description><![CDATA[Learn how your Internal Developer Platform (IDP), specifically Backstage's Catalog API, can become the context infrastructure that makes org-specific AI agents accurate, grounded, and production-ready. Includes three wiring patterns with code examples.]]></description><content:encoded><![CDATA[<p>An on-call engineer gets paged at 3am. Checkout is degraded. They open their AI assistant and ask: "What services does <code>checkout-api</code> depend on, and who's on call for them?" The agent either hallucinates a plausible-sounding list of services that don't exist, or shrugs and admits it doesn't know.</p><p>The LLM has the capability to reason about dependency graphs in generic contexts, but it can’t tell you anything about a dependency graph it doesn’t have access to. The gap between what a modern LLM can do and what it actually knows about your specific systems is what kills AI adoption for anything beyond code generation.</p><h2>The Two Categories of AI Tasks in Engineering Orgs</h2><p>AI tasks that show up in engineering organizations can usually be split between generic and org-specific tasks.</p><p>Generic tasks such as writing a unit test, suggesting a refactor, or explaining a regex work well out of the box because the underlying knowledge is universal. The model has seen thousands of similar examples in its training data.</p><p>Org-specific tasks are different. "Which team owns <code>auth-gateway</code>?" "Did <code>payments-service</code> deploy anything in the last four hours?" "What's the runbook for <code>checkout-api</code> queue consumer failures?" These questions require private, structural knowledge about your organization that no pre-trained model has and can't hallucinate accurately. The knowledge is specific, relational, and continuously changing.</p><p>Most teams try to close this gap by dumping documentation into a vector store and calling it RAG. Confluence pages, GitHub READMEs, and runbook docs get chunked, embedded, and retrieved at query time. This might work until the docs go stale (immediately), ownership information gets siloed in a wiki nobody maintains (also immediately), or the agent retrieves a plausible document that describes the system as it existed eighteen months ago. Unstructured documentation is a poor substrate for org-specific AI tasks. It has no canonical entity IDs, no typed relations, and no consistent update cadence. You end up with confident wrong answers, which are worse than no answers.</p><p>The most structured, continuously updated source of engineering context in your org is your Internal Developer Platform. Context engineering, or deciding what data populates a model’s context window at inference time, treats the IDP as context infrastructure rather than a simple developer portal. The underlying knowledge is already there, and exposing that data involves wiring it into the model’s context.</p><h2>What Your IDP Actually Knows (and Why That Data Is Rare)</h2><p>A mature <a href="https://backstage.io/">Backstage</a> IDP maintains a layered graph of operational facts about every registered service. Each context layer maps to a different type of data:</p><p><strong>1. Service catalog:</strong> Component, API, System, and Resource entity kinds carry <code>spec.type</code>, <code>spec.lifecycle</code>, and <code>metadata.tags</code> for tech stack metadata. <code>spec.owner</code> links every component to the team or group accountable for it. This alone answers a class of AI queries ("who owns this service?", "which services are in <code>production</code> lifecycle?") that most agents can't currently handle.</p><p><strong>2. Ownership graph:</strong> The traversal <code>spec.owner</code> ><code>Group</code> entity ><code>spec.members</code> gives you a directed chain from any service name to a list of actual humans or an on-call rotation. When a <a href="https://www.pagerduty.com/">PagerDuty</a> plugin is attached, the group entity can resolve directly to an active incident responder, not just a team name.</p><p><strong>3. Dependency map:</strong><code>spec.dependsOn</code>, <code>spec.providesApis</code>, and <code>spec.consumesApis</code> form a queryable directed graph of service-to-service relationships. This is the data an AI agent needs to answer "what else does this change affect?" during change-impact analysis or incident scope assessment.</p><p><strong>4. Deployment history:</strong><a href="https://docs.github.com/en/actions">GitHub Actions</a>, Tekton, and <a href="https://argo-cd.readthedocs.io/en/stable/">ArgoCD</a> Backstage plugins surface deploy metadata as catalog annotations (<code>backstage.io/last-deploy-timestamp</code>, commit SHA, deploying user). An agent with access to this data can answer "did anything deploy to <code>checkout-api</code> in the past 6 hours?" without needing to query GitHub directly.</p><p><strong>5. Incident data:</strong> PagerDuty and Opsgenie plugins embed open incident counts, on-call rotation names, and service health thresholds as entity annotations. This is the difference between an agent that helps triage and one that produces noise.</p><p>All of this data is machine-readable by design. Every piece of it exists as structured YAML at the source, delivered as typed JSON through the <a href="https://backstage.io/docs/features/software-catalog/software-catalog-api/">Backstage Catalog API</a>. Contrast that with a Confluence page about <code>checkout-api</code>, which might contain some of this information, written in prose, last updated whenever someone remembered to do it. The IDP version is authoritative, entity-keyed, and alive.</p><p>Proprietary portals that don't offer accessing the catalog entities through a consistent data structure lack the structural backbone that makes this data tractable as an AI context source.</p><h2>Context Engineering: What Goes Into the Window and Why It Matters</h2><p>Unlike prompt engineering, which focuses on how you phrase a request, context engineering relies on a set of decisions to determine what the model sees before it generates anything at all. These decisions might include what data to retrieve, how to structure it, when to inject it, and how much to trim.</p><p>For an AI agent operating in a production engineering org, IDP data maps to four distinct context types, each relevant to a different query class:</p><ul><li><strong>Factual context</strong> (ownership, lifecycle, tech stack) answers "who owns this" and "what kind of thing is this"</li><li><strong>Relational context</strong> (dependency maps) answers "what else is affected" and "what does this call"</li><li><strong>Historical context</strong> (deployment events, incident records) answers "what changed recently" and "has this broken before"</li><li><strong>Procedural context</strong> (runbooks, ADRs, TechRadar entries linked to catalog entities) answers "how do we handle this"</li></ul><p>The architectural advantage of IDP data over unstructured docs is precision. Catalog entities have canonical identifiers and typed relations, which means retrieval can combine semantic search with structured filtering. A vector similarity search can surface the relevant entity description, while the entity name, namespace, and relations ensure the agent retrieves the correct <code>checkout-api</code> component, not just a document that happens to mention checkout in passing. Semantically similar isn't good enough; the context needs to resolve to the exact service entity.</p><h2>Three Patterns for Wiring Your IDP to an AI Agent</h2><p>The following three patterns describe different ways of consuming the Backstage Catalog API as an underlying data source. Each uses a different retrieval mechanism and requires distinct infrastructure.</p><h3>Pattern A: RAG Over Catalog Entities</h3><p>Embed catalog entity descriptors as structured text chunks and store them in a vector index, such as <a href="https://github.com/pgvector/pgvector">pgvector</a> or a hosted vector service (use what you already have). Retrieve the relevant chunks at query time and inject them into the system prompt. <a href="https://www.langchain.com/">LangChain</a> and <a href="https://www.llamaindex.ai/">LlamaIndex</a> both have straightforward document loader patterns for this.</p><p>The chunking strategy matters. Avoid embedding an entire entity as a single chunk. Instead, split entities by context type: a facts chunk (name, owner, lifecycle, description), a dependencies chunk (dependsOn, providesApis, consumesApis), and an incident/deployment chunk (annotations). This produces three embeddings per entity, all keyed to the same entity name, and enables more precise retrieval when a query targets dependencies rather than ownership.</p><p><strong>Best for:</strong> Read-only Q&#x26;A at scale ("list all services owned by team-payments", "which services are in <code>experimental</code> lifecycle?").</p><p><strong>Trade-off:</strong> Can impact index freshness. If you don't wire catalog change events to index updates, your RAG pipeline will drift from the live catalog.</p><h3>Pattern B: MCP Server Wrapping the Backstage Catalog API</h3><p>Run a Model Context Protocol (MCP) server that wraps the Backstage Catalog API and exposes catalog operations as agent-callable tools. Anthropic’s MCP server (released November 2024) defines a standard for exposing external systems in this way, allowing agents to fetch fresh catalog data during inference.</p><p>The MCP server translates existing Catalog API endpoints into tool definitions. For example:</p><p>| MCP Tool | Backstage Endpoint |
|---|---|
| <code>get_component_by_name</code> | <code>GET /api/catalog/entities/by-name/component/{namespace}/{name}</code> |
| <code>list_entities_by_owner</code> | <code>GET /api/catalog/entities?filter=spec.owner={team}</code> |
| <code>get_entity_relations</code> | <code>GET /api/catalog/entities/by-name/{kind}/{namespace}/{name}</code> |</p><p>When the agent receives a query, it can call these tools dynamically and retrieve fresh catalog data during reasoning.</p><p>For example, answering a question like:</p><blockquote><p>Which services owned by <code>team-payments</code> have dependencies on <code>checkout-api</code>?</p></blockquote><p>might involve multiple tool calls:</p><ol><li>Query services owned by the team</li><li>Retrieve each entity's dependency relations</li><li>Filter those that reference <code>checkout-api</code></li></ol><p>An MCP server orchestrates those API calls while exposing them to the agent as simple tools.</p><p><strong>Best for:</strong> Multi-step agentic workflows that need to traverse the service graph dynamically.</p><p><strong>Trade-off:</strong> Every tool call translates into a Catalog API request, so complex queries can introduce additional latency compared to pre-indexed retrieval.</p><h3>Pattern C: Direct Function-Tool Definitions</h3><p>Define Catalog API endpoints as function tools directly in your <a href="https://platform.openai.com/docs/guides/function-calling">OpenAI</a> or <a href="https://docs.anthropic.com/en/docs/build-with-claude/tool-use">Claude API</a> call. No additional infrastructure. The agent calls the tool during inference, fetches entity data, and incorporates it into its response.</p><p>Here's a minimal tool definition for <code>get_component_by_name</code>:</p><pre><code class="language-json">{
  "name": "get_component_by_name",
  "description": "Retrieve a Backstage catalog entity for a named service component. Returns ownership, dependencies, lifecycle status, runbook URL, and deployment metadata.",
  "parameters": {
    "type": "object",
    "properties": {
      "name": {
        "type": "string",
        "description": "The component name as registered in the Backstage catalog (e.g., 'checkout-api', 'payments-service')"
      },
      "namespace": {
        "type": "string",
        "description": "The Backstage namespace, defaults to 'default'",
        "default": "default"
      }
    },
    "required": ["name"]
  }
}
</code></pre><p>The implementation calls <code>GET /api/catalog/entities/by-name/component/{namespace}/{name}</code> against your Backstage instance.</p><p><strong>Best for:</strong> Shipping a proof of concept today. No new infrastructure, real-time data, works with any LLM that supports function calling.</p><p><strong>Trade-off:</strong> You're making an API call per query. At high volume, Pattern A's pre-indexed retrieval will be faster.</p><h3>Choosing the right pattern</h3><p>Start with Pattern C today. Graduate to Pattern B as your agentic workflows get more complex, particularly when they require multi‑hop traversal. Pattern A is the right call if you're building read-heavy Q&#x26;A at scale and want to minimize per-query API latency.</p><p>The table below summarizes the key trade‑offs between the three patterns.</p><p>| Pattern | Freshness | Latency | Infra Complexity | Best For |
|---|---|---|---|---|
| A: RAG over catalog entities | Index refresh cadence | Low (pre-indexed) | Medium (embedding pipeline + vector store) | Read-only Q&#x26;A at scale |
| B: MCP server | Real-time | Higher (per-hop API call) | High (MCP server) | Multi-step agentic workflows |
| C: Direct function tools | Real-time | Medium (per-query API) | Low (none) | Zero-infra proof of concept |</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3LFi2UoN9nZXrd8j0Mkeze/b64885b638cae8ce88e9d55bc85ffbc5/mermaid-diagram-2026-03-19T12-58-37.png" alt="Three Patterns for Wiring Your IDP to an AI Agent"></p><h2>Building the Pipeline: From catalog-info.yaml to Context String</h2><p>In the pipeline from IDP to AI agent, Backstage acts as the system of record, where services are described in <code>catalog-info.yaml</code> with structured relationships and metadata. The Catalog API provides a queryable interface over that data, returning entity definitions as JSON. From there, the pipeline converts those entities into smaller, purpose-built context representations — either as embeddings for retrieval or as structured responses returned through tool calls.</p><p>A typical <code>catalog-info.yaml</code> in a production Backstage instance can define metadata such as its name, description, owner, dependencies, and the APIs it provides.</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: checkout-api
  description: "Handles payment checkout flows and queue consumer processing"
  annotations:
    pagerduty.com/service-id: "P1234XY"
    backstage.io/last-deploy-timestamp: "2025-01-14T22:31:00Z"
    runbook-url: https://runbooks.internal/checkout-api
spec:
  type: service
  lifecycle: production
  owner: team-payments
  system: payments-platform
  dependsOn:
    - component:default/payments-service
    - component:default/inventory-api
    - resource:default/checkout-queue
  providesApis:
    - checkout-api-v2
  consumesApis:
    - payments-processing-api
</code></pre><p>The Catalog API surfaces the YAML as a JSON entity at <code>GET /api/catalog/entities/by-name/component/default/checkout-api</code>. The fields your agent needs are in <code>metadata</code>, <code>spec</code>, and <code>metadata.annotations</code>. An important thing to keep in mind is that while the <code>catalog-info.yaml</code> file is the most common method to declare entities, the catalog can also be populated from multiple data sources, such as AWS, and in many scenarios, enriching an entity from multiple sources can provide a richer context to the AI agent.</p><p>Here's a Python function that takes the entity JSON and returns a structured context string ready to inject into a system prompt or retrieve from a vector index:</p><pre><code class="language-python">def parse_entity_ref(ref: str) -> str:
    """
    Convert Backstage entity refs like:
    'component:default/payments-service'
    into just:
    'payments-service'
    """
    try:
        return ref.split("/")[-1]
    except Exception:
        return ref
def entity_to_context_string(entity: dict) -> str:
    metadata = entity.get("metadata", {})
    spec = entity.get("spec", {})
    annotations = metadata.get("annotations", {})

    name = metadata.get("name", "unknown")
    description = metadata.get("description", "No description provided")
    owner = spec.get("owner", "UNKNOWN — ownership gap")
    system = spec.get("system", "UNKNOWN — system unassigned")
    lifecycle = spec.get("lifecycle", "unknown")

    depends_on = [parse_entity_ref(r) for r in spec.get("dependsOn", [])]
    provides_apis = [parse_entity_ref(r) for r in spec.get("providesApis", [])]
    consumes_apis = [parse_entity_ref(r) for r in spec.get("consumesApis", [])]

    runbook = annotations.get("runbook-url", "No runbook linked")
    last_deploy = annotations.get("backstage.io/last-deploy-timestamp", "No deploy data")
    pagerduty_id = annotations.get("pagerduty.com/service-id", "No PagerDuty link")

    return f"""Service: {name}
Description: {description}
Owner: {owner}
System: {system}
Lifecycle: {lifecycle}

Dependencies: {', '.join(depends_on) if depends_on else 'None recorded'}

Provides APIs: {', '.join(provides_apis) if provides_apis else 'None recorded'}
Consumes APIs: {', '.join(consumes_apis) if consumes_apis else 'None recorded'}

Last Deploy: {last_deploy}
Runbook: {runbook}
"""
</code></pre><h3>Example: What the Model Actually Sees</h3><p>When the agent retrieves context for a service like checkout-api, the information injected into the model’s context window is a structured block derived from the catalog entity, not the raw YAML or JSON.</p><p>A typical context injection might look like this:</p><pre><code>Description: Handles payment checkout flows and queue consumer processing
Owner: team-payments
System: payments-platform
Lifecycle: production

Dependencies: payments-service, inventory-api, checkout-queue

Provides APIs: checkout-api-v2
Consumes APIs: auth-api

Last Deploy: 2025-01-14T22:31:00Z
Runbook: https://runbooks.internal/checkout-api
</code></pre><p>This block is small enough to fit comfortably inside an LLM context window while still giving the model the critical operational facts it needs to answer questions like:</p><p>"Who owns checkout-api?"</p><p>"What services might be affected if checkout-api fails?"</p><p>"Did anything deploy recently that could explain this incident?"</p><p>For Pattern A (RAG), different slices of this information are typically embedded separately. For example, one embedding for service facts, one for dependency relations, and one for operational history. So, the retrieval layer can return only the context relevant to the user’s question.</p><p>The model reasons over authoritative service metadata pulled directly from the IDP catalog, rather than relying on inference or approximation.</p><p>If you're running on Roadie, the Catalog API is already available at a stable, authenticated endpoint. Roadie also ships the <a href="https://roadie.io">AI Assistant RAG plugin</a>, which implements the embedding, indexing, and retrieval layer of Pattern A out of the box. If your org is already on Roadie, the pipeline in this section is largely already running. You would only need to connect your LLM endpoint to it, instead of building the chunking infrastructure from scratch.</p><h2>Operational Hygiene: Your Context Is Only as Good as Your Catalog</h2><p>Incomplete catalog data creates predictable failure patterns when used as AI context. Here’s a typical example: an AI agent pages the wrong team during an incident because <code>spec.owner</code> was missing from a catalog entity, and the agent fell back to a default or hallucinated a plausible owner name.</p><p>To avoid this class of failure, a catalog needs to meet three completeness requirements before it’s safe to use as an AI context source:</p><p><strong><code>spec.owner</code> must be populated on every component:</strong> Unenforced ownership means the agent has no escalation path. An agent that can't answer "who owns this service?" is useless for incident triage and on-call routing, which are the two highest-value use cases for real-time IDP context.</p><p><strong><code>metadata.description</code> must be non-empty:</strong> Empty descriptions degrade embedding quality and cause false-positive retrievals in Pattern A. A query for "checkout flow services" can return <code>inventory-api</code> simply because its description is empty, effectively turning it into a wildcard candidate during retrieval.</p><p><strong>System relations must be defined:</strong> Without <code>spec.system</code>, the dependency graph is a set of disconnected nodes. An agent trying to answer "what other services are in the same system as <code>checkout-api</code>?" can't traverse a graph that doesn't have system edges. This matters for change-impact analysis, which needs to understand blast radius within a system boundary.</p><p>Beyond completeness, two operational concerns apply regardless of which pattern you use:</p><p><strong>Access control:</strong> AI agents query the catalog on behalf of users. Backstage's <a href="https://backstage.io/docs/permissions/overview">permission framework</a> must be enforced at the API layer so agents can't surface catalog data that the requesting user isn't authorized to see. Don't skip this step just because the agent interface feels informal.</p><p><strong>Catalog freshness:</strong> Catalog auto-sync must be wired to source change events such as SCM push hooks and CI completion events, not nightly batch jobs. Deployment history and incident annotations are time-sensitive. A last-deploy timestamp from eight hours ago is misleading context during an active incident. Every hour of staleness widens the hallucination window on operational queries.</p><p>That said, wiring SCM-triggered catalog refresh reliably is harder in practice than it sounds. Sync failures, webhook misconfigurations, and integration drift are recurring operational costs of self-hosted Backstage. On Roadie's SaaS platform, SCM-triggered catalog refresh and entity validation are managed infrastructure, which removes the self‑hosted costs associated with stale‑catalog problems caused by sync failures. This is a concrete build-vs-buy consideration for teams deciding where to invest engineering time.</p><h2>Start Here: Audit Your Catalog for AI Context Readiness Today</h2><p>To get started auditing your catalog for AI context readiness, here are three steps you can complete before end of day. No new dependencies required.</p><p><strong>Step 1: Run a completeness audit.</strong> Hit <code>GET /api/catalog/entities?filter=kind=component</code> against your Backstage instance and pipe the response through this script:</p><pre><code class="language-python">import requests
from collections import defaultdict

BACKSTAGE_URL = "https://your-backstage.example.com"
TOKEN = "your-backstage-token"

def audit_catalog_completeness():
    url = f"{BACKSTAGE_URL}/api/catalog/entities?filter=kind=component"
    headers = {"Authorization": f"Bearer {TOKEN}"}

    response = requests.get(url, headers=headers)
    entities = response.json()

    gaps = defaultdict(list)

    for entity in entities:
        name = entity["metadata"]["name"]
        spec = entity.get("spec", {})
        metadata = entity.get("metadata", {})

        if not spec.get("owner"):
            gaps["missing_owner"].append(name)
        if not spec.get("system"):
            gaps["missing_system"].append(name)
        if not metadata.get("description"):
            gaps["missing_description"].append(name)

    total = len(entities)
    print(f"\nCatalog Completeness Audit — {total} components\n")

    for gap_type, names in gaps.items():
        pct = len(names) / total * 100
        print(f"{gap_type}: {len(names)} entities ({pct:.1f}%)")
        for name in names[:5]:
            print(f"  - {name}")
        if len(names) > 5:
            print(f"  ... and {len(names) - 5} more")

audit_catalog_completeness()
</code></pre><p>This tells you exactly how much context debt you're sitting on. If 30% of your components are missing <code>spec.owner</code>, that's 30% of the queries an AI agent handles about ownership that will produce wrong or empty answers.</p><p><strong>Step 2: Fix the gaps on your top 10 most critical services first.</strong> Define "most critical" as the highest deploy frequency, most upstream dependents, or most incident-prone, whichever your team can quantify. These are the services an AI agent will be asked about most often. A complete catalog entry for <code>checkout-api</code> is worth more than partial entries for 50 internal tools nobody queries.</p><p><strong>Step 3: Write and test one function-tool definition.</strong> Take the JSON tool definition from Pattern C above, attach it to a <a href="https://console.anthropic.com/">Claude</a> or <a href="https://platform.openai.com/playground">OpenAI Playground</a> session, and ask a real question such as "Who is on call for <code>payments-service</code>?" or "What does <code>checkout-api</code> depend on?" If the catalog entry is complete, the answer will be correct. If the answer is wrong or empty, the output will point to the gap in the catalog data, such as a missing <code>spec.owner</code>, an empty <code>spec.dependsOn</code>, or a PagerDuty annotation that hasn’t been set.</p><p>The catalog completeness audit is the forcing function here. A RAG pipeline built on top of an incomplete catalog cannot reliably produce accurate answers. Getting the data right comes first. The context pipeline is the easy part.</p><h2>Frequently Asked Questions</h2><h3>What is context engineering for AI agents?</h3><p>Context engineering is the discipline of deciding what data populates a model's context window at inference time: what to retrieve, how to structure it, when to inject it, and how much to trim. Unlike prompt engineering, which focuses on how you phrase a request, context engineering controls what the model sees before generating a response. This means wiring structured IDP data, including ownership graphs, dependency maps, and deployment history, directly into agent queries.</p><h3>Why is an IDP better than Confluence for AI context?</h3><p>An Internal Developer Platform like Backstage stores engineering knowledge as structured, entity-keyed, machine-readable YAML. Every component has a canonical ID, typed relations (<code>spec.dependsOn</code>, <code>spec.owner</code>), and is updated continuously via SCM and CI integrations. Confluence pages go stale immediately, have no canonical entity IDs, and contain no typed relations. Structured IDP data produces grounded, accurate answers.</p><h3>Which pattern should I start with: RAG, MCP, or function tools?</h3><p>Pattern C (direct function-tool definitions) is usually the best place to start. It requires no new infrastructure, delivers real-time Catalog API data, and works with any LLM that supports function calling. You can ship a working proof of concept today. Graduate to Pattern B (MCP server) when your agentic workflows need multi-hop catalog traversal. Pattern A (RAG over catalog entities) is a better fit when you need low-latency, read-only Q&#x26;A at scale.</p><h2>Final Thoughts</h2><p>Every engineering org with a functioning IDP already has the context to make org-specific AI tasks tractable. Ownership graphs, dependency maps, deployment history, and incident annotations are already present as structured, entity-keyed data that updates continuously. Bridging the gap between "we have an IDP" and "our AI agent knows our system" is largely a matter of wiring.</p><p>The three patterns above provide a concrete path from the Backstage Catalog API to a grounded AI agent, whether you want to ship something quickly (Pattern C), build a scalable retrieval pipeline (Pattern A), or support multi-hop agentic workflows (Pattern B). The catalog completeness requirements define the data quality bar that makes any of these patterns reliable in production.</p><p>Your IDP's catalog is already the most accurate, continuously-updated map of your engineering org, making it a practical foundation for AI agents’ context infrastructure.</p><p>If you're running Backstage and want the Catalog API, entity sync, and AI Assistant RAG pipeline without the self-hosted maintenance overhead, Roadie's managed platform ships all three. <a href="https://roadie.io/request-demo">See how Roadie turns your IDP into a context engine</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Backstage and its Place Among Developer Portals: A Technical Architecture Guide]]></title><link>https://roadie.io/blog/backstage-and-its-place-among-developer-portals/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-and-its-place-among-developer-portals/</guid><pubDate>Tue, 17 Mar 2026 06:00:00 GMT</pubDate><description><![CDATA[Explore Backstage architecture, plugins, and the software catalog. Compare self-hosted Backstage, proprietary portals, and SaaS portals built on open standards.]]></description><content:encoded><![CDATA[<p>The Internal Developer Portal market has matured fast. But despite all the new vendors, most teams evaluating portals are still making the same underlying decision:
What architecture do we want to commit to for the next 5-10 years of developer experience?</p><p>Today, there are three distinct paths:</p><ol><li>Self-hosted Backstage</li><li>Proprietary portals</li><li>SaaS portal built on an open standard</li></ol><p>Backstage became the Kubernetes of developer portals, not because it's easy to run, but because it's the open standard that solves the portability problem. When you build on Backstage, you're choosing an ecosystem where your data model isn't locked to a single vendor. But this flexibility comes with real operational costs that many teams underestimate.</p><p>This guide examines Backstage's technical architecture, specifically its plugin system, entity model, and extension points, and explains where it fits in the modern platform engineering stack. You'll understand the trade-offs between framework flexibility and product convenience, and why SaaS portal built on an open standard represents an architectural evolution rather than just another hosting option.</p><h2>Deconstructing Backstage: How it Actually Works</h2><p>Backstage isn't a product you install. It's a collection of TypeScript libraries that you assemble into your own developer portal. This distinction matters because it changes how you think about customization, upgrades, and ownership.</p><h3>Assembly Required: Backstage as a Framework</h3><p>When you start with Backstage, you're creating a new Node.js application. You import Backstage packages as dependencies, configure them through code, and deploy the resulting application to your infrastructure. This is fundamentally different from signing into a SaaS portal where the vendor controls the codebase. Backstage is an extensible application framework that you build and tailor to your organization’s needs.</p><p>The core Backstage repository provides the foundational packages: <code>@backstage/core-components</code> for React UI elements, <code>@backstage/backend-defaults</code> for the new backend system, and <code>@backstage/catalog-model</code> for the type system. You compose these into your instance by modifying the <code>packages/app</code> and <code>packages/backend</code> directories in your Backstage monorepo.</p><p>This architecture gives you the ability to treat your portal as an internal platform product and something you can evolve deeply to match your organization. If you want to adjust navigation patterns, you can modify the React components directly. If your authentication model is complex, you can implement it in your backend package. If your organization has domain concepts that don’t fit predefined schemas, you can extend the entity model to reflect them.</p><p>The portal becomes an application you continuously evolve over time. Want to change how the sidebar renders? You edit the React components directly. Need custom authentication logic? You implement it in your backend package.</p><p>In proprietary portals, customization typically happens through predefined settings and extension points, which is faster to adopt but structurally more constrained.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/js75xxIIvg0yYNv7yMqlP/b9228c9007b58af1fdc7ba7c4c218270/mermaid-diagram-2026-03-09T14-51-40.png" alt="Your Backstage Instance"></p><h3>The Plugin Architecture: Backstage’s Extensibility Model</h3><p>In Backstage, even core capabilities like the catalog are implemented as plugins. This isn't just organizational; it's the technical mechanism that makes Backstage an ecosystem rather than a single product.</p><p><a href="https://backstage.io/docs/architecture-decisions/adrs-adr011/">Plugins are NPM packages</a> that implement specific interfaces. Frontend plugins use <code>createFrontendPlugin</code> from <code>@backstage/frontend-plugin-api</code> and contribute UI through extension points such as pages and entity components (for example via blueprints like PageBlueprint or EntityCardBlueprint). Backend plugins use <code>createBackendPlugin</code> from <code>@backstage/backend-plugin-api</code> with a microservice architecture where plugins operate in complete isolation, communicating only through network calls.</p><p>The plugin system is implemented through Backstage's extension API. Installing a plugin is a build-time composition step: you import the NPM package, register it in the app, and Backstage wires it into the portal through defined extension points. Backstage core provides shared platform services like routing, authentication context, configuration, and consistent UI primitives, while plugins focus on domain-specific integration logic.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/1Rg5Oc6vR8qanty9cKwhj2/12aee4b3d624026aed2e12d1df073d36/mermaid-diagram-2026-03-09T14-53-43.png" alt="The Plugin Architecture"></p><p>For example, the <a href="https://roadie.io/backstage/plugins/kubernetes/">Kubernetes plugin</a> provides a frontend component that displays cluster resources witha backend integration that queries Kubernetes APIs. The plugin reuses shared concerns like authentication context, configured proxy settings, and secrets handling rather than re-implementing them.</p><p>This plugin-first architecture is what enables the Backstage ecosystem to scale, with over <a href="https://roadie.io/backstage/plugins/">250 community plugins</a>. Instead of requiring changes to core, integrations can be delivered as modular plugins - connecting to systems like PagerDuty, Datadog, or your CI/CD system - while remaining aligned to a shared extension model.</p><h3>The Entity Model: The System of Record</h3><p>Backstage's other foundational concept isthe <a href="https://backstage.io/docs/features/software-catalog/">Software Catalog</a>. At its core, this catalog is a graph of entities that represent your software ecosystem. Entities are commonly defined in <code>catalog-info.yaml</code> files that live alongside your source code.</p><p><a href="https://backstage.io/docs/features/software-catalog/descriptor-format/">An entity can represent</a>: a service, a library, a team, an API, or a custom resource type you define. Each entity has a kind (Component, API, Resource, System, Domain, Group, User, Template, or Location), metadata (name, description, annotations), and relationships to other entities.</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  description: Handles payment processing
  annotations:
    github.com/project-slug: company/payment-service
spec:
  type: service
  lifecycle: production
  owner: payments-team
  system: checkout
  providesApis:
    - payment-api
  consumesApis:
    - fraud-detection-api
</code></pre><p>The model is declarative and versioned. Your entity definitions live in git, versioned alongside your code. Over time, the catalog becomes a structured system of record for: ownership, lifecycle state, dependencies, API relationships and the context required to power templates, scorecards, and automation.</p><p>The catalog backend continuously processes these YAML files through entity providers, plugins that discover and ingest entities from different sources. For example, the GitHub provider scans your repositories for <code>catalog-info.yaml</code> files and the Kubernetes provider can generate entities from cluster resources. This extensibility means you control what gets cataloged and how.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/3GTICDRuiCpT8XXOyEgBPI/6cca977e9dd007fc46622bc4a8fe1835/mermaid-diagram-2026-03-09T15-02-00.png" alt="The Entity Model"></p><p>The result is that Backstage is more than a developer portal UI. It is a structured model of your engineering organization, which the portal then uses as context for workflows, integrations, and governance.</p><h2>The Landscape: Three Ways to Adopt a Developer Portal</h2><p>Once you understand Backstage’s architecture, the market becomes easier to reason about.</p><p>Organizations evaluating developer portals are typically choosing between three distinct models:</p><ol><li>Self-hosted Backstage (framework ownership)</li><li>Proprietary SaaS portals (closed product ownership)</li><li>SaaS portals built on the Backstage standard (productized standard)</li></ol><p>The differences are not primarily about features. They’re about ownership, extensibility, and where your system of record lives.</p><h3>Self-Hosted Backstage: You Own the Framework</h3><p>In this model, you adopt the Backstage standard directly and operate it yourself. You control the React frontend, the Node backend, the plugin surface area, the entity model and schema evolution as well as the upgrade and migration lifecycle.</p><p>This maximizes architectural control, but you also comes with responsibility for maintaining a rapidly evolving framework. The portal becomes an internally operated software product. For organizations with strong platform engineering capacity, this can be a deliberate and strategic choice.</p><h3>Proprietary Portals: The Vendor Owns the Model</h3><p>In a closed SaaS model, the vendor owns the application, the schema and the extension model.</p><p>These tools are typically optimized for rapid setup. You connect your Source Control and your CI system, and a catalog is generated. You configure scorecards and workflows through predefined interfaces.
This speed comes from removing choices. However, the catalog and workflow layer usually live inside a vendor-defined schema. Even when APIs are available, the data model is product-defined. Over time, your service metadata and automation logic become coupled to that model.</p><p>Extensibility typically happens through predefined extension points: custom fields, webhooks, workflow builders, or API integrations. That can be sufficient for many teams. But if your organization’s domain model diverges significantly from the product’s assumptions, deeper customization often requires vendor support or roadmap alignment.</p><p>The trade-off is clear: faster time-to-value, but less control over how the system evolves.</p><h3>SaaS portal built on an open standard: Productizing the Architecture</h3><p>In this hybrid approach, the vendor owns the application, operational lifecycle (upgrades, migrations, infrastructure), and schema evolution. However, the architecture remains aligned to Backstage principles and ecosystem patterns.</p><p>The key distinction is extensibility. Rather than limiting integrations to a fixed set of vendor-defined connectors, the system is built around a plugin-oriented model. Integrations are modular, scoped, and composable. Teams can extend the portal to integrate internal systems, domain-specific tooling, or custom workflows without modifying core application code.</p><p>The difference from closed SaaS portals is architectural. Instead of limiting extensibility to configuration, webhooks, or predefined workflow builders, this model treats extensibility as a first-class concern. New capabilities can be introduced as modular integrations, rather than requiring changes to the core product.</p><p>This approach exists because many organizations want the architectural advantages of the Backstage standard like modular integrations, catalog-driven context, and workflow extensibility, without taking on the responsibility of operating and upgrading a framework deployment themselves.</p><h3>The Data Model Trade-off</h3><p>Over time, a developer portal becomes less about UI and more about context: ownership, dependencies, lifecycle state, maturity standards, and the metadata needed to power workflows.
Where that context lives, and who controls how it evolves, has long-term implications.</p><p>In a self-hosted Backstage deployment, the context model is defined in code and declarative descriptors (often versioned in Git). The organization controls how the model evolves and how relationships are represented.</p><p>In a closed SaaS portal, the context model typically lives inside a vendor-managed database with a product-defined schema. Even when APIs exist, the shape of the model and its evolution are governed by the vendor.</p><p>In a SaaS portal built on open standard, the vendor owns the application and the evolution of the model, but the portal is still built around catalog-driven context and modular extensibility rather than a fixed set of vendor-only connectors.</p><p>The trade-off isn’t theoretical. Once ownership, automation, and governance are encoded into a portal, migrating between models is rarely just a data export. It involves re-mapping relationships, workflows, and assumptions embedded in how that context is represented.</p><h2>The Operational Layer: Running Backstage Is Real Engineering Work</h2><p>Backstage’s power comes from the fact that it’s an open source framework, but that also means you are running a real software system.
A self-hosted Backstage deployment is not a static tool. It’s:</p><ul><li>A Node.js backend with multiple plugins and integrations</li><li>A React frontend composed from independently versioned packages</li><li>A PostgreSQL database backing the catalog and metadata</li><li>Authentication, authorization, and secrets management and the lifecycle of ongoing dependency and framework upgrades</li></ul><p>This is infrastructure-level responsibility.</p><p><a href="https://backstage.io/docs/overview/versioning-policy/">Backstage releases happen monthly</a>, on the Tuesday before the third Wednesday of each month. Individual packages follow semantic versioning, but Backstage releases explicitly don't adhere to semver. Breaking changes are documented with <code>**BREAKING**:</code> prefix in changelogs - but the practical reality is that staying current requires consistent engineering attention.</p><p>The TypeScript tax is real. Plugins evolve quickly, and compatibility issues do happen: plugin updates can introduce breaking changes, and deep customizations can conflict with upstream changes. Running Backstage well requires engineers who understand the Backstage framework architecture, can debug the plugin composition model, and can keep upgrades from turning into periodic migrations.</p><p>Security is your responsibility. You configure authentication, implement <a href="https://roadie.io/product/access-control/">authorization rules</a>, and manage secrets. <a href="https://backstage.io/docs/permissions/overview/">RBAC in Backstage</a> requires implementing permission policies as code through the permission framework, which means you need someone who actually understands the permission model to secure your instance correctly as the portal grows</p><p>Operationa complexity compounds at scale. You need to monitor the backend, tune database queries, and keep frontend bundle sizes from drifting. When the portal is down, you handle the incident response. And because the portal becomes a daily surface for developer workflows, reliability expectations tend to climb quickly.</p><p>Many teams understaff Backstage operations. They assume that because it's "just a developer portal," it doesn't need dedicated ownership. Then they struggle with upgrades, accumulate technical debt, and lose confidence in the platform.</p><p>The complexity isn’t a flaw, it's the cost of adopting a highly extensible standard. When ownership is vague, upgrades get deferred, incompatibilities pile up, and confidence erodes. When it’s staffed like a core platform service, it behaves like one.
That pattern shows up in <a href="https://roadie.io/blog/the-2025-state-of-backstage-report/">Backstage’s own community reporting too</a>: organizations with larger, dedicated portal teams consistently report modestly higher satisfaction.</p><h2>Roadie: SaaS Delivery Without Closed-System Tradeoffs</h2><p>Backstage established the architectural standard for developer portals: a plugin-first model, a structured software catalog, and an ecosystem of integrations. But consuming a standard doesn’t require operating the framework directly.</p><p><a href="https://roadie.io/">Roadie is a SaaS developer portal</a> built on the open standard and inspired by its ecosystem. It enables the architectural advantages that made Backstage the default framework like a plugin-based architecture, a catalog-driven context and an ecosystem integration model, then elevates them into a product designed for long-term context stability, governance, and scale.</p><p>This matters because most organizations don’t want to become framework maintainers, but they also don’t want their portal to become a closed system with a vendor-owned schema. Roadie offers this alternative: the architectural upside of the standard, with a SaaS product delivery.</p><h3>Architecture: SaaS Built on the Open Standard (Not a Hosted Distribution)</h3><p>Roadie is a SaaS developer portal built on the open standard, but it isn’t simply a hosted version of Backstage.</p><p>Roadie delivers a SaaS portal experience with hosting, databases, monitoring, security controls, and upgrades, while retaining the Backstage ecosystem approach that made extensibility and integration portability possible in the first place. Roadie can absorb ecosystem change, improve workflow UX, and provide enterprise operational guarantees, <a href="https://roadie.io/blog/our-first-12-month-soc2-type-2-report/">including SOC 2 Type 2</a>, without requiring customers to become framework maintainers.</p><h3>What Roadie Optimizes (and Why It’s a Distinct Third Model)</h3><p>Roadie is designed for teams who want the architectural upside of the Backstage standard without turning the portal itself into an internally operated framework.</p><p>From the Backstage ecosystem, Roadie inherits the most valuable properties: an extensible integration model and a large body of <a href="https://roadie.io/backstage/plugins/">community and partner plugins</a>. Teams can adopt integrations incrementally and extend the portal over time, instead of being limited to a fixed set of vendor-owned connectors.</p><p>Importantly, extensibility isn’t restricted to pre-approved marketplace integrations. Teams can <a href="https://roadie.io/docs/custom-plugins/overview/">build their own plugins and integrations</a> using the same plugin-oriented model to connect internal systems, domain-specific tooling, and organization-specific workflows. Extensibility also applies to automation: platform teams can also extend self-service workflows through Scaffolder templates, including custom actions that integrate directly with internal tooling and automation systems. This means the portal can evolve alongside your architecture and operating model, rather than being constrained by a vendor-defined capabilities.</p><p>Roadie productizes areas where a framework deployment typically requires code-first implementation. For example, access control and governance can be configured through product workflows rather than requiring teams to implement RBAC policy logic directly in code. Secrets and integration configuration can be managed through dedicated interfaces instead of being pushed into environment variables and bespoke deployment pipelines.</p><p>From an operational perspective, Roadie owns the lifecycle that comes with self-hosting Backstage: framework upgrades, compatibility testing, security patching, infrastructure operations, and production monitoring. In a self-hosted model, each of these becomes ongoing internal engineering work - work that competes directly with building new developer workflows, improving golden paths, or investing in higher-leverage platform initiatives.</p><p>This goes beyond “hosting”. Roadie tracks changes across the Backstage ecosystem, including core framework releases and plugin-level updates, which are evaluated and incorporated into the Roadie product lifecycle. Platform teams don’t need to plan and execute backend migrations, reconcile plugin API changes, or coordinate dependency upgrades across their portal deployment as the ecosystem evolves.</p><p>In practical terms: self-hosted Backstage makes your team responsible for maintaining the framework itself. Roadie shifts that responsibility boundary: your team focuses on extending and using the portal, not operating it.</p><p>The result is a SaaS-delivered portal that remains aligned with Backstage’s extensibility model and ecosystem approach, without requiring customers to maintain the framework itself.</p><h3>Technical Detail: Addressing Backend System Migration and Upgrade Burden</h3><p>Backstage's architecture <a href="https://backstage.io/docs/backend-system/building-backends/migrating/">evolved significantly with the introduction of the new backend system</a>, which reached stable 1.0 in 2024. For self-hosted teams, this wasn’t a minor version bump. It was a migration-class change that touched plugin wiring, backend dependencies, and integration compatibility.</p><p>Plugin registration changed from imperative setup to declarative <code>backend.add()</code> calls. The legacy <code>@backstage/backend-common</code> package was deprecated in favor of <code>@backstage/backend-defaults</code>. Service-to-service communication patterns shifted to dependency injection. Configuration structure changed. In practice, this meant reviewing custom plugins, updating imports, reconciling breaking changes, and validating that integrations continued to function correctly.</p><p>For teams running Backstage themselves, migrations like this become internal engineering projects. They require planning, testing, staged rollouts, and often temporary freezes on other platform work. This aligns with findings from the <a href="https://roadie.io/blog/the-2025-state-of-backstage-report/">2025 State of Backstage report:</a> "Teams running Backstage themselves describe a very different day to day reality compared to those using managed platforms, especially around stability and upgrades."</p><p>Roadie evaluates changes in the core Backstage framework and plugin ecosystem, tests them against supported integrations and configurations, and implements the necessary adjustments within the platform itself.</p><p>In this case, Roadie absorbed this migration into the product lifecycle. Changes in the core Backstage framework and plugin ecosystem were evaluated, tested against supported integrations and real customer plugin combinations, and the necessary compatibility work was implemented within the platform itself. Customers did not need to refactor backend code, reconcile plugin API changes, or coordinate dependency upgrades across their portal deployment. Upgraded instances were delivered without requiring each team to manage the migration work.</p><p>This approach scales. As Backstage evolves, SaaS providers can absorb the upgrade complexity without customers modifying their configurations or code.</p><h2>Conclusion: Choosing Your Architecture</h2><p>The choice between frameworks and products isn't about features. It's about ownership and long-term flexibility.</p><p>Proprietary portals offer speed: you get a working portal quickly andavoid operational complexity.But that speed usually comes from a vendor-owned model: the schema, workflow layer, and extensibility boundaries are defined by the product. If your organization fits those assumptions, this can be a great trade.</p><p>Self-hosted Backstage offers portability and control: your data model isn't locked to a vendor and you can integrate with any tool through the plugin system. But you also take on the full cost of ownership: upgrades, migrations, security, scaling, plugin compatibility, and the ongoing work of maintaining a rapidly changing framework.The operational burden is real. For many teams, it becomes the limiting factor. Not because Backstage is flawed, but because running it well requires sustained platform engineering investment.</p><p>Roadie represents a third model: a SaaS portal experience aligned with Backstage’s architectural principles, without requiring customers to operate the framework.
Roadie carries forward the architectural lineage of Backstage, including plugin-based extensibility, ecosystem familiarity, a catalog-driven foundation, and workflow flexibility, while taking responsibility for upgrades, migrations, infrastructure, and long-term lifecycle management. It’s a different responsibility boundary: teams keep the architectural upside of the standard, while offloading the operational and upgrade burden that makes self-hosting expensive over time.</p><p>The portal you choose becomes part of your platform architecture, and the decisions you make about extensibility, portability, and ownership will shape your strategy for years. Choose a model aligned with how you want your engineering organization to operate.</p>
]]></content:encoded></item><item><title><![CDATA[Context Engineering: The Missing Discipline in AI-Assisted Development]]></title><link>https://roadie.io/blog/context-engineering-ai-development/</link><guid isPermaLink="false">https://roadie.io/blog/context-engineering-ai-development/</guid><pubDate>Thu, 12 Mar 2026 10:00:00 GMT</pubDate><description><![CDATA[What is context engineering? Learn how this emerging discipline improves AI-assisted development by grounding models in company-specific context and data.]]></description><content:encoded><![CDATA[<p>A typical AI coding assistant is trained on publicly available GitHub repositories, RFCs, and Stack Overflow answers, making it perfectly capable of handling difficult coding implementations. However, what it lacks is context. The assistant doesn't know who owns the <code>payment-service</code> or which S3 bucket naming convention your security team mandated last year.</p><p>The industry spent the last two years obsessed with prompt engineering. Teams refined instructions, added chain-of-thought reasoning, and built elaborate system prompts. None of that addresses the real issue: your AI doesn't know your company. It knows the world, but it doesn't know <em>your</em> world. Thankfully, context engineering can help, and it's going to define the next wave of platform engineering work.</p><h2>What Is Context Engineering</h2><p>The working definition of context engineering is the systematic practice of curating, structuring, and retrieving information to ground AI models in specific domain knowledge.</p><p>The roots of context engineering go back to context-aware computing from the early 1990s, when researchers like Bill Schilit started building systems that could adapt their behavior based on location, user, and environment. The insight was the same then as now: the value of any intelligent system scales with the quality of context it can access. RAG architectures, popularized by the 2020 Lewis et al. paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," brought this concept into the LLM era. But RAG is just the retrieval mechanism. Context engineering is the discipline of deciding what to retrieve, how to structure it, and which sources to trust.</p><p>This difference matters because the industry is moving from model-centric thinking ("we need a better model") to context-centric thinking ("we need better data retrieval"). GPT-4 Turbo and Claude 3.5 Sonnet are good enough for most coding tasks. The bottleneck has shifted from model intelligence to grounding.</p><h2>Context Engineering vs. Prompt Engineering</h2><p>For the past two years, teams have focused on prompt engineering, refining instructions so models behave correctly. That works when the problem is ambiguity or reasoning quality. It fails when the model simply lacks the right information.</p><p>No prompt can reliably answer:</p><p>Who owns <code>checkout-service</code>?</p><p>What lifecycle state is <code>legacy-auth-api</code> in?</p><p>What security constraints apply to <code>payment-service</code>?</p><p>If the data isn't accessible through a retrieval layer, the model will infer, and sometimes hallucinate.</p><p>Prompt engineering optimizes how you ask. Context engineering optimizes what the model knows.</p><h2>The Three Layers of Context</h2><p>When your developers use GitHub Copilot or Cursor, those tools are solving a layered context problem, and they're only solving part of it.</p><p><strong>Layer 1 is local context:</strong> the file you have open, the function you're editing, the variables in scope. This is what language servers (LSP) have always done, and what Copilot does well. The model sees your current cursor position and the surrounding tokens. For greenfield code written against public libraries, this is often enough.</p><p><strong>Layer 2 is repository context:</strong> the patterns, structures, and dependencies in the current codebase. Cursor's codebase indexing handles much of this, using vector embeddings to make local code semantically searchable. You can ask "how does the authentication middleware work?" and get a coherent answer drawn from files across the repo.</p><p><strong>Layer 3 is organizational context,</strong> and this is where every current AI coding tool struggles. Organizational context is the knowledge that lives outside any single repository:</p><ul><li>Who owns <code>payment-service</code> and what's its current lifecycle status?</li><li>What's the approved pattern for creating a new S3 bucket with FIPS-compliant encryption?</li><li>Where's the API spec for the internal event bus, and what events does <code>order-service</code> emit?</li><li>What's the SLA tier for <code>fraud-detection-service</code> and what are its production constraints?</li></ul><p>No amount of clever prompting retrieves this information. It doesn't exist in any repo. It lives in your organization's institutional knowledge, and without a structured home, it's functionally invisible to any AI system.</p><p>The most natural system to solve Layer 3 is the Internal Developer Portal (IDP).
<img src="//images.ctfassets.net/hcqpbvoqhwhm/5mmneVEKk7YyDTCVl26zBJ/12523a9d7a173e350cc87cbacf6f3ef5/mermaid-diagram-2026-03-12T12-35-16.png" alt="Internal Developer Portal (IDP)"></p><h2>Why the IDP Is the Natural Context Engine</h2><p>Your IDP already contains the two things an AI needs most: metadata and semantics.</p><p>Metadata lives in <code>catalog-info.yaml</code>, the structured backbone of every Backstage service entry. Each record carries owner, tier, lifecycle, dependencies, and tags in a machine-readable, version-controlled format. That's exactly the information an AI needs to answer "what are the production constraints on <code>payment-service</code>?" without guessing. The answer isn't buried in a README or a Slack thread. It's in a schema-enforced file your platform team already maintains as part of normal operations.</p><p>Semantics live in TechDocs: your architectural decision records, how-to guides, runbooks, and onboarding documentation. When a developer asks Cursor, "What's the right approach for adding distributed tracing to a new Go service?", the correct answer exists in your TechDocs, not on Stack Overflow. TechDocs is already your single source of truth for <em>how things work here</em>. It needs to be your AI's source of truth too.</p><p>Spotify proved this architecture works at scale. Their internal "AiKA" (AI Knowledge Assistant) and "Honk" background coding agents both integrate deeply with Backstage's catalog and metadata graph. Spotify didn't need to build a separate AI knowledge layer; the IDP was already taking care of that. The IDP's role as a context engine follows directly from its architecture: it's the only system in your organization that maintains structured, governed, and continuously updated metadata about every service you run.</p><p>Because Roadie is managed Backstage, this architectural pattern is available to you without the overhead of building and maintaining the context infrastructure yourself. The Spotify engineering team spent significant effort wiring Backstage into AiKA and Honk. Roadie ships that foundation.</p><h2>Connecting AI Tools via the Model Context Protocol</h2><p>Knowing that the IDP holds the right data is one thing; getting AI tools to query it reliably is another. That's where the Model Context Protocol (MCP) comes in.</p><p>MCP is USB-C for AI. Before USB-C, every device had its own connector. You needed a drawer full of adapters for every combination. Before MCP, every AI tool had its own proprietary method for connecting to external data sources, if it connected at all. MCP is the open standard that lets any compliant AI client (Cursor, Claude Desktop, a custom agentic pipeline) connect to any compliant MCP Server and query its data through a consistent interface.</p><p>Roadie acts as an MCP Server, exposing your Backstage catalog and TechDocs as queryable endpoints. Here's what that looks like in practice:</p><ol><li>A developer opens Cursor and asks: "Generate a Terraform config for <code>payment-service</code> that meets its production SLA requirements."</li><li>Cursor's MCP client queries Roadie's MCP server for the <code>payment-service</code> catalog entry.</li><li>Roadie returns structured metadata: tier-1 service, multi-region deployment, PCI-compliant, owner: <code>group:payments-team</code>, depends on <code>fraud-detection-service</code> and <code>event-bus</code>.</li><li>Cursor incorporates this context into the generation, auto-applying the correct instance types, encryption configuration, and cross-region failover settings.</li></ol><p>Without MCP and a populated catalog, the developer gets a generic Terraform template that needs manual adjustment and a second round-trip to whoever actually knows the service requirements. With it, the AI generates something correct for <em>this</em> service, at <em>this</em> organization. That's the difference between an AI assistant and an AI that actually knows where it works.</p><h2>The Problem with "Just Dump It in a Vector DB"</h2><p>Here's an approach I've seen teams try when they get serious about AI grounding: export everything (Confluence pages, Jira tickets, Slack threads, internal wikis), embed it all in Pinecone or Weaviate, and call it a "context lake."</p><p>This doesn't work. Vector databases are great tools. The issue is that unstructured, uncurated data produces what I'd call "context poisoning".</p><p>Your Confluence instance has 2,000 pages. Maybe 300 are accurate. Another 400 are outdated, written before the infrastructure migration, the reorg, or the service's deprecation. The remaining 1,300 are drafts, duplicates, or meeting notes that were never meant to be authoritative. When your AI retrieves from this pool, it has no way to distinguish the canonical architectural decision record from the three-year-old wiki page that contradicts it. The model doesn't know that <code>legacy-payment-gateway-setup.md</code> was superseded by a new doc eighteen months ago. So it pulls from both, fuses them, and produces output that's partly right and partly dangerously wrong.</p><p>This is the risk with some proprietary "context lake" approaches: you can't engineer good context from bad source data, no matter how sophisticated your retrieval layer is. And Cortex's scorecard-based "AI readiness checks" are a step forward, enforcing that services have owners and documentation, but scorecards tell you <em>whether</em> context exists, not whether it's accurate or semantically coherent.</p><p>The <code>catalog-info.yaml</code> schema enforces structure at the source. Every service entry has a defined owner (not "the payments team" but <code>group:payments</code>), a machine-readable lifecycle status (<code>production</code>, <code>deprecated</code>, <code>experimental</code>), and explicit dependency links. A deprecated service can't masquerade as current because its lifecycle field says otherwise. An unmaintained entry gets surfaced by catalog health checks before it becomes a source of bad context. The data is searchable and <em>governed</em>.</p><p>That said, this does require your catalog to be accurate. A Backstage instance with stale <code>catalog-info.yaml</code> files is its own form of context poisoning. The discipline of context engineering starts with the discipline of catalog maintenance.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/2XGFeMvHKUhWPXUNaOmk0U/14849540d2b6d3a20b66dabe74a0835d/mermaid-diagram-2026-03-12T12-37-06.png" alt="Structured vs. Unstructured Context in AI Systems"></p><h2>Final Thoughts: Building the Map for Your Digital Workforce</h2><p>The agentic AI era is arriving faster than most platform teams are ready for. Chatbot-style AI assistants are forgiving. A slightly wrong answer gets ignored and corrected. Agents that take action aren't forgiving. An agent that auto-generates a pull request, provisions cloud infrastructure, or routes a production incident based on service ownership data needs that data to be correct. The cost of a context error scales directly with the autonomy of the system making decisions from it.</p><p>The teams preparing well are focusing on reliable organizational context, not model sophistication or GPU budgets. The catalog entries you keep accurate, the TechDocs you maintain, and the ownership mappings you get right form your AI workforce's institutional memory. That's what your AI tools actually run on.</p><p>Context engineering is platform engineering's next frontier. The good news is that if you've been running Backstage, you're further along than you think. You have the schema, the governance model, and the data. You just need to connect it.</p><p>Your AI needs a brain. Start building your organization's memory today with Roadie's managed Backstage platform. <a href="https://roadie.io/request-demo/">Book a demo</a> to see how your Service Catalog can become your context engine.</p>
]]></content:encoded></item><item><title><![CDATA[The Context Engineering Glossary for Platform Engineers]]></title><link>https://roadie.io/blog/context-engineering-glossary/</link><guid isPermaLink="false">https://roadie.io/blog/context-engineering-glossary/</guid><pubDate>Thu, 05 Mar 2026 10:00:00 GMT</pubDate><description><![CDATA[A practical glossary of context engineering terms for platform engineers, explaining RAG, context drift, system prompts, and more using real platform examples.]]></description><content:encoded><![CDATA[<p>Your team just wired an LLM into your Internal Developer Portal. The architecture review kicks off and someone asks whether you're doing RAG or agentic retrieval. Someone else flags context drift as a risk. A third person raises privilege leakage in the system prompt. You nod along, but the vocabulary is moving faster than the documentation.</p><p>This glossary defines every key term in the context engineering stack through the specific lens of platform engineering — Service Catalogs, TechDocs, golden paths, and on-call data — not abstract data science. Bookmark it, share it with your team, and use it as a reference before your next architecture decision.</p><p>This glossary focuses on context supply, not model training, fine-tuning, or prompt copywriting. Context engineering does not make models smarter — it determines what the model is allowed to know, when, and why. If an answer is wrong, the first place to look is rarely the model; it’s the data pipeline feeding it.</p><hr><h2>Section 1: Context Fundamentals</h2><h3>What Is Context Engineering?</h3><p>Context engineering is the practice of curating, structuring, and retrieving the right infrastructure data so that an LLM can answer domain-specific questions accurately. The word choice matters: it's <em>engineering</em>, not prompting. Where prompt engineering focuses on the wording of individual queries, context engineering focuses on the entire information supply chain that feeds the model before it generates a word.</p><p>For platform teams, context engineering means deciding which fields from your <code>catalog-info.yaml</code> get indexed, how your TechDocs chunks get sized and tagged, and what real-time operational signals get injected at query time. A developer asking "Is the payments service production-ready?" gets a useful answer only if the lifecycle field from the Service Catalog was curated, indexed, and retrieved correctly. The LLM itself contributes maybe 20% of that answer's quality; context engineering accounts for the rest.</p><p>An LLM is a powerful reasoning engine with no institutional memory. Context engineering is how you give it one. In other words, context engineering is not about improving reasoning quality — it’s about constraining the information surface the model can reason over.</p><h3>What Is a Context Window?</h3><p>The context window is the total number of tokens an LLM can process in a single request, covering the system prompt, retrieved documents, conversation history, and the generated response combined. <a href="https://platform.openai.com/docs/models/gpt-4o">GPT-4o supports up to 128,000 tokens</a>, and <a href="https://deepmind.google/technologies/gemini/pro/">Gemini 3.5 Pro pushes to 1 million tokens</a>. These numbers sound large until you picture a Service Catalog with 800 registered components, each with full metadata and linked TechDocs pages.</p><p>Stuffing everything into the context window is not a strategy. Irrelevant data degrades output quality, increases cost per query (models like <a href="https://www.anthropic.com/pricing">Claude charge per input token</a>), and slows response time. The engineering discipline is in <em>selecting</em> the right 2,000 tokens out of 2,000,000 available, pulling only the service metadata relevant to the specific query, not the entire catalog.</p><p>Efficient context selection is where retrieval architecture pays for itself.</p><h3>What Is Grounding in LLMs?</h3><p>Grounding anchors an LLM's response in verified, authoritative data sources rather than the model's pre-trained weights. Without grounding, a model answering "Who is the on-call engineer for the checkout service?" will either hallucinate a plausible name or admit it doesn't know. With grounding, the response comes from the live <a href="https://www.pagerduty.com/">PagerDuty</a> schedule injected at query time.</p><p>In a platform engineering context, your Service Catalog is the primary grounding layer. When every answer the AI gives traces back to a specific entity in the catalog, with a citable source, you've achieved grounded output. Ungrounded AI assistants erode trust fast: one invented service name or wrong runbook link and developers stop using the tool. RAG is an architectural mechanism; grounding is the result. You can implement RAG without achieving grounding if the retrieved data isn’t authoritative or current.</p><hr><h2>Section 2: Architecture and Retrieval Terms</h2><h3>RAG (Retrieval-Augmented Generation)</h3><p><a href="https://arxiv.org/abs/2005.11401">Retrieval-Augmented Generation (RAG)</a> is the architectural pattern where a system retrieves relevant documents from an external knowledge base before passing them to an LLM for response generation. The model doesn't rely on what it learned during training; it reads what you give it at runtime.</p><p>The flow for an IDP-backed assistant looks like this:
<img src="//images.ctfassets.net/hcqpbvoqhwhm/4BJtdLuaiZZKhUfUqZW3rQ/0643d73e9093e315afcd977405a45ff2/mermaid-diagram-2026-03-05T11-32-41.png" alt="IDP-backed assistant"></p><p>A developer asks: "How do I rotate credentials for the auth service?" The system encodes that query, searches TechDocs for credential rotation guides tagged to the auth service, pulls the service owner from the catalog, and injects both into the prompt. The LLM generates a specific, sourced answer, not a generic "here's how credential rotation works" response scraped from its training data.</p><p>RAG is the foundational pattern for any AI assistant built on top of an IDP. Every other term in this glossary relates to how well your RAG implementation performs.</p><h3>What Are Vector Embeddings?</h3><p>A vector embedding is a numerical representation of text, typically a list of 768 to 3,072 floating-point numbers, that captures semantic meaning rather than just the words themselves. Two sentences that mean the same thing will have similar embeddings even if they share no words. "Service is deprecated" and "component has reached end-of-life" end up close together in embedding space; "deploy to production" and "YAML syntax error" end up far apart.</p><p>To build RAG for your IDP, every TechDocs page, every catalog entity description, and every relevant metadata field needs to be converted into an embedding and stored. When a developer submits a query, the query also gets embedded, and the system retrieves the stored documents whose embeddings are most similar. That's semantic search.</p><p>Generating and managing these embeddings is non-trivial. You need to pick an embedding model (<a href="https://platform.openai.com/docs/guides/embeddings">OpenAI's <code>text-embedding-3-large</code></a> or a self-hosted <a href="https://www.sbert.net/">Sentence Transformers</a> variant), decide chunk sizes, handle incremental updates when docs change, and keep embeddings in sync with the underlying catalog. Roadie handles this entire pipeline automatically for TechDocs on your managed Backstage instance. You don't maintain a separate embedding job or manage model versions.</p><h3>What Is a Vector Database?</h3><p>A vector database is a storage engine purpose-built for indexing and querying high-dimensional embedding vectors. It provides <a href="https://en.wikipedia.org/wiki/Nearest_neighbor_search">Approximate Nearest Neighbor (ANN) search</a> at scale, which means it can find the 10 most semantically similar chunks from a corpus of 500,000 embeddings in under 100 milliseconds. Standard relational databases like PostgreSQL can store vectors (via <a href="https://github.com/pgvector/pgvector"><code>pgvector</code></a>), but dedicated systems like <a href="https://www.pinecone.io/">Pinecone</a>, <a href="https://weaviate.io/">Weaviate</a>, and <a href="https://qdrant.tech/">Qdrant</a> are optimized for this workload.</p><p>For platform teams evaluating AI tooling, the vector database is an infrastructure dependency that often gets underestimated. It requires provisioning, access control, index tuning, and synchronization with your source catalog. When Roadie embeds your TechDocs, the vector storage layer is managed within the platform. You're not standing up a Qdrant cluster alongside your Backstage deployment.</p><h3>What Is Semantic Search?</h3><p>Semantic search finds content based on meaning and intent, not keyword overlap. In an IDP context, it's the difference between a developer searching for "payment processor" and finding the <code>checkout-service</code>, <code>billing-api</code>, and <code>stripe-gateway</code> components — even though none of them are literally named "payment processor" — versus a keyword search that returns zero results because the exact string doesn't match any component name.</p><p>This matters especially for large catalogs and for developers who are new to the codebase. They don't know the internal naming conventions. They describe what they're looking for in plain English. Semantic search over vector embeddings bridges the delta between how developers think and how services are named.</p><p>On its own, semantic search is insufficient for an AI assistant — it retrieves candidates, but the Service Catalog determines which of those candidates are valid, owned, and safe to surface.</p><hr><h2>Section 3: Platform Data Types (The Context Sources)</h2><h3>Service Catalog Context</h3><p>Service Catalog context is the structured metadata that lives in your <code>catalog-info.yaml</code> files and gets surfaced through the <a href="https://backstage.io/docs/features/software-catalog/">Backstage catalog API</a>. Fields like <code>owner</code>, <code>lifecycle</code>, <code>tier</code>, <code>tags</code>, <code>system</code>, and <code>dependsOn</code> are machine-readable facts that give an LLM the authority to answer structural questions.</p><p>"Who owns the recommendations engine?" gets answered from the <code>owner</code> field. "Is this service production-ready?" gets answered from the <code>lifecycle: production</code> tag. "What services would be affected if the user-profile API went down?" gets answered from dependency relationships in the catalog graph. This data is already structured, already maintained (or should be), and it's the highest-signal context source you have. Poor catalog hygiene directly degrades AI output quality.</p><h3>TechDocs Context</h3><p>TechDocs context is unstructured markdown documentation that lives alongside your service code and gets rendered in <a href="https://backstage.io/docs/features/techdocs/">Backstage TechDocs</a>. It answers the "how" questions that structured catalog metadata can't: how to run the service locally, how to interpret a specific error code, how to onboard to the payments team's workflow.</p><p>When ingested into a RAG system, TechDocs pages get chunked (typically into 512-token segments with overlap), embedded, and indexed against their source entity. A developer asking "What does a 503 from the auth service usually mean?" should retrieve the relevant troubleshooting section from the auth service's TechDocs, not a generic HTTP guide. The specificity of the retrieval depends entirely on how well TechDocs are written and tagged. Vague documentation produces vague answers.</p><h3>Operational Context</h3><p>Operational context is real-time data injected at query time rather than pre-indexed into a vector database. It includes current PagerDuty on-call schedules, <a href="https://kubernetes.io/">Kubernetes</a> pod health and restart counts, recent <a href="https://argo-cd.readthedocs.io/en/stable/">Argo CD</a> deployment status, open Jira incidents, and GitHub Actions build logs.</p><p>This data changes too fast for batch indexing to keep up. Instead, you pull it live via API calls triggered by the query itself. A developer asking "Why is checkout slow right now?" needs the current K8s resource utilization for the checkout pods, not the documentation about checkout's architecture. Mixing pre-indexed catalog and TechDocs context with real-time operational context is what separates a genuinely useful AI assistant from a documentation search engine.</p><p>Operational context informs decisions; it does not imply automated remediation unless explicitly authorized. Observing live state and acting on it are separate trust boundaries.</p><h3>Golden Path Context</h3><p>Golden path context comes from your <a href="https://backstage.io/docs/features/software-templates/">Backstage Scaffolder</a> templates, the opinionated, pre-approved patterns your platform team maintains for creating new services, adding CI/CD pipelines, or spinning up databases. This context feeds the AI's code generation and workflow guidance capabilities.</p><p>When a developer asks "How do I create a new Python microservice that follows our standards?" the answer shouldn't come from a generic tutorial. It should come from your actual Scaffolder template, including your team's specific conventions around naming, logging configuration, health check endpoints, and observability setup. Golden path context ensures that AI-assisted code generation produces output that passes your internal review standards on the first attempt.</p><hr><h2>Section 4: Agentic Capabilities</h2><h3>What Is Agentic Context Injection?</h3><p>Agentic context injection is the dynamic process by which an AI system decides <em>which</em> data sources to query based on the intent of the user's question, rather than fetching a fixed set of context for every request. It's the difference between a system that always retrieves the top-10 catalog entries regardless of the question, and a system that recognizes "my build is failing" as a signal to pull CI/CD logs, not architecture documentation.</p><p>A well-designed agentic system routes queries through an intent classifier first. Questions about ownership route to the catalog API. Questions about procedures route to TechDocs embeddings. Questions about current system state trigger live operational data calls. This routing logic is itself a form of engineering. It determines response latency, token cost, and answer relevance simultaneously.</p><p>Without strict boundaries, agentic retrieval increases blast radius: every additional tool or data source expands what the system can surface or misuse. Intent routing must be auditable, deterministic, and permission-aware to be safe in production.</p><h3>Tool Use and Function Calling</h3><p><a href="https://platform.openai.com/docs/guides/function-calling">Function calling</a> is the capability that allows an LLM to request the execution of a predefined function, a structured API call, rather than generating a text answer directly. The model outputs a JSON object specifying which function to call and with which parameters; your application executes the call and feeds the result back to the model.</p><p>For IDP AI assistants, function calling turns the LLM into an active participant in your platform's API surface. Instead of the model trying to recall what it knows about a service's on-call engineer, it calls <code>get_oncall_for_service(service_id="checkout")</code>, gets a live response from PagerDuty, and incorporates that response into its answer. Functions you'd expose typically include catalog entity lookup, TechDocs page retrieval, incident history queries, and deployment status checks. The LLM becomes a reasoning layer over your actual infrastructure data.</p><h3>What Is a System Prompt?</h3><p>The system prompt is the foundational instruction block prepended to every conversation with the AI assistant. It defines the model's persona (a senior platform engineer, not a general assistant), its constraints ("only answer questions about services in this catalog"), its output format preferences, and its access permissions.</p><p>For a platform assistant, the system prompt is effectively a policy document. It specifies that the model should cite its sources, decline to speculate about services not in the catalog, and escalate ambiguous ownership questions to a human. A weak system prompt produces an assistant that will confidently make things up. A well-engineered system prompt is a first line of defense against the risks described in the next section. In practice, the system prompt is inseparable from access control. It should reflect the same RBAC assumptions as the IDP itself — otherwise the model’s behavior will drift from the platform’s security model.</p><hr><h2>Section 5: Quality and Risk Definitions</h2><h3>What Is LLM Hallucination?</h3><p>Hallucination is when an LLM generates information that is factually incorrect but presented with full confidence. In a platform engineering context, hallucinations take a specific and damaging form: the model invents service names, fabricates runbook steps, cites non-existent on-call rotations, or describes API contracts that don't match the actual implementation.</p><p>The primary defense against hallucination is grounding (see above), combined with explicit system prompt instructions to cite sources. If the model's answer can't be traced to a specific catalog entity or TechDocs page, it shouldn't be trusted. Measuring hallucination rate by sampling model responses against the catalog is a useful quality metric for AI-enabled IDP rollouts.</p><h3>What Is Context Drift?</h3><p>Context drift is the discrepancy between the data the AI has indexed and the actual current state of your infrastructure. A TechDocs page describing the old three-tier deployment model that your team migrated away from six months ago is a context drift problem. A catalog entry with a stale owner field pointing to a team that was reorganized is another.</p><p>Context drift is not a one-time fix. It's an ongoing operational concern. The mitigation is a combination of automated re-indexing (triggering embedding updates when <code>catalog-info.yaml</code> files change) and documentation standards that treat TechDocs as a first-class engineering artifact. An AI assistant is only as current as the data it reads. If your catalog hygiene is poor, context drift will silently produce incorrect answers with no obvious signal that something is wrong.</p><h3>What Is Context Poisoning?</h3><p>Context poisoning occurs when low-quality, contradictory, or maliciously crafted documentation gets retrieved and influences the model's output. Two TechDocs pages for the same service that give conflicting deployment instructions will cause the model to blend them into a response that's confidently wrong. A poorly maintained runbook that describes a procedure deprecated two years ago is a context poisoning vector.</p><p>The solution is content governance: ownership requirements for every TechDocs page, last-reviewed timestamps surfaced in the catalog, and automated quality checks that flag documentation not updated in over 90 days. The AI doesn't discriminate between trusted and untrusted docs. The retrieval system surfaces whatever scores highest semantically. You own the quality of what gets indexed.</p><h3>What Is Context Overreach?</h3><p>Context overreach happens when you inject too much data into the prompt, including irrelevant retrieved chunks that dilute the signal and confuse the model. A developer asking about the auth service's rate limits doesn't need context from the billing service's TechDocs, even if billing is a downstream dependency. Retrieving ten chunks when three would suffice increases token cost, slows the response, and statistically introduces off-topic content that nudges the model toward a less precise answer.</p><p>The fix is tighter retrieval: stricter similarity thresholds, metadata filtering (retrieve only docs tagged to the queried service), and re-ranker models that score retrieved chunks for relevance before they enter the prompt. Context budgeting, deciding in advance how many tokens each source type is allowed to consume, is a practical starting point.</p><h3>What Is Privilege Leakage in AI Systems?</h3><p>Privilege leakage occurs when the AI assistant returns information about services, infrastructure, or documentation that the querying user shouldn't have access to, because the retrieval layer doesn't enforce the same <a href="https://backstage.io/docs/permissions/overview">Role-Based Access Controls (RBAC)</a> as the IDP itself. A junior engineer asking a general question about "our database infrastructure" shouldn't receive details about the security team's secrets management service, even if that service's TechDocs scored highly in the semantic search results.</p><p>Preventing privilege leakage requires that your retrieval pipeline filters indexed documents by the user's Backstage permissions before returning results. It's not enough to apply RBAC at the catalog UI layer; the vector search results that feed the LLM must respect the same access policies. This is one of the most commonly overlooked security requirements in IDP AI implementations.</p><h3>What Are Implicit Trust Chains?</h3><p>An implicit trust chain forms when a document retrieved as context itself references other documents — runbooks, architecture decision records, external wikis — that are outdated, incorrect, or not indexed by the retrieval system. The model reads the retrieved doc, which cites "the standard deployment procedure in the ops runbook," but the ops runbook lives in Confluence and isn't indexed. The model either ignores the reference, invents what it thinks the runbook says, or generates an incomplete answer.</p><p>Auditing your documentation for external references and either bringing those references into your indexed corpus or explicitly removing the links is a necessary part of context engineering. Every document in your retrieval index is implicitly vouching for everything it cites.</p><hr><p>The pattern running through every definition in this glossary is simple: context engineering is now a core platform responsibility, not an AI feature bolted on at the edge. LLMs are capable of sophisticated reasoning, but they reason over whatever you give them. Platform teams that invest in clean catalogs, maintained TechDocs, and well-governed golden paths aren't just doing good hygiene. They're building the infrastructure that makes AI actually work.</p>
]]></content:encoded></item><item><title><![CDATA[Splitting TechDocs Out of Our Monolithic Backstage Deployment]]></title><link>https://roadie.io/blog/splitting-techdocs-out-of-our-monolithic-backstage-deployment/</link><guid isPermaLink="false">https://roadie.io/blog/splitting-techdocs-out-of-our-monolithic-backstage-deployment/</guid><pubDate>Thu, 26 Feb 2026 17:00:00 GMT</pubDate><description><![CDATA[How Roadie scaled Backstage by separating TechDocs from a monolithic backend, improving reliability, performance, and operational efficiency.]]></description><content:encoded><![CDATA[<p>At <a href="https://roadie.io/">Roadie</a>, we operate Backstage at a significant scale. Each customer receives a fully isolated, single-tenant Backstage deployment running in its own Kubernetes namespace. This architecture gives customers strong security boundaries, predictable isolation, and the freedom to customize their instance without affecting others.</p><p>But, this model also introduces some operational complexity. Architectural decisions that are harmless at a small scale can cause issues when every tenant is running their own Backstage stack. One such decision was how we deployed <a href="https://backstage.io/docs/techdocs/generated-index/">TechDocs</a> as part of the same backend service as everything else.</p><p>In this article, we'll talk about why our original approach stopped scaling, how we redesigned it, and what improved when we split TechDocs out of the monolithic backend.</p><h2>Our Original Architecture</h2><p>In our original setup, each Roadie tenant ran a complete Backstage application composed of all frontend and backend plugins bundled together. On the backend side, everything was executed within a single Node.js process.</p><p>This meant that for each customer, a dedicated Kubernetes namespace was created to ensure isolation, a single Backstage backend pod was deployed into that namespace, and all backend plugins like Catalog, Scaffolder, TechDocs, and Auth were loaded into the same backend service.</p><p>From an architectural standpoint, this resulted in a classic monolith. Every backend plugin shared the same runtime, memory space, CPU limits, and lifecycle. This design was simple to operate and reason about early on, and it served us well for a long time. But, as customer usage patterns evolved, issues began to appear.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1ppREmPFLqvt3iEZpShTB1/dcd6bacf67e59941596a6d324dc6fd89/mermaid-diagram-2026-02-26T16-58-05.png" alt="Original Monolithic Backstage Deployment Architecture"></p><h2>The Problem: Resource Contention</h2><p>As more customers adopted TechDocs and began publishing larger documentation sites, we started receiving alerts that were difficult to explain at first glance.</p><p>These incidents typically involved brief periods during which the Backstage backend became unavailable, CPU usage exceeded configured limits, and Kubernetes restarted backend pods from resource exhaustion. What made this particularly challenging was that the failures were intermittent and tenant-specific. Many tenants were completely unaffected, while a handful experienced repeated disruptions, which made it harder to pinpoint the issue.</p><p>After analyzing metrics, logs, and pod-level behavior, a pattern emerged. In every affected case, the TechDocs backend plugin was consuming a disproportionate amount of CPU and memory.</p><h3>Why TechDocs Was the Culprit</h3><p>We noticed that issues only occurred for tenants with particularly heavy TechDocs usage. This included large documentation sites with many pages and assets, repositories containing multiple documentation sets, and frequent rebuilds triggered by ongoing documentation updates.</p><p>This behavior is expected when you look at what TechDocs does under the hood. The backend is responsible for fetching documentation files, rendering Markdown content, and running documentation generators like MkDocs. These tasks are inherently resource-intensive, especially during large or frequent builds.</p><p>When TechDocs runs in the same process as the rest of the Backstage backend, resource spikes are not contained. CPU saturation or memory pressure caused by documentation builds directly impacts unrelated functionality, including catalog ingestion, scaffolder workflows, and authentication. The monolithic design of TechDocs became a liability.</p><h2>The Decision: Split TechDocs Out</h2><p>To restore stability and regain operational control, we decided to extract TechDocs from the monolithic backend and deploy it as a separate Backstage backend application. The main reason for this was isolation. We wanted TechDocs to operate independently so that its workload characteristics would not interfere with the rest of the system. At the same time, we wanted to avoid breaking existing APIs or introducing fragile custom integrations.</p><p>Our TechDocs requirements were simple:</p><ul><li>It needed to be independently deployable so it could evolve on its own schedule.</li><li>It needed to be discoverable by the core backend without hardcoded configuration.</li><li>It needed to scale independently, based on documentation workload rather than overall backend traffic.</li></ul><h2>The New Architecture</h2><p>In the new design, each tenant runs two distinct Backstage backend services instead of one.</p><p>The first is the <strong>core Backstage backend.</strong> This service is responsible for handling catalog ingestion, Tech Insights, authentication, and other core APIs.</p><p>The second is a <strong>dedicated TechDocs backend.</strong> This service runs only the TechDocs-related functionality and handles documentation builds and rendering.</p><p>The two services communicate using Backstage’s built-in discovery mechanism.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/3JnTCsweZXsxixCFyDhTwI/a3fd56dfdcadf9afd9b2e93badfca56f/mermaid-diagram-2026-02-26T17-01-07.png" alt="Improved Backstage Deployment After Splitting Out TechDocs"></p><h2>Results</h2><h3>Cleaner Cluster Allocations</h3><p>By running TechDocs in its own pod, we gained fine-grained control over its resource profile. CPU and memory limits are now explicitly tuned for documentation workloads, and scaling rules can be applied only where documentation usage justifies it.</p><p>This prevents overprovisioning the core backend while still allowing TechDocs to scale aggressively when needed.</p><h3>Improved Stability</h3><p>Isolating TechDocs eliminated an entire class of failures. Documentation builds no longer put pressure on unrelated backend functionality. Catalog ingestion, scaffolder executions, authentication flows, and core API availability remain stable even during peak documentation activity.</p><p>For customers, this translates directly into fewer outages and a more predictable Backstage experience.</p><h3>Easier Debugging and Operations</h3><p>From an operational perspective, separating TechDocs clarified boundaries. Resource spikes are now immediately attributable to the correct service. Logs are easier to interpret, and incidents can be diagnosed and mitigated quickly.</p><p>This separation also simplifies future tuning and capacity planning. With TechDocs isolated, we can reason about its resource usage independently from the rest of the backend and make decisions based on real workload characteristics rather than worst-case assumptions. CPU and memory requests can be adjusted specifically for documentation builds, and autoscaling policies can be tuned around build frequency, repository size, and peak documentation activity. This also makes forecasting easier: growth in documentation usage no longer forces us to overprovision the core backend. Instead, we can scale and optimize each service independently, reducing wasted capacity while maintaining predictable performance.</p><h2>What We Gained Overall</h2><p>Splitting TechDocs out of the monolith forced us to formalize a clear boundary between core platform responsibilities and workload-specific plugins, which in turn improved how we think about backend composition overall. What started as a targeted stability fix turned out to have broader implications for how we structure the backend.</p><p>As a result, we now have a repeatable pattern for extracting heavy backend plugins into independent services when their resource profiles or failure modes warrant it. Backend deployments are slimmer, responsibilities are better defined, and each service can be sized and scaled according to the work it actually performs rather than the worst-case behavior of a single plugin. This makes the system easier to reason about both during normal operation and when something goes wrong.</p><h2>Takeaways</h2><p>TechDocs was the most obvious candidate for separation due to its workload profile, but this architecture opens the door to further modularization where it makes sense. Backstage provides the primitives needed to support this kind of design. At scale, using them becomes less of an optimization and more of a requirement.</p><p>If you're running Backstage in a multi-tenant or high-scale environment and are seeing similar symptoms, it's worth examining your heaviest backend plugins and questioning whether they belong in the same process as the rest of Backstage.</p>
]]></content:encoded></item><item><title><![CDATA[Supercharge your GitLab setup with Roadie's Internal Developer Portal]]></title><link>https://roadie.io/blog/supercharge-your-gitlab-setup/</link><guid isPermaLink="false">https://roadie.io/blog/supercharge-your-gitlab-setup/</guid><pubDate>Thu, 19 Feb 2026 10:00:00 GMT</pubDate><description><![CDATA[Supercharge GitLab with Roadie’s managed Backstage platform. Eliminate repo sprawl, centralize ownership and governance, visualize CI/CD, and launch a production-ready Internal Developer Portal in hours, not months.]]></description><content:encoded><![CDATA[<p><strong><a href="/">Roadie's SaaS Backstage platform</a> offers the deepest GitLab integration of any Internal Developer Portal</strong>, solving "GitLab Sprawl," the challenge of navigating hundreds of repositories without a unified <a href="/product/catalog/">software catalog</a>, ownership model, or governance layer. GitLab itself acknowledges the gap: its own IDP category is officially "planned but not funded," with implementation pushed beyond 2025. For GitLab-centric organizations, Roadie provides a production-ready portal in hours rather than the 6 to 12 months required to self-host Backstage.</p><p>This guide covers every integration surface between Roadie and GitLab, from auto-discovery of catalog entities across hundreds of repos to CI/CD visualization, scaffolding new services, enforcing governance via scorecards, and securely connecting self-managed GitLab instances through the Roadie Broker.</p><h2>Set Up Roadie with GitLab: A Step-by-Step Guide</h2><p>Connecting GitLab to Roadie takes about 30 minutes. You'll create a token, configure auto-discovery to populate your <a href="/product/catalog/">Software Catalog</a>, enable CI/CD visibility, and build your first <a href="/product/scaffolder/">scaffolder template</a>.</p><h2>Prerequisites</h2><p>Before you start, make sure you have:</p><ul><li>Admin access to your Roadie instance (URL format: <code>https://&#x3C;your-tenant>.roadie.so</code>)</li><li>A GitLab account with permissions to create Personal Access Tokens</li><li>Access to at least one GitLab group or project containing repositories.</li></ul><h2>Step 1: Create and Configure Your GitLab Token</h2><p>GitLab uses Personal Access Tokens to authenticate API requests. Roadie needs this token to read repositories, discover catalog entities, and interact with CI/CD pipelines.</p><p>Backstage needs a <a href="https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html">GitLab API token</a> for discovery and plugin data. Three token types exist, each with different trade-offs:</p><p><strong>Group Access Tokens</strong> are the recommended choice for Backstage integrations. They create a bot user scoped to a specific group and all its projects, don't consume a GitLab license seat, and aren't tied to a human user (avoiding breakage when someone leaves the organization). They require GitLab Premium or Ultimate on SaaS. <strong>Personal Access Tokens (PATs)</strong> provide broader instance-wide access but inherit the creating user's permissions and break if that user is deactivated. <strong>Project Access Tokens</strong> are too narrow for discovery since you'd need one per repository.</p><p>The required scopes depend on the use case. For <strong>read-only catalog discovery and CI/CD visualization</strong>, <code>read_api</code> is sufficient. For <strong><a href="https://roadie.io/product/scaffolder/">scaffolder</a> operations</strong> that create repositories and merge requests, the full set of <code>api</code>, <code>read_repository</code>, and <code>write_repository</code> is needed. Roadie stores tokens securely in its secrets management UI at <code>https://&#x3C;tenant>.roadie.so/administration/secrets</code>, backed by AWS Parameter Store with per-tenant KMS encryption, rather than requiring them in plaintext config files.</p><p>Once you have your token, you can enter it into <a href="/docs/details/setting-secrets/">Roadie</a>.</p><ol><li><p>Log into your Roadie instance at <code>https://&#x3C;your-tenant>.roadie.so</code></p></li><li><p>Navigate to <strong>Administration > Secrets</strong> (direct URL: <code>https://&#x3C;your-tenant>.roadie.so/administration/secrets</code>).</p></li><li><p>Find the secret named <code>GITLAB_TOKEN</code></p></li><li><p>Click <strong>Edit</strong> and paste your GitLab token</p></li><li><p>Click <strong>Save</strong></p></li></ol><p>The secret update takes 2-3 minutes to propagate. You'll see the status indicator change from "Updating" to "Available" when it's ready.</p><h2>Configure Auto-Discovery: eliminate manual catalog registration at scale</h2><p>The core of any Internal Developer Portal is the <a href="https://roadie.io/product/catalog/">software catalog</a>. Without auto-discovery, teams must manually register every service, an approach that collapses at hundreds of repositories. <a href="https://backstage.io">Backstage's</a><code>GitlabDiscoveryEntityProvider</code>, packaged in <code>@backstage/plugin-catalog-backend-module-gitlab</code>, crawls a GitLab instance, finds repositories containing a <code>catalog-info.yaml</code> file, and registers them automatically.</p><p>In self-hosted Backstage, this requires editing <code>app-config.yaml</code> directly:</p><pre><code class="language-yaml">catalog:
  providers:
    gitlab:
      production:
        host: gitlab.com
        branch: main
        fallbackBranch: master
        skipForkedRepos: true
        includeArchivedRepos: false
        group: my-org                     # Scope to a specific group
        groupPattern:
          - '^platform-.*$'               # Regex: only groups starting with "platform-"
          - 'services'
        projectPattern: '[\s\S]*'         # Regex: match all projects (default)
        entityFilename: catalog-info.yaml
        excludeRepos:
          - my-org/deprecated-service
        schedule:
          frequency: { minutes: 30 }
          timeout: { minutes: 3 }
</code></pre><p><strong>Roadie replaces this YAML editing with a UI-based configuration</strong> at <code>https://&#x3C;tenant>.roadie.so/administration/settings/integrations/gitlab</code>. Admins add their GitLab instance URL, configure provider rules pointing to specific groups, and entities appear in the catalog within minutes.</p><p>The discovery provider supports powerful filtering through regex patterns. The <code>groupPattern</code> field accepts a list of regular expressions OR'd together to select which groups to scan. The <code>projectPattern</code> field applies a regex against each project's <code>path_with_namespace</code>. A legacy discovery processor also exists using wildcard URLs (<code>https://gitlab.com/group/subgroup/blob/*/catalog-info.yaml</code>, where <code>*</code> resolves to each repo's default branch), but the entity provider approach is the current recommended pattern.</p><p>For near-real-time updates, the provider supports <strong>webhook-driven ingestion</strong>, configure GitLab <code>push</code> webhooks to trigger incremental catalog refreshes instead of waiting for scheduled polls. Events can be received via HTTP endpoints, AWS SQS, Google Pub/Sub, or Kafka through <code>@backstage/plugin-events-backend-module-gitlab</code>.</p><p>To set up auto-discovery:</p><ol><li><p>Navigate to <strong>Administration > Integrations &#x26; Plugins > GitLab</strong> (direct URL: <code>https://&#x3C;your-tenant>.roadie.so/administration/settings/integrations/gitlab</code>)</p></li><li><p>In the <strong>Host</strong> field, enter your GitLab instance URL:</p><ul><li>For GitLab.com: <code>gitlab.com</code></li><li>For self-hosted: Your full domain (e.g., <code>gitlab.company.com</code>)</li></ul></li><li><p>Leave <strong>API Base URL</strong> with the default vaule unless you're using a custom API endpoint</p></li></ol><p>Now, you’ll need to add a discovery provider in Roadie. The discovery provider tells Roadie which GitLab groups or projects to scan for catalog files.</p><ol><li><p>In the same GitLab integration page, scroll to <strong>Configure GitLab Discovery</strong></p></li><li><p>Click <strong>Add Provider Configuration</strong></p></li><li><p>Configure the provider. At the very least, you’ll need to enter your group name. You can also add filters to refine discovery. For example, you can add a <a href="/docs/catalog/location-management/#gitlab-autodiscovery">project pattern</a>.</p></li><li><p>Click <strong>Save</strong></p></li></ol><p>The discovery process runs every hour by default. To trigger an immediate scan, you can refresh the catalog or wait for the next scheduled run.</p><p>For Roadie to link catalog entities to GitLab data, you need to add GitLab-specific annotations to your <code>catalog-info.yaml</code> files.</p><p>Add one of these annotations to your entity metadata:</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: my-service
  annotations:
    # Option 1: Use project ID (found in Settings > General in GitLab)
    gitlab.com/project-id: '12345'

    # Option 2: Use project slug (group/project format)
    # gitlab.com/project-slug: 'acme-corp/my-service'

    # Option 3: For self-hosted GitLab instances
    # gitlab.com/instance: 'gitlab.company.com'
</code></pre><p>The project ID is the most reliable option. You can find it in your GitLab project under <strong>Settings > General</strong> at the top of the page.</p><p>Now, auto-discovery should be enabled, and your catalog will be populated with projects from GitLab.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/45Moo60vy3OyikGC5wXrro/87a3dfb7206bb2629594e4b260664b3c/image4.png" alt="Catalog with projects from GitLab"></p><h2>CI/CD visibility and contextual linking to GitLab</h2><p>Once entities are cataloged, the <code>@immobiliarelabs/backstage-plugin-gitlab</code> provides rich GitLab data directly on each component's page in Backstage. This plugin reads the <code>gitlab.com/project-slug</code> or <code>gitlab.com/project-id</code> annotation and calls the GitLab API via a backend proxy to surface:</p><ul><li><strong>Pipeline status table</strong> showing the last 20 builds with status (success/failed/running/pending), direct links to the GitLab pipeline URL, and execution time</li><li><strong>Merge requests table</strong> with open and recently merged MRs, linking directly to GitLab</li><li><strong>MR statistics</strong> for review velocity insights</li><li><strong>Contributors/people card</strong>, language breakdown, releases, code coverage, and rendered README</li></ul><p>The plugin supports both cloud-hosted <code>gitlab.com</code> and self-managed GitLab instances (configured via the <code>gitlab.com/instance</code> annotation). Multiple GitLab instances can be configured simultaneously in Roadie, useful for organizations running both cloud and on-premise deployments or undergoing migrations between the two.</p><p>The GitLab plugin should already be installed in your Roadie instance, but you need to add it to your component layouts.</p><ol><li><p>Navigate to any component in your catalog</p></li><li><p>Click the <strong>gear icon</strong> (⚙️) in the top right corner</p></li><li><p>Click the <strong>plus icon</strong> (+) to add a new card</p></li><li><p>Select <strong>GitLab</strong> from the plugin list:</p><ul><li><strong>EntityGitlabPeopleCard</strong>: Shows contributors and languages</li><li><strong>EntityGitlabPipelinesTable</strong>: Shows recent pipeline runs</li><li><strong>EntityGitlabMergeRequestStatsCard</strong>: Shows MR stats</li><li><strong>EntityGitlabMergeRequestsTable</strong>: Shows MRs</li><li><strong>EntityGitlabReadmeCard</strong>: Shows Readme</li><li><strong>EntityGitlabLanguageCard</strong>: Shows repository languages</li><li><strong>EntityGitlabReleasesCard</strong>: Shows recent releases</li></ul></li><li><p>Drag the cards to arrange them in your preferred order</p></li><li><p>Click <strong>Save</strong> to apply the layout.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/6cY278NTHX3s07zojEBqM8/d83a362d5bcdc3fecfd5a620b7bb44ae/image2.png" alt="CI/CD visibility and contextual linking to GitLab"></p></li></ol><p>If you want consolidated GitLab data:</p><ol><li><p>Click the <strong>plus icon</strong> (+) next to the existing tabs</p></li><li><p>Select <strong>EntityGitlabContent</strong></p></li><li><p>Name the tab (e.g., "GitLab")</p></li><li><p>Click <strong>Save</strong></p></li></ol><p>This creates a comprehensive GitLab view with all available cards on one page.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/3BxItasUlfg4LsnHEqB2Rl/4070c28da9f7b9792744a19cdb172bfc/image3.png" alt="Consolidated GitLab Data"></p><h2>Step 4: Create Your First Scaffolder Template</h2><p><a href="https://roadie.io/product/scaffolder/">Software Templates</a> in Backstage, powered by the Scaffolder plugin, let developers create new projects through forms that execute predefined actions.</p><p>Create a new file in one of your GitLab repositories at <code>templates/basic-service.yaml</code>:</p><pre><code class="language-yaml">apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: gitlab-basic-service
  title: Create a New Service (GitLab)
  description: Creates a new service repository in GitLab with a standard structure
spec:
  owner: user:default/&#x3C;your-user>
  type: service

  parameters:
    - title: Service Information
      required:
        - name
        - owner
        - description
      properties:
        name:
          title: Service Name
          type: string
          description: Unique name for your service (lowercase, hyphens only)
          pattern: '^[a-z0-9-]+$'
        description:
          title: Description
          type: string
          description: What does this service do?
        owner:
          title: Owner
          type: string
          description: User that owns this service
          ui:field: OwnerPicker
          ui:options:
            catalogFilter:
              kind: [Group, User]

    - title: GitLab Configuration
      required:
        - repoUrl
      properties:
        repoUrl:
          title: Repository Location
          type: string
          ui:field: RepoUrlPicker
          ui:options:
            allowedHosts:
              - gitlab.com
            allowedOwners:
              - &#x3C;your-group>

  steps:
    - id: fetch-base
      name: Fetch Base Template
      action: fetch:template
      input:
        url: https://gitlab.com/&#x3C;your-group>/&#x3C;your-repo>/-/tree/master/skeleton
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          owner: ${{ parameters.owner }}
          repoUrl: ${{ parameters.repoUrl }}

    - id: publish
      name: Publish to GitLab
      action: publish:gitlab
      input:
        repoUrl: ${{ parameters.repoUrl }}
        defaultBranch: main

    - id: register
      name: Register Component
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
        catalogInfoPath: '/catalog-info.yaml'

  output:
    links:
      - title: Repository
        url: ${{ steps.publish.output.remoteUrl }}
      - title: Open in Catalog
        icon: catalog
        entityRef: ${{ steps.register.output.entityRef }}
</code></pre><h3>Create the Template Skeleton</h3><p>In the same repository, create a <code>skeleton</code> directory with your template files:</p><p><strong>skeleton/catalog-info.yaml:</strong></p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: ${{ values.name }}
  description: ${{ values.description }}
  annotations:
    gitlab.com/project-slug: ${{ values.repoUrl | parseRepoUrl | pick('owner') }}/${{ values.name }}
spec:
  type: service
  lifecycle: experimental
  owner: ${{ values.owner }}
</code></pre><p><strong>skeleton/README.md:</strong></p><pre><code class="language-markdown"># ${{ values.name }}

${{ values.description }}

## Getting Started

[Add your setup instructions here]

## Owner

Maintained by: ${{ values.owner }}
</code></pre><p><strong>skeleton/.gitignore:</strong></p><pre><code>node_modules/
.env
dist/
</code></pre><h3>Register the Template</h3><ol><li>Create a <code>catalog-info.yaml</code> in your templates repository:</li></ol><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Location
metadata:
  name: templates
  description: Scaffolder templates for our organization
spec:
  type: url
  targets:
    - https://gitlab.com/&#x3C;your-group>/&#x3C;your-repo>/-/blob/main/templates/basic-service.yaml
</code></pre><ol start="2"><li>Register this location in Roadie:
<ul><li>Navigate to <strong>Catalog > Import</strong></li><li>Paste the URL to your <code>catalog-info.yaml</code></li><li>Click <strong>Analyze</strong>, then <strong>Import</strong></li></ul></li></ol><p>If auto-discovery is configured for your templates repository, Roadie finds and registers them automatically.</p><h3>Test Your Template</h3><ol><li><p>Navigate to <strong>Templates</strong> in your Roadie instance</p></li><li><p>Find your template: "Create a New Service (GitLab)"</p></li><li><p>Click <strong>Run</strong> and fill out the form:</p><ul><li>Service Name: <code>test-service</code></li><li>Description: <code>A test service to verify template functionality</code></li><li>Owner: Select an owner from the dropdown</li><li>Repository Location: Provide a name for the repository</li></ul></li><li><p>Click <strong>Review</strong> to see a summary of what will be created</p></li><li><p>Click <strong>Create</strong> to execute the template</p></li></ol><p>The scaffolder will:</p><ul><li>Generate files from your skeleton directory</li><li>Create a new GitLab repository</li><li>Push the generated code</li><li>Register the component in your Roadie catalog</li></ul><p>You'll see a link to the new repository and the catalog entry when complete.</p><h3>Add GitLab-Specific Actions</h3><p>Roadie supports <a href="https://roadie.io/backstage/scaffolder-actions/">several GitLab-specific scaffolder actions</a> beyond basic repository creation:</p><p><strong>Create a merge request:</strong></p><pre><code class="language-yaml">- id: create-mr
  name: Create Merge Request
  action: publish:gitlab:merge-request
  input:
    repoUrl: gitlab.com?owner=acme-corp&#x26;repo=my-service
    title: 'feat: Initial service setup'
    description: 'Automated setup via scaffolder'
    branchName: feature/initial-setup
    targetBranch: main
</code></pre><p><strong>Add project variables:</strong></p><pre><code class="language-yaml">- id: add-variables
  name: Add CI/CD Variables
  action: gitlab:projectVariable:create
  input:
    repoUrl: gitlab.com?owner=acme-corp&#x26;repo=my-service
    key: API_KEY
    value: ${{ parameters.apiKey }}
    masked: true
    variableType: env_var
</code></pre><p><strong>Trigger a pipeline:</strong></p><pre><code class="language-yaml">- id: trigger-pipeline
  name: Trigger Initial Pipeline
  action: gitlab:pipeline:trigger
  input:
    repoUrl: gitlab.com?owner=acme-corp&#x26;repo=my-service
    branch: main
</code></pre><h2>Tech Insights enables governance at scale through GitLab API data</h2><p><a href="https://roadie.io/product/tech-insights/">Tech Insights</a> transforms ad-hoc governance (manual audits, spreadsheets of compliance) into continuous, automated checks surfaced directly in the developer portal. The system works through three layers: <strong>data sources</strong> (fact retrievers that call APIs), <strong>checks</strong> (boolean logic evaluating facts), and <strong>scorecards</strong> (collections of checks targeting entity subsets).</p><p>For GitLab integrations, common governance questions answered through the API include:</p><p>| Governance Check | GitLab API Endpoint | Data Type |
|------------------|---------------------|-----------|
| Repository has CODEOWNERS | <code>GET /projects/:id/repository/files/CODEOWNERS</code> | <code>boolean</code> |
| Default branch protected | <code>GET /projects/:id/protected_branches</code> | <code>boolean</code> |
| CI configuration exists | <code>GET /projects/:id/repository/files/.gitlab-ci.yml</code> | <code>boolean</code> |
| MR approval rules configured | <code>GET /projects/:id/approval_rules</code> | <code>boolean</code> |
| Project description set | <code>GET /projects/:id</code> → <code>description</code> field | <code>boolean</code> |
| Open vulnerability count | <code>GET /projects/:id/vulnerability_findings</code> | <code>number</code> |</p><p>Roadie's managed <a href="https://roadie.io/docs/tech-insights/introduction/">Tech Insights</a> provides <strong>100+ built-in data source types</strong> and a UI for <a href="https://roadie.io/docs/tech-insights/add-check/">defining checks and scorecards</a> without writing code. Scorecards can target entity subsets (e.g., "all production services must have CODEOWNERS and branch protection") and surface results as team-level rollups, historical trends, and operational review dashboards. The <code>read_api</code> scope is sufficient for all fact collection.</p><h2>Connecting Self-Managed GitLab (On-Prem)</h2><p>Organizations running <strong>self-managed GitLab behind a firewall</strong> face a connectivity challenge: how does a SaaS portal reach an internal GitLab instance without exposing it to the internet? <a href="https://roadie.io/docs/integrations/broker/">Roadie's Broker</a>, based on Snyk's open-source broker, solves this with <strong>outbound-only connections</strong>.</p><p>The architecture consists of two components. The <strong>Broker Client</strong> is a Node.js application deployed inside the customer's infrastructure (via Docker, Helm chart, or npm CLI). It initiates an outbound WebSocket connection to the <strong>Broker Server</strong>, a tenant-specific endpoint hosted in Roadie's infrastructure. All traffic flows through this tunnel, Roadie never initiates inbound connections, and <strong>no firewall ports need to be opened</strong>.</p><p>Security is enforced through multiple layers. An <code>accept.json</code> configuration file on the client side provides <strong>allowlist-based access control</strong>, by default, Roadie has zero access to any internal APIs. Only explicitly permitted URL patterns and HTTP methods are proxied. Authentication tokens for internal GitLab remain in the customer's infrastructure and are never transmitted to Roadie. Additionally, the Broker Server restricts connections to customer-specified <strong>IP CIDR ranges</strong>, and all access is logged for audit.</p><p>A minimal Broker Client deployment for self-managed GitLab:</p><pre><code class="language-bash">docker run \
  --env BROKER_TOKEN=gitlab-integration \
  --env BROKER_SERVER_URL=https://&#x3C;tenant>.broker.roadie.so \
  -v $(pwd)/accept.json:/service/accept.json \
  roadiehq/broker
</code></pre><p>The corresponding <code>accept.json</code> restricts access to only the GitLab API paths Backstage needs:</p><pre><code class="language-json">{
  "private": [
    {
      "//": "GitLab API access for catalog discovery and plugins",
      "method": "GET",
      "path": "/api/v4/*",
      "origin": "${GITLAB_INTERNAL_URL}",
      "auth": {
        "scheme": "token",
        "token": "${GITLAB_INTERNAL_TOKEN}"
      }
    }
  ],
  "public": [
    { "method": "GET", "path": "/healthcheck" }
  ]
}
</code></pre><p>Broker configuration is managed through Roadie's admin UI at <code>https://&#x3C;tenant>.roadie.so/administration/settings/integrations/broker</code>, where admins set CIDR ranges and broker tokens.</p><h2>How Roadie compares to the alternatives for GitLab shops</h2><p><strong>GitLab's native capabilities leave a significant gap.</strong> GitLab offers CI/CD pipelines, project templates, and compliance frameworks, but no unified service catalog, no scorecard-based governance, no <a href="https://roadie.io/product/documentation/">docs-as-code portal</a>, and no cross-tool aggregation surface. GitLab's Service Desk is an ITSM ticketing system for external users, not a developer portal. GitLab's own <a href="https://about.gitlab.com/direction/">direction page</a> explicitly identifies Backstage and Port as competitive solutions, tacitly acknowledging the gap it cannot fill until at least 2026.</p><p><strong><a href="https://roadie.io/blog/the-true-cost-of-self-hosting-backstage/">Self-hosted Backstage</a></strong> provides the same plugin ecosystem but demands a dedicated team for maintenance, upgrades, infrastructure management, and plugin curation. Real-world experience confirms that organizations routinely spend months reaching production and still struggle with adoption at scale. Roadie eliminates this burden: automatic monthly upgrades, pre-configured plugins (75+), no-code UI customization, and enterprise features like <a href="https://roadie.io/product/access-control/">RBAC</a>, usage analytics, and OpenSearch-powered catalog search that don't exist in open-source Backstage.</p><p><strong>Proprietary portals</strong> (Cortex at ~$65/user/month, Port with a free tier up to 15 users then $40/seat/month) offer polished UIs and built-in engineering intelligence, but carry significant lock-in risk with proprietary data models. Cortex provides strong DORA metrics and maturity scorecards but has no community plugin ecosystem and no TechDocs equivalent. Port offers maximum flexibility through custom "Blueprints" but lacks native documentation features and treats GitLab primarily as an action backend rather than a deep data source. Neither matches the depth of Backstage's GitLab plugin ecosystem, which includes discovery, org sync, CI/CD visualization, scaffolding, pipeline triggers, code owners display, MR statistics, and coverage reporting across <strong>six actively maintained packages</strong>.</p><p>Roadie sits at the intersection: the open-source extensibility and <a href="https://roadie.io/docs/integrations/gitlab/">GitLab integration</a> depth of Backstage, with the operational simplicity of a managed SaaS, at roughly one-third the per-user cost of Cortex.</p><h2>Common pitfalls and production-hardening advice</h2><p>Several failure modes recur in GitLab + Backstage deployments at scale. <strong>Token expiration</strong> is the most common, GitLab tokens expire silently (default max 365 days, extendable to 400 in GitLab 17.6+), causing catalog updates to stop without clear errors. Automate rotation via the GitLab API's <code>POST /groups/:id/access_tokens/:token_id/rotate</code> endpoint. <strong>Entity naming collisions</strong> break ingestion when two repos use identical <code>metadata.name</code> values; standardize naming conventions early and enforce them through scaffolder templates. <strong>Quoted numeric IDs</strong> trip up YAML parsing, <code>gitlab.com/project-id: '4521'</code> must be quoted, or YAML interprets it as a number and the annotation match fails.</p><p>For discovery at scale (500+ repos), <strong>scope discovery to specific groups</strong> rather than scanning the entire instance to reduce API calls, and <strong>deploy webhook-driven updates</strong> to eliminate polling overhead. Use <a href="https://roadie.io/product/scaffolder/">Software Templates</a> to generate correct <code>catalog-info.yaml</code> files for new projects from day one, preventing the catalog drift that occurs when teams must retroactively add metadata to existing repos. Treat <code>catalog-info.yaml</code> as code, reviewed via merge requests, enforced by CI validation, and owned by the service team.</p><h2>Conclusion</h2><p>The combination of Roadie and GitLab addresses a gap that GitLab itself won't fill until 2026 at the earliest. <strong>Auto-discovery with regex-based group and project filtering</strong> eliminates the manual registration burden across hundreds of repos. <strong><a href="https://roadie.io/backstage/scaffolder-actions/">Scaffolder actions</a></strong> specific to GitLab automate project creation with built-in guardrails, CI configuration, branch protection, and catalog registration happen in a single self-service workflow. <strong><a href="https://roadie.io/product/tech-insights/">Tech Insights scorecards</a></strong> transform governance from periodic audits into continuous, visible compliance tracking. And the <strong><a href="https://roadie.io/docs/integrations/broker/">Broker architecture</a></strong> extends all of this to self-managed GitLab instances without any firewall changes, using outbound-only connections with allowlist-based access control.</p><p>The key technical insight is that <a href="https://backstage.io">Backstage's</a> GitLab integration ecosystem is uniquely deep, six actively maintained packages covering discovery, org sync, CI/CD visualization, pipeline triggers, and scaffolding. <a href="https://roadie.io">Roadie</a> wraps this ecosystem in a managed layer that eliminates the 3,12 engineer operational burden of self-hosting, adds enterprise features absent from open-source Backstage (<a href="https://roadie.io/product/access-control/">RBAC</a>, scorecards, no-code layout editing, AI capabilities), and avoids the proprietary lock-in of alternatives like Cortex and Port. For GitLab-centric organizations, it represents the fastest path from repository sprawl to a governed, self-service developer platform.</p><h2>Next Steps</h2><p>Now that you understand how Roadie integrates with GitLab, here are practical next steps to get started:</p><ul><li><strong><a href="https://roadie.io/free-trial/">Start a free trial</a></strong> to connect your GitLab instance and see auto-discovery in action within minutes</li><li><strong><a href="https://roadie.io/docs/integrations/gitlab/">Explore the GitLab integration documentation</a></strong> for detailed configuration instructions and advanced patterns</li><li><strong><a href="https://roadie.io/docs/catalog/modeling-entities/">Learn how to model your software catalog</a></strong> to represent your system architecture properly with components, APIs, and dependencies</li><li><strong><a href="https://roadie.io/docs/scaffolder/writing-templates/">Create your first Software Template</a></strong> to standardize how teams create new GitLab projects with pre-configured CI/CD and governance</li><li><strong><a href="https://roadie.io/docs/tech-insights/add-check/">Set up Tech Insights checks</a></strong> to monitor engineering standards like CODEOWNERS files and branch protection across your GitLab repositories</li><li><strong><a href="https://roadie.io/case-studies/">Review customer success stories</a></strong> from other organizations using Roadie with GitLab to accelerate their platform engineering initiatives</li></ul>
]]></content:encoded></item><item><title><![CDATA[Backstage Microservices Strategies: Taming Sprawl with a Service Catalog]]></title><link>https://roadie.io/blog/backstage-microservices-strategies/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-microservices-strategies/</guid><pubDate>Thu, 12 Feb 2026 10:00:00 GMT</pubDate><description><![CDATA[Learn how Backstage and a service catalog tame microservices sprawl, clarify ownership, reduce MTTR, and accelerate onboarding with Golden Paths and dependency visibility.]]></description><content:encoded><![CDATA[<p>When a 3 AM incident cascades through your 400-microservice architecture, the critical question isn't what's broken, it's who owns it. Without a centralized system of record, organizations inevitably accumulate "zombie services" - undocumented, unmaintained code that nobody claims until it fails catastrophically. <a href="https://roadie.io/backstage-spotify/">Backstage</a>, the CNCF-incubated developer portal created by Spotify, has emerged as the definitive solution for taming this complexity, streamlining incident response and drastically cutting the time required to onboard new engineers.</p><p>The stakes are substantial: the friction caused by tool sprawl and context switching acts as a massive tax on engineering velocity. For organizations with 50+ engineers already committed to microservices, the question isn't whether to implement a service catalog, it's how to do it effectively before complexity overwhelms capacity.</p><h2>The hidden cost of microservices at scale</h2><p>Modern engineering organizations force developers to juggle a dizzying array of monitoring, CI/CD, and cloud infrastructure tools. This fragmentation forces constant context switching, breaking flow state and burning valuable engineering hours every week. When Expedia Group surveyed their <a href="https://roadie.io/case-studies/expedia-group-backstage-mvp/">5,000+ developers managing 20,000 microservices</a>, documentation discoverability emerged as their primary pain point - engineers were spending more time finding information than building features.</p><p>The ownership problem compounds over time. Without a system of record, teams accumulate what practitioners call "microservice graveyards", entire clusters of services where the original owners have departed and no one wants responsibility. At Spotify before Backstage, engineers described their workflow as <a href="https://qeunit.com/blog/quality-engineering-productivity-at-spotify/">"rumor-driven development"</a>, the only way to discover how something worked was asking colleagues who might remember.</p><p>Incident response suffers most acutely. FireHydrant's analysis of 50,000+ incidents found that when services have clear ownership attached, <a href="https://firehydrant.com/blog/learn-from-50-000-incidents-with-the-first-incident-benchmark-report/">mean time to resolution drops by 36%</a>. Motability was able to reduce the creation of new services <a href="https://roadie.io/case-studies/motability-operations-case-study-a-modern-idp/">from 2 - 3 days to minutes</a> after implementing service catalog tooling that eliminated the "who owns this?" question during outages. The pattern is consistent: visibility into ownership and dependencies transforms incident response from frantic Slack archaeology into systematic problem-solving.</p><h2>Backstage as your microservices operating system</h2><p>Backstage functions as an <a href="https://roadie.io/backstage-spotify/">internal developer portal</a>, a unified interface that aggregates service metadata, documentation, and operational tooling into a single searchable surface. Created by Spotify in 2016 and open-sourced in 2020, it now manages their 2,000+ backend services and 4,000+ data pipelines with contributions from over 60 internal teams. The CNCF accepted it as an Incubating project in March 2022, signaling enterprise-grade maturity. Created by Spotify in 2016 and open-sourced in 2020, it now manages their <strong><a href="https://backstage.io/blog/2020/03/16/announcing-backstage/">2,000+ backend services and 4,000+ data pipelines</a></strong> with contributions from <strong>over 60 internal teams</strong>. The CNCF <strong><a href="https://www.cncf.io/blog/2022/03/15/backstage-project-joins-the-cncf-incubator/">accepted it as an Incubating project</a></strong> in March 2022, signaling enterprise-grade maturity.</p><p>The <a href="https://roadie.io/product/catalog/">Software Catalog</a> forms the foundation. Every service, API, library, and infrastructure resource gets registered with a <code>catalog-info.yaml</code> file that lives alongside the code:</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payments-service
  description: Handles payment processing for all checkout flows
  annotations:
    # Links this service to its PagerDuty on-call schedule
    pagerduty.com/service-id: P123ABC
    # Connects to the GitHub repository
    github.com/project-slug: acme/payments-service
spec:
  type: service
  lifecycle: production
  # Defines team ownership - answers "who owns this?"
  owner: payments-team
  # Groups service into larger business domain
  system: checkout
  # Declares what APIs this service provides
  providesApis:
    - payments-api
  # Declares dependencies on other resources
  dependsOn:
    - resource:default/payments-db
</code></pre><p>This declarative approach ensures metadata lives with code and flows through standard git workflows. The <strong>owner</strong> field answers the 3 AM question definitively. The <strong>dependsOn</strong> and <strong>providesApis</strong> fields create a navigable dependency graph. Annotations connect the service to operational tooling, PagerDuty, CI/CD pipelines, monitoring dashboards, creating what Backstage calls a "single pane of glass."</p><p>The <a href="https://roadie.io/docs/catalog/modeling-entities/">System Model</a> introduces organizational hierarchy: <strong>Domains</strong> (business areas like Payments or Search) contain <strong>Systems</strong> (collections of components that form a product capability), which contain <strong>Components</strong> (individual services) and <strong>APIs</strong> (interface boundaries). This taxonomy maps directly to how engineering organizations structure teams and ownership, making catalog navigation intuitive rather than arbitrary.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/13x7xPBbnhdFipKk5v8drw/831e59e90e09af23d834923d2e624907/mermaid-diagram-2026-02-11T12-44-15.png" alt="The System Model"></p><h2>Golden Paths eliminate the copy-paste tax</h2><p>Before Backstage, creating a new service at Spotify took <a href="https://engineering.atspotify.com/2020/08/how-we-improved-developer-productivity-for-our-devops-teams">14 days of configuration, pipeline setup, and documentation. Afterward: less than 5 minutes</a>. The difference is the <strong><a href="https://roadie.io/product/scaffolder/">Scaffolder</a></strong>, Backstage's templating system that implements what Spotify calls <a href="https://engineering.atspotify.com/2020/08/how-we-use-golden-paths-to-solve-fragmentation-in-our-software-ecosystem">"Golden Paths"</a>.</p><p>A Golden Path is an opinionated, supported path to building something, a backend service, a data pipeline, a React application. Rather than starting from a copy-pasted template that's already drifted from current standards, engineers use Software Templates that generate services with current CI/CD configuration, security scanning, logging, and observability pre-wired:</p><pre><code class="language-yaml">apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: spring-boot-service
  title: Spring Boot Microservice
  description: Create a production-ready Spring Boot service with CI/CD
spec:
  owner: platform-team
  type: service
  # Parameters define the form users fill out
  parameters:
    - title: Service Details
      required:
        - name
        - owner
      properties:
        name:
          type: string
          title: Service Name
          description: Lowercase with hyphens (e.g., user-auth-service)
        owner:
          type: string
          title: Owner
          description: Team that will own this service
          ui:field: OwnerPicker
  # Steps define what actions to execute
  steps:
    # Step 1: Fetch the template skeleton from a repository
    - id: template
      name: Fetch Template
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          owner: ${{ parameters.owner }}

    # Step 2: Publish to GitHub and create repository
    - id: publish
      name: Publish to GitHub
      action: publish:github
      input:
        repoUrl: github.com?repo=${{ parameters.name }}&#x26;owner=acme
        description: ${{ parameters.description }}

    # Step 3: Register in Backstage catalog automatically
    - id: register
      name: Register Component
      action: catalog:register
      input:
        catalogInfoPath: /catalog-info.yaml
</code></pre><p>The philosophy matters: Golden Paths are <strong>recommended, not mandated</strong>. Engineers can deviate, but they lose platform team support. This balances standardization with autonomy, the platform provides the easy path, but doesn't constrain innovation. Spotify maintains <a href="https://engineering.atspotify.com/2020/08/how-we-use-golden-paths-to-solve-fragmentation-in-our-software-ecosystem">six Golden Paths</a> spanning backend, frontend, data engineering, machine learning, and web development, each optimized for their specific discipline.</p><p>The productivity impact extends beyond service creation. Spotify measured new engineer <a href="https://engineering.atspotify.com/2021/09/how-backstage-made-our-developers-more-effective-and-how-it-can-help-yours-too">time to 10th pull request, dropping from 60+ days to 20 days</a> after Backstage deployment. When every service follows consistent patterns, understanding one means understanding all.</p><h2>Dependency visualization reveals blast radius</h2><p>The <a href="https://roadie.io/backstage/plugins/catalog-graph/">Catalog Graph plugin</a> transforms the static catalog into an interactive dependency map. When planning an API deprecation or infrastructure migration, engineers can trace exactly which services consume an endpoint and who owns them. During incidents, the graph shows upstream dependencies that might be causing failures and downstream services that might be affected.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/42KtKZ9WOuBfHiBl3074Dd/6a663f35451889709239d51f5373833e/mermaid-diagram-2026-02-11T12-46-45.png" alt="The Catalog Graph plugin"></p><p><em>Example dependency graph showing blast radius when payments-service changes</em></p><p>Relationships are defined explicitly in <a href="https://roadie.io/docs/catalog/modeling-entities/">catalog metadata</a> using <code>dependsOn</code>, <code>providesApis</code>, and <code>consumesApis</code> fields. Backstage automatically generates inverse relationships, if Service A declares it consumes API B, API B's page shows Service A as a consumer. This bidirectional visibility makes deprecation planning systematic: filter by API, identify all consumers, contact those teams, and track migration progress.</p><p>The blast radius analysis capability transforms change management. Before deploying infrastructure changes, engineers visualize what breaks if a database becomes unavailable or an API endpoint goes down. Migration wave planning becomes data-driven, identify server clusters where dependency chains can be broken cleanly, then sequence the migration accordingly.</p><h2>Tech Insights gamifies production readiness</h2><p>Catalog completeness means nothing if the metadata is wrong. <strong><a href="https://roadie.io/product/tech-insights/">Tech Insights</a></strong> (called Scorecards in some implementations) provides automated fact-checking that validates services against production readiness standards. The system operates on two concepts: <strong>Facts</strong> (data points collected from various sources) and <strong>Checks</strong> (rules that evaluate facts).</p><p>Common checks include <a href="https://roadie.io/backstage/plugins/pagerduty/">PagerDuty integration</a> verification (ensuring on-call is configured), deprecated library detection, Node.js version compliance, and documentation completeness. Each check runs on a configurable schedule and produces a compliance score visible on service pages:</p><pre><code class="language-yaml"># Example Tech Insights check configuration
techInsights:
  factChecker:
    checks:
      productionReadiness:
        type: json-rules-engine
        name: Production Readiness
        description: Ensures services meet production standards
        # Define which facts to collect
        factIds:
          - entityOwnershipFactRetriever
          - techdocsFactRetriever
          - pagerdutyFactRetriever
        # Define the rule logic
        rule:
          conditions:
            all:
              # Check 1: Must have a group owner (not individual)
              - fact: hasGroupOwner
                operator: equal
                value: true
              # Check 2: Must have TechDocs documentation
              - fact: hasTechDocs
                operator: equal
                value: true
              # Check 3: Must have PagerDuty integration
              - fact: hasPagerDuty
                operator: equal
                value: true
</code></pre><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/33O9cQMN3fBVkmQPILWSZj/396ebe198b055987a7771e48116f1dff/mermaid-diagram-2026-02-11T12-48-58.png" alt="Catalog completeness"></p><p><a href="https://roadie.io/case-studies/dexcom-automating-catalog-completeness-backstage/">Dexcom used automated checks</a> to drive catalog completeness from 60% to over 95%. <a href="https://roadie.io/blog/backstage-gets-quality-and-compliance-scorecards-with-roadie/">Baillie Gifford</a>, operating in the regulated financial services sector, uses scorecards to track security tool adoption across 250 developers, generating compliance reports that previously required days of manual assembly.</p><p>The gamification effect drives adoption organically. When teams see leaderboards showing their scorecard performance relative to peers, competitive dynamics motivate improvement without mandates. Engineering leaders report this <a href="https://roadie.io/blog/improving-and-measuring-developer-experience-with-backstage/">"soft governance" approach</a> achieves better compliance than top-down enforcement while preserving team autonomy.</p><h2>Integration creates the unified dashboard</h2><p>Backstage's plugin architecture enables what engineers call the <strong>"single pane of glass"</strong>, consolidated visibility across the entire operational stack. Over <a href="https://roadie.io/backstage/plugins/">200 plugins</a> provide native integrations with common tooling.</p><p>The <strong><a href="https://roadie.io/backstage/plugins/kubernetes/">Kubernetes plugin</a></strong> displays deployment status, pod health, and resource metrics directly on service pages. Engineers see crash logs aggregated from all pods, health indicators, and links to deeper investigation tools, without leaving Backstage or requiring kubectl access. The <strong><a href="https://roadie.io/backstage/plugins/pagerduty/">PagerDuty plugin</a></strong> shows active incidents, on-call schedules, and allows triggering new incidents from service context. <strong><a href="https://roadie.io/backstage/plugins/github-actions/">GitHub Actions</a></strong>, <strong><a href="https://roadie.io/backstage/plugins/circle-ci/">CircleCI</a></strong>, and <strong><a href="https://roadie.io/backstage/plugins/jenkins/">Jenkins</a></strong> plugins display build status, deployment history, and failure details.</p><p>API management uses the same catalog model. OpenAPI, AsyncAPI, and GraphQL specifications register as API entities with full interactive documentation, consumer/provider relationships, and lifecycle management. When API version 2 launches, teams identify v1 consumers directly from the catalog and coordinate deprecation timelines.</p><p>The integration pattern is consistent: annotate catalog entities with tool-specific identifiers, and plugins fetch relevant data on page load. A properly configured service page shows ownership, dependencies, <a href="https://roadie.io/product/documentation/">documentation</a>, build status, deployment health, active incidents, and on-call, everything needed to understand and operate the service from a single URL.</p><h2>Choosing between self-hosted and managed options</h2><p>Self-hosted Backstage offers unlimited customization but demands significant investment: typically <a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/">2-3 dedicated FTEs</a> with TypeScript/React expertise for initial buildout and ongoing maintenance. Organizations like <a href="https://roadie.io/case-studies/from-self-hosted-backstage-to-roadie/">Paddle</a> ran self-hosted Backstage for four years before migrating to managed alternatives when the maintenance burden conflicted with driving adoption.</p><p><strong><a href="https://roadie.io">Roadie</a></strong> provides managed Backstage at approximately $20/user/month with same-day setup, <a href="https://roadie.io/backstage/plugins/">200+ pre-configured plugins</a>, and enterprise features like <a href="https://roadie.io/product/access-control/">RBAC</a> included. The tradeoff is reduced customization compared to self-hosted, though standard catalog formats mean organizations can migrate later if needs evolve.</p><p>Proprietary alternatives like Cortex ($65-69/user/month), OpsLevel, and Port offer differentiated approaches. Cortex emphasizes AI-powered service discovery and executive reporting. OpsLevel prioritizes fast deployment, 30-45 days typical, with automated catalog maintenance. Port offers maximum customization through a no-code builder but requires significant configuration investment.</p><p>For organizations with 50-100 engineers, managed solutions typically deliver faster time-to-value. Above 500 engineers with dedicated platform teams, self-hosted Backstage becomes economically viable if TypeScript expertise exists. Regulated industries should evaluate on-premises options alongside <a href="https://roadie.io/blog/roadie-local-self-hosted-backstage-ready-in-minutes/">Roadie's self-hosted offering</a>.</p><h2>Starting your service catalog journey</h2><p>Successful implementations follow a consistent pattern: start with the <strong><a href="https://roadie.io/product/catalog/">software catalog</a></strong> before adding complexity. Import users and teams first so ownership fields work immediately. Choose early-adopter teams willing to contribute catalog metadata, then expand systematically. Platform teams at Expedia <a href="https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/#:~:text=Expedia%20Group%20put%20850%2B%20engineers%20through%20their%20Backstage%20based%20bootcamp%20in%202022.">put 850+ engineers through Backstage-based bootcamp</a> in their first year, treating adoption as a change management initiative rather than a technology deployment.</p><p>Catalog completeness matters more than feature breadth initially. Contentful <a href="https://roadie.io/case-studies/maintaining-velocity-through-hypergrowth-contentful/">achieved 90% metadata coverage within one year</a> by making Scaffolder the default service creation path, new services entered the catalog automatically, while existing services received incremental metadata through team contributions.</p><p>Measure what matters: <strong>time to 10th PR</strong> for onboarding velocity, <strong>MTTR</strong> for incident response improvement, and <strong>catalog completeness</strong> for adoption tracking. <a href="https://engineering.atspotify.com/2021/09/how-backstage-made-our-developers-more-effective-and-how-it-can-help-yours-too">Spotify's Pia Nilsson captured the business case succinctly</a>: "If you have numbers like that in your organization, it's easy to get buy-in for investments in developer experience."</p><p>The microservices complexity that created the 3 AM ownership problem also created the opportunity for systematic improvement. Backstage provides the framework; your implementation provides the value. Organizations that treat their <a href="https://roadie.io/docs/catalog/modeling-entities/">service catalog</a> as a product, with dedicated ownership, user feedback loops, and continuous improvement, consistently report the productivity gains that justify investment. Those that deploy and forget find another unused tool in an already crowded landscape.</p><p>The choice isn't whether complexity will be managed, it's whether you'll manage it systematically before it manages you.</p><h2>Next Steps</h2><p>Ready to implement Backstage in your organization? Here are resources to help you get started:</p><ul><li><p><strong><a href="https://roadie.io/product/catalog/">Explore Roadie's Catalog</a></strong> - See how Roadie's managed Backstage platform can help you organize your microservices architecture with automated discoverability and ownership tracking.</p></li><li><p><strong><a href="https://roadie.io/product/scaffolder/">Learn About the Scaffolder</a></strong> - Discover how Software Templates and Golden Paths can standardize service creation and reduce onboarding time from weeks to minutes.</p></li><li><p><strong><a href="https://roadie.io/case-studies/">Read Implementation Case Studies</a></strong> - Learn from companies like Expedia Group, Dexcom, and Contentful who have successfully deployed Backstage at scale.</p></li><li><p><strong><a href="https://roadie.io/blog/the-true-cost-of-self-hosting-backstage/">Compare Deployment Options</a></strong> - Download the whitepaper comparing managed versus self-hosted Backstage to determine the best approach for your organization.</p></li><li><p><strong><a href="https://roadie.io/request-demo/">Book a Demo</a></strong> - See Roadie in action. Request a personalized demo to discover how managed Backstage can tame your microservices sprawl with same-day setup and enterprise-grade security.</p></li></ul>
]]></content:encoded></item><item><title><![CDATA[Creating Backstage EntityProviders at Runtime  ]]></title><link>https://roadie.io/blog/creating-backstage-entityproviders-at-runtime/</link><guid isPermaLink="false">https://roadie.io/blog/creating-backstage-entityproviders-at-runtime/</guid><pubDate>Thu, 29 Jan 2026 00:00:00 GMT</pubDate><description><![CDATA[Learn how to dynamically manage catalog data sources in Backstage without redeploying. This post introduces a provider pooling pattern that pre-registers EntityProviders at startup and assigns them to consumers at runtime—enabling multi-tenancy, user-defined integrations, and self-service onboarding scenarios.]]></description><content:encoded><![CDATA[<p>Backstage's catalog is the heart of your developer portal. EntityProviders are the mechanism by which data flows into it—they connect to external systems, fetch entity data, and push it to the catalog.</p><p>Typically, EntityProviders are registered at application startup via the <code>catalogProcessingExtensionPoint</code>. Once the backend initializes, the set of providers is fixed. But what happens when you need to dynamically create new sources of catalog data without redeploying?</p><p>Consider these scenarios:</p><ul><li>A multi-tenant platform where each tenant needs isolated entity management</li><li>User-defined integrations that pull data from custom sources</li><li>Dynamic data pipelines that generate catalog entities on-demand</li><li>Self-service onboarding where teams register their own data sources</li></ul><p>Backstage doesn't natively support registering EntityProviders after startup. This post explains how to solve this with a provider pooling pattern.</p><h2>The Challenge</h2><p>The <code>EntityProviderConnection</code> that allows emitting entities is established once at startup when providers are registered. After initialization, you cannot add new providers—any attempt to call <code>addEntityProvider</code> after the catalog has started will fail.</p><h2>The Solution: Provider Pooling</h2><p>Instead of fighting Backstage's architecture, work with it. The key insight: register a pool of providers at startup, then dynamically assign them to consumers at runtime.</p><pre><code>┌─────────────────────────────────────────────────────────────┐
│                    Backend Startup                          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  1. Create pool of N idle EntityProviders            │   │
│  │  2. Register all with catalogProcessingExtensionPoint│   │
│  │  3. Restore any persisted assignments from database  │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────┐
│                  ProviderRegistryService                    │
│  ┌────────────────────────────────────────────────────┐     │
│  │  getProviderFor(id) → assigns idle provider        │     │
│  │  releaseProvider(id) → clears entities, returns    │     │
│  └────────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────┘
                               │
         ┌─────────────────────┼─────────────────────┐
         ▼                     ▼                     ▼
    Consumer A            Consumer B            Consumer C
    (provider-0)          (provider-1)          (provider-2)
</code></pre><h2>The Pooled EntityProvider</h2><p>Extend the standard <code>EntityProvider</code> with assignment tracking and the ability to clear entities:</p><pre><code class="language-typescript">interface PooledEntityProvider extends EntityProvider {
  assignTo(id: string): void;
  clearAssignment(): void;
  clearEntities(): Promise&#x3C;void>;
  updateEntities(entities: Entity[]): Promise&#x3C;void>;
}
</code></pre><p>The key method is <code>clearEntities</code>—emitting an empty full mutation removes all entities that provider previously managed:</p><pre><code class="language-typescript">async clearEntities(): Promise&#x3C;void> {
  await this.connection?.applyMutation({
    type: 'full',
    entities: [],
  });
}
</code></pre><h2>The Registry Service</h2><p>The registry manages the pool, handling assignment and release:</p><pre><code class="language-typescript">interface ProviderRegistryService {
  getProviderFor(id: string): Promise&#x3C;PooledEntityProvider>;
  releaseProvider(id: string): Promise&#x3C;void>;
  getAllProviders(): PooledEntityProvider[];
}
</code></pre><p>When a consumer requests a provider, the registry either returns an existing assignment or finds an available provider from the pool and persists the new assignment.</p><p>When a provider is released, the registry clears its entities, removes the assignment, and returns it to the pool.</p><h2>Catalog Registration</h2><p>Register all pooled providers at startup via a backend module:</p><pre><code class="language-typescript">const providerRegistryCatalogModule = createBackendModule({
  pluginId: 'catalog',
  moduleId: 'provider-registry',
  register(module) {
    module.registerInit({
      deps: {
        catalog: catalogProcessingExtensionPoint,
        registry: providerRegistryServiceRef,
      },
      async init({ catalog, registry }) {
        for (const provider of registry.getAllProviders()) {
          catalog.addEntityProvider(provider);
        }
      },
    });
  },
});
</code></pre><h2>Persistence</h2><p>Store assignments in a database table so they survive restarts. This ensures the same consumer gets the same provider ID, maintaining entity ownership continuity.</p><h2>Usage</h2><pre><code class="language-typescript">// Acquire a provider
const provider = await registry.getProviderFor('my-integration-123');

// Emit entities
await provider.updateEntities(entities);

// When done, release (clears entities and returns provider to pool)
await registry.releaseProvider('my-integration-123');
</code></pre><h2>Trade-offs</h2><ul><li><strong>Memory overhead</strong> — Pre-allocated providers consume memory, though instances are lightweight</li><li><strong>Fixed capacity</strong> — Pool exhaustion causes failures; size appropriately for your workload</li><li><strong>Provider ID stability</strong> — Assignments must persist to maintain entity ownership across restarts</li></ul><h2>Conclusion</h2><p>By pre-registering a pool of EntityProviders and dynamically assigning them at runtime, you can achieve flexible, dynamic entity management without modifying Backstage's core architecture. This pattern works for multi-tenancy, user-defined integrations, or any scenario requiring runtime control over catalog data sources.</p>
]]></content:encoded></item><item><title><![CDATA[The True Cost of Self-Hosting Backstage: A Build vs. Buy Analysis for Engineering Leaders]]></title><link>https://roadie.io/blog/the-true-cost-of-self-hosting-backstage/</link><guid isPermaLink="false">https://roadie.io/blog/the-true-cost-of-self-hosting-backstage/</guid><pubDate>Wed, 28 Jan 2026 17:00:00 GMT</pubDate><description><![CDATA[A build vs. buy analysis of self-hosting Backstage—real costs, hidden tradeoffs, and what engineering leaders should consider before committing platform time.
]]></description><content:encoded><![CDATA[<p>Your platform team is fielding 30 Slack messages a day. "Who owns the payment service?" "How do I get an S3 bucket?" "Does virus scanning exist in our environment?"</p><p>Someone suggests <a href="https://roadie.io/backstage-spotify/">Backstage</a>. You look at the README, see it's open source, and think you'll just spin it up.</p><p>Six months later, you've burned through your platform team's roadmap, your <a href="https://roadie.io/product/catalog/">catalog</a> is half-populated, and developers are still opening JIRA tickets for everything.</p><p>At my previous company, we built an internal developer portal from scratch. The bill ran into the millions. <a href="https://roadie.io/blog/from-a-spreadsheet-and-a-usd2m-bill-why-we-built-roadie/">That experience taught me</a> that every engineering organization over 100 people eventually needs this infrastructure. But it also showed me how expensive it is to build and maintain. Even with AI coding agents to help with development, there’s a thousand small decisions to make about business logic.</p><p>Now, as CEO of Roadie, I talk to engineering leaders every week who are making this build vs. buy decision. Most underestimate what self-hosting Backstage actually costs. Not just in dollars, but in team capacity and time to value.</p><p>Here's what the decision really looks like.</p><h2>The Problem You're Actually Solving</h2><p>The developer portal problem starts small. You're managing 50 services across 30 teams. Someone asks which services depend on the auth API. You don't have a good answer.</p><p>Another team needs virus scanning. You think it exists somewhere, but you're not sure.</p><p>So you make a spreadsheet. Service name, owner, what it does. Problem solved.</p><p>This works for about three weeks. Then nobody updates the spreadsheet. It's out of date. People stop trusting it.</p><p>This is the discoverability problem. At 10 engineers, you just shout across the office. At 100+ engineers, that breaks down completely. You need a <a href="https://roadie.io/product/catalog/">software catalog</a> that actually stays current.</p><p>But discoverability is just the first problem. You also hit:</p><p><strong>The self-service bottleneck</strong>: A mobile developer needs an S3 bucket. They don't know Terraform. They open a JIRA ticket. Your platform team gets to it in two weeks. The developer either waits or bypasses your platform entirely and creates shadow IT.</p><p><strong>The governance gap</strong>: Your security team needs to verify every service has proper on-call setup. Your compliance team wants to check access controls. You need automated checks against your entire software catalog, not manual audits.</p><p>These three problems drive every developer portal evaluation. The question isn't whether you need a solution. The question is whether you build it or buy it.</p><h2>What Backstage Actually Gives You</h2><p><a href="https://roadie.io/backstage-spotify/">Backstage</a> is not a developer portal. It's a framework for building one.</p><p>This distinction matters. When Spotify open sourced Backstage in 2020, they released their toolkit for building developer portals, not a turnkey product. You get a collection of TypeScript libraries and React components. You have to assemble them into something useful.</p><p>This creates immediate friction for most platform teams:</p><p><strong>You need TypeScript expertise</strong>. Most platform teams work in Go, Python, and YAML. Web development is a different skillset than infrastructure engineering. You either hire TypeScript developers or retrain your existing team.</p><p><strong>You're building, not configuring</strong>. Out of the box, Backstage gives you basic catalog functionality. It doesn't give you <a href="https://roadie.io/product/access-control/">role-based access control</a>. It doesn't give you production-grade search (it runs on Postgres full-text search). It doesn't give you most enterprise features. You build those.</p><p><strong>You're maintaining a web application</strong>. Backstage releases a new version every couple of weeks. Each upgrade can break your plugins. Each new integration requires custom TypeScript code. This is ongoing work, not a one-time project.</p><h2>The Real Cost: Team Capacity</h2><p>When we surveyed the Backstage community this year, organizations that reported being happy with their self-hosted deployment had at least three dedicated engineers. Some teams were as large as 12 people.</p><p>Let me translate that into budget terms. Three mid-level engineers cost around $450,000 per year in salary, benefits, and overhead. That's the minimum for a successful deployment.</p><p>But the real cost is what those engineers aren't doing.</p><p><strong>Time to production: 6-12 months</strong>. You're not launching next sprint. You're building the catalog model, integrating CI/CD tools, setting up authentication, configuring plugins, and training teams. The organizations we talk to consistently report 6-12 months before they had something teams would actually use.</p><p><strong>Opportunity cost</strong>. Those three engineers aren't improving your CI/CD pipeline. They're not hardening security. They're not building platform capabilities that differentiate your business. They're maintaining a developer portal.</p><p><strong>Missing features you have to build</strong>. Need <a href="https://roadie.io/product/access-control/">RBAC</a> so your security services aren't visible to everyone? You're building that. Want better search? You're integrating Elasticsearch. Want <a href="https://roadie.io/product/documentation/">API documentation</a>? You're configuring and maintaining that integration.</p><p><a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/">The actual costs break down</a> into several categories:</p><ul><li>3 engineers minimum at $150k loaded cost each = $450,000/year</li><li>9 months to production at 60% team efficiency = ~$200,000 in delayed value</li><li>Ongoing maintenance and feature development</li><li>TypeScript training or new hires</li></ul><p>First year total exceeds $800,000. Every year after that is still $450,000 minimum, assuming no team growth.</p><p>And you still haven't built all the features you need.</p><h2>When Building Makes Sense</h2><p>Self-hosting Backstage gives you something valuable: control.</p><p>If you have unique requirements that no vendor can handle, being able to modify every line of code matters. If you're integrating with complex legacy systems, having full access to the source code can be critical.</p><p>You also get the Backstage ecosystem. The community is active. New <a href="https://roadie.io/backstage/plugins/">plugins</a> appear regularly. If you need an integration with a specific tool, someone in the community might have already built it.</p><p>Some engineering leaders prefer ownership for critical infrastructure. They don't want vendor dependencies. They want the source code running in their environment.</p><p>These are legitimate reasons to self-host. But they need to justify the cost.</p><p><strong>Build if:</strong></p><ul><li>You have genuinely unique requirements no vendor can handle</li><li>You already have a team with TypeScript expertise who wants to own this</li><li>You're large enough (500+ engineers) that control benefits outweigh costs</li><li>You have specific security requirements that mandate on-premises deployment</li><li>Your platform team has capacity to spare</li></ul><p><strong>Don't build if:</strong></p><ul><li>Your platform team is already stretched thin</li><li>You want to launch in weeks, not months</li><li>You need enterprise features without building them</li><li>You'd rather focus engineering resources on your actual platform</li></ul><p>The key question: What do you want your platform team working on? If the answer is "building platform capabilities that make our engineering organization more effective," you probably shouldn't be maintaining a developer portal.</p><h2>The Managed Alternative</h2><p>When we built <a href="https://roadie.io/">Roadie</a>, the goal was straightforward: give you Backstage without the team overhead.</p><p>Here's what that means:</p><p><strong>Day one deployment</strong>. You connect your GitHub organization. Your services start populating. No six-month buildout. No TypeScript work. You're <a href="https://roadie.io/docs/getting-started/overview/">configuring, not coding</a>.</p><p><strong>Enterprise features included</strong>. <a href="https://roadie.io/product/access-control/">RBAC</a>, production-grade search, authentication integrations. The pieces you'd have to build yourself are already there, built from feedback across hundreds of deployments.</p><p><strong>Automatic upgrades</strong>. When Backstage releases a new version, we test it, validate it, and roll it out. You don't manage the upgrade cycle. You don't deal with breaking changes.</p><p><strong>No TypeScript team required</strong>. You use a web UI to configure Backstage. Your platform team stays focused on platform work.</p><p>The financial calculation is simple. Managed Backstage costs a fraction of a three-engineer team. But the more important comparison is what your platform team accomplishes.</p><p>Would you rather have three engineers maintaining Backstage, or three engineers improving your CI/CD pipeline and building platform capabilities?</p><h2>The Hybrid Approach</h2><p>This is Roadie's actual positioning: the hybrid model.</p><p>Proprietary developer portals lock you into their data model. If they don't support your integration, you're stuck. If they sunset a feature, you adapt. You have no control.</p><p>Self-hosted Backstage gives you control but requires a dedicated team and significant TypeScript expertise.</p><p><a href="https://roadie.io/backstage-comparison/">Managed Backstage</a> sits in the middle:</p><ul><li>The flexibility of Backstage's open source ecosystem</li><li>Day-one usability and enterprise features</li><li>No team overhead, no TypeScript requirement, no year-long buildout</li></ul><p>You're not locked into a proprietary platform. If you decide you want to self-host later, you can. The catalog format is standard Backstage. Your <a href="https://roadie.io/backstage/plugins/">plugins</a> are standard Backstage.</p><p>But you also don't need to staff a team just to keep the portal running.</p><h2>The Path Few Consider</h2><p>Most engineering leaders frame this as "self-host Backstage vs. buy a proprietary portal." But there's a third option: start managed, migrate to self-hosted later if needed.</p><p>You can validate that Backstage solves your problems without burning six months and half your platform team's capacity. Get your <a href="https://roadie.io/docs/getting-started/model-software/">catalog</a> populated. Get teams using it. Prove the value.</p><p>Then, if you decide you need more control, migrate to self-hosted. The data model is identical. The plugins are the same. You're not locked in.</p><p>Several of our <a href="https://roadie.io/case-studies/">customers</a> took exactly this path in reverse. They self-hosted Backstage, realized it was consuming their platform team, and <a href="https://roadie.io/case-studies/why-celonis-switched-from-selfhosted-backstage-to-roadie/">migrated to Roadie</a> to free up those engineers for higher-value work.</p><h2>Making the Decision</h2><p>Here's the framework I use when talking to engineering leaders:</p><p><strong>Start with team capacity</strong>. Can your platform team absorb a year-long project plus ongoing maintenance? If not, you're not really choosing to build. You're choosing to delay.</p><p><strong>Calculate opportunity cost</strong>. What else could three engineers accomplish in a year? New CI/CD capabilities? Better security tooling? Improved developer experience? Is maintaining a developer portal more valuable than those alternatives?</p><p><strong>Consider your timeline</strong>. Do you need this solved in weeks or months? If you need it fast, you're buying. If you have a year to spare, you might build.</p><p><strong>Evaluate your requirements</strong>. Do you have genuinely unique needs, or do you just need a software catalog with good search, RBAC, and common integrations? Most organizations overestimate how unique their requirements actually are.</p><p><strong>Think about your team's preferences</strong>. Does your platform team want to work on TypeScript and React components, or do they want to work on platform capabilities? Forcing engineers to maintain infrastructure they don't want to maintain leads to burnout and turnover.</p><p>The honest answer for most organizations under 500 engineers is that building doesn't make financial sense. You can get Backstage with all the ecosystem benefits for a fraction of the cost, with immediate deployment, and without tying up your platform team.</p><h2>What This Really Comes Down To</h2><p>The question isn't whether you need a developer portal. If you're managing 50+ services across 30+ teams, you probably do.</p><p>The question is whether building and maintaining that portal is the best use of your platform engineering resources.</p><p>For most engineering leaders, the answer is no. Your platform team should be <a href="https://roadie.io/blog/what-to-think-about-when-youre-thinking-about-an-idp/">improving your platform</a>, not maintaining a web application.</p><p>That's why Roadie exists. You get Backstage without the team overhead. You get the open source ecosystem without the TypeScript requirement. You get day-one deployment without the year-long buildout.</p><p>And your platform team gets to focus on the work that actually differentiates your business.</p><p>The build vs. buy decision isn't really about cost. It's about what you want your team working on. Choose accordingly.</p><h2>Next Steps</h2><p>If you're evaluating Backstage for your organization, here are concrete next steps to help you move forward:</p><p><strong>Explore Backstage capabilities</strong>: Review our <a href="https://roadie.io/backstage-spotify/">comprehensive guide to Backstage</a> to understand what's possible with the platform and how organizations are using it today.</p><p><strong>See how others decided</strong>: Read <a href="https://roadie.io/case-studies/">case studies from engineering teams</a> who've made the build vs. buy decision. Learn from <a href="https://roadie.io/case-studies/why-celonis-switched-from-selfhosted-backstage-to-roadie/">Celonis's experience migrating from self-hosted to managed</a> and how <a href="https://roadie.io/case-studies/maintaining-velocity-through-hypergrowth-contentful/">Contentful maintained velocity through hypergrowth</a>.</p><p><strong>Understand the full cost picture</strong>: Dive deeper into <a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/">the complete cost breakdown of self-hosting Backstage</a> to validate your budget estimates.</p><p><strong>Try it yourself</strong>: <a href="https://roadie.io/request-demo/">Request a demo</a> to see how managed Backstage works, or <a href="https://roadie.io/free-trial/">start a free trial</a> to experience Roadie firsthand and get your catalog populated in hours, not months.</p><p><strong>Compare your options</strong>: Review <a href="https://roadie.io/blog/backstage-alternatives/">Backstage alternatives and approaches</a> to ensure you're making an informed decision about your developer portal strategy.</p>
]]></content:encoded></item><item><title><![CDATA[Self-Hosting Backstage: The Real To-Do List]]></title><link>https://roadie.io/blog/self-hosting-backstage-the-real-to-do-list/</link><guid isPermaLink="false">https://roadie.io/blog/self-hosting-backstage-the-real-to-do-list/</guid><pubDate>Fri, 23 Jan 2026 08:00:00 GMT</pubDate><description><![CDATA[Many teams self-host Backstage. Few plan for what it takes to run it at scale. This guide lays out the real engineering effort behind a production-grade Backstage deployment, based on years of experience running Backstage in production at scale, so platform teams and engineering leaders can plan with eyes open.]]></description><content:encoded><![CDATA[<p>Considering Backstage? Great! Want to self-host it? Sure - lots of organizations do. In fact, in the <a href="https://roadie.io/blog/the-2025-state-of-backstage-report/">State of Backstage 2025 survey</a>, 91% said they self-host, versus 9% on managed platforms. Just because loads of people do it doesn’t mean it’s easy, though. The real question is what it takes to run Backstage as a production platform once you have thousands of engineers, hundreds of teams, and a catalog that keeps growing. The gap between “we stood it up” and “engineers rely on it every day” is where most of the cost and complexity lives.</p><p>We’ve spent the last five years running Backstage in production across dozens of organizations, and building the missing pieces that make it reliable at scale: performance work, background job isolation, governance and scorecards, RBAC, search, TechDocs operations, plugin maintenance, and the unglamorous reality of constant upgrades. If you want to know what’s the real cost and effort involved in standing up a Backstage instance, we’re uniquely positioned to tell you.</p><p>This post is a roadmap of that work. If you only have a minute, the table below is the executive summary detailing what will probably need to be done. Think of it as your ultimate to-do list for self-hosting Backstage.</p><p>And as always, many of these learnings and insights are captured in this year’s <a href="https://roadie.io/blog/the-2025-state-of-backstage-report/">State of Backstage Report</a>, where you can hear firsthand from self-hosted users around their Backstage journey.</p><h2>Executive summary</h2><p><em>Some assumptions about these numbers:</em></p><ul><li>You’re aiming for a production-grade portal, not a demo.</li><li>You expect meaningful adoption across the organization, not a niche tool used by one team.</li><li>You’ll run a non-trivial plugin surface (CI, SCM, cloud, observability, security, incident tooling).</li><li>Effort is shown as engineering weeks, where one engineering week equals 5 working days of one engineer, and roughly 40 hours.</li><li>Obviously ranges here vary by organizational complexity, catalog size, and how strict your governance/security requirements are.</li></ul><p>| <strong>Initiative</strong> | What you’re building | Effort (engineering weeks) |
| --- | --- | --- |
| Performance and scalability | Server-side catalog pagination, worker isolation, stability profiling, general optimization, scaffolder scaling | 16-24 weeks (plus ongoing maintenance). This depends substantially on the size of your catalog - this will be on the higher side for larger catalogs with 50,000+ entities |
| Catalog customization and data modeling | Configurable catalog UI, decorators/fragments to enrich without PRs, custom kinds/schema, refresh triggers, completeness tracking | 18-30 weeks depending on just how much customization is necessary to meet your requirements. Plus ongoing time required for maintenance |
| Search | Operate a real search backend (OpenSearch), indexers and relevance tuning, UX refinements, AI search tie-in | 8-12 weeks for search, plus another 8-12 weeks for AI search if you want a real assistant experience |
| Plugins and integrations | Ongoing plugin lifecycle, auth quirks, API drift, version compatibility | 0.5 to 2 weeks per plugin to productionize (auth, permissions, UI polish, support), plus a few hours per upgrade cycle |
| Tech Insights and scorecards | Facts ingestion, rule engine, no-code builder UI, built-in checks library, aggregation/history/reporting | ~100 weeks (roughly 6 months for a team of 4 engineers) to build Tech Insights |
| RBAC, security and governance | Role mapping, policy engine, admin UI, token issuance/revocation tied to RBAC | ~50 weeks (6 months for a team of 2 engineers) to build RBAC |
| TechDocs operations | Hybrid build modes, webhooks/rebuilds, curated MkDocs environment, performance tuning | 2 weeks for initial setup, ongoing maintenance to address any issues |
| Developer experience polish | Catalog UI QoL, homepage improvements, admin UX | Ongoing commitment that can very easily be a full-time job for a platform engineer |
| AI and MCP | AI assistant over catalog/docs, embeddings/vector store, permission-aware retrieval, MCP servers | 12- 24 weeks, with a significant ongoing investment to continually implement and improve |
| Upgrades and release engineering | Test suites across plugin surface, staged rollouts, triage and rollback processes | Ongoing investment of a few hours a week, with significant  effort of 5 - 20 engineering days around major Backstage releases |</p><p>The rest of this post breaks down each initiative, why it exists, what tends to go wrong in real usage, and the type of engineering work required to make it solid. If you’re a platform engineer, treat it like a checklist. If you’re an engineering leader, treat it like a way to pressure-test the real cost and opportunity cost of self-hosting.</p><h2>1. Performance and scalability: keeping Backstage fast at 200k entities</h2><p>Backstage starts out fast enough. It becomes difficult to keep it performant and responsive at scale.</p><p>Once you have tens or hundreds of thousands of entities, TechDocs for most services, scorecards, search indexing, CI integrations and so on, the default architecture begins to strain. With several customers with north of 200,000 entities in their catalogs, this is where we had to invest heavily.</p><p>Some of the larger pieces you should plan for:</p><h3>Server side catalog pagination</h3><p>Vanilla Backstage loads the full result set for catalog pages, then filters on the frontend. That is fine for a few hundred entities. At hundreds of thousands, it become very slow.</p><p>We rewired the catalog list to use server side pagination and filtering. That meant changing how queries are constructed, how counts are calculated, and how the UI behaves when it only has a slice of the data. The result is dramatically lower load times for big catalogs. Reproducing that means touching both backend and frontend, and being ready to debug subtle performance and UX regressions when filters combine in unexpected ways.</p><p>Factor on at least 200 engineering hours to implement server side catalog pagination. This accounts for a backend refactor, restructuring the frontend table, compatibility handling for pagination edge cases, and API shape changes.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/31cXG91KsUBa8EA97IFHMI/8eb3cf0f1cba84c500b1822fea59a48c/screencapture-demo2-roadie-so-catalog-2026-01-21-12_04_04.png" alt="screencapture-demo2-roadie-so-catalog-2026-01-21-12 04 04"></p><p><em>Catalog pagination becomes an absolute necessity as the size of the catalog increases</em></p><h3>Background job isolation</h3><p>In open source Backstage, long running tasks often share a process with the user facing API. TechDocs builds, scaffolder jobs, scorecard computations, data ingestions and so on can all compete with login and catalog requests.</p><p>We pulled the heavy backend work out into separate worker containers. That improves stability and reduces resource contention, but it also means you now have to design and operate a small distributed system: one set of pods serving interactive traffic, another running asynchronous jobs, plus the plumbing to schedule, observe, and scale those workers safely.</p><p>Effort wise, a reasonable baseline is one to two days for the Backstage specific configuration and app construction, plus at least a few more days for the infrastructure work. All told, expect a small team to spend a week or two on this.</p><h3>Deep stability work</h3><p>On the way to running multi tenant Backstage, we have spent a lot of time on the unglamorous work: memory leaks, readiness probes that misreport their status, and so on. Examples include fixing global arrays that never stopped growing, ensuring in memory caches actually drop expired items, and tuning Kubernetes probes so that pods are only marked ready when they really are.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/71Re1Ihkuz92uauDpn2l9m/99424026b1e471a28def23359f4dc185/image-tenant-performance-sentry.png" alt="image-tenant-performance-sentry"></p><p><em>Tenant performance is meticulously logged and continually optimized</em></p><p>These are the bugs that only show up after weeks of production traffic. If you self host, you need to plan for that kind of ongoing investigation with profiling, heap dumps and careful rollouts.</p><p>Factor in at least a couple of engineering weeks time off the bat to ensuring all your infrastructure is properly profiled and optimized, and several hours per month of monitoring to ensure everything is running smoothly.</p><h3>Scaffolder scalability: popularity changes the problem</h3><p>The Backstage Scaffolder is easy to run when usage is low, but once it becomes the default way engineers create services and automate workflows, it turns into a workflow you need to manage. Scaling isn’t just “add more pods” - it’s making sure you can absorb bursts without the rest of Backstage slowing down, separating template execution from other traffic, and having enough visibility to support it day to day with an understanding what’s running, what’s queued, what’s stuck, what’s failing, and why.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2O0yi5PZmlezsGVgJxZnM9/2b017b1f24ab346edd4439ccb218a31d/image-scaffolder.png" alt="image-scaffolder"></p><p><em>The Scaffolder as a production workflow engine</em></p><h3>Preloading, caching and endless optimizations, frontend and back</h3><p>On the frontend we made numerous changes to how Backstage handles content. This includes optimizing API calls, ruthlessly culling what is loaded, and optimizing how we serve static content such as TechDocs.</p><p>In the backend we added visibility and controls around the job scheduler, because once your tooling grows and you’ve got multiple ingestion refreshes, scorecard jobs and TechDocs builds, it is easy to end up with invisible backlogs that impact performance.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7ouHIKuDByABAH34bouV0X/1f15973c9d77141b9b497a5d1b86e053/image-scheduler.png" alt="image-scheduler"><em>Roadie’s scheduler view showing background jobs across the instance</em></p><p>None of these changes is individually complex, but together they form the difference between “Backstage mostly works” and “Backstage feels fast even on a Monday morning with thousands of users.”</p><p>It’s challenging to reliably estimate this effort; but even at the lower end for a smaller catalog expect to sink multiple months worth of engineering effort into optimization for Backstage. For us, obviously with multiple Enterprise customers that makes sense to invest the time - for an individual team it’s a challenging cost-benefit discussion to make work.</p><h2>2. Catalog customization and data modeling</h2><p>Backstage’s catalog is designed to be endless flexible and extensible, but in practice organizations quickly run into the limits of pure YAML in git.</p><p>Over time we have had to turn the catalog into something that behaves more like a product, both in terms of how it’s populated, and how it’s used and consumed by users and services.</p><h3>Custom columns, tabs and views</h3><p>Platform teams want to surface critical metadata directly in the catalog table: criticality, lifecycle, tier, compliance score, owner health, and so on. We built configurable catalog columns that can display arbitrary metadata, numeric scales, links and even scorecard results, along with the ability for users to save filtered views as tabs.</p><p>Replicating this means building a more dynamic catalog UI and a way to define, persist and share those views. The alternative is endless “export to CSV and slice in a spreadsheet” workflows, which ultimately defeat the purpose of an IDP.</p><p>This was one of the most useful features we’ve built for Roadie, and there’s hundreds of hours of engineering effort behind the work.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5zzOyACsWyx8HggVSU42V8/57a55a69ea00e95b6ffbf6b0760bae1b/image-custom-tabs.png" alt="image-custom-tabs"><em>Custom tabs in action -  a must have for the more mature catalog</em></p><h3>Decorators and “glue of truth”</h3><p>Real organizations almost never have a single source of truth. Teams want to combine data from git, SSO, HR systems, cloud providers and ad hoc spreadsheets.</p><p>We built the Decorator as an entity decorator, and the Fragments API, which allow extra metadata to be stored in Roadie’s database and merged into entities at runtime, without changing the YAML. That is the foundation for things like business ownership, cost centre tags, custom maturity ratings, and so on. The Decorator works in the UI, while the Fragments API achieves the same thing programmatically.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/XFD5Pj3LJ5AIpuNnSOSOJ/206bf5277bfe26fc3741afa985c75a1b/image-decoraters.png" alt="image-decoraters"><em>Skip the YAML - decorate entities with metadata from within the Backstage UI</em></p><p>If you self host, you will need your own answer for how to enrich entities from multiple systems without forcing every change through a pull request. What we’ve learned from experience is that trying to batch annotate entities via organization-wide PRs is unlikely to succeed, hence, empowering the platform teams to decorate these entities at the IDP level. Factor on at least two  months of engineering investment to make the data model changes, build the logic and UI elements.</p><h3>Repositories, products and custom kinds</h3><p>Very quickly people want to catalogue more than just “services.” They want to represent repositories, shared libraries, products, data models, infrastructure resources and more.</p><p>We have added new kinds such as Repository and Product, plus guidance and tooling for extending the entity model safely. Doing this yourself means working with Backstage’s schema system, updating layouts and cards, and then thinking about how all of this will be queried and shown in search.</p><p>These are some of the more foundational and impactful changes, and are in the Backstage scheme of things, on the simpler side. Factor on 40-60 hours for the changes to the data model schema and building the custom processor.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1rxY4C9Hl4F7YRaftgm2Iv/3397fde63994c77e2d333f064af10575/image-catalog-graph.png" alt="image-catalog-graph"></p><p><em>Richer kinds unlock richer relationships: the catalog graph is one way to visualize how services, repos, and products connect</em></p><h3>Instant updates and completeness tracking</h3><p>The catalog is a living thing, and it’s important it stays fresh. As usage grows, you will want more control over how your catalog is refreshed, in an idempotent way, from your underlying systems. We use webhooks and APIs from SCM systems to trigger catalog refreshes in near real time, and we track catalog completeness using scorecards that measure ownership, labels and other metadata.</p><p>A self hosted team will need some equivalent if you want to trust the catalog as a real time view of your software. Factor on a month or more to get this done.</p><h2>3. Search: how people can actually find things</h2><p>Search is one of the most visible parts of a developer portal. If it feels off, people give up quickly. Open source Backstage has gradually improved search, but we found that large organizations quickly needed more than the out-of-the-box search experience.</p><h3>A real search engine</h3><p>We moved to OpenSearch for search, with analyzers that handle mixed case names, hyphenation, and partial matches. For example, engineers can type part of a service name, or a fragment of an API route, and still find what they need. This is not the case for out-of-the-box self-hosted Backstage.</p><p>That work required running and maintaining a search cluster, setting up indexers, and designing relevance rules. It also meant a pass over search UX, so that results are presented in a way people can actually work with. This can take anywhere from 2-3 months depending on the level of customization required.</p><h3>AI search</h3><p>A recently addition, but a critical one - we now make available our MCP tooling to a built-in AI assistant in the Roadie UI. This allows engineers to ask natural language questions about their catalog and have the answers displayed in Roadie. A step change from regular search, and a similar order of magnitude in terms of engineering effort. The real investment here was in the addition of MCP tooling, but factor on a couple of weeks at least here.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1mAd0XpJgO90SAw4Ckso3s/b9408eb993a4e6b3f9858b6ec7fe2c18/image-ai-search.png" alt="image-ai-search"></p><p><em>AI search uses the same MCP tooling to expose a richer, conversational search experience</em></p><h2>4. Plugins and integrations: owning the long tail</h2><p>Backstage’s plugin model is its greatest strength and one of its sharper edges.</p><p>Most large organisations end up using a long list of plugins: Argo CD, AWS, Azure DevOps, GitLab, Jira, Datadog, Sentry, SonarQube, security scanners, incident tools and more. Each of those brings its own authentication quirks, rate limits, API changes and version compatibility issues.</p><p>Over the last five years we have:</p><ul><li>Maintained and updated dozens of plugins when Backstage core moved forward or vendor APIs changed</li><li>Built new plugins such as AWS resource ingestion, Wiz security integration, LaunchDarkly enhancements, Shortcut integration and more</li><li>Created a secure connectivity pattern with Snyk using an open source broker to reach on prem systems without opening inbound access</li></ul><p>If you choose to self host, the integrations you rely on today will keep evolving. The work is less about “install plugin X once” and more about “own a small product surface for each plugin indefinitely.”</p><p>That is not a reason to avoid self hosting, but it is a cost you should be explicit about.</p><h2>5. Tech Insights and scorecards: turning Backstage into a governance tool</h2><p>Most organizations adopting Backstage eventually want more than a catalog. They want to use it to drive standards: security adoption, SLO coverage, migration progress, documentation quality, and so on.</p><p>Open source Tech Insights gives you some primitives, but it expects you to write a lot of the logic and UI yourself. We turned that into a full product, and it was a six-month engineering lift from a fairly substantially sized engineering team. Factor on at least the same for your own efforts to make Tech Insights into a fully-featured product. Here’s what we’ve built:</p><h3>A no code scorecard builder</h3><p>Platform teams can define checks and scorecards in a UI. Under the hood, data is pulled from SCMs, CI, security tools and other sources, stored as facts, and evaluated regularly. We ship a large library of built in checks so that people do not start from a blank page.</p><h3>Aggregation, history and reporting</h3><p>Results are rolled up by team and group, graphed over time, and shown either on a dedicated Scorecards section, and/or directly on entity pages and in the catalog. That lets you answer questions like “which team is lagging on SAST adoption” or “how has our documentation coverage changed over the last quarter.”</p><p>To replicate this yourself you will need to implement three things: a data ingestion layer, a rule engine, and a UI that surfaces all of it in Backstage. It is very doable, but it is also a multi month engineering effort.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6wOH9SGZ9JiICqnEGQjSwu/589b5d3545bc94ca4ddee94395920325/image-tech-insights.png" alt="image-tech-insights"><em>Tech Insights scorecards in action, turning standards into automated checks across your entire software catalog</em></p><h2>6. RBAC, security and governance: controlling who can do what</h2><p>Once you have real adoption, permissions move from “nice to have” to “non negotiable.”</p><p>Backstage’s permission framework is flexible, but it does not ship with a full RBAC system out of the box. Building a full RBAC product (roles, admin UI, policy engine, tokens) is typically a six month effort for a small two-person team. We had to build:</p><h3>Fine grained roles and permission policies</h3><p>We support custom roles, mapping from identity provider groups to roles, and a policy engine that can express rules like “owners of a service can edit it, others can only view” or “only this group can run these templates.” That is exposed through an admin UI rather than code.</p><h3>API tokens and service accounts</h3><p>To allow automation and external tools such as MCP clients, we added API token support tied into the same permission system. That means designing token issuance and revocation flows, and making sure tokens are not a back door around your RBAC rules.</p><h2>7. TechDocs: documentation without the pain</h2><p>Docs like code is one of Backstage’s most attractive features, but it can be surprisingly hard to operate in anger.</p><p>Problems tend to show up over time: builds that are slow or flaky, large monorepos that generate huge docs sites, Markdown features that people expect but are not configured, and so on.</p><p>We addressed this with:</p><h3>Hybrid build modes</h3><p>Teams can choose between on demand builds or CI based publishing to a shared bucket. For large or frequently accessed docs, CI publishing gives much better performance.</p><h3>Autodiscovery and webhooks</h3><p>When a docs folder changes in git, webhooks trigger a rebuild in the background so that by the time someone opens the page, the new version is already there.</p><h3>A curated MkDocs environment</h3><p>We ship a standard set of MkDocs plugins and extensions for diagrams, tabs, admonitions and monorepos. That saves teams from having to build their own MkDocs image and solve the compatibility problems that come with it.</p><p>None of this is conceptually complex, but if you skip it you can easily end up with TechDocs that are slow, unreliable, or underused.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4bUaZZqQ5sIUM41dVuXDCG/63864610c73b61b76e08c80b5ccfb095/image-techdocs.png" alt="image-techdocs"></p><p><em>TechDocs pages rendered in Backstage, powered by MkDocs and a build pipeline running behind the scenes</em></p><h2>8. Developer experience improvements</h2><p>Developers judge a portal by how it feels in day to day use.</p><p>A lot of our work has been on the small things: catalog table layout, sticky filters, configurable columns, a useful homepage, sensible defaults for layouts, certified templates, better error messages, and an admin area that is navigable when you have dozens of integrations.</p><p>On top of that we are experimenting with new ways to let teams extend the UI quickly, such as MDX based homepage cards that can fetch and present data without building a full plugin. The direction here is to give engineers “power tools” so they can shape the portal around their workflows.</p><p>If you self host, this is the kind of work that rarely makes it onto a roadmap, but that strongly influences whether people actually enjoy using Backstage.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/53L2pRJGvToqiFgNBeHEcY/d202560bae4495903bb6c0781918a14a/image-homepagecards.png" alt="image-homepagecards"></p><p><em>Homepage cards - a quick and easy way to expose information from internal and external APIs in Roadie</em></p><h2>9. AI and MCP: the next layer</h2><p>Finally, there is AI.</p><p>We touched on this under search - having an AI assistant that can answer questions about your documentation and catalog, and we have implemented Model Context Protocol servers so that external AI tools can safely talk to Roadie’s APIs. That allows agents in editors to discover templates, understand APIs and query scorecards, all through Backstage as a system of record.</p><p>You do not need AI on day one of a self hosted journey. It is, however, part of where the ecosystem is going. If you want Backstage to be a first class participant in your AI tooling, you will likely need to invest in similar capabilities: embeddings, vector storage, permission aware retrieval, and APIs tailored for LLM clients.</p><h2>10. Saving the best for last - upgrades</h2><p>If there is one topic that consistently surprises teams adopting Backstage, it is the upgrade burden.</p><p>Standing up Backstage for the first time is the easy part. Keeping it running, stable, secure, and compatible with the fast-moving upstream project is where the long-term work lives. Backstage releases frequently. Plugins evolve independently. Breaking changes land often and sometimes without much warning. And because Backstage is a framework rather than a product, the blast radius of every change is potentially wide.</p><h3><strong>Backstage moves fast</strong></h3><p>Backstage core typically publishes new releases every few weeks. These releases regularly include:</p><ul><li>Deprecations or removals of catalog processors or entity fields</li><li>Changes to authentication flows and permission boundaries</li><li>Scaffolder API changes</li><li>Search backend rewrites</li><li>Plugin architecture changes (backend plugin framework migration, for example)</li></ul><p>If you fall behind, the upgrade path compounds. A one-version bump is manageable. A six-month gap can become a multi-week project.</p><h3><strong>Plugins evolve at their own cadence</strong></h3><p>Each plugin (Datadog, Argo, GitHub, Jira, TechDocs, API Docs, AWS, Azure, Wiz, LaunchDarkly, and dozens more) sits on top of Backstage but is not coordinated with it. They evolve independently, and breaking changes in one can cascade across your installation.</p><p>A self-hosted team effectively ends up with a plugin garden, each with its own update cycle, bug behavior, API quirks, and upstream issues.</p><h3><strong>The systems we had to build to keep this sustainable</strong></h3><p>After years of running Backstage in production, we realized that upgrades needed as much engineering investment as performance or catalog work. We ended up building test suites that run Roadie’s full plugin surface against new Backstage versions before anything reaches customers. This catches breakages early and reliably.</p><p>We also never upgrade every tenant or environment at once. Rollouts happen gradually, with internal test tenants and rollback paths if something unexpected happens.</p><p>When something does break (and it often does) we have documented processes for diagnosing plugin failures, dependency mismatches, deprecations, and migration regressions, and clear steps for rolling forward or back safely. There’s also the experience that comes from having done this for years across multiple versions - hard won experience that allows our team to quickly triage and resolve highly-specific Backstage issues.</p><p>These systems collectively represent hundreds of hours of engineering time. They exist because without them, upgrades would regularly break real customer portals.</p><h3><strong>The hidden cost: this work never ends</strong></h3><p>This is the part people most often underestimate. Upgrades are not a one-time project. They are ongoing rent  Every new Backstage release, every new plugin version, every upstream API deprecation requires attention.</p><p>Self-hosting teams often start enthusiastically, then gradually slow down as the backlog of breaking changes grows. Eventually they end up frozen on an old version, unable to upgrade without major intervention.</p><h3><strong>Why this matters for your roadmap</strong></h3><p>If you plan to self-host Backstage, you need to view upgrades as a primary, recurring body of work, not a background task. The cost lives in:</p><ul><li>Testing every combination of plugins and Backstage core on every upgrade</li><li>Managing breaking changes across dozens of moving parts</li><li>Keeping your team aware of upstream changes and migration guides</li><li>Avoiding drift so upgrades remain tractable</li><li>Making sure your internal plugins and catalog processors don’t fall behind</li><li>Ensuring that upgrades don’t take down developer workflows</li></ul><p>This is not a warning; it’s a reality. Many teams can absolutely do this. But you should plan for it with eyes open. If there is one lesson from the <a href="https://roadie.io/blog/the-2025-state-of-backstage-report/">State of Backstage</a> report, it is this: upgrades are the number one pain point for self-hosters. Not performance. Not TechDocs. Not plugins. Upgrades.</p><h2>Bringing it together: a realistic roadmap</h2><p>Taken together, the list above is long. That is the point.</p><p>Self hosting Backstage is not simply about standing up a Node process and pointing it at your git repos. It is about:</p><ul><li>Operating a complex web application with many background jobs and external dependencies</li><li>Owning a growing set of plugins and integrations and keeping them healthy</li><li>Deciding how you will model your organisation and software, and then enforcing and evolving that model (we provide extensive engagement and support here through a structured onboarding and ongoing Customer Success motion)</li><li>Providing governance through scorecards and permissions, not slide decks and spreadsheets</li><li>Smoothing the user experience enough that engineers actually want to use the portal</li><li>Keeping up with upstream Backstage changes and broader trends such as AI</li><li>Upgrades. Friction around keeping your Backstage instance up-to-date with the upstream upgrades is a major, major source of pain for self-hosters.</li></ul><p>None of this is impossible. Many of our customers could build a lot of it themselves, or did, before migrating to Roadie. Hundreds or thousands of organizations are successfully hosting their own Backstage implementations.</p><p>The question is not “can you,” but “which pieces do you want to own, and how much time do you want to commit.” Every engineering hour spent on an upgrade, or an enhancement already offered by Roadie is an hour not spent on improving developer experience, or custom plugins that unlock real value for your internal users. It’s a question of opportunity cost.</p><p>Roadie is far more than a hosted wrapper for Backstage - we offer production-grade infrastructure and enhancements that address fundamental gaps in security, governance, performance and developer experience that organizations that want to self-host would inevitably need to replicate.</p><p>If you decide to continue your journey on self hosted Backstage, we hope this roadmap helps you plan that work with eyes open. And if you prefer to have someone else carry this complexity for you, that is the role Roadie continues to play: a production grade distribution of Backstage, shaped by years of learning across many organizations. Feel free to <a href="https://roadie.io/request-demo/">request a demo</a> anytime.</p>
]]></content:encoded></item><item><title><![CDATA[The Three Big Problems Every Platform Engineering Team Must Solve]]></title><link>https://roadie.io/blog/the-three-big-problems-every-platform-engineering-team-must-solve/</link><guid isPermaLink="false">https://roadie.io/blog/the-three-big-problems-every-platform-engineering-team-must-solve/</guid><pubDate>Tue, 13 Jan 2026 11:30:00 GMT</pubDate><description><![CDATA[Platform engineering teams hit the same breaking points as they scale. Learn the three core problems every team must solve — discoverability, self-service, and governance — and why internal developer portals exist to address them.]]></description><content:encoded><![CDATA[<p>I learned about platform engineering the hard way. At Workday, I was an infrastructure product manager building what was essentially a private AWS inside the company. We had virtualization platforms, logging systems, monitoring systems, the works.</p><p>Everything worked fine when we had 10 or 20 services. Then we hit 50. That's when people started asking questions we couldn't answer: What is all this stuff? Who owns which service? If something breaks at 2 AM, who do we call? How do we know everything is secure?</p><p>We tried a spreadsheet. It lasted about three months before it became hopelessly out of date. So we built a UI from scratch to list services, show where they were running, and let people create new services without going through us for everything.</p><p>That was 10 years ago. Today, this problem has a name, <a href="https://roadie.io/backstage-spotify/">internal developer portal</a>, and there's an entire market built around solving it. But after talking to hundreds of platform teams, I've found they all struggle with the same three fundamental problems.</p><h2>Problem 1: The Discoverability Crisis</h2><p>The discoverability problem shows up when your organization crosses a threshold. At 10 engineers, everyone knows what everyone else is working on. At 100+ engineers, it's chaos.</p><p>Here's what this looks like: Your security team builds a virus scanning service. Your mobile developers need to scan files for viruses, but they don't know this service exists. In an enterprise, they can't just sign up for some random SaaS tool because of compliance requirements. So they open a Slack thread, ask around, wait for responses, schedule a meeting, and maybe get an answer in a week.</p><p>Meanwhile, your platform team is getting bombarded: "Do we have an API for X?" "Who owns service Y?" "Which services depend on this database?"</p><p>The discoverability problem has three common variations:</p><p><strong>Documentation sprawl</strong>: Teams scatter docs across Confluence, GitHub wikis, Google Docs, and Notion. No one can find anything. You need to centralize documentation in a <a href="https://roadie.io/product/catalog/">software catalog</a> where people can actually discover it.</p><p><strong>Dependency mapping</strong>: You need to understand which services depend on each other. When you're planning to upgrade a database, you need to know what breaks. Without a dependency graph, you're operating blind.</p><p><strong>API discovery</strong>: Different teams build APIs, but there's no central place to see what exists, what they do, or how to use them. You need a searchable catalog of API specs.</p><p>This problem scales exponentially. We don't work with companies under 50 engineers because they don't face this yet. Most of our customers have 100+ engineers. At that scale, you can't just shout across the office anymore.</p><h2>Problem 2: The Self-Service Bottleneck</h2><p>Your platform team becomes a bottleneck. Developers want to get things done without opening a JIRA ticket and waiting days for someone to configure their network rules or provision an S3 bucket.</p><p>At most organizations, you're waiting days for basic requests. This creates two problems:</p><p>First, developers get frustrated and stop being productive. They're blocked on simple tasks that should take five minutes.</p><p>Second, they bypass your platform entirely. This is shadow IT. A mobile developer who doesn't know Terraform just goes into the AWS console and clicks "Create S3 Bucket" because they can't wait. Now you have untracked infrastructure that's not in your Terraform state, doesn't follow your naming conventions, and might not meet your security requirements.</p><p>The self-service problem shows up in two main ways:</p><p><strong>Project creation</strong>: A developer wants to start a new service on your platform. They need a repo, CI/CD pipeline, monitoring, logging, and all your platform integrations. If they configure this manually, they'll get it wrong. You want to give them a template that sets everything up correctly in five minutes, a <a href="https://roadie.io/product/scaffolder/">"golden path"</a>.</p><p><strong>Infrastructure requests</strong>: A developer needs an S3 bucket, a database, or network rules configured. They don't know Terraform and shouldn't have to learn it. You want to let them fill out a form that opens a templated pull request against your Terraform repository. Someone reviews it, merges it, and the infrastructure gets created. The request is tracked, the developer doesn't need specialized knowledge, and you maintain control.</p><p>Self-service doesn't mean "let developers do whatever they want." It means giving them fast, easy ways to do things the right way. You're creating guardrails, not removing them.</p><h2>Problem 3: The Governance Gap</h2><p>You have hundreds of services running on your platform. You need to know they're secure, reliable, and following your best practices. But you have no single place to check.</p><p>Here's a concrete example: You use <a href="https://incident.io/">Incident.io</a> for incident management. Every service should be registered in Incident.io with an on-call person assigned. How do you verify this? You could manually check each service, but that's impossible at scale. You could send a Slack message asking people to audit their services, but half won't respond.</p><p>The governance problem shows up in several ways:</p><p><strong>Security compliance</strong>: You need to verify that all services are scanning dependencies for vulnerabilities, using approved authentication methods, and following your security policies. Without automated checks, you're relying on self-reporting.</p><p><strong>Reliability standards</strong>: You need to know which services have proper monitoring, alerting, and on-call rotations. When an outage happens, you need to immediately know who to call.</p><p><strong>Best practices enforcement</strong>: Your platform team has defined standards, code review requirements, test coverage thresholds, documentation expectations. You need to see which teams are falling behind so you can work with them to improve.</p><p>The governance problem is about visibility rolled up across your org chart. You want to ask: "Show me the director of engineering who has the most services failing our security checks." Then you can work with that director to improve things.</p><p>Some teams also want <a href="https://dora.dev/">DORA metrics</a> (deployment frequency, lead time, change failure rate, mean time to recovery) visible in one place. This is harder than it sounds. One of the DORA metrics is deployment frequency, the more you deploy, the smaller your changes are, the less likely they cause issues. But what counts as a "deployment" when different teams use Argo CD, Netlify, and five other deployment tools? Some normalization has to happen before you can just look at a DORA metric. Teams aren't necessarily there yet, and they're expecting a bit of magic that isn't that simple.</p><p>The same applies to defining what counts as a "service." That's a hard question to answer, and it's one of the most important challenges when trying to get any developer portal working in your organization.</p><h2>Why This is Hard to Solve</h2><p>These three problems, discoverability, self-service, and governance, all stem from the same root cause: you have a lot of software, and you need organized metadata about it.</p><p>At Workday, we spent a lot of money building a custom solution. When I left in 2020, I talked to other companies and found they'd all built similar things.</p><p>The problem is that building this from scratch takes a year and requires a dedicated team to maintain. You're essentially building a product inside your company.</p><p>In 2020, <a href="https://backstage.io/">Spotify open sourced Backstage</a>, which gave the world a framework for building developer portals. But Backstage isn't a ready-to-use portal, it's a set of TypeScript libraries you use to build your own portal. This creates new problems:</p><p><strong>Language barrier</strong>: Platform teams typically work in Go, Python, or YAML. They don't know TypeScript, which is a web development language.</p><p><strong>Build time</strong>: Because you're building from libraries, not deploying a container, it takes six months to a year to get Backstage into production.</p><p><strong>Team requirements</strong>: We surveyed the Backstage community and found that teams who report being happy with self-hosted Backstage have at least three dedicated engineers. Large deployments have 12+ engineers working on Backstage full-time.</p><p><strong>Missing features</strong>: Backstage doesn't include basic features like <a href="https://roadie.io/product/access-control/">role-based access control</a> out of the box. The search runs on PostgreSQL full-text search, which is okay but not as good as Elasticsearch. Your search won't be great unless you manage an Elasticsearch cluster as well as your Backstage instance.</p><p>Getting Backstage takes a year and a team of five people. That's why the internal developer portal market exists.</p><h2>How Companies Choose Solutions</h2><p>When people come to us, they're typically in one of three situations:</p><p><strong>They already have Backstage</strong>: Someone stood it up at some point and people are using it. They're realizing it's a lot of effort and they don't want to staff that team of five people.</p><p><strong>They want Backstage specifically</strong>: They want a developer portal, like the idea of Backstage, and don't want to be locked into a proprietary data model. They want to customize their solution because they have legacy tools they need to integrate. But they don't want to staff the team around it.</p><p><strong>They just want a developer portal</strong>: They don't care if it's Backstage or not. In this case, it's more competitive between proprietary solutions and Backstage-based options.</p><p>For the first two groups, they're doing a <a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/">build versus buy evaluation</a>: how much will it cost us to build and maintain this versus how much will it cost to buy a managed solution?</p><p>For the third group, it's more of a feature comparison and proof-of-concept scoring across vendors.</p><h2>What You Actually Need</h2><p>Solving these three problems requires a few key capabilities:</p><p><strong>A software catalog</strong>: Your source of truth for what software exists, who owns it, and how it's configured. The catalog needs to integrate with your existing tools, your repos, your CI/CD, your cloud providers, so it stays up to date automatically.</p><p><strong>Self-service actions</strong>: Your developers need a UI for common tasks that generates the right pull requests, kicks off the right workflows, and follows your standards. This keeps them moving fast without bypassing your platform.</p><p><strong>Automated scoring</strong>: You need <a href="https://roadie.io/product/tech-insights/">automated checks</a> that run against everything in your catalog and tell you what's not meeting your standards. This gives you the visibility to work with teams on improvements.</p><p><strong>Easy onboarding</strong>: Getting services into the catalog can't require each team to manually register everything. You need <a href="https://roadie.io/docs/getting-started/autodiscovery/">automated ingestion</a> that pulls metadata from your existing systems. This is table stakes, but it's an area where Backstage is weak compared to proprietary competitors.</p><p>The challenge is that every organization has a slightly different definition of what counts as a "service" or a "deployment" or a "team." You need a solution that's flexible enough to adapt to your organization while being opinionated enough to actually work.</p><h2>The Path Forward</h2><p>These three problems, discoverability, self-service, and governance, get worse as your engineering organization grows. If you're at 100 engineers now, imagine what happens at 200 or 500. The chaos compounds.</p><p>You're not the first platform team to face these problems. Every company with more than 50 engineers hits them eventually. Software catalogs, self-service automation, and governance scoring aren't experimental anymore.</p><p>The decision you need to make is whether to <a href="https://roadie.io/backstage-comparison/">build or buy</a>. Building gives you complete control but requires significant investment. Buying gets you there faster but means accepting someone else's opinions about how things should work.</p><p>Whatever path you choose, the problems won't solve themselves. Your platform team is already overwhelmed with questions, your developers are already frustrated with bottlenecks, and your managers can't answer basic questions about what's running in production.</p><p>These problems only get worse with time. The earlier you address them, the easier they are to solve.</p><h2>Next Steps</h2><p>If you're ready to address these platform engineering challenges, here are some practical next steps to consider:</p><p><strong>Evaluate Backstage for your organization</strong>: Learn more about <a href="https://roadie.io/backstage-spotify/">what Backstage is and how it works</a> to understand if it's the right foundation for your developer portal needs.</p><p><strong>See a developer portal in action</strong>: <a href="https://roadie.io/request-demo/">Request a demo of Roadie</a> to see how a managed Backstage solution can solve your discoverability, self-service, and governance problems without the overhead of building and maintaining it yourself.</p><p><strong>Calculate your total cost of ownership</strong>: Use our guide on <a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/">how much Backstage really costs</a> to compare the build versus buy decision for your specific situation.</p><p><strong>Learn from teams who've solved these problems</strong>: Read our <a href="https://roadie.io/case-studies/">case studies</a> to see how companies like Contentful, Celonis, and others tackled similar platform engineering challenges.</p><p><strong>Start with a free trial</strong>: If you're ready to experiment, <a href="https://roadie.io/free-trial/">try Roadie free</a> to get hands-on experience with a fully managed developer portal built on Backstage.</p>
]]></content:encoded></item><item><title><![CDATA[Platform Engineering in 2026: Why DIY Is Dead]]></title><link>https://roadie.io/blog/platform-engineering-in-2026-why-diy-is-dead/</link><guid isPermaLink="false">https://roadie.io/blog/platform-engineering-in-2026-why-diy-is-dead/</guid><pubDate>Tue, 23 Dec 2025 09:00:00 GMT</pubDate><description><![CDATA[Platform engineering is entering its maturity phase. As IDPs and best practices standardize, building your own platform interface is becoming a costly distraction. We explore why DIY platform engineering is dead—and what winning teams do instead.]]></description><content:encoded><![CDATA[<p>The software industry loves a good pendulum swing. For decades, we watched as centralized IT teams controlled every aspect of infrastructure, then witnessed the dramatic DevOps revolution that pushed that responsibility directly onto developers. Now, as organizations grapple with the consequences of both extremes, Platform Engineering is being touted as the discipline that finally gets the balance right.</p><p><a href="https://www.gartner.com/en/infrastructure-and-it-operations-leaders/topics/platform-engineering">Gartner forecasts</a> that by 2026, 80% of large software engineering organizations will establish platform teams as internal providers of reusable services, components, and tools for application delivery, up from 45% in 2022. The <a href="https://www.cncf.io/blog/2024/11/15/internal-developer-platforms-at-scale-with-the-certified-backstage-associate-cba-certification/">CNCF Backstage project</a> now boasts <a href="https://thenewstack.io/five-years-in-backstage-is-just-getting-started/">over 3,400 adopters</a> worldwide. What started as a bunch of internal teams hacking together their own tools has turned into a mature discipline with established best practices, dedicated conferences, and a rapidly consolidating technology landscape.</p><p>For VPs of Engineering and Platform Engineering leads who have moved beyond the "why do we need this" phase, the question now is: <strong>how do we mature this practice without getting stuck in endless portal maintenance?</strong></p><h2>The Evolution of Platform Engineering</h2><p>To understand where we are going, we must briefly understand how we got here. We have moved from the "Ticket Queue" era (where centralized IT provided stability but strangled velocity) to the "DevOps Revolution," where developers gained speed but drowned in infrastructure complexity.</p><p>Platform Engineering represents the synthesis of these two extremes. It centralizes complexity without removing autonomy. Platform teams build and maintain the underlying infrastructure, but they expose it through self-service interfaces. This allows developers to move quickly without needing to master every implementation detail, effectively treating the platform as a product with developers as customers.</p><h2>From Infrastructure to Interfaces</h2><p>Early platform engineering efforts focused heavily on infrastructure primitives. Teams invested enormous energy in standardizing <a href="https://roadie.io/docs/integrations/kubernetes/">Kubernetes deployments</a>, building CI/CD pipelines, and creating Infrastructure as Code templates. These were necessary foundations, but they were not sufficient for driving developer adoption.</p><p>The modern Platform Engineering conversation has shifted decisively toward Developer Experience. The infrastructure layer still matters, but what sets teams apart is how they present that infrastructure to developers. That's why Internal Developer Portals (IDPs) have become popular.</p><h2>The Rise of the IDP</h2><p>An IDP is the storefront of your platform. It gives developers a unified interface to discover services, access documentation, spin up new projects from templates, and view their deployments. Without this interface layer, even the most sophisticated platform remains opaque and underutilized.</p><p>While various tools exist, the market has overwhelmingly converged on a single standard. <a href="https://newsletter.getdx.com/p/backstage-and-the-developer-portal-market">Recent analysis</a> indicates Backstage holds approximately 89% market share among organizations that have adopted an IDP. Originally developed by Spotify and now a CNCF project, it has moved from early experimentation to essential infrastructure.</p><p>This dominance is reflected in the project's momentum. Backstage now boasts <a href="https://www.cncf.io/blog/2024/11/15/internal-developer-platforms-at-scale-with-the-certified-backstage-associate-cba-certification/">over 270 public adopters</a>, including global brands like LinkedIn, CVS Health, and Vodafone. It was also the <a href="https://backstage.io/blog/2024/12/18/backstage-wrapped-2024/">top CNCF project by end-user commits</a> and the fourth most contributed-to CNCF project in 2024, trailing only infrastructure giants like Kubernetes, OpenTelemetry, and Argo.</p><h2>Platform as a Product: From Philosophy to Practice</h2><p>The shift to "Platform as a Product" marks a critical maturity point in Platform Engineering. Instead of mandating tools, modern platform teams treat developers as customers, using a competitive dynamic to force a relentless focus on value delivery.</p><p>This approach manifests in two concrete practices:</p><h3>Golden Paths</h3><p>Rather than offering infinite flexibility, platform teams curate Golden Paths, opinionated, well-supported pathways for common tasks. These paths come with excellent documentation, proven templates, and integrated tooling. Developers can deviate when necessary, but the "Golden Path" represents the path of least resistance and highest support.</p><h3>Measuring Success</h3><p>Success measurement has fundamentally changed. Uptime is no longer the only metric that matters. Leading teams now track impact using:</p><ul><li><strong>Adoption rates:</strong> Are developers voluntarily choosing the platform?</li><li><strong>Time-to-hello-world:</strong> How fast can a new engineer deploy code?</li><li><strong>DORA metrics:</strong> Tracking deployment frequency and lead time for changes.</li><li><strong>Satisfaction scores:</strong> Using frameworks like SPACE to measure developer sentiment.</li></ul><p>For example, <a href="https://backstage.io/docs/overview/adopting/">Spotify reported</a> that their time-to-tenth-pull-request metric for new developers dropped by 55% after deploying Backstage.</p><h2>The Tooling Landscape: Consolidation and Standardization</h2><p>The Platform Engineering technology stack has matured into defined categories. The infrastructure stack is well-defined: cloud providers (AWS, GCP, Azure) at the bottom, Kubernetes for orchestration, and Terraform for Infrastructure as Code. Increasingly, Backstage or a commercial derivative serves as the interface layer on top. <a href="https://www.gartner.com/en/documents/6586902">Gartner's Hype Cycle for Platform Engineering</a> now tracks dozens of technologies across this stack, reflecting the maturity of the space.</p><p>This consolidation is creating a dilemma for the interface layer: Build vs. Buy. Organizations must choose between self-hosting Backstage, purchasing a commercial offering, or using a managed Backstage solution.</p><p>Self-hosting Backstage provides maximum flexibility but comes with significant costs. <a href="https://platformengineering.org/blog/platform-engineering-predictions-for-2025">Industry observers report</a> common pitfalls including:</p><ul><li><strong>Long time-to-value:</strong> Teams often spend 6-12 months on setup, with complex implementations extending to 18+ months.</li><li><strong>Maintenance burden:</strong> The plugin architecture requires continuous maintenance, and breaking changes in recent releases have created upgrade challenges.</li><li><strong>Low adoption:</strong><a href="https://thenewstack.io/spotifys-backstage-roadmap-aims-to-speed-up-adoption/">Organizations outside of Spotify struggle with adoption</a>, with average internal rates hovering around 10%—often because teams burn out on maintenance before delivering features developers actually want.</li></ul><p>This is not a failing of Backstage itself, but rather a reflection of the reality that <a href="https://roadie.io/blog/from-day-0-to-day-2-a-guide-to-planning-and-implementing-backstage/">building and maintaining a production-quality developer portal</a> requires dedicated ongoing investment. Many organizations discover that maintaining the portal consumes so much of their platform team's capacity that they never get to building the unique platform capabilities that would actually differentiate their developer experience.</p><p>Commercial and managed offerings address this challenge. <a href="https://roadie.io/blog/backstage-alternatives/">Several alternatives exist</a> in the IDP market, each with different approaches and strengths:</p><ul><li>Solutions like <a href="https://roadie.io/">Roadie</a> provide Backstage as a service, eliminating the operational overhead while preserving the extensibility and ecosystem compatibility that make Backstage attractive.</li><li><a href="https://www.redhat.com/en/about/press-releases/red-hat-accelerates-internal-developer-portal-adoption-latest-version-red-hat-developer-hub">Red Hat Developer Hub</a> offers an enterprise-grade alternative for organizations already invested in the Red Hat ecosystem.</li><li>Platforms like Port, Cortex, and OpsLevel provide different architectural approaches to the IDP challenge, while Humanitec focuses on platform orchestration.</li></ul><p>These approaches allow platform teams to skip directly from concept to value delivery, focusing their energy on the platform logic and Golden Paths that are unique to their organization rather than on maintaining commodity infrastructure.</p><h2>The Future: AI and the Intelligent Platform</h2><p>The integration of AI into IDPs is the next frontier for Platform Engineering. <a href="https://www.gartner.com/en/newsroom/press-releases/2025-07-01-gartner-identifies-the-top-strategic-trends-in-software-engineering-for-2025-and-beyond">Gartner predicts</a> that by 2028, 90% of enterprise software engineers will use AI code assistants, up from less than 14% in early 2024. The role of developers is shifting from implementation to orchestration, focusing on problem-solving and system design. Early implementations are already showing promising results, and the trajectory suggests fundamental changes to how developers interact with their platforms.</p><h3>Model Context Protocol</h3><p>The <a href="https://roadie.io/blog/announcing-the-roadie-mcp/">Model Context Protocol (MCP)</a>, introduced by Anthropic and rapidly gaining adoption, provides a standardized approach for connecting AI systems with data sources and tools. Platform teams are beginning to expose their capabilities through MCP servers, enabling developers to interact with their platforms using natural language through AI assistants. Rather than navigating a web interface to provision a new service, a developer can simply describe what they need in conversation with an AI agent that understands the organization's platform capabilities.</p><h3>Agentic AI</h3><p><a href="https://roadie.io/blog/ai-is-showing-up-in-developer-portals/">Agentic AI</a> takes this further. Instead of passively waiting for developer requests, intelligent platform agents can proactively identify issues, suggest optimizations, and implement routine fixes autonomously. This shifts the platform team's role from building tools to defining the rules and organizational knowledge that enable these agents to operate safely.</p><p>These capabilities are not speculative. Organizations are already deploying AI-powered observability that automatically analyzes logs and flags anomalies before they impact production. <a href="https://roadie.io/product/scaffolder/">Scaffolding workflows</a> are becoming increasingly intelligent, generating not just boilerplate code but production-ready configurations tailored to the specific context. Documentation is being synthesized and queried through natural language interfaces rather than searched manually.</p><h3>The Prerequisites for AI-Powered Platforms</h3><p>The platform teams who will thrive in this environment are those who have already built strong foundational platforms. AI agents need reliable, well-documented APIs to interact with. They need accurate <a href="https://roadie.io/product/catalog/">software catalogs</a> to understand system relationships. They need established Golden Paths that encode organizational best practices. Organizations that have invested in mature Internal Developer Portals have the infrastructure in place to adopt AI capabilities rapidly.</p><h2>The End of DIY</h2><p>Platform Engineering is entering its "boring" phase, and this is exactly what success looks like. The fundamental patterns are established, and the technology choices are converging.</p><p>The teams achieving the best outcomes have recognized that maintaining commodity infrastructure is not a competitive advantage. They buy or use managed solutions for the interface layer, freeing their platform engineers to focus on the unique Golden Paths and integrations that actually differentiate their developer experience.</p><p>Building a developer portal is not the same as building a platform. The portal is the interface; the platform is the substance behind it. Organizations that conflate the two often find themselves with impressive storefronts and empty shelves. The path forward is to source the interface from a specialist and focus your energy on the substance.</p><h2>Next Steps</h2><p>If you're ready to move from concept to implementation, here are concrete next steps you can take:</p><p><strong>Evaluate Your Build vs. Buy Decision:</strong> Before committing significant engineering resources to building and maintaining your own Backstage instance, consider the <a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/">true cost of self-hosting</a>. Understanding the full scope of ongoing maintenance, upgrades, and support requirements will help you make an informed decision.</p><p><strong>Start with a Software Catalog:</strong> The foundation of any successful Internal Developer Portal is a comprehensive <a href="https://roadie.io/product/catalog/">software catalog</a>. Begin by modeling your existing services, APIs, and resources. Learn about <a href="https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/">effective strategies for creating a complete catalog</a> to ensure discoverability across your organization.</p><p><strong>Implement Self-Service Templates:</strong> Reduce friction and standardize best practices by deploying <a href="https://roadie.io/product/scaffolder/">software templates</a> that encode your Golden Paths. This accelerates onboarding and ensures consistency across your engineering organization.</p><p><strong>Define Engineering Standards with Tech Insights:</strong> Move beyond anecdotal evidence and establish <a href="https://roadie.io/blog/how-to-define-engineering-standards/">measurable engineering standards</a> using data-driven scorecards. This helps you track compliance, identify gaps, and demonstrate continuous improvement.</p><p><strong>Plan Your Adoption Strategy:</strong> Technical implementation is only half the battle. Develop a comprehensive <a href="https://roadie.io/blog/the-adoption-journey-initiatives-and-strategies/">adoption strategy</a> that addresses organizational change management, stakeholder engagement, and measuring success metrics that matter to your leadership team.</p><p><strong>Explore Managed Backstage Options:</strong> If you want to skip the lengthy setup and ongoing maintenance burden, <a href="https://roadie.io/free-trial/">try Roadie</a> to get a production-ready Backstage instance deployed in minutes rather than months, allowing your team to focus on building unique platform capabilities instead of maintaining infrastructure.</p><p><strong>Stop building the tool and start building the platform. Get a production-ready Backstage instance today with <a href="https://roadie.io/request-demo/">Roadie</a>.</strong></p>
]]></content:encoded></item><item><title><![CDATA[7 Best Developer Portals for Enterprise Engineering Teams]]></title><link>https://roadie.io/blog/7-best-developer-portals-for-enterprise-engineering-teams/</link><guid isPermaLink="false">https://roadie.io/blog/7-best-developer-portals-for-enterprise-engineering-teams/</guid><pubDate>Tue, 16 Dec 2025 14:00:00 GMT</pubDate><description><![CDATA[Choosing a developer portal is a multi-year, multi-million dollar decision. This guide compares the 7 best enterprise IDPs, breaking down cost, maintenance, lock-in, and scalability.]]></description><content:encoded><![CDATA[<p>Your platform team built an internal developer portal. Or maybe you bought one. Either way, you spent $2 million and a year of engineering time. Sound familiar?</p><p>Here's the uncomfortable truth about <a href="https://roadie.io/blog/developer-portals-are-a-superpower/">internal developer portals</a> in 2025: the market has fundamentally matured, and the old "build versus buy" calculus no longer applies. Companies that chose self-hosted Backstage discovered that "free and open-source" means 3-12 dedicated engineers. Organizations that bought proprietary platforms found themselves locked into data models they can't migrate away from. And those who picked the wrong solution are now facing a painful rip-and-replace.</p><p>The IDP landscape has evolved into three distinct approaches, each with radically different total cost of ownership:</p><ul><li><strong>Build</strong>: Self-hosted Backstage installations offering maximum flexibility at the cost of $1M+ annual operational overhead.</li><li><strong>Buy</strong>: Proprietary SaaS platforms like Cortex and Port with polished interfaces but permanent vendor lock-in.</li><li><strong>Hybrid</strong>: Managed Backstage solutions delivering open-source ecosystem benefits without the engineering tax.</li></ul><p>This guide evaluates the seven platforms that enterprise engineering leaders are actually deploying at scale, focusing on the strategic tradeoffs that matter three years after your initial decision, when you discover whether you made the right choice.</p><h2>What Makes a Great Enterprise Developer Portal?</h2><p>Before diving into specific platforms, let's establish the evaluation criteria that separate enterprise-ready IDPs from tools that work well in demos but fail in production.</p><h3>Ecosystem and Extensibility</h3><p>Can the platform integrate with your entire toolchain, or are you limited to what the vendor supports? At enterprise scale, you're likely using 50+ different tools across CI/CD, monitoring, security, and cloud infrastructure. The difference between supporting 20 integrations versus 250+ becomes critical when half your value comes from having everything in one place.</p><h3>Vendor Lock-In Risk</h3><p>If you decide to move away from the platform in two years, what happens to your data model? Open-source-based solutions like <a href="https://roadie.io/backstage-spotify/">Backstage</a> use standardized YAML entity definitions that you can export and migrate. Proprietary platforms often use custom data models that trap your organizational knowledge inside their systems.</p><h3>Maintenance Overhead</h3><p>Does running the platform require a dedicated team, or can your existing platform engineers manage it alongside their other responsibilities? Self-hosted solutions can consume 3-5 full-time engineers just for maintenance, upgrades, and troubleshooting. This is what we call the "TypeScript tax," the hidden cost of maintaining frontend infrastructure that most DevOps teams aren't equipped to handle.</p><h3>Enterprise Readiness</h3><p>Does the platform provide <a href="https://roadie.io/product/access-control/">role-based access control</a> (RBAC), single sign-on (SSO), and SOC2 compliance out of the box, or do you need to build these capabilities yourself? For regulated industries or companies with strict security requirements, these aren't nice-to-haves, they're table stakes.</p><h3>Day 2 Operations</h3><p>Look beyond the initial setup. How difficult is it to upgrade the platform when breaking changes occur? How do you handle search infrastructure at scale? What happens when you need to migrate to a new backend system? The platforms that look easiest on day one often become maintenance nightmares on day 700.</p><hr><h2>1. Roadie: The Backstage Platform for Enterprises</h2><p><strong>Category</strong>: Hybrid (Managed Backstage)</p><p><strong>Best For</strong>: Teams that want Backstage's ecosystem without the operational burden</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6lmXocsOuSd6aOTTg8d1aa/773f95ba5b99cb40279cf0d83aa9be5b/image2.png" alt="Roadie"></p><p><a href="https://roadie.io/">Roadie</a> delivers the full power of Spotify's open-source <a href="https://roadie.io/backstage-spotify/">Backstage platform</a> as a managed SaaS service. This hybrid approach gives you access to the entire Backstage ecosystem, 211 <a href="https://roadie.io/backstage/plugins/">open-source Backstage plugins</a>, standardized data models, and active open-source development, while eliminating the maintenance overhead that typically requires 3+ dedicated engineers.</p><p>The platform handles all infrastructure concerns: hosting, security patches, database management, enterprise-grade search, and complex upgrades like the <a href="https://roadie.io/blog/migrating-to-backstages-new-backend-a-step-by-step-guide/">New Backend System migration</a> that challenged self-hosted installations throughout 2024. Your team focuses on configuring integrations and building workflows, not on TypeScript debugging or React component updates.</p><p><strong>Key Features</strong>:</p><ul><li>Minimal-maintenance Backstage platform with automated upgrades</li><li>Access to entire open-source plugin ecosystem (211 plugins, 82 supported out-of-the-box)</li><li>Built-in <a href="https://roadie.io/product/tech-insights/">Tech Insights</a> for scorecards and engineering standards (paid add-on)</li><li>Enterprise <a href="https://roadie.io/product/access-control/">RBAC</a> (basic RBAC in Teams plan, custom RBAC in Growth plan)</li><li>No vendor lock-in, data model is standard Backstage YAML</li></ul><p><strong>Pros</strong>:</p><ul><li><strong>No TypeScript Tax</strong>: Platform engineering teams can focus on platform capabilities rather than <a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/">maintaining TypeScript and React frontends</a>.</li><li><strong>Open Ecosystem</strong>: Any <a href="https://roadie.io/backstage/plugins/">Backstage plugin</a> works, including community-developed integrations.</li><li><strong>Automatic Migrations</strong>: Complex upgrades like the New Backend System transition happen automatically.</li><li><strong>Faster Time-to-Value</strong>: Most customers <a href="https://roadie.io/blog/from-day-0-to-day-2-a-guide-to-planning-and-implementing-backstage/">see value within weeks</a>, not months.</li><li><strong>Standard Data Model</strong>: Your <a href="https://roadie.io/docs/catalog/modeling-entities/">catalog definitions</a> are portable YAML files, not proprietary formats.</li></ul><p><strong>Cons</strong>:</p><ul><li><strong>Backstage UI Constraints</strong>: Less drag-and-drop flexibility compared to tools like Port, though more structured.</li><li><strong>Requires SaaS Comfort</strong>: While Roadie offers secure connectivity options (like the <a href="https://roadie.io/docs/integrations/broker/">Roadie Broker</a>) for on-premises resources, it is primarily a hosted service.</li></ul><p><strong>Pricing</strong>: Teams plan starts at $24/developer/month with a 50-seat minimum (50-150 developers). Growth plan pricing is custom with a 100-seat minimum (100+ developers). Only active contributors to your source control management (SCM) incur costs, non-coding team members like product managers and leadership can access for free. Tech Insights is an optional paid add-on. <a href="https://roadie.io/pricing/">View pricing details</a>.</p><h2>2. Cortex: The Engineering Metrics Specialist</h2><p><strong>Category</strong>: Proprietary SaaS</p><p><strong>Best For</strong>: Organizations obsessed with service maturity scorecards and reliability metrics</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/njTzWK2RKpPAK6SUh9ZML/3dc641c8284af97ac4cc4ba2cb0dbef9/image4.png" alt="Cortex"></p><p><a href="https://www.cortex.io/">Cortex</a> offers a polished, opinionated developer portal focused heavily on measuring and improving service quality through scorecards and engineering metrics. The platform excels at gamifying service ownership with detailed maturity models, SLO tracking, and automated scoring based on engineering best practices using Bronze/Silver/Gold levels and point-based systems.</p><p>The UI feels modern and intuitive, particularly for teams familiar with SaaS tools like Datadog or PagerDuty. Cortex's scorecard system is more sophisticated than most alternatives, offering fine-grained control over scoring criteria with flexible rule definitions and excellent visualization of engineering standards compliance across your organization.</p><p><strong>Key Features</strong>:</p><ul><li>Advanced scorecard system with customizable rubrics using Bronze/Silver/Gold levels and point-based scoring</li><li>Strong reliability engineering focus (SLOs, incidents, on-call)</li><li>Polished, modern UI optimized for service discovery</li><li>AI-powered features including Ownership Prediction and Velocity Dashboard for DORA metrics</li><li>60+ out-of-the-box integrations with major monitoring and development tools</li></ul><p><strong>Pros</strong>:</p><ul><li><strong>Scorecard Sophistication</strong>: Best-in-class service maturity tracking with Bronze/Silver/Gold level visualization and detailed point-based scoring.</li><li><strong>Beautiful UI</strong>: Modern design that impresses stakeholders.</li><li><strong>Strong Reliability Focus</strong>: Excellent for teams prioritizing SRE practices.</li><li><strong>Fast Initial Setup</strong>: Can get basic catalog running quickly.</li><li><strong>AI Integration</strong>: New AI features for ownership prediction and metrics analysis.</li></ul><p><strong>Cons</strong>:</p><ul><li><strong>High Price Point</strong>: Known for being expensive at enterprise scale, especially compared to Backstage-based alternatives.</li><li><strong>Proprietary Lock-In</strong>: Data model is Cortex-specific, making migration difficult.</li><li><strong>Limited Ecosystem</strong>: Rely on Cortex to build integrations, can't leverage community plugins.</li><li><strong>Scaffolding Limitations</strong>: Template/workflow capabilities lag behind <a href="https://roadie.io/product/scaffolder/">Backstage's Software Templates</a>.</li></ul><p><strong>Pricing</strong>: Not publicly disclosed; requires signing up for a demo to receive pricing information. However, the <a href="https://tei.forrester.com/go/Cortex/IDP/?lang=en-us">Forrester Total Economic Impact study</a> from July 2024 lists pricing at approximately $65/user/month at scale. Multiple tiers available (Engineering Intelligence, Accelerate, Full IDP, Site License) with features scaling from basic catalog and scorecards to full platform capabilities with AI-powered features. <a href="https://www.cortex.io/pricing">Request pricing</a>.</p><h2>3. Port: The No-Code Builder's Platform</h2><p><strong>Category</strong>: Proprietary SaaS</p><p><strong>Best For</strong>: Teams that need to model non-standard assets or want maximum UI customization</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/23UiJYWqhSthUKK1loK2o9/ebec972a84806fd85c21648910f4808d/image5.png" alt="Port"></p><p><a href="https://www.port.io/">Port</a> takes a radically different approach: instead of providing an opinionated developer portal, it gives you building blocks to create your own. The platform's "no-code" interface lets you define custom data models (called Blueprints) for any asset type, not just services and APIs, but also environments, IoT devices, or cloud resources.</p><p>This flexibility makes Port uniquely suited for organizations with complex, non-standard infrastructure that doesn't fit typical service catalog patterns. You can build custom views, define relationships between any entity types, and create workflows that match your exact processes. Port has recently rebranded as an "Agentic Internal Developer Portal" with enhanced AI capabilities.</p><p><strong>Key Features</strong>:</p><ul><li>Fully customizable data models and UI views via Blueprints</li><li>No-code interface for defining entities and relationships</li><li>Strong visualization capabilities for complex systems</li><li>Self-service actions using Cookiecutter templates</li><li>50+ integrations including DORA metrics tracking</li><li>AI agent capabilities and Engineering360 dashboard</li></ul><p><strong>Pros</strong>:</p><ul><li><strong>Ultimate Flexibility</strong>: Model anything your organization needs, not just traditional services.</li><li><strong>Custom Views</strong>: Build exactly the interface your teams need.</li><li><strong>Good for Complex Infrastructure</strong>: Excellent for multi-cloud, hybrid environments with diverse asset types.</li><li><strong>Visual Workflow Builder</strong>: Create automation without writing code.</li></ul><p><strong>Cons</strong>:</p><ul><li><strong>Blank Slate Problem</strong>: Maximum flexibility means you build everything from scratch; less out-of-the-box value.</li><li><strong>Proprietary Ecosystem</strong>: Cannot leverage the Backstage open-source plugin ecosystem, though the Ocean Framework provides extensibility through data integrations and workflow automation.</li><li><strong>Weaker Documentation Features</strong>: While it supports Markdown, it lacks the full <a href="https://roadie.io/docs/getting-started/technical-documentation/">TechDocs</a> build pipeline and search capabilities found in Backstage.</li><li><strong>Steeper Learning Curve</strong>: Teams need time to master the data modeling concepts.</li></ul><p><strong>Pricing</strong>: Free tier available (up to 15 seats, 10,000 entities). Startup tier at $30/developer/month. Enterprise tier available with premium features including SSO, advanced RBAC, ISO 27001 and SOC2 Type 2 certifications, and dedicated support. <a href="https://www.port.io/pricing">View pricing details</a>.</p><h2>4. OpsLevel: The Service Maturity Tracker</h2><p><strong>Category</strong>: Proprietary SaaS</p><p><strong>Best For</strong>: Teams focused primarily on service ownership and maturity tracking
<img src="//images.ctfassets.net/hcqpbvoqhwhm/4zIcUIyWcjHdedrZCyHWzy/581079d535a27592023a5a1d65f6ce4e/image8.png" alt="OpsLevel"></p><p><a href="https://www.opslevel.com/">OpsLevel</a> started as a service maturity and ownership tool before evolving into a broader developer portal. This heritage shows in its excellent service ownership features and straightforward approach to tracking engineering standards compliance through its "Rubric" system with Bronze/Silver/Gold levels, plus separate Scorecards for team-specific standards.</p><p>The platform offers the fastest time-to-initial-value for basic service cataloging, with typical deployments completing in 30-45 days. You can have a working catalog with ownership information and basic checks running within hours. However, this simplicity comes at the cost of extensibility, OpsLevel's feature set is more constrained than platforms built on extensible architectures. Recent additions include AI-powered features for check generation and catalog enrichment.</p><p><strong>Key Features</strong>:</p><ul><li>Fast setup for basic service catalog (typical 30-45 day deployment)</li><li>Strong focus on service ownership and maturity rubrics</li><li>Good integration with CI/CD systems for automated checks (60+ integrations)</li><li>AI-generated checks and AI-enriched catalog</li><li>Package version inventories for SBOM visibility</li><li>Clean, straightforward UI</li></ul><p><strong>Pros</strong>:</p><ul><li><strong>Quickest Setup</strong>: Get a basic catalog running faster than any alternative.</li><li><strong>Service Ownership Focus</strong>: Excellent for clarifying who owns what.</li><li><strong>Maturity Rubrics</strong>: Good built-in templates for measuring service quality.</li><li><strong>Intuitive Interface</strong>: Less complex than more feature-rich platforms.</li><li><strong>AI Integration</strong>: New AI features streamline catalog management.</li></ul><p><strong>Cons</strong>:</p><ul><li><strong>Limited Extensibility</strong>: Can't add community plugins or build custom integrations easily.</li><li><strong>Rigid UI</strong>: Less flexible than Port or Backstage for customization.</li><li><strong>Narrower Feature Set</strong>: Primarily focused on cataloging and checks, less emphasis on docs or scaffolding.</li><li><strong>Proprietary Lock-In</strong>: Migration path unclear if you outgrow the platform.</li></ul><p><strong>Pricing</strong>: Not publicly disclosed, pricing based on team size with custom quotes. Per-developer pricing model with volume discounts available. Includes SOC2 Type 2 compliance and SAML-based SSO. Pricing customizable based on needs including self-hosted options and support levels. <a href="https://www.opslevel.com/pricing">Request pricing</a>.</p><h2>5. Atlassian Compass: The Jira Companion</h2><p><strong>Category</strong>: Proprietary SaaS</p><p><strong>Best For</strong>: Organizations deeply invested in the Atlassian ecosystem
<img src="//images.ctfassets.net/hcqpbvoqhwhm/Sakhng7268r7KFqQwySXD/f493382aeca2d643b59f6cf1db7207eb/image6.png" alt="Compass"></p><p>If your company runs on Jira, Bitbucket, and Confluence, <a href="https://www.atlassian.com/software/compass">Compass</a> offers the most seamless native integration you'll find. The platform leverages Atlassian's identity system, pulls in data from other Atlassian products automatically, and feels like a natural extension of your existing toolchain.</p><p>Compass provides automated service health monitoring, tracking metrics from integrated tools and surfacing problems before they escalate. For teams already paying for Atlassian products, Compass represents an incremental cost with minimal integration effort. The platform has scorecards with a new "Maturity Levels" feature added in 2025.</p><p><strong>Important Note</strong>: Atlassian deprecated the Templates/scaffolding feature on December 1, 2025. This represents a significant capability reduction for teams requiring self-service service creation.</p><p><strong>Key Features</strong>:</p><ul><li>Native integration with Jira, Bitbucket, Confluence, and Opsgenie</li><li>Automated service health monitoring</li><li>Component tracking with Atlassian-native data models</li><li>Integrated incident management through Opsgenie</li><li>Scorecards with Maturity Levels feature</li><li>Built on Atlassian Forge with GraphQL APIs for extensibility</li></ul><p><strong>Pros</strong>:</p><ul><li><strong>Atlassian Integration</strong>: Unbeatable if you use Jira for everything.</li><li><strong>Familiar UX</strong>: Feels consistent with other Atlassian products.</li><li><strong>Automated Health Monitoring</strong>: Good automated health checks.</li><li><strong>Standalone or Bundled</strong>: Available as standalone product or with some enterprise packages.</li></ul><p><strong>Cons</strong>:</p><ul><li><strong>The Atlassian Trap</strong>: Struggles with non-Atlassian tools like <a href="https://roadie.io/docs/integrations/github-discovery/">GitHub</a>, <a href="https://roadie.io/docs/integrations/gitlab/">GitLab</a>, or CircleCI.</li><li><strong>Scaffolding Removed</strong>: Templates feature deprecated December 1, 2025, no longer available for service creation.</li><li><strong>Proprietary Ecosystem</strong>: Can't extend with community plugins.</li><li><strong>Not a Full IDP</strong>: More of a service catalog with add-ons than a complete platform interface.</li></ul><p><strong>Pricing</strong>: Free tier available (3 full users, unlimited basic users). Standard tier at $8/user/month includes basic features. Premium tier at $25/user/month includes IP Allowlisting, advanced integrations, 99.9% uptime SLA, and premium support. Discounted rates available for teams above 101 users. Compass is a standalone product with separate billing from other Atlassian tools. <a href="https://www.atlassian.com/software/compass/pricing">View pricing details</a>.</p><h2>6. Backstage (Self-Hosted)</h2><p><strong>Category</strong>: Open Source (Self-Hosted)</p><p><strong>Best For</strong>: Large enterprises with dedicated platform engineering teams and specific compliance requirements
<img src="//images.ctfassets.net/hcqpbvoqhwhm/4En3HTl4YiAGrBHe7LJjTt/b46df4f2b3009fa27cba3c993e7deaf5/image1.png" alt="Backstage"></p><p>Spotify's <a href="https://backstage.io/">Backstage</a> is the industry-standard open-source developer portal framework. It powers IDPs at Spotify, American Airlines, Pinterest, and thousands of other organizations. The platform offers ultimate flexibility: you own the code, control the infrastructure, and can customize anything.</p><p>But this flexibility comes with significant operational costs. Self-hosting Backstage requires 3-5 dedicated engineers to manage infrastructure, handle upgrades, maintain search systems, and keep up with the rapidly evolving codebase. Roadie's survey of the Backstage community found that users reporting satisfaction with self-hosted deployments had at least three engineers dedicated full-time, with some companies staffing teams of 12 people just for Backstage. Breaking changes occur regularly with monthly releases, and major migrations like the <a href="https://roadie.io/blog/migrating-to-backstages-new-backend-a-step-by-step-guide/">New Backend System</a> transition that completed in 2024 consumed months of engineering time for self-hosted teams.</p><p>For perspective on the true cost of building similar developer portals, <a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/">Zalando invested over $4 million</a> across four years developing their internal platform before open-sourcing their work as part of Backstage.</p><p><strong>Key Features</strong>:</p><ul><li>Fully open-source with Apache 2.0 license</li><li>250+ <a href="https://roadie.io/backstage/plugins/">community plugins</a> covering every major tool</li><li>Extensible architecture for <a href="https://roadie.io/docs/custom-plugins/overview/">custom plugins</a> and integrations</li><li>Active community and regular monthly releases</li><li>CNCF Incubating project with strong enterprise adoption</li></ul><p><strong>Pros</strong>:</p><ul><li><strong>Ultimate Control</strong>: You own everything and can customize without limits.</li><li><strong>No License Costs</strong>: Free to download and use.</li><li><strong>Massive Ecosystem</strong>: Largest community and plugin library (250+ plugins).</li><li><strong>No Vendor Dependency</strong>: Run anywhere, modify anything.</li><li><strong>Industry Standard</strong>: Backed by CNCF and major enterprises.</li></ul><p><strong>Cons</strong>:</p><ul><li><strong>Operational Overhead</strong>: Requires 3-5 FTE engineers minimum for successful deployments, some teams reach 12 FTEs.</li><li><strong>The TypeScript Tax</strong>: Most DevOps teams lack frontend skills for React/TypeScript customization.</li><li><strong>Upgrade Complexity</strong>: Monthly breaking changes and major migrations (like New Backend System) require significant engineering investment.</li><li><strong>Day 2 Operational Burden</strong>: You manage databases, search infrastructure (Elasticsearch), monitoring, and security patches.</li><li><strong>Hidden Costs</strong>: The real cost when factoring in engineering time, opportunity cost, and infrastructure can reach millions of dollars. At typical senior platform engineer compensation ($250K/year fully loaded), 3-5 engineers cost $750K-$1.25M annually, plus infrastructure costs ($12K-$24K/year). TCO typically exceeds $2M+ over three years when including engineering time and opportunity cost.</li></ul><p><strong>Pricing</strong>: Free (open source under Apache 2.0 license). However, total cost of ownership includes 3-5 engineer salaries ($750K-$1.25M+ annually at senior platform engineer compensation of $250K/year fully loaded) plus infrastructure costs ($12K-$24K+ annually for hosting, databases, and search infrastructure). TCO typically exceeds $2M+ over three years when factoring in engineering time, opportunity cost, and infrastructure.</p><h2>7. Configure8: The Universal Catalog Platform</h2><p><strong>Category</strong>: Proprietary SaaS</p><p><strong>Best For</strong>: Organizations prioritizing discovery and cost analytics</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3msywkJ5nfMhWOlE04Zbas/6dc081a1da5747722e475b0e6903ee2e/image7.png" alt="Configure8"></p><p><a href="https://configure8-io.webflow.io/">Configure8</a> positions itself as a "universal catalog" that can ingest and relate data from virtually any source. The platform emphasizes discovery features, helping teams understand what they have and how it's interconnected. It also offers strong cloud cost integration, surfacing spending data alongside technical resources.</p><p>While Configure8 has solid core features including 30+ integrations and workflow-based Self-Serve Actions, its smaller market presence and proprietary nature make it a riskier choice than platforms with larger ecosystems or open-source foundations.</p><p><strong>Key Features</strong>:</p><ul><li>Universal catalog supporting diverse asset types</li><li>Strong discovery and search capabilities</li><li>Cloud cost analytics integration</li><li>Relationship mapping across resources</li><li>Workflow-based Self-Serve Actions</li><li>Available as SaaS or on-premises deployment</li></ul><p><strong>Pros</strong>:</p><ul><li><strong>Comprehensive Discovery</strong>: Good at helping teams understand what exists.</li><li><strong>Cost Integration</strong>: Unique focus on cloud spending alongside technical resources.</li><li><strong>Multi-Source Ingestion</strong>: Can pull data from many systems.</li><li><strong>Deployment Flexibility</strong>: Offers both SaaS and on-prem options.</li></ul><p><strong>Cons</strong>:</p><ul><li><strong>Smaller Ecosystem</strong>: Less community support and fewer integrations (30+) than larger platforms.</li><li><strong>Proprietary Lock-In</strong>: Data model is Configure8-specific.</li><li><strong>Limited Track Record</strong>: Fewer public case studies and enterprise deployments than alternatives.</li><li><strong>Market Position Uncertainty</strong>: Smaller player in a competitive market.</li></ul><p><strong>Pricing</strong>: Free tier available (up to 10 users for scorecards). Paid tiers available with SOC2 certification and RBAC features. Enterprise pricing available with additional features and volume discounts. Available as both SaaS and on-premises deployment. Contact Configure8 for detailed pricing information. <a href="https://configure8-io.webflow.io/pricing">View pricing page</a>.</p><h2>Comparison: At a Glance</h2><p>| Platform | Foundation | Maintenance | Ecosystem Size | Lock-In Risk | Enterprise Features |
| :--- | :--- | :--- | :--- | :--- | :--- |
| <strong>Roadie</strong> | Open Source (Backstage) | Minimal (Managed) | 211 plugins | Low (standard YAML) | RBAC, SSO, SOC2 Day 1 |
| <strong>Cortex</strong> | Proprietary | Minimal (SaaS) | 60+ integrations | High (proprietary) | Strong |
| <strong>Port</strong> | Proprietary | Minimal (SaaS) | 50+ integrations | High (proprietary) | Good |
| <strong>OpsLevel</strong> | Proprietary | Minimal (SaaS) | 60+ integrations | High (proprietary) | Basic |
| <strong>Compass</strong> | Proprietary | Minimal (SaaS) | Atlassian-centric | High (proprietary) | Good (if Atlassian) |
| <strong>Backstage</strong> | Open Source | High (3-12 engineers) | 250+ plugins | None | DIY |
| <strong>Configure8</strong> | Proprietary | Minimal (SaaS) | 30+ integrations | High (proprietary) | Moderate |</p><h2>How to Choose the Right Developer Portal</h2><p>The right IDP depends on your organization's constraints, technical culture, and platform engineering maturity:</p><ul><li><strong>Choose Self-Hosted Backstage if</strong> you have a team of 10+ platform engineers, unlimited budget, and specific requirements that absolutely cannot be met by managed solutions. You're willing to invest significant engineering time in maintenance and customization for maximum control. Be prepared for 3-12 dedicated engineers and $2M+ total cost of ownership over three years.</li><li><strong>Choose Cortex or Port if</strong> you value polished UI above all else, don't mind proprietary lock-in, and want specific workflow capabilities their platforms emphasize. Budget for potentially higher costs at scale (~$65/user/month for Cortex, ~$30+/user/month for Port enterprise tiers).</li><li><strong>Choose OpsLevel if</strong> you need a basic service catalog immediately and your primary use case is <a href="https://roadie.io/docs/catalog/ownership/">tracking ownership</a> and maturity, not building complex workflows or maintaining extensive documentation.</li><li><strong>Choose Compass if</strong> you live entirely in the Atlassian ecosystem and can accept its limitations with non-Atlassian tools. Note that scaffolding/templates were deprecated December 1, 2025. The integration efficiency may outweigh the platform's constraints if you're already in the Atlassian ecosystem.</li><li><strong>Choose Roadie if</strong> you want the industry standard (Backstage) with its entire ecosystem and community support, but you value your engineers' time too much to spend it on infrastructure maintenance. It's the "golden path" for enterprises that want modern platform capabilities without the platform tax or the $2M+ cost of building from scratch.</li></ul><p>The key question isn't which platform has the most features, it's which platform lets your engineers focus on building platform capabilities instead of maintaining platform infrastructure. At the 150+ engineer scale where IDPs become critical, that distinction determines whether your portal becomes a force multiplier or just another thing to maintain, potentially at a cost exceeding $2M over three years if you go the self-hosted route.</p><h2>Next Steps</h2><p>Ready to implement a developer portal for your organization? Here are some practical next steps to guide your journey:</p><p><strong>Evaluate Roadie with a Free Trial</strong>: Experience the power of managed Backstage firsthand. <a href="https://roadie.io/free-trial/">Start your free trial</a> to explore Roadie's features, integrations, and see how quickly you can get value without the operational overhead.</p><p><strong>Learn from Customer Success Stories</strong>: See how other enterprise teams have successfully implemented developer portals. Read our <a href="https://roadie.io/case-studies/">case studies</a> to understand real-world adoption strategies, challenges overcome, and measurable results achieved.</p><p><strong>Dive Deeper into Backstage</strong>: If you're new to Backstage or want to understand its capabilities better, explore our comprehensive <a href="https://roadie.io/backstage-spotify/">guide to Backstage</a> and learn why it's become the industry standard for developer portals.</p><p><strong>Plan Your Implementation</strong>: Moving from evaluation to production requires careful planning. Our guide on <a href="https://roadie.io/blog/from-day-0-to-day-2-a-guide-to-planning-and-implementing-backstage/">planning and implementing Backstage from Day 0 to Day 2</a> provides a roadmap for successful adoption.</p><p><strong>Compare Your Options</strong>: Still weighing self-hosted versus managed solutions? Read our detailed <a href="https://roadie.io/backstage-comparison/">comparison of managed vs. self-hosted Backstage</a> to understand the true total cost of ownership.</p><p><strong>Schedule a Demo</strong>: See Roadie in action with your specific use cases. <a href="https://roadie.io/request-demo/">Request a personalized demo</a> to discuss your requirements, explore integrations, and understand how Roadie can accelerate your platform engineering initiatives.</p>
]]></content:encoded></item><item><title><![CDATA[From a Spreadsheet and a $2M Bill: Why We Built Roadie]]></title><link>https://roadie.io/blog/from-a-spreadsheet-and-a-usd2m-bill-why-we-built-roadie/</link><guid isPermaLink="false">https://roadie.io/blog/from-a-spreadsheet-and-a-usd2m-bill-why-we-built-roadie/</guid><pubDate>Thu, 11 Dec 2025 11:00:00 GMT</pubDate><description><![CDATA[David Tuite shares the origin story of Roadie. Discover why building an Internal Developer Portal from scratch is a $2M trap, and how managed Backstage solves it.]]></description><content:encoded><![CDATA[<p>I was an infrastructure product manager at a large enterprise SaaS company when I first ran into the problem that would eventually lead to Roadie. We were building out what was essentially a private AWS inside the company, virtualization platforms, logging systems, monitoring infrastructure, the works. It was going well. Maybe too well.</p><p>Teams kept deploying more stuff onto our platform. The security team had services running. There was a virus scanning team. Dozens of other teams I didn't even know existed were all running their software on our infrastructure. At around 50 services, people started asking questions we couldn't answer.</p><p>Who owns this thing? If it goes down at 2am, who do I call? Is it secure? How many resources should we allocate to it?</p><h2>The Spreadsheet Solution</h2><p>To help answer these questions, we started with what everyone starts with: a spreadsheet. One column for service names, another for owners, another for what it does.</p><p>It lasted about three weeks before it became useless. Nobody updated it. The information got stale. We had services running in production that weren't in the spreadsheet and entries for services that had been decommissioned months ago.</p><p>So we did what any self-respecting infrastructure team would do, we built a custom solution.</p><h2>The $2 Million Developer Portal</h2><p>We built a UI that could list all the services, show where they were running, and let people create new services with the click of a button. You'd fill out a form, and you'd get a repo with code that would run perfectly on our platform. You could configure your networking rules without going through ten Slack messages trying to find the right person. It was genuinely useful.</p><p>When I look back at the team size and time involved, I estimate we spent about $2 million USD building that thing from scratch. That isn't an exaggeration; it’s standard math when you factor in the salaries of the engineers, product managers, and designers required to build and maintain an internal product over several years.</p><p>And here's the thing: when I talked to people at other companies, they'd all built some version of the same solution to the same problem.</p><h2>The Turning Point for Developer Tools</h2><p>Fast forward to 2020. I'm leaving that role, and I can't stop thinking about this problem. Companies are clearly willing to spend millions solving it.</p><p>Around the same time, <a href="https://roadie.io/backstage-spotify/">Spotify open-sourced Backstage</a>, their <a href="https://roadie.io/blog/developer-portals-are-a-superpower/">internal developer portal</a> that they'd been using since 2016. Suddenly, there were two approaches:</p><ul><li><strong>Proprietary solutions</strong> like Cortex and OpsLevel were building closed-source developer portals and selling them as SaaS products. Quick to set up, but you're locked into their data model and their way of doing things.</li><li><strong>Open-source Backstage</strong> gave you maximum flexibility. You could customize everything. But it came with a catch.</li></ul><h2>The Hidden Cost of "Free"</h2><p>Here's what people don't realize about Backstage until they're deep into implementation: <em>it's not a developer portal</em>. It's a <a href="https://roadie.io/backstage-spotify/">framework for building developer portals</a>. It's TypeScript libraries that you use to construct your own solution.</p><p>That's a problem for platform teams because most of them don't know TypeScript. They know Go and YAML and infrastructure-as-code. Not React and frontend frameworks.</p><p>When we <a href="https://roadie.io/blog/backstage-alternatives/">surveyed the Backstage community</a>, we found something interesting: people who reported being happy with their self-hosted Backstage deployment had at least three engineers dedicated to it full-time. Some companies had teams of 12 people just working on Backstage.</p><p>Think about that. You're trying to solve the developer productivity problem, and you need to staff an entire team to maintain your solution. That's not solving the problem, that's just moving it around.</p><p>And keep in mind, there are features Backstage doesn't give you out of the box. <a href="https://roadie.io/docs/details/permissions/">Role-based access control</a>? Build it yourself. Better search than Postgres full-text? Run and maintain your own Elasticsearch cluster.</p><p>It takes about a year to get a self-hosted Backstage instance into production. A year and a team of five people. Or you could go with a proprietary solution and be live in weeks, but give up all the flexibility and customization that made Backstage so attractive in the first place.</p><h2>The Best of Both Worlds</h2><p>We built Roadie to give people the power of Backstage's open-source ecosystem, all those plugins, the community contributions, and the flexibility, without requiring them to staff a team around it.</p><p>When you use Roadie, you get access to the <a href="https://roadie.io/backstage/plugins/">hundreds of open-source plugins</a> that the Backstage community maintains. Every time someone contributes an improvement to the Backstage core, you get it automatically. You're part of this massive ecosystem that's constantly getting better.</p><p>But you don't need a team of TypeScript engineers. You don't need to spend six months setting it up. You log in on day one and start using it.</p><p>We handle all the infrastructure, all the upgrades, all the features that Backstage doesn't include by default. We've added <a href="https://roadie.io/product/access-control/">role-based access control</a>, better search, and <a href="https://roadie.io/product/tech-insights/">scorecards for tracking standards</a>.</p><h2>Who Comes to Roadie and What Problems We Solve</h2><p>Here's what we see when companies come to us. They're usually in one of three situations:</p><p><strong>They already have Backstage.</strong> Someone got excited about it, stood up an instance, and now they're realizing it's eating up way more engineering time than they expected. They want to flip it to Roadie without starting over.</p><p><strong>They want Backstage specifically.</strong> They love the open-source ecosystem and the flexibility, but they don't have the bandwidth to build and maintain it themselves. They're doing the build-versus-buy calculation and realizing Roadie is cheaper than five full-time engineers.</p><p><strong>They just want a developer portal.</strong> They don't particularly care if it's Backstage or something else. They have the discoverability problem—too many services, nobody knows who owns what—and they need to solve it.</p><p>For the first two groups, the decision is easy. For the third group, it comes down to whether they value flexibility and avoiding vendor lock-in. If they do, <a href="/backstage-comparison/">Backstage through Roadie</a> makes sense. If they want something more opinionated and are okay with proprietary data models, other solutions might work.</p><p>These companies are trying to solve three main problems:</p><ul><li><strong>Discoverability.</strong> This is the main complaint we see, and the same problem we had when I frist worked on developer portals. "We have too much stuff and nobody knows what any of it is." This extends to <a href="https://roadie.io/product/documentation/">centralizing documentation</a>, mapping service dependencies, <a href="https://roadie.io/docs/details/openapi-specs/">cataloging API specs</a>. It all falls under the umbrella of making it possible to find things.</li><li><strong>Self-service automation.</strong> Platform teams want developers to be able to get things done without opening JIRA tickets and waiting days. Need to create a new service? There should be a button for that. Need an S3 bucket? Fill out a form that generates a Terraform pull request instead of waiting for someone to do it manually. Our <a href="https://roadie.io/product/scaffolder/">software templates feature</a> handles this.</li><li><strong>Governance and standards.</strong> Leadership wants to know: is our software secure? Is it reliable? Are teams following our best practices? They want automated checks—is this service properly configured in our incident management tool? Does it have someone on-call? Are there critical security vulnerabilities we haven't addressed? <a href="https://roadie.io/docs/tech-insights/introduction/">Tech Insights</a> makes this possible.</li></ul><p>A developer portal doesn't magically solve these problems. But it gives you a place to tackle them systematically instead of through a patchwork of spreadsheets, Slack messages, and tribal knowledge.</p><h2>What No Tool Can Fix</h2><p>The biggest mistake I see companies make is thinking this is just a technical problem. They'll come to me and say "We want DORA metrics in our portal" and I'll ask "What's a deployment in your organization?" and then we discover that half their teams deploy through ArgoCD, some use Netlify, and others have their own custom pipeline.</p><p>You can't measure deployment frequency across your organization until you agree on what a deployment is. No tool can solve that for you, not Backstage, not Roadie, not anyone. That's an organizational alignment problem disguised as a technical one.</p><p>The same thing happens with service catalogs. "Show me all our services" sounds simple until someone asks "What's a service?" and three engineering managers give you three different answers.</p><p>Developer portals give you a forcing function to solve these problems and a place to encode the answers once you've figured them out.</p><h2>Where Roadie's Headed</h2><p>Two things are taking up most of our roadmap energy right now.</p><h3>AI Integration</h3><p>When you think about the questions people want answered, "Who's the engineering manager for the virus scanning service?" or "Which services depend on our authentication API?", they're perfect for conversational interfaces. You should be able to type that into a chat and get an answer. We're building that.</p><h3>Automated Ingestion</h3><p>This is less sexy but maybe more important. Right now, getting data into Backstage is too manual. You have to go to each team and say "put your stuff in the catalog" and that doesn't scale. We need to <a href="https://roadie.io/docs/getting-started/autodiscovery/">automatically discover and ingest</a> services, infrastructure, and dependencies. We're making Backstage better at this than the proprietary competitors, which is crucial for wider adoption.</p><h2>The Hybrid Approach</h2><p>Here's how I explain Roadie to people: self-hosted Backstage sucks because you need a team of five engineers. Proprietary developer portals suck because you're locked into their data model and limited customization.</p><p>Roadie gives you the best of both. You get the massive open-source ecosystem that's constantly improving through community contributions. But you don't need to staff a team around it. You don't need to spend a year bringing it into production.</p><p>We built Roadie because we lived through building it from scratch. We know exactly how much it costs and how long it takes. And we knew there had to be a better way.</p><p><a href="https://roadie.io/request-demo/">Request a demo</a> or <a href="https://roadie.io/free-trial/">start a free trial</a> to see how it works for yourself. You can also check out this case study to see how <a href="https://roadie.io/case-studies/why-celonis-switched-from-selfhosted-backstage-to-roadie/">Celonis switched from self-hosted Backstage to Roadie</a> and the results they achieved.</p>
]]></content:encoded></item><item><title><![CDATA[Backstage: How much does it really cost?]]></title><link>https://roadie.io/blog/backstage-how-much-does-it-really-cost/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-how-much-does-it-really-cost/</guid><pubDate>Thu, 04 Dec 2025 04:00:00 GMT</pubDate><description><![CDATA[Backstage promises a low-cost, open-source solution for building an Internal Developer Portal, but the true true cost of ownership can be surprisingly high. We explore how hidden costs like setup, customization, and ongoing maintenance often make Backstage more complex and costly than you may think...]]></description><content:encoded><![CDATA[<p>"Open source is free like a puppy, not free like a beer."</p><p>This adage has never been truer than it is for Spotify's Backstage in 2025. While the repository is free to clone, turning that code into a secure, scalable, and adopted <a href="https://roadie.io/backstage-spotify/">Internal Developer Portal</a> (IDP) is a massive financial and operational undertaking.</p><p>For engineering leaders, the question isn't "Can we build this?" As Roadie CEO David Tuite highlighted at <a href="https://youtu.be/3_esw4UfYC4?si=ht7xlv6Hc_Q0Tsr5&#x26;t=450">BackstageCon</a>, the real question is: "Should we spend millions of dollars a year to maintain this?"</p><p>In this guide, we break down the real math behind self-hosting <a href="https://roadie.io/backstage-spotify/">Backstage</a>. We look beyond the initial setup and examine the deep technical challenges, from the misalignment of skills on your platform team to the relentless pace of upgrades, that make "free" software incredibly expensive.</p><h2>1. The Headcount Reality</h2><p>Before discussing technology, we must look at the human capital required. Backstage is not a tool you install; it is a product you build and maintain.</p><p>Our research into the <a href="https://roadie.io/blog/the-2025-state-of-backstage-report/">State of Backstage</a> found that organizations satisfied with their self-hosted setup typically dedicate 3 to 12 full-time engineers to the platform.</p><p>If you assign a single engineer to "figure it out," they will spend 100% of their time on maintenance—dependency updates, security patches, and keeping the lights on—leaving zero capacity for feature development or driving adoption.</p><h2>2. The "TypeScript Tax": A Fundamental Skill Mismatch</h2><p>Beyond the number of heads, there is the issue of the <em>type</em> of skills required. Backstage is not a binary you configure with YAML; it is a set of libraries written in TypeScript and React that your engineers can combine together to build an IDP.</p><p>This presents a critical problem for most infrastructure and platform teams.</p><p>The DevOps Skill Set: Your platform engineers are likely experts in Go, Python, Terraform, and Kubernetes. They live in the terminal and think in terms of infrastructure-as-code.</p><p>The Backstage Requirement: To <a href="https://roadie.io/blog/the-power-of-customization-making-backstage-work-for-you-with-roadie/">customize Backstage</a>, you need to write React components, debug frontend hooks, manage CSS themes, and handle complex Node.js build pipelines.</p><p>The Reality: When you ask a Senior Cloud Engineer to center a <code>&#x3C;div></code> or debug a React context error, you are not only misusing their expensive skillset, but you are also setting them up for frustration. Even if they can technically write the code needed to customize Backstage, these engineers won't have the user experience skills required to produce a portal that your teams love to use.</p><p>To run Backstage successfully, you essentially need to hire Frontend Platform Engineers, a niche and expensive role, or constantly borrow time from product teams who would rather be shipping features.</p><h2>3. The Hidden Infrastructure Bill</h2><p>Getting Backstage running on a laptop takes ten minutes. Getting it ready for the enterprise takes months. When you choose to self-host, you take ownership of a distributed system that requires significant "plumbing."</p><h3>Authentication &#x26; Security</h3><p>Backstage provides the framework, but you build the doors.</p><p><strong>The Work:</strong> You must manually register OAuth applications (Okta, Google, GitHub), configure backend resolvers, handle session cookies, and manage CORS policies.</p><p><strong>The Risk:</strong> You are responsible for vulnerability management. Backstage relies on hundreds of Node.js dependencies. When a security advisory comes out for a nested dependency, you are the one who has to patch, rebuild, and redeploy.</p><h3>The TechDocs Pipeline</h3><p><a href="https://roadie.io/docs/getting-started/technical-documentation/">TechDocs</a> (docs-like-code) is a flagship feature, but it requires external infrastructure to work at scale.</p><p><strong>The Work:</strong> It is not "plug and play." You must provision cloud storage (AWS S3, GCS), configure IAM roles, and build a dedicated CI/CD pipeline to generate static HTML from Markdown and push it to storage.</p><h3>Search Infrastructure</h3><p>The default search engine is basic. For a production-grade experience, you need to manage a dedicated Elasticsearch or OpenSearch cluster, adding another layer of cost and maintenance.</p><h2>4. The Maintenance Treadmill (Day 2 Operations)</h2><p>Backstage moves fast. According to the <a href="https://roadie.io/blog/the-2025-state-of-backstage-report/">2025 State of Backstage report</a>, 56% of adopters cite upgrades as their biggest pain point.</p><p>Unlike other infrastructure tools where an upgrade is often just bumping a version number in a config file, upgrading Backstage often requires code changes.</p><p><strong>Breaking Changes:</strong> You will frequently need to refactor your App.tsx or backend wiring to accommodate changes in the core APIs.</p><p>The New Backend &#x26; Frontend Systems: In 2024, Backstage introduced a <a href="https://roadie.io/blog/migrating-to-backstages-new-backend-a-step-by-step-guide/">completely new backend architecture</a> followed closely by a <a href="https://backstage.io/docs/frontend-system/architecture/index">new frontend system</a>. Teams self-hosting Backstage are currently spending months refactoring their entire plugin ecosystem and UI logic to migrate to this new system.</p><p><strong>Plugin Rot:</strong> Open-source <a href="https://roadie.io/backstage/plugins/">plugins</a> are often abandoned. If a community plugin you rely on breaks, you become the maintainer.</p><h2>5. The Financial Math: Calculating TCO</h2><p>Now, let's translate that technical effort into dollars.</p><h3>The Cost of Waiting (Time to Value)</h3><p>First, consider the cost of <em>not</em> having an IDP while you build it. Self-hosted implementations typically take 6 to 12 months to reach production. That is a year of paying salaries without seeing value. In contrast, managed solutions like Roadie typically go live in under a month, accelerating your ROI significantly.</p><h3>The Hard Costs </h3><p>The single largest cost of self-hosting is Headcount.</p><p> You are building an internal product team. Based on data from hundreds of implementations, a "minimum viable" self-hosted setup requires:</p><ul><li>Year 1 (Build &#x26; Launch): 3 Full-Time Engineers (FTEs).</li><li>Year 2+ (Maintenance): 2 FTEs to handle upgrades, migrations, and support.</li></ul><p>The 2025 Math: According to 2025 market data, the fully loaded cost (salary, equity, benefits) of a Senior Platform Engineer in top tech hubs averages $125,000.</p><p>| <strong>Cost Category</strong>                                      | <strong>Year 1 Cost (DIY)</strong> | <strong>Recurring Annual Cost (DIY)</strong> |
|--------------------------------------------------------|-------------------------|----------------------------------|
| Engineering Salaries (3 FTEs Year 1, 2 FTEs Year 2)    | $750,000                | $500,000                         |
| Cloud Infrastructure (Hosting, DB, Elasticsearch)      | $12,000                 | $12,000                          |
| Maintenance &#x26; Upgrades (20% of time)                   | Included in Salary      | Included in Salary               |
| <strong>Total TCO</strong>                                          | <strong>$762,000</strong>            | <strong>$512,000</strong>                     |</p><h2>6. Comparison: Self-Hosted Backstage vs. Roadie</h2><p><a href="https://roadie.io/">Roadie</a> provides the exact same underlying technology (Backstage) but removes the technical implementation layer.</p><p>| <strong>Feature / Requirement</strong> | <strong>Self-Hosted Backstage</strong>             | <strong>Roadie (Managed)</strong>                               |
|---------------------------|----------------------------------------|----------------------------------------------------|
| Time to Value             | 6 to 12 Months                         | &#x3C; 1 Month                                          |
| Required Skills           | TypeScript, React, Node.js             | No Code / Config Only                              |
| Team Required             | 2–3 Full-Time Engineers                | Part-time Admin                                    |
| Upgrades                  | Manual Code Refactoring                | Automated / Handled by Roadie                      |
| Search Engine             | Manage your own Elasticsearch          | Included &#x26; Optimized                               |
| Security / SOC2           | You handle the audit                   | <a href="https://roadie.io/blog/soc2-compliance/">SOC2 Type 2 Certified</a> |</p><h2>7. The Opportunity Cost</h2><p>This is the cost that doesn't show up on a spreadsheet, but it's the one that kills momentum.</p><p>If your best Platform Engineers are spending their weeks debugging React components, upgrading Node.js versions, or fixing broken TechDocs pipelines, they are not building your platform.</p><p>They aren't optimizing your Kubernetes clusters, they aren't improving your CI/CD pipelines, and they aren't helping product teams ship faster. They are maintaining a CMS tool.</p><h2>Conclusion: Buy the Outcome, Don't Build the Tool</h2><p>Backstage is the standard for IDPs, but "free" software is expensive.</p><p>Roadie allows you to buy the outcome, a centralized, adopted, and secure developer portal, without paying the tax of building and maintaining it yourself.</p><p>Ready to see how Roadie can transform your developer experience? <a href="https://roadie.io/request-demo/">Contact us</a> for a personalized demo or <a href="https://roadie.io/free-trial/">start your free trial</a> today.</p><h2>Next Steps</h2><p>If you're evaluating whether to self-host Backstage or use a managed solution like Roadie, here are some recommended resources to guide your decision:</p><ul><li><a href="https://roadie.io/docs/getting-started/overview/">Getting Started with Roadie</a> - Learn how to quickly set up your Internal Developer Portal with minimal overhead and start seeing value in days instead of months.</li><li><a href="https://roadie.io/docs/integrations/">Explore Backstage Plugins &#x26; Integrations</a> - Discover the 80+ pre-configured plugins and integrations that Roadie supports out of the box, eliminating weeks of setup and configuration work.</li><li><a href="https://roadie.io/product/tech-insights/">Tech Insights: Scorecards for Engineering Standards</a> - Understand how to define and track engineering standards across your organization without building custom tooling.</li><li><a href="https://roadie.io/docs/getting-started/scaffolding-components/">Software Templates (Scaffolder)</a> - Learn how to create golden paths to production that embed best practices and accelerate service creation.</li></ul>
]]></content:encoded></item><item><title><![CDATA[The Best Backstage Alternatives: The 2026 Buyer's Guide]]></title><link>https://roadie.io/blog/backstage-alternatives/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-alternatives/</guid><pubDate>Thu, 27 Nov 2025 10:05:00 GMT</pubDate><description><![CDATA[Backstage isn’t for everyone, and that’s okay - for some software engineering teams Backstage may just not be a good fit. From small teams that thrive on shared knowledge and simplicity, to organizations with clean, modern tech stacks that don’t face legacy complexity, we break down where Backstage might not shine. Plus, we highlight Backstage alternatives that can meet your needs until size, scale or complexity demand a more robust solution.]]></description><content:encoded><![CDATA[<p>Spotify’s <a href="https://roadie.io/backstage-spotify/">Backstage</a> defined the Internal Developer Portal (IDP) category. It proved that a central hub for services, documentation, and tooling could transform the developer experience. But as many engineering leaders discover, it is not a one-size-fits-all solution.</p><p>If you are looking for alternatives, you have likely realized that adopting Backstage is a major engineering undertaking. It requires a dedicated team, TypeScript expertise, and months of setup. For many teams, the operational overhead simply outweighs the value.</p><p>However, the problem Backstage attempts to solve is unavoidable. As organizations grow, they inevitably hit a ceiling known as the <a href="https://en.wikipedia.org/wiki/Dunbar&#x27;s_number">Dunbar Number Effect</a>. Anthropologist Robin Dunbar proposed that humans can only maintain a stable social network of about 150 people. Beyond this number, tribal knowledge evaporates, ownership becomes unclear, and the informal communication channels that worked for a startup turn into chaos.</p><p>You need a system to solve this chaos—a "system of record" for your engineering team. But you don't necessarily need the complexity of self-hosted Backstage to get it.</p><p>This guide covers the best Backstage alternatives for 2026. We will outline the three paths you can take, Build, Buy, or Hybrid, and help you find the right solution to conquer the Dunbar Number without drowning in maintenance work.</p><h2>The Three Paths to an IDP: Build, Buy, or Hybrid</h2><p>Before we dive into the specific tools, it's crucial to understand the three fundamental approaches you can take:</p><ol><li><strong>Build (Self-Hosted Backstage)</strong>: You take the open-source Backstage project and dedicate a team of engineers to build, customize, and maintain your own portal. This offers ultimate flexibility but comes with significant headcount and operational costs.</li><li><strong>Buy (Proprietary IDPs)</strong>: You purchase a SaaS solution from a vendor like Cortex or Port. These platforms are often quick to set up and feature-rich, but they lock you into a proprietary data model and ecosystem.</li><li><strong>Hybrid (Managed Backstage)</strong>: You use a service like Roadie, which handles the hosting, maintenance, and enterprise-grade features for Backstage. This gives you the power of the open-source ecosystem without the operational burden, offering a "best of both worlds" approach.</li></ol><h2>Backstage Alternatives: At a Glance</h2><p>| Tool                | Core Technology       | Hosting Model | Key Strength                                   | Ideal For                                                                                     |
|---------------------|------------------------|----------------|-------------------------------------------------|-----------------------------------------------------------------------------------------------|
| Roadie              | Backstage             | SaaS (Managed) | The Backstage ecosystem without the overhead.   | Teams who want Backstage’s open-source power but need a fast, managed, enterprise-ready solution. |
| Cortex              | Proprietary           | SaaS           | Engineering metrics &#x26; scorecards.               | Organizations focused on measuring and improving service quality and engineering performance. |
| Port                | Proprietary           | SaaS           | Developer-friendly API &#x26; flexibility.           | Teams that want to build custom workflows and integrations on a flexible platform.            |
| Atlassian Compass   | Proprietary           | SaaS           | Deep Atlassian ecosystem integration.           | Companies heavily invested in the Atlassian stack (Jira, Confluence, Bitbucket).              |
| OpsLevel            | Proprietary           | SaaS           | Service maturity &#x26; reliability checks.          | SRE and platform teams focused on enforcing production readiness standards.                   |
| Self-Hosted Backstage | Backstage (OSS)      | Self-Hosted     | Ultimate customization and control.             | Large organizations with a dedicated platform team (5+ engineers) to manage the instance.     |</p><hr><h2>The Hybrid Approach: Managed Backstage</h2><h4>Roadie</h4><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5c4NeTlUx6yJC7iPOcVmGm/f77db8a541ec4a404faeb48ead0ca008/image6.png" alt="Roadie"></p><p><a href="https://roadie.io/">Roadie</a> isn't just an alternative to Backstage, it's a different way to <em>adopt</em> Backstage. It’s built on the belief that you shouldn't have to choose between the power of a vibrant open-source community and the convenience of a SaaS product.</p><p>As we discussed in <a href="https://roadie.io/blog/from-a-spreadsheet-and-a-usd2m-bill-why-we-built-roadie/">our founding story</a>, the "aha!" moment for Roadie was realizing that while Backstage is a fantastic framework, it requires a dedicated team of 3-12 engineers and a 6-12 month investment to become production-ready. Roadie solves this by providing a secure, scalable, and fully managed Backstage experience out of the box.</p><ul><li><strong>Best for</strong>: Teams who have decided on Backstage for its extensibility as their platform but want to accelerate time-to-value and reduce their operational burden.</li><li><strong>Key Features</strong>:
<ul><li>Get a production-ready Backstage instance in minutes, not months. Roadie handles upgrades, security, and maintenance.</li><li>Includes critical features missing from open-source Backstage, like Role-Based Access Control (RBAC) enterprise-grade search, and scorecards, from day one.</li><li>Instantly access and install all of the best open-source Backstage plugins without the hassle of rebuilding your instance.</li></ul></li><li><strong>Considerations</strong>: Because Roadie uses Backstage as its foundation, it shares the same data model and core experience. Teams looking for a completely different, highly opinionated UI may be better served by a proprietary vendor.</li></ul><h2>The "Buy" Approach: Proprietary IDPs</h2><h4>Cortex</h4><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6kpKbNT3kLi0sMdiVGB8Pf/5cc9efe897983f2eec1deb63858ca202/image7.png" alt="Cortex"></p><p><a href="https://www.cortex.io/">Cortex</a> has established itself as a leader in the IDP space, with a strong focus on service quality, reliability, and engineering metrics. It excels at helping platform teams define standards and drive adoption through its powerful Scorecards feature.</p><ul><li><strong>Best for</strong>: Organizations focused on establishing and tracking engineering standards and service maturity.</li><li><strong>Key Features</strong>: A central inventory for all microservices, applications, and APIs; scorecards to define rules and initiatives for service health; and a scaffolder to create new services from templates.</li><li><strong>Considerations</strong>: Cortex is a proprietary platform. While powerful, you are locked into its data model. Migrating away in the future could be a significant undertaking.</li></ul><h4>Port</h4><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4qDkEWb3LdigA9ZUshPcuH/f2ee499d995e336c9a429da634adff7b/image4.png" alt="Port"></p><p><a href="https://www.getport.io/">Port</a> is designed for flexibility, centered around a developer-friendly API that allows teams to ingest any data and build custom workflows. It positions itself as a platform for building a developer portal, rather than a rigid, out-of-the-box solution.</p><ul><li><strong>Best for</strong>: Platform teams with strong development capabilities who want to build highly custom developer experiences and workflows.</li><li><strong>Key Features</strong>: A highly flexible "blueprint" model to define any asset, a self-service hub for custom actions, and scorecards to track quality and security metrics.</li><li><strong>Considerations</strong>: Port's flexibility is its greatest strength but can also mean a steeper learning curve and more initial setup compared to more opinionated platforms.</li></ul><h4>Atlassian Compass</h4><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/10CvQupxOwtnSGEro8DzsB/e794bf9f32406dfc7bb71a6a50995e13/image1.png" alt="Atlassian Compass"></p><p><a href="https://www.atlassian.com/software/compass">Atlassian Compass</a> is Atlassian's entry into the developer experience space. Its primary advantage is its seamless integration with the broader Atlassian ecosystem, making it a natural choice for teams already standardized on Jira, Confluence, and Bitbucket.</p><ul><li>Best for: Companies deeply embedded in the Atlassian suite of tools.</li><li>Key Features: A component catalog to track ownership, health scorecards to monitor best practices, and deep, native integration with other Atlassian products.</li><li>Considerations: Its greatest strength is also its weakness. If you are not an Atlassian-centric organization, Compass may feel less integrated and compelling compared to other options.</li></ul><h4>OpsLevel</h4><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3oexkip746yR0QnvX6RFA2/e40b0ef2d64584197a7f3c4e349fcc8d/image5.png" alt="OpsLevel"></p><p><a href="https://www.opslevel.com/">OpsLevel</a> is a mature player that focuses heavily on service ownership and reliability, making it a favorite among SRE and platform teams. It helps organizations answer the question, "Is our software ready for production?"</p><ul><li><strong>Best for</strong>: SRE-driven organizations looking to enforce service maturity standards and improve on-call processes.</li><li><strong>Key Features</strong>: A complete service catalog, an extensive library of automated maturity checks, and integrations with on-call tools like PagerDuty.</li><li><strong>Considerations</strong>: OpsLevel's focus is more on reliability and standards than on developer self-service and scaffolding, which are stronger in other platforms.</li></ul><h2>The "Build" Approach: Self-Hosted Backstage</h2><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3tptJ4QqxfK9UZlek4NUuk/09cef8a260b64ed965c17cbd041a8129/image3.png" alt="Self-Hosted Backstage"></p><p>Choosing to self-host Backstage is a significant engineering commitment. It should be treated as building an internal product, not just deploying a tool.</p><ul><li><strong>Best for</strong>: Large organizations with a well-funded, dedicated platform team (5+ engineers) that has a clear mandate to build and maintain a highly customized developer portal.</li><li><strong>Key Features</strong>: You have complete control over the code and can mold the platform to your exact specifications, and you own and control your instance and data model.</li><li><strong>Considerations</strong>: This path carries a high cost of ownership. As detailed in our <a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/">cost analysis of self-hosting Backstage</a>, you must account for the fully-loaded salaries of a dedicated engineering team, the 6-12 month initial build time, and the ongoing operational burden of maintenance and upgrades.</li></ul><hr><h2>How to Choose the Right Path for You</h2><p>Making the right choice depends entirely on your organization's priorities, resources, and philosophy. Ask your team these questions:</p><p><strong>1. How important is the open-source ecosystem to us?</strong></p><p>If you want to avoid vendor lock-in and leverage the innovation of a massive community, your choice is between self-hosting Backstage or using a managed service like Roadie. If you prefer the all-in-one experience of a single vendor, a proprietary option like Cortex or Port is a better fit.</p><p><strong>2. What is the size and skill set of our platform team?</strong></p><p>If you have a team of 5-10 engineers with TypeScript experience and a mandate to build a custom portal, Self-Hosted Backstage is a viable path. If your platform team is smaller, or you want them focused on other priorities, Roadie or a proprietary vendor is the more efficient choice.</p><p><strong>3. What is our most critical "job to be done" right now?</strong></p><ul><li>If you prefer a proprietary, opinionated model for scorecards and don't mind vendor lock-in, tools like Cortex or OpsLevel offer polished but rigid solutions.</li><li>If you want to build custom workflows from scratch and are comfortable within a closed-source ecosystem, Port offers a flexible proprietary API.</li><li>If your organization exists entirely within the Atlassian suite and you don't require broad third-party integrations, Atlassian Compass is a natural extension of that walled garden.</li><li>If you want enterprise-grade features (Scorecards, RBAC, Self-Service) combined with the freedom and extensibility of the open-source ecosystem, Roadie is your answer.</li></ul><p>Ultimately, an Internal Developer Portal is a long-term investment in your developer experience. By understanding the fundamental trade-offs between the "Build," "Buy," and "Hybrid" approaches, you can make a decision that will empower your teams for years to come.</p><p>Ready to see the hybrid approach in action? <a href="https://roadie.io/free-trial/">Try Roadie for free</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Announcing the 2025 State of Backstage Report]]></title><link>https://roadie.io/blog/the-2025-state-of-backstage-report/</link><guid isPermaLink="false">https://roadie.io/blog/the-2025-state-of-backstage-report/</guid><pubDate>Wed, 26 Nov 2025 10:00:00 GMT</pubDate><description><![CDATA[Backstage has moved from early experimentation to essential infrastructure for platform engineering. To understand how teams are using it today, Roadie surveyed 105 active practitioners across industries, maturity levels, and hosting models. The 2025 State of Backstage Report captures where Backstage is delivering real value, where teams continue to face friction, and how the ecosystem is evolving as organizations scale their internal developer platforms.]]></description><content:encoded><![CDATA[<p>Five years on, Backstage has moved far beyond its experimental roots. What started as a way to catalog services is now embedded in the core of platform engineering across enterprises. Teams aren’t asking whether they should use Backstage anymore, they’re asking how to operate it well, how to scale it, and how to keep it maintainable as their environments grow more complex.</p><p>To get an accurate picture of how teams are navigating this shift, we surveyed 105 active Backstage practitioners across industries, company sizes, and maturity levels. The
<a href="https://downloads.ctfassets.net/hcqpbvoqhwhm/84OkiFFLGSnkLIipDtlvH/5711720dc369d304921507539bc4b0e1/Roadie_State_of_Backstage_2025_report.pdf">State of Backstage 2025 Report</a> brings together those insights and paints a clear picture of where Backstage excels and where teams continue to struggle.</p><p>A few themes stood out strongly.</p><p>Hosting choices are shaping outcomes more than ever. Teams running Backstage themselves describe a very different day to day reality compared to those using managed platforms, especially around stability and upgrades.</p><p>Upgrades remain a defining challenge. Even experienced teams report friction in keeping pace with Backstage’s release cadence, particularly when plugins and customizations accumulate.</p><p>Automation is quickly becoming the heart of successful Backstage implementations. As teams lean into templates and workflow automation, the platform shifts from a visibility layer to a genuine control plane.</p><p>And maturity isn’t simply a matter of time. Some organizations advance rapidly with the right structure and investment, while others stall even after years of use. Catalog health, governance, and sustained ownership are emerging as the real differentiators.</p><p>These findings reflect a framework entering a new phase. Backstage is stabilizing, expectations are rising, and the questions are becoming more operational and long term. The report explores these patterns in detail and highlights what separates thriving implementations from those that struggle to sustain momentum.</p><p>You can read the full State of Backstage 2025 Report here:</p><p><a href="https://downloads.ctfassets.net/hcqpbvoqhwhm/84OkiFFLGSnkLIipDtlvH/5711720dc369d304921507539bc4b0e1/Roadie_State_of_Backstage_2025_report.pdf"><strong>State of Backstage 2025 Report</strong></a></p><p>If you want to explore how these trends map to your own platform journey, we’re always happy to <a href="https://roadie.io/request-demo/">talk</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Roadie unaffected by npm supply chain attack]]></title><link>https://roadie.io/blog/roadie-unaffected-by-npm-supply-chain-attack/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-unaffected-by-npm-supply-chain-attack/</guid><pubDate>Tue, 25 Nov 2025 12:00:00 GMT</pubDate><description><![CDATA[Recently a large-scale supply chain attack was reported targeting the npm ecosystem. Malware has been found that propagates through packages, steals credentials, and even includes a destructive “dead-man’s switch” mechanism. Roadie is unaffected by this attack.]]></description><content:encoded><![CDATA[<p>A <a href="https://about.gitlab.com/blog/gitlab-discovers-widespread-npm-supply-chain-attack/">significant</a><a href="https://www.wiz.io/blog/shai-hulud-2-0-ongoing-supply-chain-attack">threat</a> has emerged across the npm ecosystem. Attackers are using malicious packages that infiltrate repositories, spread to other packages maintained by affected developers, harvest tokens for GitHub, npm, AWS, GCP and Azure, and include a destructive payload that triggers if the attacker infrastructure is disrupted.</p><p>As soon as these reports surfaced we initiated a full dependency review covering all our services, builds and publish pipelines:</p><ul><li>We scanned for any use of known malicious or compromised packages.</li><li>We compared our versions of all npm dependencies against published indicators of compromise.</li><li>We verified none of our services include packages with the propagation or “dead-man’s switch” behaviour described in the reports.</li><li>We maintain continuous monitoring of our package supply chain for new threats.</li></ul><p><strong>We can confirm: we are not affected by this incident.</strong></p><p>We do not use any of the compromised versions described in public reports; any overlapping package names in our dependency tree are on safe, non-compromised versions.</p><p>We will continue to monitor developments, update our scanning and tooling, and share any relevant security guidance. If your team is reviewing internal risks or supply-chain posture and would like supporting detail from us, please reach out.</p>
]]></content:encoded></item><item><title><![CDATA[More MCP Servers, moving to the AI SDK, en mass changes in the UI, and OpenSearch for all]]></title><link>https://roadie.io/blog/more-mcp-servers-moving-to-the-ai-sdk-en-mass-changes-in-the-ui-and-open-search-for-all/</link><guid isPermaLink="false">https://roadie.io/blog/more-mcp-servers-moving-to-the-ai-sdk-en-mass-changes-in-the-ui-and-open-search-for-all/</guid><pubDate>Mon, 01 Sep 2025 11:00:00 GMT</pubDate><description><![CDATA[August is a time for summer holidays and beaches and getting vaguely sunburnt but in a nice way. Or, if you're Roadie folks: building AI tools for Backstage. This month we integrated some MCP server tools, pushed out API token permissions for everyone, made mass updates possible, and moved OpenSearch to GA.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>More Roadie MCP Server(s)</h2><p>Last month we <a href="https://roadie.io/blog/announcing-the-roadie-mcp/">launched a set of Model Context Protocol servers</a> to provide data to any LLM-powered MCP client that can accept authenticated external servers, like Cursor or VSCode with Copilot.</p><p>This month we've extended that offering to include a few more of the bells and whistles you get with Roadie: backend config and the ability to manipulate Catalog entities via the Decorator.</p><p>For those not au fait with the current boom in MCP server creation, MCP is a protocol <a href="https://www.anthropic.com/news/model-context-protocol" title="MCP Server Announcement">created by Anthropic last year</a> to help standardise the way data is sent to LLMs and how models can access tools. Via an MCP server (or servers) you can give LLM-powered application access to data and tools that the foundational LLMs could never have.</p><p>We've added two new ones:</p><ul><li>The <a href="https://roadie.io/docs/details/roadie-mcp/backend-config/" title="Roadie MCP Server - Backend Config">Backend Config Server</a> provides specialized MCP tools for managing and querying backend configuration in Roadie.</li><li>The <a href="https://roadie.io/docs/details/roadie-mcp/catalog-decorators/" title="Roadie MCP Server - Catalog Decorator">Catalog Decorators Server</a> provides MCP tools for managing catalog entity decorators/fragments in Roadie.</li></ul><p>Now you can, from the comfort of your IDE or MCP client, in addition to accessing the Catalog and the Scaffolder, you can:</p><ul><li>List out proxies, lists ones which are correctly configured, and returns the path for any given proxy</li><li>List the available secrets that can be used in proxy configurations</li><li>Query and then append additional information to Catalog entities via a fragment, like new specifications, metadata, etc</li></ul><p>All you need to get going is an API token from Roadie and away you go.</p><p>Full docs for all released MCP servers can be found here: <a href="https://roadie.io/docs/details/roadie-mcp/">https://roadie.io/docs/details/roadie-mcp/</a></p><h2>API tokens for all</h2><p>We've added a new role to <code>api-token-creator</code> which, when added to a user, allows them to (wait for it...) create API tokens.</p><p>This has previously been a permission attached by default to the Admin role, and for customers not paying for Role-based Access Control they wouldn't have been able to delegate that permission to non-Admins easily.</p><p>That API token opens up all the MCP server capabilities we've been working on, so it makes a lot of sense for us to make it as easy as possible to get a token.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/W7cH5Lp7ySoYZcbvRnVrF/f26f8425f9caf6c506c2ad8e6843fb29/Screenshot_2025-12-17_at_12.23.20.png" alt="api-token-creator"></p><h2>En Masse Changes in the UI</h2><p>To make it easier to add that role to all users (and general make mass changes for various Admin tasks) we've introduced en masse changes.</p><p>You can now make mass changes to:</p><ul><li>Add or remove roles to sets of users</li><li>Mass deletion or cleanup of Locations if they're no longer necessary</li></ul><p>Simple.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2834FbNNN2c2ww2pmdzXpb/2aebbd3ea3eff23055eb466bf3d7e1ae/Screenshot_2025-12-17_at_11.32.32.png" alt="En Masse changes"></p><h2>Moving to the AI SDK from Mastra</h2><p>This is a quasi-behind-the-curtain update from us, but as we move forward with AI feature building we've made the choice to move from <a href="https://mastra.ai/" title="Mastra AI">Mastra</a>, a way to build agents in TS, to <a href="https://ai-sdk.dev/docs/introduction" title="Vercel AI SDK">Vercels AI SDK</a>, a more fully-fledged . Mastra wrapped AI SDK v4 and provided some added bells and whistles, but as this space moves so darn fast Vercel caught up (and actually out-paced Mastra) and we found ourselves looking longingly at the features we could unlock by adopting the AI SDK directly.</p><p>What this means for those beta testing new AI features from us is:</p><ul><li>Streaming of chats (and generally a 'hey this feels like ChatGPT or Cursor' feel)</li><li>History, UI elements and a much more dynamic chat</li><li>Chat as a pop-up or slideout (coming soon)</li></ul><h2>OpenSearch hits GA</h2><p>OpenSearch has hit GA for Roadie so now all customers can benefit from our new search backend.</p><p>We'll be making tweaks to result ranking and indexing over the next few weeks and months as we gather more feedback from users, so let us know if your results are not what you'd intuitively expect.</p><h2>Anything else?</h2><p>No.</p><p>Happy Summer everyone (or at least for those in the Northern hemisphere)! 😎⛱️</p>
]]></content:encoded></item><item><title><![CDATA[AI, IDPs and Platform Teams: What We’re Seeing]]></title><link>https://roadie.io/blog/ai-is-showing-up-in-developer-portals/</link><guid isPermaLink="false">https://roadie.io/blog/ai-is-showing-up-in-developer-portals/</guid><pubDate>Tue, 12 Aug 2025 10:00:00 GMT</pubDate><description><![CDATA[AI is already shaping internal developer portals, with platform teams testing RAG search, service ownership bots via Slack, and agent-driven automations. From the front lines, the same hurdles keep surfacing though: noisy context, hallucinations, fragmented tooling, and no governance. We take a look at how AI is affecting IDPs, and the challenges faced by the platform teams who run them.]]></description><content:encoded><![CDATA[<p>Over the past few weeks, we’ve spoken with many platform teams about how they’re experimenting with AI in their developer portals and in their workflows. While there’s a lot of experimentation in various directions, one pattern is already abundantly clear: AI isn’t a future add-on, it’s already an expectation, both for platform teams and broader engineering organizations.</p><p>But like everything in (platform) engineering, adoption isn’t linear. It’s messy, exploratory, and sometimes wildly inventive. Teams are hacking together tools, wiring up LLMs, and figuring out what “AI” actually means in the day-to-day of a developer experience and platform engineering. It’s the wild west at the moment, and we thought we’d share some of our observations from the front lines.</p><h2>What Platform Teams Are Trying</h2><p>We've seen a few consistent experiments in the wild:</p><ul><li><strong>RAG over TechDocs</strong>: Letting engineers ask natural language questions over their internal documentation using Retrieval Augmented Generation (RAG). More than one customer has built a version of this themselves using OpenAI and their TechDocs content, wired through the developer portal UI or third-party integration (Slack is especially popular).</li><li><strong>Slack + Catalog Q&#x26;A</strong>: “Who owns this service?” is still one of the most common questions. Several teams have built Slack bots to pull ownership annotations from the catalog, and some are layering in code insights too - like linking to recent deployments or PRs.</li><li><strong>Agent-like automations</strong>: Some teams are building agent workflows and expressing interest in agent-driven automations. For example, the idea of automatically creating Jira tickets for failing scorecards has come up in multiple conversations as a natural evolution from today’s Tech Insights visibility.</li></ul><h2>What’s Hard Right Now</h2><p>Despite the momentum, we’re also hearing consistent friction:</p><ul><li><strong>Context engineering</strong>: AI outputs are only as good as the context they get. Teams are struggling with what data to pass, how to structure it, and how to keep it fresh across multiple sources. Too much context can overwhelm the model; too little leads to incomplete answers.</li><li><strong>LLM hallucination</strong>: Without a structured source of truth, LLMs invent scaffolder actions, make up ownership metadata, or reference outdated or non-existent docs. The quality is unpredictable.</li><li><strong>Tool fragmentation</strong>: Documentation lives in TechDocs, Confluence, Notion, Google Docs. Data is siloed. One customer told us: “We want developers to be able to ask a question, and it shouldn’t matter where the answer lives - TechDocs or Confluence or Slack or wherever.”</li><li><strong>Lack of discoverability and reusability</strong>: Internal agents are often one-offs. Teams struggle to expose them in a central place. There’s no “catalog of agents” yet - just siloed agents. All the usual caveats here about reusability and discoverability apply.</li><li><strong>Governance and ownership</strong>: Once you’ve got three agents running, who owns them? Who updates them when the data model changes? This quickly becomes tech debt if left unstructured.</li></ul><h2>What We’re Doing About It at Roadie</h2><p>We’re leaning into this space because we believe AI + platform engineering is a natural fit. Here's how we’re helping:</p><h3><strong>1. Cataloging Agents and MCP Servers</strong></h3><p>As AI agents start creeping into more parts of the internal developer experience, answering questions, triggering automations, guiding workflows, a new kind of infrastructure is quietly taking shape. These aren’t services or systems in the traditional sense, but they’re just as operationally important. And yet, for now, most teams are flying blind.</p><p>Right now, AI agents live in the shadows. Someone built a Slack bot last quarter. Who owns it? Someone else built RAG for TechDocs. Where’s the codebase? A third team spun up a tool that opens Jira tickets based on scorecard failures. These AI tools are useful, but they’re not documented, discoverable, or governable. Nobody knows who owns them. Nobody knows if they follow security and reliability best practices, or even if they still work.</p><p>If it sounds familiar it’s because it’s the same chaos we saw with microservices before developer portals came along. So here’s the idea: what if we treat internal agents the same way we treat services or systems? What if we give them a proper home in the catalog, with metadata, ownership, links to their code, a purpose tag, and maybe even their scopes and capabilities?</p><p>We’ve already started floating this model in conversations with customers. The feedback is consistent: “Yes, please. We need to know what exists, who owns it, what it does, and whether or not we can trust it. It’s maybe not a huge problem now that we only have two or three, but it’s going to be an issue soon enough.”</p><p>And this isn’t just for visibility. Once agents are modeled in the catalog, a lot of downstream benefits open up:</p><ul><li><strong>Discoverability</strong>: Engineers can browse or search for “agents that help with onboarding” or “Slack bots connected to Roadie.”</li><li><strong>Governance</strong>: Platform teams can track which agents are connected to which datasets, where PII might flow, or whether an agent is deprecated.</li><li><strong>Ownership and auditing</strong>: Just like services, agents need owners. Catalog entries can help enforce that.</li><li><strong>Integration</strong>: Once agents are cataloged, they can show up in scorecards, dependency graphs, and documentation—just like any other component.</li></ul><p>This is an initial observation, but it fits a broader pattern: as AI agents become part of the developer experience fabric, they deserve the same treatment as other pieces of platform infrastructure, and to be treated like first-class citizens.</p><p>Any mature IDP can help by making agents first-class citizens in the developer ecosystem: discoverable, governed, and owned just like services. When agents are in the catalog, they’re easier to monitor, trust, evolve, and reuse. At Roadie, we’re building native support for this because we see it becoming table stakes for running AI at scale.</p><h3><strong>2. MCP Server: Making Roadie’s Metadata AI-Ready</strong></h3><p>One of the biggest issues platform teams run into when trying to use LLMs is that the data they need isn’t structured or accessible in a way that AI can reliably work with. You end up with hallucinations, brittle prompts, and tools that kind of work, sometimes, if you’re lucky. And that’s not good enough, especially for workflows where trust and correctness matter.</p><p>That’s why we’re building MCP (<a href="https://modelcontextprotocol.io/introduction">Model Context Protocol</a>) servers into Roadie. Roadie is already the canonical source of truth for your software, so the idea is simple: make everything inside Roadie (your catalog, your scorecards, your API specs, your scaffolder actions) available to other tools, including AI agents and copilots, through a structured and queryable API surface.</p><p>The use cases are compelling:</p><ul><li>Developers working in their IDE ask copilot to “generate a Python client for the Streetlights API,” and having the LLM automatically retrieve the correct OpenAPI spec via MCP.</li><li>Slack agents that can respond to questions like “Who owns this service?” by pulling ownership directly from Roadie’s catalog, without users having to context switch and log into another window.</li><li>Agents that can look up Tech Insights data (like whether a service is failing a scorecard) and take action or report it, without needing human intervention.</li></ul><p>An IDP should act as the live, structured source of truth that AI agents and developer tools can trust, in much the same way it already serves human developers. Exposing this data through standard protocols like MCP means your automation and AI layers always work from the latest, correct information. Roadie has already begun <a href="https://roadie.io/docs/details/roadie-mcp/">implementing this</a> so our customers’ portals can be both the human and machine-facing source of truth.</p><h3><strong>3. Multi-Source RAG: Unifying Your Docs into One Knowledge Graph</strong></h3><p>Ask any platform team what slows down developer onboarding, and you’ll hear the same thing: “We have docs, but they’re all over the place.” There’s TechDocs, Confluence, internal wikis, Notion, Google Docs, even Slack threads. It’s no wonder teams are looking at RAG to unify this mess and make information queryable with natural language.</p><p>We’re hearing from teams who want their developers to be able to ask questions like “How do I provision a new database?” and get a real answer, without needing to know <em>where</em> that answer lives!</p><p>But this only works if the RAG system can reach across all your documentation sources. Most tools today are single-source: they might work over TechDocs, but not Confluence. Or vice versa. That’s why we’re looking at multi-source RAG support as part of Roadie’s broader AI strategy.</p><p>The intention is to build a unified knowledge graph that spans TechDocs, Confluence, and other internal documentation sources. That way, developers can get answers in any channkel (Slack, their IDE, within Roadie) regardless of where the answer is, and platform teams don’t have to duplicate effort or field the same Slack questions over and over.</p><p>We’ve heard teams ask for this explicitly: “We want our engineers to ask a question once and get the right answer no matter which system it’s in.”</p><p>One of the most valuable things an IDP can do is act as the unifying layer for knowledge, irrespective of whether it lives in TechDocs, Confluence, or anywhere else. Multi-source RAG turns the IDP into a single place where developers (and their AI copilots) can find answers without hunting. Roadie’s goal is to be that connective tissue for our customers.</p><h3><strong>4. Making Scaffolding AI-Ready</strong></h3><p>Roadie’s Scaffolder is basically a UI form that does stuff - an easy way for developers to fill in parameters and create components or request infrastructure changes. But with the rise of AI-assisted development, platform teams are thinking about how the Scaffolder might be a programmable interface, something that can be invoked by agents or copilots directly from an IDE or chat environment. This reframing turns the Scaffolder into a backend surface, not just a frontend tool.</p><p>We’ve heard from teams who want to use LLMs to help author templates from scratch, generate the right parameters, and even guide developers through the scaffolding process conversationally. The challenge they face is that LLMs often hallucinate or suggest unsupported actions - because they don’t have access to the source of truth.</p><p>This is where Roadie can help. Since Roadie maintains the canonical configuration of Scaffolder actions per tenant, we can expose that metadata (through an MCP server) in a structured, machine-readable format. That allows developers to point their LLMs at a live, up-to-date catalog of what’s actually supported in their environment.</p><p>The result? AI that’s context-aware and grounded in real platform capabilities. Instead of manually digging through docs or trial-and-error authoring, developers get accurate suggestions, faster feedback, and the ability to trigger safer automation from within their IDEs.</p><p>Teams are imagining workflows like:</p><ul><li>“Create a new microservice” prompts in Slack that create a template based on the available actions and automatically open a PR.</li><li>IDE-based copilots that suggest valid actions and parameters based on intent as you write a new template.</li><li>AI agents that help debug broken scaffolder flows by understanding action compatibility.</li></ul><p>Any IDP can add value by making its automation surfaces - whether that’s Scaffolder, Ansible playbooks, or other tools - accessible to both humans and AI in a safe, structured way. That means exposing live metadata, enforcing supported actions, and reducing the gap between what teams <em>want</em> to automate and what they <em>can safely</em> automate. At Roadie, we’re starting with Scaffolder but see this as a broader pattern.</p><h2>What Comes Next?</h2><p>If your team is experimenting with AI, building internal agents, or even just asking “Where do we start?”, we’d love to chat. You’re not alone; the best practices are being invented right now, and we’d love to explore that together.</p>
]]></content:encoded></item><item><title><![CDATA[MCP Servers for Roadie, AI Search enters beta, vibe code your own UI inside Roadie, and the launch of the Scheduler page]]></title><link>https://roadie.io/blog/ai-cometh/</link><guid isPermaLink="false">https://roadie.io/blog/ai-cometh/</guid><pubDate>Mon, 04 Aug 2025 11:00:00 GMT</pubDate><description><![CDATA[AI cometh... In July we launched our MCP servers to help pull Roadie data into you LLM-powered IDEs and MCP Clients, took a step towards fully integrating RAG AI into Roadie Search, allowed customers to vibe code their own UIs and launched the Scheduler to improve transparency of the inner workings of Roadie. Oh, and updates to our Launch Darkly plugin. ]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>Roadie MCP Server(s) for your LLM-powered Applications</h2><p>We're <a href="https://roadie.io/blog/announcing-the-roadie-mcp/">launching a set of Model Context Protocol servers</a> that can provide data to any LLM-powered MCP client that can accept authenticated external servers, like Cursor or VSCode with Copilot.</p><p>For those not au fait with the current boom in MCP server creation, MCP is a protocol <a href="https://www.anthropic.com/news/model-context-protocol" title="MCP Server Announcement">created by Anthropic last year</a> to help standardise the way data is sent to LLMs and how models can access tools. Via an MCP server (or servers) you can give LLM-powered application access to data and tools that the foundational LLMs could never have.</p><p>It's a client-server architecture, so you can then connect those MCP servers to whatever MCP client you wish. MCP clients range from chat interfaces like the Claude App to full blown AI IDEs like Cursor.</p><p>Now you can, from the comfort of your IDE or MCP client, do all kinds of things:</p><ul><li>Access Catalog entity data, relationships, and documentation</li><li>Find, validate, and execute scaffolder templates</li><li>Discover and retrieve API documentation and specifications to build API clients</li><li>Access Tech Insights fact data (TI subscription needed, of course)</li></ul><p>All you need to get going is an API token from Roadie and away you go.</p><p>Full docs for all released MCP servers can be found here: <a href="https://roadie.io/docs/details/roadie-mcp/">https://roadie.io/docs/details/roadie-mcp/</a></p><h2>AI-enabled search [in beta]</h2><p>In 2024 we open-source the RAG AI plugin to make AI features in Backstage as widely available as possible.</p><p>Implementing those features in Roadie though has taken some time. We wanted to generally improve search before bringing AI into the mix. With OpenSearch powered search results now fully rolled out, we turned out attention to integrating the RAG AI plugin fully into our search.</p><p>You can:</p><ul><li>Ask natural language questions of your Tech Docs and Catalog, like <code>Summarise the  Streelight service API docs</code> or <code>How does the Streelight service interact with other services?</code></li><li>Focus on a specific Doc or Catalog entity to narrow results and ask detail questions only about that specific entity.</li></ul><p>General availability expected in September / October.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/kQjlaaYSbOQDEoRvZD9ql/07c44ebdb7270f0dbae4d461b8797108/Screenshot_2025-08-04_at_11.43.42.png" alt="AI Search"></p><h2>Vibe code your own UI inside Roadie with MDX [in beta]</h2><p>As part of a Support task, we accidentally allowed anyone to vibe code their own UI inside Roadie.</p><p>That might sound silly, but hear me out...</p><p>To cover a simple use-case of 'I just want to call and API and render some data back' we exposed an ApiViewerCard to call an API from the frontend (as long as you have a proxy configured for that call).</p><p>Then we added MDX configurability to that card and it morphed into the MdxPluginCard.</p><p><em>Then</em> we exposed some common Roadie components to make it easier to compose a card that looked and felt at home in the rest of the UI.</p><p>When you put it all together (as some early customers have now started to do) you get some fairly nice cards that do exactly what you need them to.</p><p>Not only that, given the widespread use of MDX, you can also get an LLM to vibe code them for you.</p><p>For simple use-cases where you don't want to build a full custom plugin, but you do want some ability to pull in data that is not currently supported by a card or plugin: this is your guy.</p><p>The next step in this direction? The MDX can be fully generated from prompts inside Roadie, so you don't even have to leave your instance.</p><p>Then fully generated entity page layouts...</p><p>The future is now...</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3aPsKXfPM4vt3i65YwMdAC/a06cabf855705a7ad877da492cebf75a/Screenshot_2025-09-24_at_16.52.11.png" alt="Simple My Groups MDX card"></p><h2>Tasks now exposed via the Scheduler</h2><p>To take a small step back from <strong>AI-AI-AI-AI, ALL THE TIME, EVERYWHERE</strong>, we also introduced a new Scheduler page to the Admin interface. This allows you to see the active jobs and self-serve re-triggering them if necessary.</p><p>We're continuing to expose Admin controls like this as a necessary part of an AI work. As AI starts to take over Roadie, we want Platform teams to have more control and visibility, not less.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1zAzKM926RRHtumXzexVx6/3a836b1400b9c0e5502b1f9a9c13a3d9/Screenshot_2025-07-10_at_13.44.28__1_.png" alt="Scheduler"></p><h2>Improved Launch Darkly plugin</h2><p>The Launch Darkly plugin we span up last year was looking a little lacklustre and so we've spruced it up.</p><p>For those that use Launch Darkly for feature flag management:</p><ul><li>There's now a card displaying environment statuses for each feature flag</li><li>Updated visual indicators for feature flag status (pills rather than text)</li><li>A general spruce of the card UI to make it easier to understand what's happening at a glance</li></ul><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5A1R8o3SAbKD7ljusegTLn/7e576bef07ebb0b202443775fe5636ab/launchdarkly-plugin-card.webp" alt="launchdarkly-plugin-card"></p><p>More info can be found in the docs: <a href="https://roadie.io/docs/integrations/launchdarkly/">https://roadie.io/docs/integrations/launchdarkly/</a></p><h2>Anything else?</h2><p>No.</p><p>That was quite a lot for one month :)</p>
]]></content:encoded></item><item><title><![CDATA[Announcing the Roadie MCP Server(s)]]></title><link>https://roadie.io/blog/announcing-the-roadie-mcp/</link><guid isPermaLink="false">https://roadie.io/blog/announcing-the-roadie-mcp/</guid><pubDate>Thu, 31 Jul 2025 11:00:00 GMT</pubDate><description><![CDATA[We now expose functionality data and functionality from your Roadie instance to MCP clients which can remotely authenticate (i.e. IDEs like Cursor or VSCode with Copilot). All protected by the same API token system that guards the Roadie API and gated behind the same user permissions a user would normally have.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>Introducing the Roadie MCP Server</h2><p>Roadie has always served a wide range of users, but first and foremost we're a software catalog and automation portal for software engineers.</p><p>As such, Roadie (and Backstage, upon who's shoulders we stand) has always been a good dashboard, acting as a single pane of glass for all kinds of fragmented dev tools and scattered pieces of documentation.</p><p>But it's always been a tool that you had to step out of your development workflow in order to access.</p><p>The classic flow for most engineers using Roadie was to either have Roadie open as a broswer window as they worked, or to step out of the IDE to ask questions of the Catalog, get wider context on your work, or provision some new service using the Scaffolder.</p><p>That context switch has always been a problem we'd like to solve.</p><p>Luckily the answer is here: it's the Roadie Model Context Protocol (MCP) Server(s).</p><h2>What's an MCP Server?</h2><p><a href="https://docs.anthropic.com/en/docs/claude-code/mcp">MCP is a protocol</a> introduced by <a href="https://www.anthropic.com/news/model-context-protocol">Anthropic in November 2024</a>.</p><p>The protocol itself is based on a simple client-server architecture. MCP Clients - like the new wave of IDEs (Cursor, Windsurf, VSCode with Copilot, etc) - act as clients that can call MCP servers to get additional information.</p><p>MCP servers allow the LLMs at the heart of newer software development applications to receive additional, specific context about the query that a user is making which wouldn't be well represented in the foundational model. MCP servers are therefore great for enriching responses when a base LLM simply wouldn't be able to generate a good enough answer.</p><p>For example, if you ask ChatGPT or Claude to generate you a <a href="/backstage-spotify/#the-main-features-of-backstage-by-spotify">Backstage Scaffolder Template</a> it will do a good-ish job because it was trained on enough open-source templates to do that. The format will be largely correct, for example. But, it will also hallucinate actions because it thinks they might logically exist. It also won't know which scaffolder actions you have installed. This isn't great if you want to actually run that template or automate a development flow involving the scaffolder.</p><p>If you tried the same case again but this time with an MCP which returned useable scaffolder actions upon request, then the LLM has a much better chance of generating a functional Template.</p><h2>How can I use it?</h2><p>The Roadie MCP server works with any tools that can act as an MCP client, including Claude Desktop, VSCode Copilot, and Windsurf Editor.</p><p>If you have your own MCP client with the capability to interact with remote MCP servers with authentication headers then this should also work.</p><p>You'll need a Roadie API key. Admins have access by default. There's a role in Roadie which, when enabled, can give users the ability to create their own API key.</p><p>Docs on how to get started can be found here:
https://roadie.io/docs/details/roadie-mcp/</p><h2>What can the Roadie MCP servers do?</h2><p>With the four MCP servers we've currently exposed, you can:</p><ul><li>Access catalog entity data, relationships, and documentation</li><li>Discover and retrieve API documentation and specifications</li><li>Find, validate, and execute <a href="https://roadie.io/backstage-spotify/#the-main-features-of-backstage-by-spotify">Backstage scaffolder templates</a></li><li>Access operational metrics, security data, and compliance information from Tech Insights (if you're a Tech Insights customer)</li></ul><h3>Example: OpenAPI specs.</h3><p>By asking questions in your MCP client, you can integrate two systems, assuming the one you're integrating with exposes OpenAPI specs documented in Roadie.</p><p>You would say something like this to you LLM:</p><p><code>“I need to integrate with our User Management system”</code></p><p>The MCP client will then make a call to our MCP server. The steps will look something like this:</p><ul><li>Searches for user-related APIs using find-api-specs</li><li>Retrieves specifications for relevant APIs</li><li>Explains available endpoints, authentication, and schemas</li><li>Provides integration guidance and code examples</li></ul><p>You'll then get an implementation of an integration between your current code base and the User Management system, that takes into account how the User Management API actually works.</p><h3>Example: Creating a new project using the scaffolder</h3><p>You would say something like this to you LLM:</p><p><code>“Create a new React frontend application”</code></p><p>The MCP client will then make a call to our MCP server. The steps will look something like this:</p><ul><li>Searches for React templates using find-scaffolder-templates</li><li>Shows available templates and their requirements</li><li>Guides user through providing necessary parameters</li><li>Validates inputs using validate-template-values</li><li>Executes template using run-scaffolder-template</li><li>Monitors progress and reports results</li></ul><p>You'll then get an complete application based on the provided template.</p><h2>How does authentication &#x26; permissions work?</h2><p>Getting permissions right in the context of MCP Servers, and AI and agentic systems more broadly,  can be tricky.</p><p>To solve that for the Roadie MCP Server(s), we've architected the Roadie MCP to have access to the authentication and permissions that the same user would have in the web app. We do that by combining two things:</p><ul><li>Roadie RBAC plugin</li><li>Roadie API using user-scoped tokens</li></ul><p>A user generates an API token in the web app for use in their MCP Client of choice. That token inherits the permission that the user has. It's evaluated on each call to ensure the permissions stay in sync. Tokens expire after 1 year.</p><h2>When will it be generally available?</h2><p>Folks who have opted in to our early-access AI features will have access shortly. After that we'll begin a wider rollout. Existing customers will be sent a message in the communication tool of choice (Slack or Teams) to onboard them.</p>
]]></content:encoded></item><item><title><![CDATA[Unified Search and Roadie Local enter beta, extended GitHub & GHES support, OpenSearch for all, and new plugins arrive]]></title><link>https://roadie.io/blog/roadie-local-enters-beta-extended-github-and-ghes-support-new-search-for-all/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-local-enters-beta-extended-github-and-ghes-support-new-search-for-all/</guid><pubDate>Mon, 30 Jun 2025 11:00:00 GMT</pubDate><description><![CDATA[We're introducing search improvements (both in the underlying search engines we use as well as the UI), preparing the way for AI feature rollouts this summer, supporting more SCM combinations and beta testing Roadie Local. Oh, and the Rootly and Terraform plugins to enhance your developer portal.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>OpenSearch for all</h2><p>The OpenSearch rollout began last month and has not reached general availability.</p><p>If you missed it last month that means:</p><ul><li>Casing is better handled: camelCase, kebab-case and snake_case are factored in, so the user doesn’t need to know which was used when they’re searching</li><li>Partial search terms and descriptions are better handled: for example, skipping some-word in camel-some-word-case and just typing camel case will be returned as a result.</li><li>Word fragments allow for partial word matches. We’re currently using 4-7 chars for the ngrams, so fragments of words should be well matched</li></ul><p>But that's not the only thing happening with search...</p><h2>Unified Search enters [beta]</h2><p>We've also tweaked the UI to create a new search UI that captures more information and provides a generally slicker interaction with search.</p><p>The new UI allows you to:</p><ul><li>Use keyboard shortcuts (Cmd-K from anywhere in the app to bring it up)</li><li>Navigate as well as search (try searching for “Catalog”) for example.</li><li>Do things a little faster than before (performance is improved)</li></ul><p>Unified search not only enables greater discoverability within Roadie, but is also our first step in enabling RAG AI-enhanced searching of the Catalog and Tech Docs. In fact, it'll be the entry point for a few different AI features we're dreaming up.</p><p>We're already seeing search volume and click-through to an entity that then doesn't result in a new search (i.e. "Yay, I found what I was looking for") go up as result.</p><p>More on that over the next couple of months as our AI journey continues...</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6OJVV0cUIceAdHybGS9UU2/bea134385601142396d69b0fae251469/Screenshot_2025-07-31_at_18.02.37.png" alt="Unified Search"></p><h2>Supporting both GitHub.com &#x26; GitHub Enterprise Server</h2><p>As our customers grow, so do the number of different SCMs deployed in any given organisation.</p><p>For a while we've dealt with the GitHub Cloud &#x3C;> BitBucket combo.</p><p>Now we can handle the GitHub.com &#x3C;> GHES combo too. Aside from setting up both SCMs there's nothing additional you'll need to do - it's all handled by a new GitHub switcher.</p><p>Nice.</p><h2>Roadie Local [beta] begins</h2><p>Roadie Local, the locally runnable version of Roadie you can deploy on your infrastructure, has entered beta.</p><p>We weren't expecting a long beta but who knows when you get into the weird and wacky world of on-prem deployment. More to come here I'm sure...</p><h3>Rootly Plugin</h3><p>The <a href="https://github.com/rootlyhq/backstage-plugin">Rootly plugin</a> recently released v1.0.0 of their plugin, which we've dutifully upgraded. It comes with a lot of new fun stuff, including the ability to:</p><ul><li>View and search a list of entities</li><li>View and search a list of services</li><li>View and search a list of functionalities</li><li>View and search a list of teams</li><li>View and search a list of incidents</li></ul><p>Rootly &#x3C;> Roadie customers: enjoy. https://roadie.io/docs/integrations/rootly/</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3cCnt0Yr0TzrQ2dHmr9iQ2/a37148a822bcd486481951560ce1af28/rootly-global-page.png" alt="rootly-global-page"></p><h3>Terraform Plugin</h3><p>We also introduced the <a href="https://github.com/globallogicuki/globallogic-backstage-plugins/tree/main/plugins/terraform">Terraform plugins</a> to help Terraform Cloud users manage their infrastructure.</p><p>Docs on it can be found here: https://roadie.io/docs/integrations/terraform/</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3AuiZsq4pPuF0ovgvy3g7J/0b61062d38262fff21badf5d8d26c1ec/terraform-resources-tab-1.webp" alt="terraform-resources-tab-1"></p><h3>Upgrade to v1.39</h3><p>Last but not least, Roadie is now on <a href="/backstage-spotify/">Backstage</a> version 1.39.</p><p>Full change notes for that update can be found here: https://backstage.io/docs/releases/v1.39.0/</p>
]]></content:encoded></item><item><title><![CDATA[AI is coming, but first it's Kratix, Kubernetes and Crossplane]]></title><link>https://roadie.io/blog/ai-but-first-kratix-kubernetes-crossplane/</link><guid isPermaLink="false">https://roadie.io/blog/ai-but-first-kratix-kubernetes-crossplane/</guid><pubDate>Sat, 31 May 2025 11:00:00 GMT</pubDate><description><![CDATA[We're diving deep into AI over the next few weeks, but we've also shipped a bunch of incredibly powerful plugins to help increase the ease of ingestion of resources into the Catalog. First and foremost: Kratix and Kubernetes/Crossplane.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>AI is coming...</h2><p>We've been playing around with AI features in Roadie for around 18 months now, ever since we open-sourced the RAG AI plugin way back in 2024.</p><p>We've struggled to understand <em>how</em> though.</p><p>A breakthrough came this month as we got stuck into the developments that have taken place in late 2024 and early 2025, notably the improvements in foundational models and in the introduction of the Model Context Protocol (MCP) paradigm.</p><p>We'll be pulling at a few different threads over the next few weeks and months to see what's feasible in the short-term, so watch this space for things like:</p><ul><li>RAG-AI-powered search</li><li>MCP servers to return data and actions from your Roadie instance</li><li>Integration of common (but complex) actions into the UI, powered by AI</li></ul><h2>Plugins, plugins, plugins</h2><p>In the meantime, we've introduced two substantial plugin improvements into Roadie that radically improve ingestion of entities.</p><h3>Kratix</h3><p>The Kratix Platform has grown in popularity recently as more folks a) adopt the idea of an internal platform and b) want an open-source alternative to the proprietary platforms that exist out on the market.</p><p>That's something that we want to support and foster. We're big believers in open-source, as you'd imagine given we're built on top of an open-source product and regularly contribute to various oss projects.</p><p>Kratix plugins are now available in Roadie. You'll need to be a paid Kratix user to access them, as is the model for the folks behind Kratix (Syntasso, in case you were wondering) but we now suport:</p><ul><li>Kratix FE plugin</li><li>Kratix BE plugin</li><li>Kratix scaffolder actions</li></ul><p>Enjoy. https://roadie.io/docs/integrations/kratix/</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7CqaBmAmIUewqmULSWcp25/fd8c437e9d25e94ecec899b374957ecc/kratix-promise-overview.webp" alt="kratix-promise-overview"></p><h3>Terasky K8s Ingestor and Crossplane plugins</h3><p>We also folded many of <a href="https://github.com/terasky-oss/backstage-plugins">Terasky's Kubernetes plugins</a> into Roadie.</p><p>Terasky have managed to thread the needle with improving the way Backstage interacts with Kubernetes and done so in order to ingest entities directly from K8s clusters.</p><p>They've also open-sourced Crossplane plugins to allow Backstage to:</p><ul><li>Ingest Kubernetes Components</li><li>Ingest XRDs</li><li>Ingest Claims</li></ul><p><a href="/docs/integrations/kubernetes-ingestor/">Check out our docs here</a>.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/64fK6ovlwEoFjaGycldTCd/8a295b58e84869301600eedf28790031/Screenshot_2025-07-31_at_13.59.09.png" alt="crossplane-1"></p>
]]></content:encoded></item><item><title><![CDATA[Improving Search, some mobile UX tweaks, and Tech Insights historical data ingestion]]></title><link>https://roadie.io/blog/improving-search-mobile-ux-tech-insights-historical-data/</link><guid isPermaLink="false">https://roadie.io/blog/improving-search-mobile-ux-tech-insights-historical-data/</guid><pubDate>Wed, 30 Apr 2025 11:00:00 GMT</pubDate><description><![CDATA[April was all about getting data in and out of the Catalog. We improved search by introducing a new search engine, tweaked the mobile UX of Roadie to make it more usable (at 4am when you need to grab some techdocs for an incident), and extended the limits on historical data ingestion for Tech Insights. Oh, and introduced the DX plugin to make syncing data between DX <> Roadie easier. A busy month.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>Introducing OpenSearch for better discoverability</h2><p>As the volume of data in the Catalog grows, we often see customers hit a discoverability problem: how do you find items when the Catatlog is tens of thousands of, often very similar seeming, entities?</p><p>Well: search. Duh.</p><p>But, how do you make sure search results are fast, accurate and scalable? <a href="/backstage-spotify/#the-main-features-of-backstage-by-spotify">Backstage</a> supports different search engines, but the better ones (Elastic-like ones like OpenSearch) tend to get expensive at scale, especially when you run almost isolated stacks for each customer as we do.</p><p>This month we dedicated a bit of time to figuring out how to do it at scale, and voilà! Better search has arrived!</p><p>We stripped out the old search engine and brought in <a href="https://github.com/opensearch-project/OpenSearch" title="OpenSearch">OpenSearch</a>, an Elastic-search-like search engine (an open-source one, of course).</p><p>That means you get a few improvements out of the box:</p><ul><li><strong>Casing is better handled</strong>: camelCase, kebab-case and snake_case are factored in, so the user doesn't need to know which was used when they're searching</li><li><strong>Partial search terms and descriptions are better handled</strong>: for example, skipping some-word in camel-some-word-case and just typing camel case will be returned as a result.</li><li><strong>Word fragments allow for partial word matches</strong>: We're currently using 4-7 chars for the ngrams, so fragments of words should be well matched</li></ul><p>We also have brought ourselves a lot more headroom here for optimisation - we'd reached the end of the line with the pg_search after many years of trying to force it to be the search engine that we'd have liked it to be. OpenSearch has a lot more we can tweak.</p><p>So what's next? We'll be rolling out OpenSearch progressively to customers, starting this month.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2Hwot7Cu9KMpyMuWlPh0Jq/c3adca4bf4296b8a20ef700d8e2b3d50/Screenshot_2025-05-16_at_09.45.01.png" alt="opensearch"></p><h2>Mobile UX gets some love</h2><p>Being able to search for something in the Catalog is one thing, but if you can't readily view it or the experience is poor then that's not much use.</p><p>Roadie is largely optimised for web (big tables of software == a laptop or desktop-based experience is always going to be best) but there's a baseline for usability that should be met to make some key use-cases function.</p><p>This month we focused on the use of tech docs on mobile. If it's 4am, you've just been woken up by an incident, and you need to quickly read a runbook to see what to do next: probably best to not have to get your laptop out when you have your phone <em>right there</em>.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6Pqgtst243DM9CBIaW7kYB/1b21f5294ab987925961064b3d548df9/image__42_.png" alt="mobile-ux"></p><h3>Tech Insights historical data ingestion gets 12x more data-ness</h3><p>Tech Insights push-based Data Sources allow for historical data ingestion for our customers when migrating from other systems to Roadie. That's been in place for a while, but this month we bumped the volume of data you can ingest.</p><p>Over the last few months we've seen more and more customers want to ingest <em>a lot</em> of data. Part of that is the growing maturity of scorecards as a concept within the wider industry, but also part of the wider shift towrads managed Backstage as the IDP of choice for many orgs.</p><p>Down to brass tacts though: the push-based data source can now ingest <strong>one year</strong> of data. Previously it was 1 month.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3CHEucBSQ0uXxGicqtAzJu/c965b0955ed12f5fcc68d20fa45a07fc/Screenshot_2025-05-16_at_09.49.47.png" alt="push-based-historical"></p><h3>DX Plugin</h3><p>Last but not least, we've introduced the DX plugin to Roadie to help sync the Catalog between the two.</p><p>More details here for those that are interested: https://roadie.io/docs/integrations/dx/</p><p>This is something we're expecting to see a lot of in the coming months: dev tools that need a software Catalog simply sync to a Catalog like Roadie/Backstage rather than build it themselves. Tools like DataDog and Sentry are dipping their toes into the software Catalog world but it's a tricky business keeping up to date with all the different sources of truth and ingestion mechanisms: that's why open-source tools like Backstage will always have the edge.</p><p>More on this trend on the <a href="https://thenewstack.io/the-hidden-costs-of-multiple-service-catalogs-in-development/" title="Hidden Cost of Multiple Service Catalogs">New Stack</a> for those that are interested.</p>
]]></content:encoded></item><item><title><![CDATA[Entering the Danger Zone: Multiple Software Catalogs]]></title><link>https://roadie.io/blog/entering-the-danger-zone-multiple-software-catalogs/</link><guid isPermaLink="false">https://roadie.io/blog/entering-the-danger-zone-multiple-software-catalogs/</guid><pubDate>Wed, 09 Apr 2025 23:00:00 GMT</pubDate><description><![CDATA[Developer tooling often requires service catalogs to scope the data that is created within them, especially when that tooling relates to every piece of software created by your organization.
Because of this, you often end up with multiple different service catalogs for services you’re brought. Roadie (and Backstage more broadly) can help.]]></description><content:encoded><![CDATA[<blockquote><p>A version of this article appeared on the New Stack: https://thenewstack.io/the-hidden-costs-of-multiple-service-catalogs-in-development/</p></blockquote><p>Developer tooling often requires service catalogs to scope the data that is created within them, especially when that tooling relates to every piece of software created by your organization.
Because of this, you often end up with multiple different service catalogs for services you’re brought.</p><p>Commonly, you’ll see a service catalog at the heart of classes of tools like:</p><ul><li>Data and observability, like DataDog and New Relic</li><li>Incident management, like Incident.io, PagerDuty, and FireHydrant</li><li>Developer Experience, like DX</li></ul><p>Manually creating a catalog in these applications takes valuable time away from Platform and Software teams.</p><p>It’s also not the core of any of these services, so the quality of the catalog ingestion, visualization and manipulation is rarely optimal. Some form of a catalog describing the types of software built by an organization is required to make them work effectively for your organization, but that doesn’t mean that that piece of software is focused on creating a fantastic ingestion mechanism for catalog creation.</p><p>You can easily sleepwalk into having multiple service catalogs in multiple places with multiple scope levels. This is inefficient and quickly these catalogs fall out of sync.</p><p>It’s a pain.</p><h3>Why you should use an IDP to build one catalog</h3><p>To avoid multiple catalogs you need a single source of truth. One catalog to rule them all.
There’s a strong case for an Internal Developer Portal (IDP) to be that catalog.
IDPs like Backstage, Port, and Cortex are all, at their core, software catalogs. They have some other important features (scorecards, some automation runne,r etc) but the bread and butter is making a service catalog easy to create, configure, and use.</p><p>Information from various systems is surfaced to development teams to create a single pane of glass. In building an IDP, organizations inherently create an integration and enrichment point for data about their software, which in turn can be used as part of a wider and more complex data flow.</p><p>Think in terms of data flows:</p><ul><li>Metadata about software goes in, either auto-ingested or manually added.</li><li>Rich objects are created for each piece of software.</li><li>Structural information is defined and included as part of the data model for that catalog, allowing a graph of software to be constructed of pieces of software and how each one relates to other pieces.</li></ul><p>That catalog is then a rich store of information about the software you have built. It’s just a single step away from being the source of truth for that information to other services that require it.</p><h3>Enter: Roadie</h3><p>Backstage comes with a few built-in advantages as an IDP that help it excel at this use-case.
As the dominant IDP on the market, it garners a lot of support from the third-party service providers you then need to connect to, as well as providers of catalog information (like AWS, who are particularly active plugin developers):</p><p>Plugin ecosystem. Third parties are constantly building new options for supporting this use-case. These plugins support either the visualization of information in the catalog or often, more crucially, the ingestion or extraction of catalog data from Backstage.</p><p>Auto-ingestion. Backstage has For example, AWS recently released a plugin that supports auto-ingestion of resources like S3 buckets and RDS instances that make completing your software catalog much easier than using another service.</p><p>Ease of editing. Backstage comes with a slew of simple enrichment options, leaning heavily on democratically edited yaml files in a format that</p><p>Extraction of data. The Backstage Catalog API and plugin ecosystem make it easy to get data out of Backstage when you’re ready to connect to a third-party system.</p><h3>How to use the Roadie Catalog as a source of truth</h3><p>Let’s take a look at some examples of how this can be done with examples from incident management, data visualisation and developer experience:</p><ul><li>DataDog</li><li>Incident.io</li><li>DX</li></ul><h4>DataDog</h4><h5>Using catalag-info.yaml files</h5><p>The core of the Backstage software catalog is a series of yaml files stored alongside code in your source code management (SCM) tool of choice (Backstage supports them all). These are often simply referred to as catalog-info.yaml files. They’re basically just service metadata and reference keys to other services.
DataDog maintains it’s own ingestion mechanism that uses these catalog-info.yaml files to ingest Catalog information. The integration constantly scans repositories in your SCM for Backstage YAML files named service.datadog.yaml and catalog-info.yaml — which you create when you add your service to the Backstage Software Catalog. The code snippet below shows an example of catalog-info.yaml.</p><p>You’ll need to enable the GitHub integration for this example</p><h5>Using DataDogs API</h5><p>You can also <code>POST</code> Backstage YAML files to the Datadog API. This allows you to programmatically send Backstage service definitions that may not exist in your GitHub repositories. The Backstage Catalog API can respond with your whole Catalog (or just a subset of it), so syncing the two is possible using this route.</p><p>https://www.datadoghq.com/blog/service-catalog-backstage-yaml/</p><h4>Incident.io</h4><p>Incident.io maintains a variety of different ways to connect their internal software catalog to sources of truth.</p><h5>Using catalag-info.yaml files</h5><p>Incident.io works in a similar way via their catalog-importer .
The catalog-importer is a little more involved though, so it’s worth taking a look at.
The importer can pull data from a variety of sources, “catering for all the ways people normally store their catalog data” as they so delightfully put it.
One option is GitHub. This works in much the same way as the DataDog ingestion mechanism outlined above.</p><h5>Using the Catalog API</h5><p>Another option is to read Catalog information directly from Backstage itself, via the Backstage Catalog API. This in essence makes a GET /entities call to your Catalog and retrieves information directly. You can filter that as you see fit to make sure you’re only extracting the subset of information that’s relevant for Incident.io.</p><h4>DX</h4><p>DX takes a different approach. They’ve built a full Backstage backend plugin to handle the extraction of data from Backstage.</p><h5>Using a Backstage backend plugin</h5><p>The DX Backstage backend plugin sets up jobs within Backstage to sync the DX catalog
Those jobs make a call to the DX API in order to send Catalog information.
As this can be a lot of data (at Roadie we routinely see Catalogs with 300k entities), you probably want to use the optional params for filtering. You can set these in your app-config.</p><p><strong>app-config.yaml</strong></p><pre><code>dx:
  catalogSyncAllowedKinds: [API, Component, User, Group]
You may also want to control the schedule of the sync, so as not to spam your Catalog. Again, just a bit of config in app-config.

app-config.yaml
dx:
  schedule:
</code></pre><p>frequency:
minutes: 45</p><h5>Using the Roadie API</h5><p>At Roadie we run a managed SaaS version of Backstage for many different customers. We often are asked how to make it as easy as possible to use Backstage as a source of truth for other systems.
We spend a lot of time making it as simple as possible to take catalog data out of Backstage and use it meaningfully in other applications and workflows.</p><p>To help, we expose several endpoints to allow easy syncing with different systems (either to ingest new information or pull catalog information out):</p><ul><li>Catalog endpoints like <code>/entities</code> endpoint allows you to query the Catalog API and programmatically access your software catalog in its entirety. And a <code>/fragment</code> endpoint allows you to sync different third party systems with the Catalog (i.e. ingest Slack handles for your users) and fluidly update your data s you see fit</li><li>A set of endpoints for scorecards and software standards</li><li>A set of endpoints to expose Scaffolder template information</li></ul>
]]></content:encoded></item><item><title><![CDATA[An all new (paginated) Catalog, the All Tab, and changes to the Admin area]]></title><link>https://roadie.io/blog/pagination-the-all-tab-and-changes-to-the-admin-area/</link><guid isPermaLink="false">https://roadie.io/blog/pagination-the-all-tab-and-changes-to-the-admin-area/</guid><pubDate>Mon, 31 Mar 2025 11:00:00 GMT</pubDate><description><![CDATA[Pagination and Catalog filtering labs, the All tab has arrived to make the Catalog even more composable, and the Admin area is getting a glam-up.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>Performance Improvements: Catalog Pagination &#x26; Service Workers</h2><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2pcYH9beU2ON2DVv8JoQ6/1d9da37f8a1eb5fe5cdaaf2cadc1c705/Screenshot_2025-03-19_at_10.35.55.png" alt="Paginated Catalog"></p><p>We mentioned in the previous Changelog that we've been working on a buffet of different improvements to both performance and stability to enable us to handle large Catalogs.</p><p>Top of mind this month has been performance of the backend and response time when users make simple requests of the Catalog. That means: pagination time.</p><p>The first pass of pagination has proven we can dramatically reduce page load times without impacting the current UI or UX. For the first customers to receive a paginated Catalog we're already seeing a minimum of ~300ms decreases in load time for smaller Catalogs but at least 66% for XL Catalogs (i.e. 100s of thousands of entities).</p><p>This is comparing the P95 scores for page load that we see across all our customers. Red is the rollout where we migrated a few customers over but some customers remained on the older, unpaginated Catalog. Green is the period where the paginated Catalog was fully rolled out.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1utLYqzWF67O6Z0rPFNoXv/db92c0f64d930a29ee28247eec66d4b0/pagination.png" alt="pagination"></p><p>We've also tweaked the filtering UI to handle more filters, while disabling some (Some of the in-table filtering and sorting options are not possible with a paginated catalog).</p><p>And last but not least we’ve also changed the UI a little around filtering to clean up how crowded it can become if you have a table with lots of potential filters in place.</p><p>We're also separately using service workers to pre-load some elements of the application to further speed up page load. More on that to come.</p><p>The rollout of the paginated Catalog has already begun, so more on this soon.</p><p>Fun stuff!</p><h2>All hail the All tab</h2><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3GUcnW226MOmK5GmTgBJI8/5e8094881cb75eb310e77e89831c1248/Screenshot_2025-03-19_at_10.35.51.png" alt="All tab"></p><p>On a similar theme, the Catalog has also received a major update in the form of an All tab.</p><p>About a year ago we started the effort to make the Roadie Catalog as composable as possible.</p><p>The Backstage Catalog is extremely extensible and configurable, but sometimes the UI belies that flexibility and makes it feel a little restrictive.</p><p>We wanted to highlight the power and extensibility of Backstage in the Roadie-fied version, but also make it significantly easier to make changes to the Catalog.</p><p>That's why we have Catalog Tabs (for making easy filters of the Catalog) and the Decorator (to make changes to entities via the UI or the API, without having to use YAML).</p><p>One issue with this novel approach is it created a long-tail of issues with plugins upstream. OSS plugins sometimes assume things like a Domain filter can be applied over the Catalog and a filtered list of Domains will be presented to the user. We don't really have that concept by default (and we allow users to remove it) so we've hit a few bugs over the last few months.</p><p>The All tab solves that.</p><p>The All tab shows you your whole, unfiltered Catalog. You can apply filters over the top of it, as you like. Previous cards that linked through to filtered lists of Domains, Systems etc, will now land here.</p><p>We like it a lot - we hope you like it too.</p><p>Enjoy!</p><h3>Admin Area glam-up [in beta]</h3><p>The Admin area has also seem some significant changes this month.</p><p>The old Admin area had grown into a bit of a sprawl in recent times. 70+ plugins and integrations all mixed together, doesn't lend itself to a clean and easy to navigate UX.</p><p>We decided a revamp was in order. We focused on the discoverability of plugins - namely moving to a card-based layout and refreshing the search. We also took the opportunity to make changes to the information architecture of this area to chunk up the config options into different tabs.</p><p>Next up we'll be standardising each page to make adding secrets, seeing which cards and tabs that plugin exposes are being used, and understanding at a glance how it is configured. After that we'll be looking at how simple health checks and status indicators for each plugin can give greater feedback to users about the health of that connection.</p><p>Enjoy.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4Vo2wB73BvaBREyxvsO49U/1f1311aaf069340c2914107da1d5e4f8/Screenshot_2025-03-19_at_10.34.36.png" alt="Admin Area"></p>
]]></content:encoded></item><item><title><![CDATA[From Day 0 to Day 2 - a Guide to Planning and Implementing Backstage ]]></title><link>https://roadie.io/blog/from-day-0-to-day-2-a-guide-to-planning-and-implementing-backstage/</link><guid isPermaLink="false">https://roadie.io/blog/from-day-0-to-day-2-a-guide-to-planning-and-implementing-backstage/</guid><pubDate>Thu, 13 Mar 2025 10:00:00 GMT</pubDate><description><![CDATA[Backstage is powerful, but getting it right takes work. We break down the Day 0, Day 1, and Day 2 journey of implementing Backstage, covering the planning, deployment, and long-term maintenance challenges that teams face. From the hidden complexities of catalog population to the skill gaps in platform teams, we explore what it really takes to make Backstage successful.]]></description><content:encoded><![CDATA[<p>Backstage’s biggest strength is its <a href="https://roadie.io/blog/the-power-of-customization-making-backstage-work-for-you-with-roadie/">flexibility</a>. It’s an open and highly extensible platform, designed to be <a href="https://roadie.io/blog/understanding-the-backstage-system-model/">flexible</a> enough to adapt to the way your engineering team works. With the right setup, it can become the backbone of your internal developer portal, streamlining service discovery, documentation, self-service workflows and <a href="https://roadie.io/blog/how-to-define-engineering-standards/">software standards</a> governance.</p><p>But that same flexibility comes with a cost. Backstage doesn’t come pre-configured for your organization’s needs - it’s a toolkit; more a framework for <strong>building</strong> an internal developer platform than a platform in and of itself. Moreover, it requires customization and ongoing maintenance, and crucially, it demands <a href="https://roadie.io/blog/backstage-how-much-does-it-really-cost/#breaking-down-the-backstage-tco">TypeScript expertise</a> - a skillset that many platform teams don’t typically have in-house. Unlike infrastructure-as-code or cloud automation tools (which use more familiar Python or Go), Backstage development requires working directly with a React and TypeScript codebase. This mismatch in skillsets is one of the biggest hurdles teams face when trying to operationalize Backstage, and it’s a key reason why many underestimate the effort required - especially beyond the initial deployment.</p><p>It’s helpful to think of the Backstage journey following a lifecycle: the planning and setup phase (Day 0), the deployment and integration work (Day 1), and the ongoing operations, scaling, and optimization (Day 2 and beyond). Each of these phases comes with its own challenges, and if you’re not thinking ahead, it’s easy to get stuck and see momentum stall.</p><h2><strong>Day 0: Planning and Strategy - Laying the Foundation for Backstage</strong></h2><p>Before diving headfirst into writing YAML, it makes sense to ensure Backstage is set up to solve real problems for your engineering teams. Many organizations install Backstage expecting quick wins, only to find that a half-filled service catalog and a couple of integrations deliver little immediate value. A successful implementation starts with understanding what engineers actually need from Backstage - whether it’s faster onboarding, better service discoverability, or standardization across teams. Without this clarity, adoption stalls before it even begins.</p><p><strong>Key considerations for Day 0</strong></p><p><strong>Defining success.</strong> What problem are you solving? Service observability? Developer self-service? Compliance automation? What’s the burning problem that you’re hoping an IDP will solve? Speak to your broader engineering organization and find out if the major source of frustration is a lack of centralized information around services, painful and lengthy environment creation times, or a lack of engineering standards. Knowing what problem you’re solving helps prioritize your Backstage implementation and rollout.</p><p><strong>Building organization alignment</strong>. One of the most underestimated challenges isn’t technical at all - it’s cultural. Backstage success relies on organizational buy-in, and many teams struggle with adoption of the developer portal concept itself. Engineers may be skeptical about whether Backstage (or any portal) will truly save them time or improve their workflows. A strong internal advocacy plan is just as important as a technical implementation plan. As such, it’s important to consider who owns the portal? What teams need to be involved? How will you drive adoption? Successful Backstage implementations have a clear owner who can lead this - whether that’s the platform team, developer experience group, or an infrastructure lead.</p><p><strong>Understanding complexity.</strong> Are you dealing with multiple repositories? Do you have existing metadata in spreadsheets? What integrations do you need from day one? Backstage thrives in complex engineering environments, but that complexity also makes implementation harder. If your services are spread across multiple repos (because you’re in the process of changing SCMs, for example), you’ll need to define a strategy for catalog completeness. Similarly, if engineering teams have historically tracked metadata in spreadsheets or Confluence, plan for how that information will migrate into Backstage.</p><p><strong>Choosing a hosting model.</strong> Do you have the bandwidth to stand up and then maintain a self-hosted Backstage instance, or would a managed solution free up your team? Self-hosting Backstage means setting up your own infrastructure, handling upgrades, and troubleshooting outages - responsibilities that many platform teams don’t want to take on. A managed solution like Roadie allows you to focus on adoption and feature development instead of infrastructure headaches.</p><p><strong>How Roadie helps on Day 0</strong></p><ul><li><strong>Expert-led onboarding.</strong> We guide teams through best practices and pitfalls based on real-world experience with different customers across multiple verticals.</li><li><strong>Pre-built integrations.</strong> Get out-of-the-box support for GitHub, Jira, PagerDuty, and more without spending months setting up plugins.</li><li><strong>No infrastructure headaches.</strong> Roadie fully manages Backstage, so you don’t need to worry about performance, upgrades, security patches, or downtime.</li></ul><p>Many teams get stuck in Day 0, unsure of where to begin. With Roadie, you don’t have to navigate this alone. Our Solutions team has helped hundreds of organizations set up Backstage Proof of Concepts, and helps ensure a good fit for Backstage and your organization from the very beginning.</p><h2><strong>Day 1: Deployment and Integration - Getting Backstage Live</strong></h2><p>Once the strategy is in place, it’s time to make Backstage a reality. This phase involves setting up the initial Backstage implementation, including the software catalog, integrating with existing developer tools, and ensuring that authentication, access control, and self-service workflows are functional.</p><p><strong>Key considerations for Day 1</strong></p><p><strong>Service catalog population.</strong> Where will Backstage source its metadata? Do you need GitHub sync, YAML definitions, or both? A common mistake is launching Backstage with an incomplete catalog, which can hamper efforts at driving adoption. If engineers search for a service and can’t find it, they won’t come back. You need a strategy for <a href="https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/">automatic catalog population</a> so that new services appear without manual effort. Putting Backstage into the production path is also a good way to ensure long-term catalog completeness - if developers are required to register their new services in Backstage, it builds a positive feedback loop to drive catalog completion.</p><p><strong>Tooling integrations.</strong> Which services (CI/CD, observability, cloud platforms) should be connected? Backstage is most powerful when it integrates with the tools developers use daily. Are you pulling in logs from <a href="https://roadie.io/docs/integrations/datadog/">Datadog</a>? Linking to on-call rotations in <a href="https://roadie.io/docs/integrations/pagerduty/">PagerDuty</a>? Surfacing deployment insights from <a href="https://roadie.io/docs/integrations/argocd/">ArgoCD</a>? The sooner Backstage provides visibility into a team’s actual workflow, the sooner it becomes indispensable. One of Backstage’s most compelling propositions is of the central pane of glass, reducing the need for developers to switch windows. This is only possible if you’ve integrated all the necessary tools into your Backstage environment.</p><p><strong>Authentication and RBAC.</strong> How do you ensure secure, role-based access control? Not every engineer needs full edit access to Backstage and all of its various moving parts. Define your authentication strategy early - will you use GitHub SSO? Okta? A custom identity provider? Role-based access control (RBAC) should align with your engineering org’s structure so that only relevant teams can modify services and templates while ensuring visibility remains open. Fine grain control allows you to go deeper, restricting access to sensitive entities or applying sensible limits on who can (for instance) create new scaffolder actions.</p><p><strong>Scaffolder workflows.</strong> Speaking of which, what self-service templates will accelerate developer productivity? The scaffolder is often underutilized because teams don’t set up useful templates from the start. Think about what engineers repeatedly request from platform teams (or check your ticket volumes) - service creation, infrastructure provisioning, spinning up a new API - and build those workflows into Backstage’s scaffolder. The goal is to reduce developer toil by automating common tasks**.**</p><p><strong>How Roadie helps on Day 1</strong></p><ul><li><strong>Instant deployment.</strong> A fully managed Backstage instance, set up in minutes instead of months.</li><li><strong>Automated catalog population.</strong> GitHub sync, YAML registration and the <a href="https://roadie.io/docs/api/catalog/">Roadie catalog API</a> streamline the onboarding process.</li><li><strong>Pre-configured <a href="https://roadie.io/product/access-control/">RBAC</a>.</strong> Secure access controls without needing to build policies from scratch. Fine-grained control gets you the granularity you need on top of sensible defaults.</li><li><strong><a href="https://roadie.io/product/scaffolder/">Scaffolder templates</a> and best practice.</strong> We can help get you going with advice and guidance on workflows for service creation, infrastructure provisioning, and more.</li></ul><p>By handling the operational complexity, Roadie ensures that Day 1 is about getting value from Backstage, not wrestling with configuration files.</p><h2><strong>Day 2: Operations and Optimization - Keeping Backstage Running Smoothly</strong></h2><p>Great, you’ve got Backstage live, but now the real work begins! More than anything, for teams approaching Day 2, adoption becomes the focus, but there’s still a significant governance and performance management component. Without a strong Day 2 plan, Backstage can quickly become outdated, underutilized, or a maintenance burden.</p><p><strong>Key considerations for Day 2</strong></p><p><strong>Monitoring and performance.</strong> Is Backstage responsive, scalable, and available when developers need it? Engineers are notoriously performance sensitive, so Backstage should feel snappy, not sluggish. A slow portal can affect adoption -  monitoring page load speeds, request latency, database queries, and plugin performance ensures that Backstage remains fast and scalable as more teams onboard, and the size of the catalog grows.</p><p><strong>Governance and compliance.</strong> Are service owners keeping metadata up to date? Is technical debt being tracked? Backstage is only as useful as the data inside it. If ownership details, API versions, or compliance statuses are outdated, trust erodes. Setting up Tech Insights to track stale metadata and enforce policies helps maintain catalog health.</p><p><strong>Driving adoption.</strong> Are developers using Backstage as their central hub? How do you encourage engagement? A Backstage rollout is only successful if engineers <a href="https://roadie.io/blog/improving-and-measuring-developer-experience-with-backstage/">actually use it</a>. This means marketing it internally, gathering feedback, and ensuring that the portal delivers real value - whether that’s through dashboards, integrations, or self-service features.</p><p><strong>Ongoing feature development.</strong> What additional capabilities and custom plugins should you roll out over time? Backstage evolves. As teams become comfortable with the service catalog, they may want to expand into TechDocs, scorecards, security insights or even custom plugins specific to their needs. Having a roadmap for feature adoption keeps momentum going.</p><p><strong>How Roadie helps on Day 2</strong></p><ul><li><strong>Automated upgrades and maintenance.</strong> Always running the latest, most secure version - no manual updates required.</li><li><strong>Tech Insights for governance.</strong> We help you identify gaps in ownership, enforce security policies, and track catalog health.</li><li><strong>Performance and scaling support.</strong> Roadie <a href="https://roadie.io/blog/roadie-bts-running-backstage-at-scale/">continuously optimizes</a> for speed and reliability, even as usage grows.</li><li><strong>Ongoing guidance and best practice.</strong> Access to our team of solution engineers and customer success for troubleshooting, strategy, and long-term platform growth and adoption.</li></ul><p>Maintaining Backstage isn’t just about keeping it running - it’s about keeping it useful. Many teams struggle with the high maintenance effort and slow feature development velocity, as each upgrade or new integration requires engineering time. With Roadie, we handle upgrades, performance optimization, and <a href="https://roadie.io/tags/changelog/">feature rollouts</a> for you, ensuring your Backstage instance stays modern without burdening your team.</p><h2>A Simpler, Faster Path to Backstage Success</h2><p>Backstage adoption isn’t just about getting to Day 2 - it’s about ensuring long-term success through proper planning, a smooth launch, and continuous improvement.</p><p>Without Roadie, teams often find themselves bogged down in maintenance, debugging, and internal support. <a href="https://roadie.io/blog/why-hybrid-is-best-for-idps/">With Roadie</a>, Backstage is effortless to deploy, easy to scale, and always up-to-date. That’s why to date, we’ve had several customers migrate from self-hosted Backstage to a Roadie-managed solution.</p><p>If you’re considering Backstage or struggling with an existing implementation, let’s talk. <a href="https://roadie.io/request-demo/?referringPathname=blog">Book a demo today</a> and see how Roadie accelerates your journey from Day 0 to Day 2 and beyond.</p>
]]></content:encoded></item><item><title><![CDATA[Introducing: the Wiz plugin for Backstage]]></title><link>https://roadie.io/blog/introducing-the-wiz-plugin-for-backstage/</link><guid isPermaLink="false">https://roadie.io/blog/introducing-the-wiz-plugin-for-backstage/</guid><pubDate>Tue, 04 Mar 2025 08:00:00 GMT</pubDate><description><![CDATA[Wiz helps engineering teams stay on top of security risks, but flipping between tools to check vulnerabilities slows everything down. With the new Wiz Plugin for Backstage, teams can see their security insights right alongside their services, documentation, and pipelines - no extra steps required!]]></description><content:encoded><![CDATA[<p>Platform engineering has changed how teams build and deliver software, yet security checks often remain siloed in their own dashboards or require context switching that slows everyone down. Roadie’s new <a href="https://roadie.io/docs/integrations/wiz/">Wiz Plugin</a> for Backstage tackles this issue head-on by bringing Wiz’s cloud security insights directly into the Backstage interface.</p><p>It’s no secret that identifying misconfigurations or vulnerabilities early can save engineering teams huge amounts of time and risk. But doing that efficiently means centralizing actionable security data in the same place developers already work. That’s where this new plugin comes into play.</p><p>According to Roadie Software Engineer Irma Solakovic (who built the plugin), it’s all about speed and clarity, and a single pane of glass:</p><p><em>“As an engineer, you want to shorten the time it takes to spot issues - whether that’s a misconfiguration or a vulnerability. Save time and stop hopping between different platforms - if you're already looking at your services in Backstage, why not see what's wrong there, too? Now teams can see Wiz issues alongside their services, documentation, and CI/CD pipelines.”</em></p><h2>What Does the Plugin Do?</h2><p>The plugin surfaces Wiz’s prioritized security findings - such as vulnerabilities, compliance risks and misconfigurations - right inside Backstage, giving teams a unified overview of their software landscape and its security posture. Key highlights include:</p><ul><li><strong>Single Pane of Glass:</strong> No need to jump between separate tools. Check your critical Wiz findings the moment you inspect a service in Backstage.</li><li><strong>Contextual Issue Prioritization:</strong> Wiz assesses vulnerabilities and misconfigurations in the context of your specific environment, ensuring that the most pressing threats are highlighted first.</li><li><strong>Comprehensive Risk Insight:</strong> Access up-to-date, Wiz-scanned data so you can focus on the most pressing issues first.</li><li><strong>Open Source &#x26; Managed:</strong> The plugin is open source for self-hosted Backstage users, and Roadie customers benefit from automated token renewal and streamlined setup.</li></ul><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1R0ODp3ZyR2CAWa3Tfjncd/b19c7cf162a6834ac99653ea11123442/wiz-issues-widget.png" alt="wiz-issues-widget"><em>See your Wiz issues by status, from inside Backstage</em></p><h2>Certified by Wiz</h2><p>This plugin has been certified by Wiz, so it meets Wiz’s standards for reliability and quality. Organizations using Wiz can be confident that their data is displayed accurately in Backstage - letting them act on the right information at the right time.</p><p>According to Irma, the Wiz team provided valuable support during the development process:</p><p><em>“We got great feedback from the Wiz engineers, especially around handling API usage guidelines. Their insights were really helpful in making this plugin both accurate and efficient.”</em></p><p>The feeling goes both ways, and the Wiz team were equally excited by the development of the plugin by Roadie: "<em>We’re thrilled to see Roadie’s Wiz Plugin seamlessly integrate prioritized Wiz security insights into Backstage, empowering developers and security teams to collaborate more effectively, quickly identify critical issues, and streamline remediation directly within their existing workflows.</em>"</p><h2>Why Choose Roadie for Wiz + Backstage?</h2><p>While anyone can adopt the open-source plugin, Roadie offers a managed solution and we can assist with initial setup and ongoing usage of the plugin. Additionally, Roadie allows for managing Wiz secrets via the UI, while open-source users will need to add these to their configuration file (app-config.yaml). With Roadie, you spend less time fiddling with integrations and plugins, and more time actually fixing issues.</p><h2>Get Started</h2><p>Ready to give it a spin? Check out our resources:</p><ul><li><strong>Roadie users:</strong><a href="https://roadie.io/docs/integrations/wiz/">Roadie Wiz Plugin Docs</a></li><li><strong>Self-hosted users:</strong> Install via NPM (<code>@roadiehq/plugin-wiz-backend</code>) and <a href="https://roadie.io/backstage/plugins/wiz/">follow our guide</a>.</li></ul><p>We’re excited to hear your feedback, and look forward to seeing how the open community continues to improve and enhance this plugin.</p>
]]></content:encoded></item><item><title><![CDATA[Roadie BTS: Running Backstage at Scale  ]]></title><link>https://roadie.io/blog/roadie-bts-running-backstage-at-scale/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-bts-running-backstage-at-scale/</guid><pubDate>Thu, 20 Feb 2025 07:00:00 GMT</pubDate><description><![CDATA[Backstage is powerful, but like any application, keeping it stable at scale can be a challenge. Memory leaks, unnecessary restarts, and background jobs interfering with performance can slowly erode reliability - until they become major headaches. At Roadie, we’ve been making deep, structural improvements to eliminate these issues, ensuring a smoother, more resilient experience for platform teams. Here’s how the Roadie engineering team tackled hidden performance pitfalls and made Backstage run better than ever.]]></description><content:encoded><![CDATA[<p>At Roadie, we know that platform teams rely on their Internal Developer Platform (IDP) to be always available, performant, and stable. When an IDP goes down, developer productivity suffers. That’s why we’ve been working behind the scenes on a series of improvements to make Roadie Backstage implementations more reliable and resilient, especially at scale. These aren’t necessarily flashy new features, but they make a huge difference in ensuring everything runs smoothly.</p><p>Our engineering team tackled a few key problem areas - memory leaks that were leading to crashes, liveness probes that were restarting services unnecessarily, and background jobs that were sometimes slowing things down. Let’s dig into what was happening, how we fixed it, and why it matters.</p><h3>The Challenge: Unstable Backends, Memory Leaks, and Unnecessary Restarts</h3><p>Like any complex platform, Backstage and Roadie’s multi-tenant setup requires ongoing maintenance to stay performant. Over time, we started noticing some recurring issues. Some customers were seeing intermittent backend errors. Our monitoring showed that some instances were restarting more than they should, and memory usage in certain environments was creeping up in a way that suggested something wasn’t being cleaned up properly.</p><h3>Diagnosing the Issues: A Deep Dive Into Backstage Stability</h3><p>The first step was figuring out exactly what was happening. We leaned heavily on our monitoring tools, using Grafana dashboards, CloudWatch logs, and Prometheus metrics to track memory usage, CPU performance, and garbage collection behavior. We also took a lot of heap dumps—essentially snapshots of what was in memory at a given time—to compare them over time and spot patterns.</p><p>Some issues were straightforward and easy to resolve quickly, but others were far more subtle. While some fixes were immediate, the more nuanced memory leaks and stability concerns required deeper investigation. These problems took time to surface and were harder to track down, requiring careful analysis across days or even weeks of data. Debugging them meant running long-term comparisons of heap dumps, identifying small but persistent changes, and testing fixes incrementally to ensure they actually solved the issue without introducing new problems.</p><h3>Fixing Memory Leaks</h3><p>One of the biggest culprits turned out to be a global constant that was continually appended to with the same value, causing unbounded growth in memory usage. Instead of replacing old data, the service kept adding to the same constant over time, gradually increasing memory usage until the instance would crash.</p><pre><code class="language-tsx">const A_CONSTANT = [];

export class CustomProcessor implements CatalogProcessor {
  async preProcessEntity(
    entity: Entity,
    location: LocationSpec,
  ): Promise&#x3C;Entity> {
    // The following causes a memory leak
    A_CONSTANT.push('blah');
  }
}
</code></pre><p>Additionally, express request handlers were starting un-awaited promises, which in certain edge cases could remain unresolved, holding onto memory and preventing proper cleanup.</p><pre><code class="language-tsx">const makeLargeObject = async () => {
  // assign large object and sleep.
}
router.get('/consume-memory', async (_, res) => {
  makeLargeObject(); // unwaited promises can cause memory leaks
  return res.status(200).send();
});
</code></pre><p>Backstage’s default in-memory cache behavior also contributed to the problem. Expired cache items were only removed when accessed, meaning if something wasn’t retrieved after expiring, it would just sit there, taking up space indefinitely. To fix all of this, we cleaned up how constants were stored, replaced the problematic setTimeout loops with Backstage’s built-in scheduler, made sure all promises were handled correctly, and implemented a scheduled cache cleanup process to prevent memory bloat.</p><h3>Tuning Kubernetes Liveness Probes</h3><p>Liveness probes are a great feature in Kubernetes, but they need to be carefully configured to reflect the actual health of a service. Previously, our probes were reporting success too early in the startup cycle, meaning that broken or incomplete processes were being marked as healthy and starting up successfully—even when they weren’t fully ready. To fix this, we separated out readiness and liveness probes, ensuring that Kubernetes only considered a service healthy once it had completed its startup process correctly.</p><h3>Worker Separation for a More Reliable API</h3><p>Backstage, by default, runs background jobs in the same process as the API. That means if something in the background is taking a while - say, a Tech Insights check - it could impact API performance. We saw cases where this led to slow response times or, in extreme cases, crashes. The fix here was relatively straightforward in theory: move background jobs into separate worker processes so they no longer affect the API’s stability. Now, even if something in the background is taking longer than expected, it won’t slow down the user-facing parts of the platform.</p><h3>The Impact: A More Stable and Resilient Roadie</h3><p>The results of these changes were immediate. One of the most noticeable improvements was the immediate end of Sunday night crashes. Previously, during the week, our continuous updates to customer Backstage instances meant that services were restarted regularly, which helped keep memory leaks from accumulating. But over the weekend, with no updates happening, memory leaks would build up unchecked, leading to crashes by Sunday night or Monday morning with some level of predictability. Now, with these fixes in place, memory usage remains stable throughout the week, and these weekend crashes have disappeared.</p><p>Backend restarts are also happening much less frequently, and overall system stability has improved across the board. API performance is more predictable, memory usage is under control, and customers are seeing fewer errors and disruptions.</p><p>For platform teams using Roadie, this means a smoother experience with fewer unexpected issues. If you’re running an IDP, you shouldn’t have to worry about whether it’s up and running—it should just work. And that’s what these improvements were all about.</p><h3>Lessons Learned &#x26; What’s Next</h3><p>One of the biggest takeaways from this work is just how important it is to keep a close eye on certain key metrics. If Kubernetes is restarting your pods more than usual, something’s probably wrong. If Backstage’s processing queue is growing, that’s a sign that background tasks aren’t keeping up. And if global constants aren’t being handled properly, they can lead to slow, creeping performance issues that only become obvious after weeks or months.</p><p>We also learned a few best practices along the way. Freezing global constants with <code>Object.freeze</code> helps prevent unintended modifications. Handling all promise rejections properly ensures that errors don’t silently accumulate in memory. And separating background jobs from API processes is a simple but powerful way to improve overall system stability.</p><p>Looking ahead, we’re continuing to refine our observability and monitoring, making sure that stability issues can be detected even earlier. We’re also exploring ways to improve how Backstage handles large-scale processing workloads so that performance remains smooth even as usage grows.</p><h3>Roadie: Making Backstage Better for Platform Teams</h3><p>The goal at Roadie is to make sure platform teams get the best possible experience with Backstage. That means not just adding new features, but constantly refining and improving what’s already there. These stability fixes are part of that ongoing effort, ensuring that Backstage is as reliable and hassle-free as possible for teams who depend on it.</p><p>If you’re using Roadie today, these improvements are already live and working for you. If you’re thinking about adopting Backstage and want to ensure you’re getting a battle-tested, highly available implementation, <a href="https://roadie.io/request-demo/?referringPathname=blog">we’d love to chat</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Roadie Local: Self-hosted Backstage, ready in minutes]]></title><link>https://roadie.io/blog/roadie-local-self-hosted-backstage-ready-in-minutes/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-local-self-hosted-backstage-ready-in-minutes/</guid><pubDate>Wed, 19 Feb 2025 11:00:00 GMT</pubDate><description><![CDATA[Roadie has been SaaS-only since it was founded, but the times they are a changin'. A locally runnable and deployable version of Roadie is on its way and now in beta.]]></description><content:encoded><![CDATA[<h2>💈 Introducing Roadie Local (currently in beta)</h2><p>Over the years the ability to self-host Roadie has come up time and again.</p><p>Platform teams for security-conscious, government-facing, or generally pro-self-hosting would often approach us about using Roadie but were blocked. They couldn't access all the Roadie goodies that our SaaS customers know and love, like scorecards, RBAC, and a no-code UI for crafting entity pages and enabling plugins.</p><p>Even folks who simply wanted to spin up a quick PoC of Backstage would approach us and ask if we could help. There isn't a default image for running Backstage and it can be tricky to even stand up an instance, let alone craft a version of Backstage that reaches parity with proprietary offerings like Cortex or Port in a PoC timeframe.</p><p>All that's changing.</p><p>We're launching self-hosted Roadie.</p><p>We call it <strong>Roadie Local</strong>.</p><ul><li>You'll be able to self-host it.</li><li>You'll be able to use it for free (for less than 15 users).</li><li>And it's currently in beta.</li></ul><h2>Why self-host Roadie?</h2><h3>Running Roadie in a security-conscious system</h3><p>Maybe you're a government department, defense contractor, high-frequency trading firm, or simply an organization that takes security very, very seriously: you need the software you use to be on-prem or at the very least hosted in your cloud account.</p><p>Roadie Local is that. We're even working on an air-gapped solution.</p><h3>Spinning up Backstage, fast</h3><p>Early in the cycle of adopting an Internal Developer Portal (IDP) like Roadie or Backstage you want to kick the tires and test things out. In fact, you're usually testing a whole bunch of different IDPs to understand the market and what each one offers. From completely open-source offerings like Backstage, proprietary offerings like Cortex or Port, and hybrid open-source-but-with-support-and-some-nice-features like Roadie.</p><p>Spinning up Backstage for these kinds of PoCs usually took a bit of time though, and you don’t necessarily get to feature parity with the proprietary offerings in the time allotted.</p><p>With Roadie Local you can stand up a best-in-class IDP, based on open-source Backstage, in minutes.</p><h3>Migrating from Self-hosted Backstage</h3><p>Migrating from self-hosting Backstage to Roadie is now a well-trodden path, but the act of switching is still non-trivial. That is often because of the snags from moving to SaaS. Moving from  self-hosting Backstage to self-hosting Roadie is a much simpler adoption path.</p><h2>How it works</h2><p>We've tried to keep it simple:</p><ul><li>You request a license key</li><li>We give you a license key</li><li>You pull the Roadie Local image and install it</li><li>You have a local version of Roadie</li></ul><p>That's abstracted a little 👆, but not much.</p><h2>What's in the box?</h2><p>In essence, everything that you know and love now as well as everything we're building for the SaaS version of Roadie. There will be no meaningful difference in terms of feature availability.</p><p>That means:</p><ul><li>A <strong>composed and polished version of Backstage</strong> that you can configure both in the UI and with file-based config options for added flexibility.</li><li>All the <strong>usability and quality of life improvements</strong> that we've built over the years, like no-code page editing and no-YAML ways to add catalog data.</li><li><strong>70+ plugins</strong> to help you get going on Day 1.</li><li><strong>Scorecards</strong> that let you document and enforce engineering standards.</li><li>And built-in <strong>RBAC</strong> to control who sees what and who can trigger certain actions.</li></ul><h2>The best part? Roadie Local is free for less than 15 contributing users.</h2><p>We've never had a fully free tier for Roadie.</p><p>We offer 30-day free trials of the SaaS product but the costs of running a full, isolated Backstage stack for each SaaS tenant we have meant that the economics never worked out for us to offer Roadie to smaller teams, en masse, for free.</p><p>The economics are different for Roadie Local, so we get to be different.</p><p>Roadie Local will be free for &#x3C;15 contributing users*.</p><ul><li>What's a Contributing User? Someone who has committed code to a piece of software tracked in the Catalog in the last 3 months.</li></ul><p>If you're playing around or running a quick Proof of Concept, then it's free.</p><p>If you're a smaller team and you have 10 engineers, then it's free.</p><p>Simple.</p><h2>What's next</h2><p>We don't like to spend long in beta, so expect Roadie Local to reach general availability in Spring 2025.</p><p>We're targeting BackstageCon EU in April 2025 for a big, ol' fashioned product launch.</p><h2>And then?</h2><p>In short, deployment options:</p><ul><li>Helm charts</li><li>Images and deployment options available on the AWS, Azure, and GCP Marketplaces</li><li>Other, non-Kubernetes, non-cloud provider-based deployment options</li></ul>
]]></content:encoded></item><item><title><![CDATA[Roadie Local is coming, State of Backstage launches, and Announcements lands]]></title><link>https://roadie.io/blog/more-performance-work-announcements-and-local-roadie-in-beta/</link><guid isPermaLink="false">https://roadie.io/blog/more-performance-work-announcements-and-local-roadie-in-beta/</guid><pubDate>Mon, 03 Feb 2025 11:00:00 GMT</pubDate><description><![CDATA[This month we're announcing our newest venture: on-prem Roadie, as well as a new community initiative, and the addition of announcements capability to the product.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>💈 Roadie Local [coming soon]</h2><p>We're going on-prem.</p><p>We've been toying with this idea for a long-time and we've decided to go all-in on enabling Roadie to be run on customer infrastructure.</p><p>We call it Roadie Local.</p><p>More soon - we're aiming for a beta launch in March, moving to general availability in Q2 2025.</p><h2>💈 State of Backstage launches</h2><p>We're also launching a survey.</p><p>When a community reaches a certain level of maturity getting a good lay of the land each year becomes a good idea. Backstage has hit that point - maybe we're even a little late to realise it - so we thought it was time to launch an annual State of Backstage survey.</p><p>This is a non-commercial activity for us. It’s solely to foster a greater sense of community and information sharing.</p><p>We’ll publish the report from this survey in Spring 2025 for free (of course).</p><p>Check it out on the state of backstage.io site.</p><h3>🔌 Plugins &#x26; Integrations roundup</h3><ul><li><strong><a href="https://roadie.io/docs/integrations/announcements/">Announcements plugin</a></strong>: we pulled in the Announcements plugin to give customers the ability to post updates and requests to their userbase within Roadie. It's a plugin that's been around for some time, but it's developed markedly in the last year or so so we thought it was time. Enjoy!</li></ul>
]]></content:encoded></item><item><title><![CDATA[State of Backstage Launches!]]></title><link>https://roadie.io/blog/state-of-backstage-launch/</link><guid isPermaLink="false">https://roadie.io/blog/state-of-backstage-launch/</guid><pubDate>Tue, 21 Jan 2025 05:00:00 GMT</pubDate><description><![CDATA[Deploying Backstage is just the beginning. The real challenge is turning it into a tool your developers rely on. Successful adoption requires treating Backstage like a product, aligning it with workflows, and fostering collaboration. Discover how to sustain adoption through strategic planning, automation, and creating meaningful value for your teams. ]]></description><content:encoded><![CDATA[<p>Today we’re announcing the launch of the <a href="https://stateofbackstage.io/" title="State of Backstage">State of Backstage survey</a>! 🥳🎉</p><p>Backstage has grown, diversified and differentiated a lot since it was open sourced back in 2020.</p><p>We have new features, new patterns of adoption, and many, many new members of the Backstage community.</p><p>One downside for having a sprawling community of business adopting Backstage is that we often don’t have much insight into how it is actually being used. Organisations tend to keep that information close to their chest.</p><p>There are some exceptions and some information is in the public domain to help new adopters and those who are looking to super-charge their instance. For instance, case studies are extremely useful for spreading the word about patterns of adoption. The Backstage Discord channels is great for questions about individual features and plugin adoption. Consultants and SaaS providers like Roadie are good for seeing a wider subset of good patterns.</p><p>Yet even commulative, all of these channels mean that information is fragment and the flow of information and knowledge is inconsistent.</p><p>We lack a systematic and widespread data capture mechanism to ping the community for its current state.</p><p>That’s why we’re launching the <a href="https://stateofbackstage.io/" title="State of Backstage">State of Backstage survey</a>.</p><p>This is a non-commercial activity for us. It’s solely to foster a greater sense of community and information sharing.</p><p>We’ll publish the report from this survey in Spring 2025 for free, on the state of backstage.io site.</p><p>It aims to provide a general map of the community and an understanding of its current state.</p><p>A novice Backstage user, new to the community, maybe even standing up Backstage for the first time, should be able to understand at a glance understand current best practice, plugin adoption, and patterns that are working for organisations successfully adopting backstage.</p><p>Check it out (and fill it in) at <a href="https://stateofbackstage.io/" title="stateofbackstage.io">stateofbackstage.io</a>!</p>
]]></content:encoded></item><item><title><![CDATA[Backstage Adoption: The Day 2 Problem]]></title><link>https://roadie.io/blog/backstage-adoption-the-day-2-problem/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-adoption-the-day-2-problem/</guid><pubDate>Tue, 21 Jan 2025 05:00:00 GMT</pubDate><description><![CDATA[Deploying Backstage is just the beginning. The real challenge is turning it into a tool your developers rely on. Successful adoption requires treating Backstage like a product, aligning it with workflows, and fostering collaboration. Discover how to sustain adoption through strategic planning, automation, and creating meaningful value for your teams. ]]></description><content:encoded><![CDATA[<p>Deploying Backstage is an <a href="https://roadie.io/blog/what-to-think-about-when-youre-thinking-about-an-idp/">huge milestone</a> for any engineering team. Whether you’ve set it up to improve service discoverability, streamline onboarding with templates, or enhance governance, the possibilities it unlocks are transformative. But here’s the hard truth: deploying Backstage is just the beginning. What comes next - <a href="https://roadie.io/blog/roadie-solving-the-day-2-problem-with-backstage/">the Day 2 experience</a> - is where the real work begins.</p><p>Day 2 with Backstage is about moving beyond the initial setup and figuring out how to turn Backstage into a tool your developers use as part of their daily or weekly workflows, and in time, come to  rely on. Many teams start with specific use cases like <a href="https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/">service discoverability</a> or <a href="https://roadie.io/blog/how-to-define-engineering-standards/">software governance</a>, but these don’t always translate neatly into widespread adoption. That’s because solving a single pain point doesn’t necessarily create long-term habits. To make Backstage stick, you need a strategic approach that aligns its features with your developers’ daily workflows. So, let’s explore why Backstage adoption often stalls, and how to set your organization up for sustained success.</p><h2>Build it and they ~~will~~ might come</h2><p>It’s easy to assume that once Backstage is deployed, adoption will naturally follow. After all, it solves critical <a href="https://roadie.io/blog/improving-and-measuring-developer-experience-with-backstage/">pain points</a>: creating a unified service catalog, automating repetitive tasks with <a href="https://roadie.io/blog/the-backstage-scaffolder-a-powerful-new-orchestration-tool/">templates</a>, and introducing <a href="https://roadie.io/blog/improving-and-measuring-developer-experience-with-backstage/#governance-standards-adherence-and-complexity-management">governance tools</a> to improve software quality. But in practice, adoption requires much more than simply solving these problems. It requires building habits, demonstrating value, and integrating Backstage into the fabric of your engineering culture.</p><h2>Strategy: treat Backstage like a product</h2><p>The key to overcoming adoption challenges is to treat Backstage like a product, not just a one-off project. This mindset is at the heart of successful adoption and ensures that Backstage evolves over time to meet the needs of its users. Treating it as a product means committing to continuous improvement, guided by user feedback and measurable outcomes. That does mean that if you’re a platform engineer, you may need to put on a product manager hat, thinking strategically about what your users need, prioritizing features, and continuously communicating the value of Backstage to developers and leadership alike. This means balancing the technical work with a clear focus on solving problems and delivering value internally, which may represent a big change.</p><p>It needn’t be overwhelming though - start by engaging your most important internal users - your developers. Communicate with them to understand their workflows, pain points, and priorities. Backstage adoption is a collaborative effort, and the more involved your developers feel in shaping its direction, the more likely they are to embrace it. Use tangible metrics to track success, such as catalog completeness, daily active users, or ROI from key features like templates (engineering time saved for each template run, for instance). These metrics not only highlight areas for improvement but also help demonstrate value to leadership, ensuring sustained investment.</p><p>The ‘treat it like a product’ approach to a developer portal such as Backstage is really well articulated by Adam Rogal, who leads Developer Productivity and Platform at DoorDash. In the podcast episode “<a href="https://www.youtube.com/watch?v=een198m7gfg">Bootstrapping a Developer Portal</a>,” Rogal shares how his team built their developer portal, DevConsole, with a clear focus on delivering immediate value to their engineering customers. By engaging engineers early and iterating based on their feedback, the platform team ensured that DevConsole directly addressed real pain points. This approach not only drove higher adoption but also built trust with their internal users.</p><p>Rogal also underscores the importance of creating a community - not just of users but of contributors. By empowering teams to actively participate in shaping the portal, DoorDash fostered a culture of collaboration and continuous improvement. This communal effort allowed the portal to evolve alongside the organization’s needs, ensuring its long-term relevance and success.</p><p>By adopting this product mindset and prioritizing user engagement, platform teams can transform Backstage into an indispensable tool that enhances productivity and aligns with organizational goals. The experience at DoorDash serves as a powerful example of how this approach can drive both adoption and satisfaction.</p><h2>Tactics for sustaining adoption</h2><p>While adopting a product mindset gives you the strategic foundation for long-term Backstage success, tactics offer some actionable focus areas on a day-to-day basis. Let’s explore some key tactics to help drive and sustain <a href="https://roadie.io/tags/adoption/">adoption</a>.</p><h3>Automate catalog completeness</h3><p>A rich and accurate catalog is the foundation of Backstage, but manually maintaining it is time-consuming. The best approach is automation. By using integrations with GitHub or AWS, you can <a href="https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/">automatically populate the catalog</a> with metadata from your repositories or clusters. In our experience, organizations with the highest levels of catalog completeness (90% or more) have all made extensive use of automations such as templates and scripts, and entity <a href="https://roadie.io/docs/getting-started/autodiscovery/">autodiscovery</a> and ingestion to reduce the manual effort involved.</p><h3>Leverage templates for early wins</h3><p><a href="https://roadie.io/docs/scaffolder/writing-templates/">Templates</a> are one of the easiest ways to showcase the value of Backstage. They can <a href="https://roadie.io/blog/using-backstages-scaffolder-to-fill-up-your-catalog/">automate repetitive tasks</a>, like setting up a new microservice or creating a CI/CD pipeline, saving developers significant time and ensuring software governance and best practice is baked in. The ROI here is often immediate and measurable - reducing a process from months to minutes is something both developers and leadership can get behind.</p><h3>Measure and communicate value</h3><p><a href="https://roadie.io/product/tech-insights/">Metrics</a> are your best friend when it comes to adoption. Track data like catalog completeness, template usage (and time saved), and daily active users to understand what’s working and what needs improvement. Share these metrics with leadership to demonstrate the impact of Backstage and justify further investment.</p><h3>Make adoption a cultural effort</h3><p>Adoption isn’t just a technical challenge - <a href="https://roadie.io/blog/the-adoption-journey-initiatives-and-strategies/">it’s a cultural one</a>. Evangelism plays a crucial role here. Host “lunch and learns,” demos, or office hours to show developers how Backstage can make their lives easier. Engage other engineering teams and create internal champions who can advocate for the platform within those teams. Adoption grows when developers see Backstage as part of their workflow, not an extra step.</p><h3>Invest in custom plugins</h3><p>While Backstage’s open-source plugins are a great starting point, one of Backstage’s biggest selling points is its near limitless extensibility through <a href="https://roadie.io/docs/custom-plugins/overview/">custom plugins</a>. These <a href="https://roadie.io/blog/live-custom-backstage-plugins-within-seconds/#deploying-custom-plugins-to-roadie">plugins</a> can address your organization’s unique workflows and challenges, creating a toolset that developers can’t find elsewhere.</p><h2>Plan for Day 2 from Day 1</h2><p>The real value of Backstage lies not in its deployment but in its adoption. To unlock this value, you need to approach Backstage as a product - gathering feedback, iterating on features, and aligning its capabilities with your organization’s goals. Adoption doesn’t happen overnight, but with a strategic mindset and sustained effort, Backstage can become an indispensable tool for your engineering teams.</p><p>So, as you set up Backstage, don’t just think about the first deployment. Think about what comes next - the Day 2 experience. Plan for it, invest in it, and you’ll set your organization up for long-term success.</p>
]]></content:encoded></item><item><title><![CDATA[Performance, pagination, Unified Search,new Tech Insights visualisations, GHES support and Wiz certification]]></title><link>https://roadie.io/blog/performance-pagination-new-tech-insights-visualisations-ghes-support-wiz-certification/</link><guid isPermaLink="false">https://roadie.io/blog/performance-pagination-new-tech-insights-visualisations-ghes-support-wiz-certification/</guid><pubDate>Wed, 01 Jan 2025 11:00:00 GMT</pubDate><description><![CDATA[This is another bumper changelog that covers November and December: buckle up. The new year is upon us and we're covering all things performance related, as well as enhancements to our support for GitHub Enterprise Server and the security service Wiz. Oh, and a new way to visualise Tech Insights Fact Data to make it more seamlessly integrated into the Catalog.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>💈 Performance, Stability &#x26; Pagination</h2><p>We've been working on a bunch of different improvements to both performance and stability.</p><p>First among them is pagination of the Catalog, without losing filtering.</p><p>We'll be tweaking the filtering UI in the next few weeks and months but the first pass of pagination has proven we can dramatically reduce page load times without impacting the current UI or UX. We're already seeing ~300ms decreases in load time and increased stability for giant Catalogs (i.e. 100s of thousands of entities).</p><p>We're separately using service workers to pre-load some elements of the application to further speed up page load. More on that to come.</p><p>Last but not least, we've also been chunking up Backstage into separate pods to help us scale the application as usage grows. More on that to come in blog posts, but if you're running a giant instance or have reached scale with Tech Insights, splitting individual parts of the application out into separate pods is generally a good idea.</p><p>Fun stuff!</p><h2>Tech Insights Facts: Aggregated</h2><p>Sometimes you want to just see a Fact in Tech Insights. You don't want to look at whether it passed or failed a check or whether it is meeting a Scorecard requirement: sometimes you just want the data.</p><p>An entity-by-entity view is one thing (which we implemented with Fact Tables) but often you'll want to see a team-by-team view of fact data.</p><p>For checks and scorecards we have roll-ups, and now for Facts we have Fact Cards.</p><p>Say you're tracking DORA metrics or want to see mean time to resolution for incidents. Now you can :)</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/426KnqJ5xwfge9yCNXohrV/c494fdcf4b9e34144c952b27b35cb77d/facts.png" alt="facts"></p><h2>GitHub Enterprise Server (GHES) support</h2><p>We now support both GHES using the Roadie Agent and GHES combined with github.com (if that floats your boat).</p><p>Previous iterations of our GHES support have relied on a connection to the GHES instance over the public internet. This is ok for some organizations but not all, and usually if you're using GHES you're pretty pro-security and pro-don't-send-things-over-the-public-internet.</p><p>In the past for these use-cases we used the Roadie Agent, which is based on the open-source Snyk Broker, to create a long-lived websockets connection that is fully controlled by the customer. That hasn't, up until this point, worked with GHES though.</p><p>Now it does! :)</p><p>For good measure we threw in the ability to configure both GHES and github.com simulatenously to allow folks who are migrating from one to the other to support both SCMs on their Roadie instnace.</p><p>Enjoy!</p><h2>🧘‍♂️ Scaffolder Actions table</h2><p>We have also just launched some good ol' fashioned documentation.</p><p>The Scaffolder is an extremely powerful tool, but it doesn't have the same level of documentation or cataloging that plugins receive. It can be tricky to find out which open-source actions exist and how to use them.</p><p>To solve that, we created the Scaffolder Actions Directory. We searched the internet high and low to find all the actions we could. Most of them we support, some are hot of the press and not yet included in Roadie. We'll keep updating it as the Actions ecosystem evolves.</p><p>Check it out: https://roadie.io/docs/scaffolder/scaffolder-actions-directory/</p><h3>🔌 Plugins &#x26; Integrations roundup</h3><ul><li><strong><a href="https://roadie.io/docs/integrations/wiz/">Wiz plugin</a></strong>: Our Wiz plugin is now certified! The new Wiz security frontend plugin surfaces Wiz data inside Backstage. We've worked with Wiz to give it ol' seal of approval. It's available both inside Roadie and as an OSS plugin. 👩‍🚀🚀</li></ul>
]]></content:encoded></item><item><title><![CDATA[What to think about when you’re thinking about an IDP]]></title><link>https://roadie.io/blog/what-to-think-about-when-youre-thinking-about-an-idp/</link><guid isPermaLink="false">https://roadie.io/blog/what-to-think-about-when-youre-thinking-about-an-idp/</guid><pubDate>Fri, 13 Dec 2024 09:00:00 GMT</pubDate><description><![CDATA[Thinking about implementing an IDP? Here’s what we think you should be thinking about. From tackling discoverability issues and enabling self-service environments, to streamlining the developer experience and leveraging insights, we explore how to think about the things that actually matter when you're planning out your IDP implementation. ]]></description><content:encoded><![CDATA[<p>Thinking about implementing an Internal Developer Portal (IDP)? You're in good company - <a href="https://www.gartner.com/en/information-technology/technology-adoption-roadmap">Gartner believes</a> 80% of of platform engineering teams will use IDPs by 2026. There’s a growing sense that every forward-looking technology organization should “have an IDP,” but without a clear rationale as to the "why", this mindset risks building something that (at best) is broad and shallow but that fails to address the real issues slowing your teams down. If you focus on too many things at once, you could end up with a platform that may look impressive but doesn’t actually help people do their jobs.</p><p>Instead, a successful IDP emerges when you carefully pinpoint and address the actual challenges developers face. Is discoverability a major and constant headache, with engineers spending hours trying to figure out who owns which service, or whether a certain piece of functionality already exists somewhere else? Are operations teams buried in tickets because developers need help every time they want a new environment or test database spun up? Do your devs waste time hopping between half a dozen interfaces to track deployments, check logs, and find documentation?</p><p>These are the signals to tune into when you’re thinking about building an IDP. Aligning your IDP closely with well-understood problems ensures every feature is both purposeful and valued. Equally important, it <a href="https://roadie.io/blog/the-adoption-journey-initiatives-and-strategies/">sets the stage for adoption</a>. Developers and team leads won’t embrace a platform just because it’s there—they’ll embrace it if it truly solves their everyday pain points. By narrowing your focus to the challenges that matter, you can go deep on solutions that genuinely improve how your team works, rather than going wide and hoping something sticks.</p><h2>Focus on the Three Big Challenges</h2><h3>1. Discoverability</h3><p><strong>The Challenge:</strong></p><p>Have you ever seen teams re-implement a piece of functionality simply because they didn’t know it already existed somewhere else? Or maybe someone spends half a day sifting through old wikis, outdated Confluence pages, and random Slack threads looking for an API endpoint. That’s poor discoverability in action. It’s not just about knowing what services exist, but also understanding who owns them, what their dependencies are, and where the latest <a href="https://roadie.io/blog/adopting-backstage-documentation-and-support/">documentation</a> or runbooks can be found. When this information is hard to find, it leads to frustration, wasted time, and sometimes unnecessary duplication of effort.</p><p><strong>What to Prioritize and Implement:</strong></p><p>A <strong>Service Catalog</strong> that brings together all your services, their owners, docs, performance data, and runbooks in one easily searchable location. Using something like Backstage, you create a <a href="https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/">living directory</a> of what your organization offers internally, so developers spend less time hunting and more time building.</p><p><strong>Measuring Impact and ROI:</strong></p><p>Track how often teams ask questions like, “Do we have a service that does X?” or “Who’s responsible for Service Y?” If these inquiries drop significantly, you’ve hit a milestone. Similarly, if your onboarding times shrink—new hires who previously took weeks to understand the landscape now feel comfortable in days—that’s your IDP delivering tangible value.</p><h3>2. Self-Service</h3><p><strong>The Challenge:</strong></p><p>Ever see a feature get stuck in limbo because the developer can’t get the right environment spun up? Or watch an ops team drown under a pile of infrastructure requests that never seem to end? Without self-service capabilities, your velocity takes a hit. Developers have to wait on someone else’s schedule to get a test database or a staging cluster. By the time the resource is ready, the developer may have lost context or moved on to something else. Multiply that by all the teams and projects in flight, and it’s a huge drag on efficiency.</p><p><strong>What to Prioritize and Implement:</strong></p><p><strong>Template-driven provisioning</strong> and a<strong>automated pipelines</strong> that let developers handle common requests themselves. <a href="https://roadie.io/blog/the-backstage-scaffolder-a-powerful-new-orchestration-tool/">Pre-approved templates</a> can ensure that every provisioned environment adheres to best practices and security standards, so you’re not just removing a bottleneck—you’re also improving consistency and reliability.</p><p><strong>Measuring Impact and ROI:</strong></p><p>Start by noting how long it currently takes to get a new environment—maybe it’s three days. After introducing templates, see if you’ve brought that down to a few hours or less. Fewer infrastructure-related tickets and faster environment turnarounds mean your teams can maintain their momentum, delivering features and fixes quicker than before (never mind the increase in <a href="https://roadie.io/blog/improving-and-measuring-developer-experience-with-backstage/">developer productivity</a>).</p><h3>3. Developer Experience (DX)</h3><p><strong>The Challenge:</strong></p><p>Picture a developer’s daily routine: they log into one tool for CI/CD pipelines, another for metrics, another for logs, and a separate browser tab for documentation. This constant context-switching slows them down and increases cognitive load. Over time, this fragmented experience can lead to frustration and reduced morale. It’s not that your teams don’t have the tools—they might have too many, scattered across different interfaces with inconsistent user experiences and integration points.</p><p><strong>What to Prioritize and Implement:</strong></p><p>An IDP such as Backstage that has all the necessary plugins and integrations properly configured and working becomes a <strong>Single Pane of Glass</strong> that consolidates these critical elements. By giving developers one place to view logs, metrics, deployments, code reviews, and documentation, you’re streamlining their workflow and cutting down on wasted mental effort. Having an IDP that pulls in data and functionality from multiple sources, presenting it in a coherent, intuitive manner is a surefire way to improve developer experience for your internal engineering teams.</p><p><strong>Measuring Impact and ROI:</strong></p><p>Survey your developers before and after implementation. Ask how easy it is to find information, how often they switch tools, and how smoothly they can move from coding to testing to deploying. You can also keep an eye on DORA metrics—if the team starts shipping more frequently or fixing issues faster, your integrated interface may be part of the reason why.</p><h2>Tying It All Together</h2><p>These three challenges—discoverability, self-service, and DX—often feed into one another. Better discoverability saves teams from reinventing the wheel, reducing the complexity of what you need to maintain. Self-service capabilities speed up delivery, easing the workload on ops and freeing developers to move quickly. Improved DX lowers friction, streamlines daily workflows, and keeps developers happier and more productive. Together, these improvements create a virtuous cycle that helps your engineering organization move faster and build more resilient services.</p><p>With that as a baseline, your IDP can really step up and take things a step further. For example, <a href="https://roadie.io/blog/tech-insights-for-roadie-backstage/">Roadie’s Tech Insights</a> can add another layer of value to your IDP by providing a data-driven, real-time view into the overall health and quality of your services. Rather than relying on anecdotal evidence or gut feelings, Tech Insights surfaces concrete metrics—like compliance scores, dependency health, and adherence to security best practices—within the same platform your developers already use. This makes it easier for engineering leaders to identify hot spots, measure improvement over time, and align investments with areas of greatest need.</p><p>Ultimately, you don’t implement an IDP just to say you have one. You implement it because you’ve identified specific, costly problems and want to solve them. Start by pinpointing the real issues—where are you losing time, where are developers frustrated, where are processes too complex or opaque? Then map each problem to a targeted solution—like a service catalog, a set of ready-made provisioning templates, or an integrated interface—and track the results. By focusing on the areas that matter most, you ensure that your platform isn’t broad and shallow, but narrow and deep—truly making a difference where your teams need it most.</p><p>Get the “why” and the “what” clear first. From there, your IDP will have real purpose, real value, and real staying power within your organization. It’ll be something people rely on, not just another system they’re forced to use.</p>
]]></content:encoded></item><item><title><![CDATA[Roadie is now live on AWS Marketplace]]></title><link>https://roadie.io/blog/roadie-is-now-live-on-aws-marketplace/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-is-now-live-on-aws-marketplace/</guid><pubDate>Wed, 11 Dec 2024 08:00:00 GMT</pubDate><description><![CDATA[Roadie is now available on AWS Marketplace, making it easier than ever before to start your Backstage journey.]]></description><content:encoded><![CDATA[<p>We’re excited to share that Roadie is now live on the <a href="https://aws.amazon.com/marketplace/pp/prodview-xzcdojv2n7gw6?sr=0-1&#x26;ref_=beagle&#x26;applicationId=AWSMPContessa">AWS Marketplace</a>!</p><p>For engineering teams thinking about adopting Backstage, this is great news. Here’s why:</p><p><strong>Skip the procurement hassle</strong> - Dramatically simplify the procurement process and get Roadie through the AWS Marketplace. No need to onboard a new vendor, no unnecessary paperwork, and no invoicing headaches - enjoy simplified billing, consolidated into your existing AWS bill.</p><p><strong>Faster adoption</strong> - Roadie’s hosted Backstage solution is quick to set up and effortless to maintain, allowing you to skip the complexities, hosting and maintenance of self-hosted Backstage. Purchase Roadie through AWS Marketplace and you can get started even faster by avoiding lengthy approval and onboarding processes, which means you get to delivering value to your developers sooner.</p><p><strong>Built for AWS Users</strong> - Roadie works seamlessly with your AWS environment, making it easy to manage services, track operational metrics, and boost productivity and DevEx across your teams.</p><p><strong>Maximize your AWS discounts</strong> - As an added benefit, your Roadie spend on AWS Marketplace also counts toward AWS discount programs, helping you get even more value from your existing Cloud spend.</p><p>Visit Roadie on <a href="https://aws.amazon.com/marketplace/pp/prodview-xzcdojv2n7gw6?sr=0-1&#x26;ref_=beagle&#x26;applicationId=AWSMPContessa">AWS Marketplace</a> and see how we can help your team unlock the full potential of Backstage.</p><p><a href="https://aws.amazon.com/marketplace/pp/prodview-xzcdojv2n7gw6?sr=0-1&#x26;ref_=beagle&#x26;applicationId=AWSMPContessa" title="Roadie on AWS"><img src="//images.ctfassets.net/hcqpbvoqhwhm/1jchDOOkzbQQoV5bwyZ655/2aea10097735fa508b61e901a05d0cd4/Screenshot_2024-12-11_at_16.05.28.png" alt="Screenshot 2024-12-11 at 16.05.28"></a></p>
]]></content:encoded></item><item><title><![CDATA[The Backstage Scaffolder, a Powerful New Orchestration Tool]]></title><link>https://roadie.io/blog/the-backstage-scaffolder-a-powerful-new-orchestration-tool/</link><guid isPermaLink="false">https://roadie.io/blog/the-backstage-scaffolder-a-powerful-new-orchestration-tool/</guid><pubDate>Tue, 03 Dec 2024 04:00:00 GMT</pubDate><description><![CDATA[Curious about the Backstage Scaffolder? We take a look at what you can do with this powerful tool as well as some of its limitations in comparison to its rivals.]]></description><content:encoded><![CDATA[<p>Backstage, the open source internal developer platform (IDP) created by Spotify, has a powerful unsung tool up its sleeve. The <a href="https://roadie.io/docs/scaffolder/writing-templates/">Backstage Scaffolder</a> is a cloud orchestration tool that allows a wide variety of meta orchestration workflows. It can leverage sub-workflows like <a href="https://github.com/features/actions">GitHub Actions</a> or Infrastructure as Code cloud orchestrators like <a href="https://www.env0.com/">Env0</a> or <a href="https://www.hashicorp.com/en/products/terraform">Terraform Cloud</a> to make a hugely powerful and flexible templating and automation platform for any organization.</p><p>Orchestration tools like <a href="https://www.jenkins.io/">Jenkins</a>, <a href="https://www.ansible.com/">Ansible</a> or <a href="https://saltproject.io/">Salt</a> have been around for a while. Yet the ease of use and accessibility for engineers sets the Backstage Scaffolder apart. Unlike traditional tools, it is built directly into an IDP platform, making it readily available to the entire engineering organization and low cost. It also <a href="https://backstage.io/docs/features/software-templates/authorizing-scaffolder-template-details/">supports role-based access control</a> (RBAC), which allows fine-grained control over who can create, modify or use the templates.</p><p>The Backstage Scaffolder was built originally around the idea of templating repositories in Source Control Management systems to create “golden path” standardization in organizations. However, as with many orchestration tools with broad integrations and API scripting opportunities, the Scaffolder can be effectively used for a broad set of orchestration workflows within cloud native and even more traditional companies with HTTP API access to internal systems and orchestration layers.</p><p>Let’s look at some examples of what you can do with this tool as well as some of its limitations in comparison to its rivals.</p><h2>Creating Accessible Golden Paths</h2><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2cNqfSBe9OCulyAfrOspnN/3c857478b0f74ff590e20591772216de/sb-image1.png" alt="sb-image1">
Golden paths in software organizations are essentially agreed-on best practice templates for how to develop, build, deploy and monitor software. Templates could simply be a workflow, such as a best practice CI/CD deployment pipeline or a skeleton code repository for certain workloads, like an HTTP server.</p><p>The Backstage Scaffolder was initially designed around this use case — templating new code repositories with a golden path. For instance, when a backend developer wants to create a new AWS Lambda function for a ticket, instead of writing it from scratch or making their own decisions on repository layout, tooling, testing and language, they can run a Scaffolder template that gives them a standardized starting point with a common layout, build tooling and CI/CD workflows.</p><p>The Scaffolder has integrations to all the major cloud source control management systems like <a href="https://github.com/">GitHub</a> or <a href="https://azure.microsoft.com/en-us/products/devops">Azure DevOps</a> as well as support for <a href="https://mozilla.github.io/nunjucks/">Nunjucks</a> and <a href="https://cookiecutter.readthedocs.io/en/2.0.2/">Cookiecutter</a> templating languages. This allows engineers to get new projects off the ground in a matter of minutes using a clear and secure architecture and workflow. It can embed testing standards and approaches in a codebase to save engineering time and reduce bugs.</p><p>At the organizational level it can ensure easy transfer of developers between teams as each team has similar codebase structures and workflows allowing them to get off the ground faster.</p><h2>Golden Paths at Every Level</h2><p>But it also can be used  for much higher level golden path orchestrations such as AWS account creation, infrastructure bootstrapping on top of IaC automation tools and even orchestrating workflows in other automation tools such as GitHub Actions.</p><p>There are a dizzying array of <a href="https://roadie.io/docs/scaffolder/scaffolder-actions-directory/">open source Scaffolder actions available</a> to use as well as a relatively straightforward path to <a href="https://roadie.io/docs/scaffolder/self-hosted-scaffolder-actions/">writing custom ones</a> or combining existing generic steps like <a href="https://roadie.io/docs/scaffolder/call-external-api/">calling an HTTP API</a> and <a href="https://roadie.io/docs/scaffolder/scaffolder-actions-directory/#roadiehqutilsjsonata">parsing the response</a> to achieve almost any workflow.</p><p>For instance, you can write a scaffolder template that sets up a new team with AWS accounts and users using Terraform and <a href="https://roadie.io/docs/scaffolder/scaffolder-actions-directory/#githubactionsdispatch">GitHub Actions</a>, stub Confluence docs with their own set of templated internal team documentation such as engineer onboarding checklists, create team Slack channels and document some initial team Objectives and Key Results like getting set up and releasing the first bit of production software.</p><p>Templates can save days of engineering and management time as well as ensuring standardized approaches that embed governance tooling and security standards in every team by default.</p><h2>The Discoverability Problem</h2><p>There are other tools in this area that allow templating repositories, such as <a href="https://yeoman.io/">Yeoman</a>. Combined with an internally available CI/CD workflow that can be bookmarked, golden path templating can be achieved for code repositories without the Backstage Scaffolder.</p><p>However, discoverability is a key factor in how much these kinds of golden paths are actually used. Discoverability in the CI/CD tooling space is generally very poor as they are often associated with individual repositories or lack search and accessible metadata such as a clear title, description and tags.</p><p>The Backstage Scaffolder places discoverability at its core, listing templates in a searchable and filterable page with metadata displayed on cards.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1C3AvZi1mgwm2i7K0Ad1yE/5f23669d011970d87245d903e1c00bfe/sb-image2.png" alt="sb-image2"></p><p>If we compare this to GitHub Actions, a popular cloud-based CI/CD workflow tool, we can see the difference clearly.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1Q7WoUFvMINntd9DEkWr1Z/eb6f37389addc1a8ffceed2668ed7d67/sb-image3.png" alt="sb-image3"></p><p>Without easily accessible and discoverable templates that are front of mind, it can be a challenge to get engineers and managers using templates, even if they are there.</p><h2>Combining Sub-Orchestration Workflows</h2><p>The Backstage Scaffolder can call any other specialized orchestration tool that exposes an HTTP API, chaining together orchestration workflows into larger automation processes with GitHub Actions, Terraform Cloud, Jenkins etc.</p><p>Other orchestration tooling can of course do similar things with options including <a href="https://docs.chef.io/360/1.0/courier/">Chef Courier</a>, <a href="https://www.jenkins.io/">Jenkins</a>, <a href="https://saltproject.io/">Salt</a> or <a href="https://www.redhat.com/en/technologies/management/ansible/features">Ansible Automation Platform</a>. However these tools are often gated and accessible only to DevOps engineers or infrastructure engineers partly due to pricing models, thereby restricting their impact on day-to-day engineering practices. By contrast Backstage is intended  for all members of an engineering organization on a daily basis. Having a powerful automation tool like the Scaffolder front and center in a daily tool makes driving adoption of golden path templates and time-saving workflows much easier.</p><h2>Limitations</h2><p>While the Backstage Scaffolder can be immensely powerful as an orchestration tool, it does not compete with dedicated enterprise-grade orchestration tools in certain areas such as the ability to create scheduled workflows, perform complex logic branching in an easy-to-visualize way or run shell commands. Additionally, it must be manually set up using Typescript code in your Backstage instance with integrations added individually with code changes. If you need RBAC you will need to implement and configure Backstage RBAC with code again, which can be time consuming.</p><p>Alternatively <a href="https://roadie.io/">managed Backstage solutions like Roadie</a> can provide a ready-to-use Scaffolder with RBAC integrated out of the box and secure runtime features. This is similar to tools like Ansible, which has been packaged with additional UI layers in RedHat’s <a href="https://www.redhat.com/en/technologies/management/ansible/features">Ansible Automation Platform</a>. Roadie’s Scaffolder comes with additional features such as easy configuration of template groups, <a href="https://roadie.io/docs/custom-plugins/connectivity/proxy/">self-serve proxy creation</a> for use with the <a href="https://roadie.io/docs/scaffolder/scaffolder-actions-directory/#httpbackstagerequest">HTTP Request action</a> and <a href="https://roadie.io/docs/scaffolder/certified-templates/">certified templates</a> that help users know which templates are stable and ready to use.</p><hr><p>Originally published on <a href="https://thenewstack.io/the-backstage-scaffolder-a-powerful-new-orchestration-tool/">The New Stack</a>.</p><p><em>Image from Jevanto Productions on Shutterstock.</em></p>
]]></content:encoded></item><item><title><![CDATA[Wrap up: BackstageCon & KubeCon North America 2024]]></title><link>https://roadie.io/blog/wrap-up-backstagecon-and-kubecon-north-america-2024/</link><guid isPermaLink="false">https://roadie.io/blog/wrap-up-backstagecon-and-kubecon-north-america-2024/</guid><pubDate>Wed, 20 Nov 2024 12:00:00 GMT</pubDate><description><![CDATA[BackstageCon & KubeCon North America 2024 was a riot, but what happened? What did we learn? And which sessions should you catchup on via the CNCF YouTube channel?]]></description><content:encoded><![CDATA[<h1>Wrapping up BackstageCon &#x26; KubeCon North America 2024</h1><p>Last week the platform world decamped to Salt Lake City, Utah to attend the annual BackstageCon &#x26; KubeCon North America conference.</p><p>Roadie joined in, of course, and contributed to a lively debate about the future of both spaces. We didn't present at either conference this time around (more from us in this space in April next year when BackstageCon &#x26; KubeCon return to the EU), but some great names did.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6aZeHFOH1wO0ZKcApzGVc3/e7895bca237daccc7b44bfbdc84043e5/IMG_8413.jpg" alt="KubeCon NA"></p><h1>BackstageCon</h1><p>Organised by the fine folks at Amazon (<a href="https://www.linkedin.com/in/blandes">Byran Landes</a>) and RedHat (<a href="https://www.linkedin.com/in/balajisiva/">Balaji Sivasubramanian</a>), this years North American edition of BackstageCon was everything we've come to expect and enjoy from a BackstageCon. We heard from Microsoft, Booking.com, and Roku, as well AWS, Redhat and Spotify.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5n7dtBtMIbbX4xU24BiBWu/9e66f652aafa7237bf46ae48c4aa6dd3/54135070492_e6e86f7f2c_o.jpg" alt="BackstageCon Area"></p><p><strong>Key take-aways:</strong></p><ul><li>Adopting a product mindset for your Platform continues to be the best way to drive adoption</li><li>A lot of first-time adopters were present, learning about the journey to adoption and the best path to get to a complete Backstage instance</li><li>There was a big focus on defence and security and how Backstage can help organise information for companies in those sectors</li></ul><p><strong>Our pick of the sessions:</strong></p><ul><li><a href="https://www.youtube.com/watch?v=FACtDHQvNf0&#x26;list=PLj6h78yzYM2O3YsKnBocZZPv0M6f-wLu5&#x26;index=13&#x26;ab_channel=CNCF%5BCloudNativeComputingFoundation%5D">Himanshu from Harness</a> gave a great overview of what it takes to drive adoption of Backstage, based on his experience both at Spotify and Humanitec. Key for us was the practical tips in the final third of the talk.</li></ul><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3mzpAU4nYkJkTwJcNZiSKS/36d16e9181c9a77d630c9eb9f48e5856/Screenshot_2024-11-20_at_20.30.24.png" alt="Himanshu from Harness"></p><p>All the videos for the BackstageCon sessions can be found <a href="https://www.youtube.com/playlist?list=PLj6h78yzYM2O3YsKnBocZZPv0M6f-wLu5">here</a>.</p><p>The full schedule of the event can be found <a href="https://colocatedeventsna2024.sched.com/overview/type/BackstageCon?iframe=no">here</a>.</p><h1>KubeCon</h1><p>KubeCon is inherently a much broader, less Backstage-focused affair, but Backstage still made a lot of appearances in other talks. As an enabler of broader platform initiatives or as the necessary precursor to platform functionality to development teams, Backstage is still the #1 go-to product for Platform teams looking to make an impact.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2lAcdXXxK3zCcxsiLzdJZ5/0f5db7e22f491774d5f7a95872bece5b/54142752729_87d38f476e_k.jpg" alt="KubeCon "></p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/FAQK6uuEwcEMQhApQkUXz/a4d41887fb0dcea0d586caecc173825c/54142427976_4d436f6229_k.jpg" alt="Wiz at KubeCon"></p><p><strong>Key take-aways:</strong></p><ul><li>CNCF officially launched the <a href="https://training.linuxfoundation.org/blog/just-launched-certified-backstage-associate-cba/">Certified Backstage Associate</a></li><li>Security rose in importance, with a full day of Keynotes addresses dedicated to it, and Wiz and Akamai particularly prominent sponsors</li><li>Patrik and Ben from the core maintainer team ran their update session on upcoming changes, focusing on the new declarative frontend 💪</li></ul><p><strong>Our pick of the sessions:</strong></p><ul><li><a href="https://www.linkedin.com/in/bensonphillipsiv">Benson Phillips</a> and <a href="https://www.linkedin.com/in/robheckel">Rob Heckel</a> from <a href="https://kccncna2024.sched.com/event/1i7oE/shifting-gears-leveraging-cncf-tools-to-streamline-operations-at-toyota-connected-benson-phillips-rob-heckel-toyota-connected">Toyota Connected</a> ran through their implementation of a platform composed solely of CNCF projects. They've focused heavily on ArgoCD and Backstage when building out their platform and gave a great overview of what they've been up to.</li></ul><p>All the KubeCon sessions will be uploaded soon and can be found <a href="https://www.youtube.com/c/cloudnativefdn">here</a>.</p><p>The full schedule of the event can be found <a href="https://kccncna2024.sched.com/">here</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Understanding the Backstage System Model]]></title><link>https://roadie.io/blog/understanding-the-backstage-system-model/</link><guid isPermaLink="false">https://roadie.io/blog/understanding-the-backstage-system-model/</guid><pubDate>Wed, 06 Nov 2024 12:00:00 GMT</pubDate><description><![CDATA[How does the Backstage System Model actually work? How can you use it to create a structured, flexible Catalog? How do you make sure you can both model how your organisation thinks and speaks, without locking yourself into the same model forever?]]></description><content:encoded><![CDATA[<p>The <a href="/backstage-spotify/">Backstage Internal Developer Portal</a> is, at its heart, a software catalog. As a catalog, Backstage relies on a structured System Model to represent and organize individual items, in order to make it easier to find the information development teams need. When you are setting up or running Backstage you’ll often want to tweak this Model (or make wholesale changes to it) to make it fit your organization.</p><p>In this blog we’ll explore the Backstage System Model and how you can extend it if you need to.</p><h1>Why do we need a system model?</h1><p>Catalogs require at least some structure. If you don’t have a common taxonomy for how to describe each element inside it then it lacks coherence, like a library with no labels on the shelves (or worse yet, contradictory labels). You could pour in all of your various repositories, components, gateways, resources and clusters into a catalog and it will closely resemble a giant blob of nothing.</p><p>In a Catalog, information needs to be sorted to have value. Decisions need to be made about what gets included and what does not, and you need an idea of what goes where - how things are categorized now and how they should be categorized in the future.</p><h1>The Basics</h1><p>The Backstage data model is made up of nodes ("entities") and edges ("relationships").</p><h2>Entities</h2><p>The Backstage data model is built around "entities." Entities are the core units within the Backstage catalog that represent various elements of your software ecosystem.</p><p>Each entity is defined via metadata (name, description, labels etc), spec (custom properties), and relations (connections with other entities). In OSS Backstage this information is often piped into Backstage via YAML files that adhere to Backstage's entity specification. Sometimes entities can also come from "Providers" which provide the entity from some source of truth (i.e. <a href="/docs/integrations/okta/">Users and Group entities from Okta</a>)</p><p>This model allows teams to maintain a structured, discoverable Catalog by distributing the load across every team who owns part of the Catalog.</p><p>Friction Warning:</p><ul><li>Backstage advocates for distributed ownership (i.e. each team owns the information in the Catalog that represents the software that it owns) so it can be tricky to update your model and change it over time. For example, if you wanted to replace a Kind all of the various teams would need to update their catalog files. To get around this, a lot of self-hosted Backstage users have built API-based methods for mass updates.</li></ul><h3>Kinds</h3><p>Entities are grouped into Kinds. Kinds are like a aisle at a supermarket - everything within it is broadly cohesive and organised around similar principles.</p><p>Kinds have a schema and they require a processor to correctly ingest them into the Catalog.</p><p>You get some core Kinds out-of-the-box with Backstage, like:</p><ul><li><strong>Domain</strong>: Defines larger business domains, organizing systems and components</li><li><strong>System</strong>: Higher-level abstraction representing a collection of components working together</li><li><strong>Component</strong>: Represents deployable units like services, websites, or libraries)</li></ul><p>Friction Warning:</p><ul><li>In OSS Backstage you can extend existing Kinds or write new Kinds to include whatever you’d like, but you need to build or modify a processor each time. That means writing code.</li><li>You will also need consider the long-term impact of a new Kind. You’ll likely be supporting that Kind for a long time unless you want to deprecate it and force entities that use that Kind to fail.</li></ul><h3>Types</h3><p>Kinds have Types, allowing grouping within these larger buckets.</p><p>Types can be defined on-the-fly. Nothing special is needed to make Types work, any team can create a new Type just by articulating it in their catalog-info.yaml file.</p><p>Friction Warning:</p><ul><li>This can lead to a Cambrian explosion of Types, so you may want to introduce some constraint there. Validation of Types is common.</li><li>Annoying errors can creep into Types (i.e.<code>Website</code> and <code>Wesbite</code>) unless you’re validating them in some way.</li></ul><h2>Relationships</h2><p>Relationships exist between entities to provide the connective tissue of the Backstage Catalog.</p><p>Each Kind has a preset series of permissible relationships that are built when the processor runs for that Kind.</p><p>For example, a simple Component might have some API relationships and dependencies defined:</p><pre><code class="language-apiVersion:">kind: Component
metadata:
  name: artist-web
  description: The place to be, for great artists
spec:
  type: website
  lifecycle: production
  owner: artist-relations-team
  system: artist-engagement-portal
  dependsOn:
    - resource:default/artists-db
  dependencyOf:
    - component:default/artist-web-lookup
  providesApis:
    - artist-api
</code></pre><h2>The Core System model</h2><p>Out of the box, Backstage comes with a lot of built-in Kinds with attendant relationships so you can get started as quickly as possible.</p><p>Some Kinds, like software templates and Locations are effectively atomic and compartmentalised away from other Kinds. The remainder are tied to how the Catalog is built and used to represented entities.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2c89CI4rKHDNmWvCM2IHhM/37022568f4168c497956da8d9615511a/software-model-entities.drawio-3ce7f43dd236c3934209fde8f21a4d9e.svg" alt="Backstage System Model"></p><p>They in effect represent "The Spotify Way" to model software. That’s not for everyone and won’t necessarily work perfectly for you.</p><p>If that’s the case, you have two options:</p><ul><li><code>Force it a little</code>: aka shoehorn your existing concepts into Spotify’s version. This works in a lot of cases, but is necessarily a compromise.</li><li><code>Re-model</code>: if that doesn’t do the trick, you need to get to work remodeling Backstage entity Kinds and types to fit your needs. Some can be done without code changes, but some need you to get your hands dirty.</li></ul><h1>Going beyond the basics and extending the Backstage System Model</h1><p>The Backstage framework is designed to be highly extensible, allowing you to modify or add new Kinds, Types, and Relationships based on the requirements of your organisation.</p><p>That said, there are a few things you need to think about when extending the model:</p><h3>1. No code extensibility</h3><p>Backstage has flexibility baked on for a large degree of software definition. Using Types or built-in relationships handles for most situations when you want to model your software inside the Backstage System model. 80-90% of the time this will do the trick, but will often come with some degree of compromise. For example, let’s say you want to articulate <code>Value Streams</code> as a top level concept, but have to make do with <code>Value Streams</code> being a Type associated to the <code>Domain</code> Kind. It’s imperfect, but it’ll do in a pinch.</p><p>At Roadie, we evaluate and extend the System Model for our customers regularly. That works a lot of the time, but sometimes customers have niche requests that we don’t feel would benefit all our users. This is non-optimal. We want to customers the freedom to extend the model without talking to us or writing code. To achieve that we’re building a fully self-serve, no-code UI for dynamically generating Kinds and defining a system model that can be as arbitrary as you’d like: if you want a Kind called <code>purple-monkey-dishwasher</code> you should be able to have one.</p><h3>2. Extending the framework using code</h3><p>Backstage is built around <a href="https://backstage.io/docs/features/software-catalog/external-integrations/">providers</a> and <a href="https://backstage.io/docs/features/software-catalog/external-integrations/#custom-processors">processors</a>. Providers pull data in, processors manipulate and validate that data to build the Catalog entities and relationships.</p><p>You can create wholly new providers to handle the ingestion of data from sources not currently handled by Backstage. The Backstage community has built  a lot of Providers over the years, but they may require tweaks to fit your specific use-case. For example, Roadie has rebuilt the GitHub provider to use webhook-based ingestion because the size of Catalog we habitually deal with break the GitHub rate limits</p><p>You can also modify processors for existing Kinds. For example to extend the list of allowed relationships between Kinds you need to tweak those processors.</p><p>You can also create wholly new processors to define new business logic or processes for manipulating and validating that data when you create a new Kind. Going back to the Value Stream example, now you can differentiate <code>Value Stream</code> from <code>Domain</code> and allow the Kinds to deviate usefully from one another. Maybe they each need different allowed relationships, or they’ll build their entities differently: the choice is yours.</p><h3>3. Data</h3><p>In the out-of-the-box OSS Backstage model, the data for the system model comes from yaml files. This follows the GitOps model, where changes are made in git-tracked repositories and then ingested by other systems (in this case, the Backstage Catalog).</p><p>That means if you want to change or update your model you need to change all those files. That in turn means that opening PRs against every repos which contain a relevant yaml file. This is often a large undertaking, adding significant friction. That’s why most high-volume users of OSS Backstage have built API- and database-based mechanisms to do mass updates. Roadie has two: the Decorator UI and APIs to do a variety of different update patterns (idempotent updates to sync data from a source of truth into Backstage, or just pushing in whole entities via the Roadie Entities API).</p><h1>Levers to pull when extending the model</h1><p>Below are some common methods for extending the Backstage data model:</p><h3>1. <strong>Custom Annotations</strong></h3><p>Difficulty: Trivial</p><ul><li><strong>Why</strong>: If you need to add metadata specific to your organization (like security labels, compliance levels, etc.), you can define custom annotations.</li><li><strong>How</strong>: Annotations are added as key-value pairs within the <code>metadata.annotations</code> field in your YAML definitions. These annotations can be used to enhance search functionality, create custom views, or provide additional context.</li><li><strong>Example</strong>: Adding <code>security-level: high</code> as an annotation for services that handle sensitive data allows you to quickly filter and prioritize compliance and monitoring for these services.</li></ul><p><strong>References</strong>:</p><ul><li><p><a href="https://backstage.io/docs/features/software-catalog/well-known-annotations/#annotations">Backstage Annotations Documentation</a>: Documentation on creating custom annotations to extend metadata.</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: fraud-detection-model
  description: "AI model for fraud detection"
  annotations:
    security-level: high

...
</code></pre></li></ul><h3>2. <strong>Custom Types</strong></h3><p>Difficulty: Easy</p><ul><li><strong>Why</strong>: In cases where the existing entity types (Component, API, etc.) do not fit your specific resources, you can create custom entities.</li><li><strong>How</strong>: Define a new entity type in any valid catalog-info.yaml. This simple involves adding a new type to the <code>spec.type</code> in the YAML file.</li><li><strong>Example</strong>: Suppose you have machine learning models as a core resource in your project. You could define a new <code>model</code> type.</li></ul><p><strong>References</strong>:</p><ul><li><p><a href="/blog/kinds-and-types-in-backstage/">Roadie Kinds and Types documentation</a> talks a lot about how to use Types without introducing problems</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: fraud-detection-model
  description: "Machine learning model for fraud detection"
  annotations:
    security-level: high
spec:
	type: model
  version: "1.0"
  trainingDataset: "transactions-v1"
  accuracy: "95%"
</code></pre></li></ul><h3>3. <strong>Modifying Existing Kinds to Add Custom Relations</strong></h3><p>Difficulty: Normal</p><ul><li><strong>Why</strong>: Relationships between entities help you capture dependencies, ownership, and team structures within your catalog. If your use case involves additional relationship types, custom relations can improve representation.</li><li><strong>How</strong>: Modify the relevant processor for a given Kind to enable new types of relationships to be built for that kind. Then define relations within the <code>spec.relations</code> section of the YAML file.</li><li><strong>Example</strong>: Suppose you want to track models associated with data sources. You could create a custom relation <code>usesDataFrom</code>, linking ML models to the Resource entities that document data sources they rely on.</li></ul><p><strong>References</strong>:</p><ul><li><a href="/blog/kinds-and-types-in-backstage/">Roadie Kinds and Types Documentation</a>: Provides practical examples of defining and extending Kinds.</li></ul><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: fraud-detection-model
  description: "Machine learning model for fraud detection"
  annotations:
    security-level: high
spec:
	type: model
  version: "1.0"
  trainingDataset: "transactions-v1"
  accuracy: "95%"
  relations:
  - type: usesDataFrom
    targetRef: resource:exampleorg/some-data-source
    target:
      kind: resource
      namespace: exampleorg
      name: some-data-source
</code></pre><h3>4. <strong>Creating Entirely New Custom Kinds</strong></h3><p>Difficulty: Normal / Hard</p><ul><li><strong>Why</strong>: When the System Model cannot adequately encapsulate how you build software or the relationships between various parts of your organisation, you will need to build a custom Kind.</li><li><strong>How</strong>: Write a new processor for that Kind and define a custom schema for that Kind.  This ensures all entities adhere to required fields, valid types, and constraints, providing an additional layer of validation. Then add new catalog-info.yaml files for the new Kind to relevant resources, or modify existing catalog-info.yaml files.</li><li><strong>Example</strong>: For the <code>MLModel</code> entity, you could create a new Kind to represent that in your System model. Using that new Kind you could then model relationships  as <code>version</code>, <code>trainingDate</code>, and <code>accuracy</code>.</li></ul><p><strong>References</strong>:</p><ul><li><a href="https://backstage.io/docs/features/software-catalog/descriptor-format/">Backstage JSON Schema Documentation</a>: Explains how to define and enforce custom schemas.</li></ul><h1>Conclusion</h1><p>Backstage is an extremely flexible framework for modelling software and once the building blocks and options are understood it’s simple enough to fully customise the model.</p><h3>Useful links:</h3><ul><li><a href="https://backstage.io/docs/features/software-catalog/system-model">Backstage System Model</a>: official docs and a good starter diagram for how entities in the Catalog interact.</li><li><a href="https://backstage.io/docs/features/software-catalog/life-of-an-entity">Backstage Entities</a>: official docs on the lifecycle of entities</li><li><a href="https://backstage.io/docs/features/software-catalog/well-known-relations">Backstage Relationships</a>: official docs on how relationships work inside Backstage</li><li><a href="/blog/modelling-software-backstage/">Modelling software in Backstage</a>: Roadie blog from 2021 about how to model software in Backstage using the core system model. This still represents a great primer on the out-of-the-box system model and how you could use it.</li></ul>
]]></content:encoded></item><item><title><![CDATA[Why Hybrid is Best for IDPs]]></title><link>https://roadie.io/blog/why-hybrid-is-best-for-idps/</link><guid isPermaLink="false">https://roadie.io/blog/why-hybrid-is-best-for-idps/</guid><pubDate>Tue, 05 Nov 2024 23:00:00 GMT</pubDate><description><![CDATA[Discover why the hybrid model for Internal Developer Portals (IDPs) offers the best of both worlds. We explore the pros and cons of open-source, proprietary, and hybrid IDPs, ultimately revealing how hybrid solutions like Roadie combine the extensibility and community support of open-source with the ease of use and quick deployment of proprietary options. ]]></description><content:encoded><![CDATA[<p>The Internal Developer Portal (IDP) market is only 5 or 6 years old, but we’ve seen a huge amount of growth in that time, and many different options emerging.</p><p>According to Gartner, IDPs are reported as the most frequently piloted technology in their 2022 - 2024 Technology Adoption Roadmap Survey, and 75% of organizations with platform engineering teams will provide IDPs by 2026.</p><p>With this growth, 3 main categories of Internal Developer Portal products have emerged:</p><ul><li><strong>Open-source</strong> IDPs are exactly what you think. The code is open-source and you host and maintain it yourself.</li><li><strong>Proprietary</strong> IDPs are typically software as a service startups funded by venture capital. You purchase access to them just like you purchase access to PagerDuty or GitHub Enterprise.</li><li><strong>Hybrid</strong> IDPs are a combination of open-source and proprietary. They’re built on an open-source foundation, but come with commercial support and proprietary features.</li></ul><p>In this article, I’m going to dive into each category one by one, and learn the pros and cons of each.</p><p>I’ll also explain why I believe the Hybrid model brings the best of both worlds. The community, extensibility and lack of vendor lock-on of the open-source model, alongside the low cost, ease of use, and fast deployment of the proprietary model.</p><h2>Open-source</h2><p>While technically not an IDP, and technically not the only open-source IDP, you can’t talk about open-source IDPs without the conversation focussing on Backstage.</p><p><a href="https://roadie.io/backstage-spotify/">Backstage</a> was open-sourced in early 2020 by Spotify. It was a rewrite of their internal IDP that they’d been using for years. It’s not an IDP because it’s actually a set of TypeScript libraries that developers can combine together to build an IDP, but it has still captured a huge amount of attention in the IDP space.</p><p>When Backstage launched, it was often compared to <a href="https://github.com/hygieia/hygieia">Hygieia</a> from Capital One, which predates it by almost 4 years. Even after Backstage appeared, a third contender emerged with a strong start. Ride-hailing company Lyft threw their hat into the ring when they released <a href="https://github.com/lyft/clutch">Clutch</a> a few months after Backstage was open-sourced. Clutch describes itself as "an extensible platform for infrastructure management”.</p><p>in 2024, Clutch popularity seems to have stalled, Hygieia is deprecated, and the reality is that Backstage dominates the open-source IDP market. Although not a perfect metric by any means, GitHub stars show the difference in popularity of each tool.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/3ztAVoKhmjcidfD90xcIYq/20debb7c682bb4b7384a105e0f2645e1/idp1.png" alt="idp1"></p><h3>Benefits of the open-source model for IDPs</h3><p>In many ways the open-source model is perfect for building an Internal Developer Portal. IDPs are a window into all of the tools that developers use. They’re the venerable “single pane of glass”.</p><p>Developers use a lot of tools though, and the developer tool landscape is thousands of tools strong and growing. Even the <a href="https://landscape.cncf.io/">Cloud Native Landscape</a> alone has more than 1,000 tools in it’s ecosystem.</p><p>How can a single IDP integrate with and plug-in to all these tools? Well, by creating an open-source community who will help with the effort.</p><p><strong>Huge community</strong></p><p>This is exactly what Backstage has accomplished. The Backstage community is huge and engaged.</p><ul><li>Backstage was the <a href="https://www.cncf.io/reports/cncf-annual-report-2023/">top end-user contributed CNCF project of 2023</a>, with more than 4,000 contributions from end user companies.</li></ul><h3>Benefits of the open-source model for IDPs</h3><p>In many ways the open-source model is perfect for building an Internal Developer Portal. IDPs are a window into all of the tools that developers use. They’re the venerable “single pane of glass”.</p><p>Developers use a lot of tools though, and the developer tool landscape is thousands of tools strong and growing. Even the <a href="https://landscape.cncf.io/">Cloud Native Landscape</a> alone has more than 1,000 tools in it’s ecosystem.</p><p>How can a single IDP integrate with and plug-in to all these tools? Well, by creating an open-source community who will help with the effort.</p><p><strong>Huge community</strong></p><p>This is exactly what Backstage has accomplished. The Backstage community is huge and engaged.</p><ul><li>Backstage was the <a href="https://www.cncf.io/reports/cncf-annual-report-2023/">top end-user contributed CNCF project of 2023</a>, with more than 4,000 contributions from end user companies.</li><li>There are 17,000 people in the Backstage Discord channel, with dozens and dozens of questions being asked or answered every day.</li><li>Backstage has a <a href="https://backstage.io/community">deep partner ecosystem</a>. Whether you’re a Fortune 50 enterprise or a Series C startup, there’s a partner who can help you deploy and use Backstage.</li><li>Backstage has official support from leading DevTools companies. Companies like Snyk, PagerDuty, Dynatrace and even AWS have released officially supported plugins for Backstage.</li></ul><p>This vibrant community means that you can always get help with Backstage. The other benefit is the extensibility and the number of integrations available.</p><p><strong>Huge number of integrations</strong></p><p>The result of this large community is that Backstage is installed and adopted at thousands of organizations. These organizations use a diverse set of engineering tools, and many adopters have built and open-sourced Backstage plugins for these tools.</p><p>The <a href="https://backstage.io/plugins">Backstage plugins directory</a> has more than 200 plugins available, so you can be fairly confident that there are plugins available for the tools you use.</p><p><strong>Highly extensible</strong></p><p>Underlying all these plugins is a framework which is designed from the ground up to be maximally extensible. <a href="https://backstage.io/docs/plugins/">The docs say</a>:</p><blockquote><p>Backstage is a single-page application composed of a set of plugins.</p></blockquote><blockquote><p>Our goal for the plugin ecosystem is that the definition of a plugin is flexible enough to allow you to expose pretty much any kind of infrastructure or software development tool as a plugin in Backstage.</p></blockquote><p>Practically, this means you can customize and extend your Backstage install to your specific requirements. If you need to build your catalog in Gerrit rather than GitHub, then you can swap out the GitHub integrations and replace them. The same concept applies to most parts of Backstage.</p><p>By coupling this technical extensibility with a permissive Apache 2.0 license and a <a href="https://github.com/backstage/community/blob/main/GOVERNANCE.md">mature governance model</a>, adopters can make wholesale changes to Backstage if they like.</p><p>The result is that many organizations have customized Backstage heavily. Take a look at this screenshot of Sunrise, <a href="https://platformengineering.org/talks-library/sunrise-zalandos-internal-developer-platform">the Backstage-based IDP that Zalando created</a>, as an example.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/maWMqKoK0KLgvYfvTZn8F/cab20948d658be53033be67abe3f577f/idp2.png" alt="idp2"></p><p>This completely custom UI shows the progress of changes as they flow through CICD on their way to production. Very little of what you see here comes out of the box with Backstage. Instead, it’s wired together from the basic building blocks of Backstage plugins.</p><p><strong>Lack of vendor lock-in</strong></p><p>Because Backstage is open-sourced and backed by large end user companies and the CNCF, adopters can feel comfortable that it will be around for a long time, and will continue to meet their needs into the future.</p><p>The IDP market is still very early. The majority of the players have been founded in the past 5 years. We will likely see some consolidation of this market as players get acquired and wound down. Any customers of these companies will be forced to migrate to a different solution.</p><p>We’ve seen this play out in the CICD market already. There was a time when Travis CI was a dominant provider of continuous integration. In 2019, Travis CI was <a href="https://news.ycombinator.com/item?id=18978251">acquired by private-equity firm Idera</a>. One month later, <a href="https://news.ycombinator.com/item?id=19218036">layoffs began</a>. In early 2020, Travis CI <a href="https://news.ycombinator.com/item?id=25338983">stopped providing free support for open-source products</a>, and in late 2020 <a href="https://news.ycombinator.com/item?id=24964601">the pricing changes began</a>.</p><p>Companies can avoid this fate in the IDP market by building on an open-source project instead of a commercial vendor with it’s own data model and lock-in.</p><h3>Drawbacks of the open-source model of IDPs</h3><p><strong>It’s expensive to build</strong></p><p>Backstage is a larger undertaking than most people realize before they get into it. To quote Gartner in their <a href="https://www.gartner.com/en/documents/4010078">Innovation Insight for Internal Developer Portals report</a>:</p><blockquote><p>Gartner inquiries indicate that Backstage implementations may require substantial effort in standing up the service.</p></blockquote><p>In order to replicate something approaching the Sunrise portal mentioned above, companies should expect to allocate 2 to 5 engineers to the project for a number of years.</p><p>Don’t believe me? Zalando said it themselves in <a href="https://youtu.be/zowEfZoZycs?si=OYdNtCUZ9jee1ydu&#x26;t=1515">this presentation they did</a> at the Autodesk Developer Productivity Summit.</p><blockquote><p>All that I’ve showed you was done through the contributions of many teams, but the core team was three engineers and a engineering manager.</p></blockquote><p>That team has been working on Sunrise since 2020.</p><blockquote><p>And that’s where my team is focussed on, actually like, the discovery part of our application landscape, and where we actually decided to, a few years ago in the beginning of the Backstage era in 2020 when Backstage was released, to actually onboard ourselves into Backstage.</p></blockquote><p>Let’s do the math. Assume an engineer or engineering manager costs $250k per year, therefore the team of 4 costs $1 million per year. 4 years have passed since 2020, so they’ve spent $4 million in total on their core Backstage team. Add in the “contributions of many teams” and we could easily be talking about a $6 million outlay.</p><p>Now you’re probably saying “yeah but we’re not as big as Zalando”, but trust me, even small companies will spend a half a million dollars pretty easily.</p><p><strong>It requires uncommon skills</strong></p><p>Backstage is typically owned and managed by platform teams or DevOps teams. These teams are probably skilled in YAML and kubernetes-native languages like Go.</p><p>Backstage is written in TypeScript. The backend runs on NodeJS, and the frontend is written in React.</p><p>Organizations who are considering self-hosted Backstage should consider whether or not they have the skills to develop in these languages. As mentioned above, Backstage is a set of libraries that users combine together to make a developer portal. That means you need to write TypeScript to make use of it. You need to understand frontend development, HTML and CSS to customize it. It may not make sene make sense to build up these skills in a Platform and Infrastructure organization because they will not be easily transferrable to other projects.</p><p>The second set of uncommon skills that a Backstage deployment needs are project management and developer relations. Once it’s live, you need to evangelize it. You have to go out to the rest of the organization and teach them about it and how to use it. Are your platform developers really going to want to do this work?</p><p><strong>It’s got a long time to value</strong></p><p>The typical Backstage rollout has a few phases. It starts with an initial deployment, which typically takes  a few months to get a basic Backstage instance into production. Some parts of Backstage will work straight out of the box at this point. You’ll be able to use the scaffolder to create new software projects from predefined templates, for example.</p><p>Other parts of Backstage require an adoption phase. The software catalog is usually one of them. In order to have a complete software catalog, teams must first put their software into it.</p><p>In 2023, at BackstageCon North America, I <a href="https://youtu.be/Ar9Tk1t6toQ?si=LfC7CgV77FowTUYr">spoke to the experiences of some Backstage adopters</a> who had attempted this feat. I first explained that 60% of the Backstage adopters I had interviewed were attempting to populate their catalog by writing YAML files that contain metadata about their services (these are called <code>catalog-info.yaml</code> files).</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2djFTALB80KSSETrFon7ma/14cf1ced01166a13f0f17323626ab4ed/idp3.png" alt="idp3"></p><p>I then shared some quotes from adopters who had followed this path</p><blockquote><p>The catalog never really worked for us. We struggled with adoption and rate limiting. Half the info in the catalog was wrong when they first pushed adoption.
<strong>2,000 engineers, 2 years experience</strong></p></blockquote><blockquote><p>The thing we struggled with the most was getting teams to create the YAML file. We spent 6 months actively encouraging teams to register their components with Backstage. After 1 year, only 30 or 40% of the active repos were there.
<strong>120 engineers, 1 year experience</strong></p></blockquote><p>Without a lot of product development and project management, this catalog population process takes a long time. There are many Backstage adopters who are at less than 50% catalog completeness after two years with the tool. Even Spotify admit that <a href="https://thenewstack.io/how-spotify-achieved-a-voluntary-99-internal-platform-adoption-rate/">Backstage adoption rate often stagnates at 10%</a>. You may have dreams of your software catalog becoming a single pane of glass for all of the software development in your organization, but it simply cannot fulfill its purpose if the software is not in there.</p><h2>Proprietary</h2><p>At the other end of the spectrum, there are a number of proprietary developer portals like Cortex and Port.</p><p>The concept of proprietary developer portals is largely the same as Backstage. They have similar features and target a similar user in the engineering organization. The main differences are in their extensibility. Users can’t edit the code of a proprietary developer portal, and there won’t be a large community who are producing open-source plugins.</p><h3>Benefits of the proprietary model for IDPs</h3><p>While a proprietary developer portal may never be as extensible as Backstage, they certainly do have some benefits.</p><p><strong>Quicker to get started</strong></p><p>Proprietary IDPs will mostly just work out of the box. You don’t have to plan sprints and do research and write code to get some value. They come with admin interfaces where you can set some properties, connect up your tools, and starting using the product straight away. This is likely a better experience for platform engineers who don’t have the time or expertise to deploy and customize Backstage from scratch.</p><p>When you do need to configure something, the documentation provided by a proprietary developer portal will likely be more comprehensive and up to date. They’re simply more motivated and able to produce high quality documentation than an open-source community. This makes setup faster and less confusing, and helps to prevent misconfigurations.</p><p><strong>User-friendly interface</strong></p><p>Proprietary IDPs bring their own user interface (UI). They’re developed from the ground up with design systems in place to help to ensure consistency across integrations and pages. They also have designers and UI engineers working to implement design practices in a coherent way.</p><p>While the Backstage core team have done a lot to put the tool in a good place, the reality is that Backstage’s default interface leaves a lot to be desired.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2c5i5wKrKGkM4u3tJGTccS/ad6033f7fd1eafc4cc2414aa5e4695c3/idp4.png" alt="idp4"></p><p>This reality is brought on by the combination of overly familiar Material UI libraries and community contributed code that can be hit-or-miss in terms of how it looks and feels.</p><p>Backstages plugins can be contributed by anyone, so there’s no guarantee that they will look good, or look consistent. It’s very likely that different plugins will work completely differently even though they are installed in the same Backstage instance.</p><p><strong>More guidance</strong></p><p>Proprietary IDPs are more opinionated in how they should be used. When you work with a vendor you should get access to solutions engineering and customer success resources who can help to share lessons from other customers.</p><p>Working with the open source can sometimes feel like being handed a box of car parts and asked to build a car. Sure you can figure it out eventually, but you’re going to make mistakes along the way, and it probably won’t be the most efficient path.</p><p>Having a partner to work alongside and share lessons is worth a lot, and helps to cut the time to value for your IDP project.</p><h3>Drawbacks of the proprietary model for IDPs</h3><p><strong>Vendor lock-in</strong></p><p>Proprietary IDPs have their own in-house APIs, their own data model, and their own way of doing things. Many of the options on the market are early stage startups. There’s a good chance some of them will disappear over the next few years. If you’re a customer, your IDP will go with them.</p><p>When they do survive, you’ll likely be forced to choose between paying up for price hikes, or completing an expensive migration project to a different solution.</p><p><strong>Limited extensibility</strong></p><p>It’s not possible to change the code of a proprietary IDP. If there’s something about the way it works that doesn’t suit the needs of your company, your best hope is to contact support and hope that they have the desire and the roadmap space to change it.</p><p>Most development organizations have some homegrown tools that have sprung up over the years internally. These tools only exist inside the company they were created, and no IDP vendor will have an out-of-the box integration for them. For these tools, you need to create your own plugins if you wish them to be part of your single pane of glass IDP. Some proprietary IDPs do support custom plugin development, but you’ll need to do so using only the concepts and libraries that the vendor provides, and the plugins you create will not be transferrable to any other solution.</p><p>Proprietary IDPs have fewer integrations than the massive Backstage community. Even the most well funded proprietary vendor has 48 plugins listed on their integrations page, less than a quarter of the number that are available for Backstage. No matter how big this vendor grows their team, they won’t be able to compete with the 500+ companies who have contributed to Backstage.</p><p><strong>Limited speed of execution</strong></p><p>The tech landscape is constantly shifting and evolving. Before serverless we had Kubernetes, before Kubernetes we had virtualization, before virtualization we had bare metal. In between and around these major platform shifts we had and continue to have a massive number of different types of serverless, containerization and virtualization options.</p><p>A developer portal is supposed to wrap all of these shifts and technologies in order to give the best possible coverage over your internal engineering landscape. Proprietary IDPs may struggle to capture all of these technology shifts into the future, leading to gaps in the portal that platform teams put in front of their internal users.</p><h2>Hybrid</h2><p>Last, but not least, we have the Hybrid model. This model takes an open-source foundation with hundreds of plugins and integrations available, and makes it available in an out-of-the-box format which is easy to get started with.</p><p>This is what Roadie is. It’s the only standalone IDP based on Backstage, and it offers the best of both worlds - extensibility <em>and</em> speed of execution.</p><p><strong>Huge number of integrations</strong></p><p>The vast majority of plugins that have been created for Backstage can work on Roadie. <a href="https://roadie.io/docs/integrations/">Our integrations library</a> contains 70+ plugins and integrations today, and we’re adding more all the time, in line with customer requests. Once a new open-source plugin is created, it takes us about a day to make it available in Roadie.</p><p>The only reason we wouldn’t add a Backstage plugin are that it’s of such low quality that it doesn’t work or provide any value.</p><p>This means that Roadie customers can effectively choose from 200+ plugins and integrations that the Backstage community has created and use them on our platform. In fact, <a href="https://github.com/RoadieHQ/roadie-backstage-plugins">we’ve created some of the most popular plugins ourselves</a> and open-sourced them.</p><p><strong>Highly extensible</strong></p><p>Roadie strives to support as much customization of our IDP as possible. In addition to theming and branding, we support a custom data model, custom Backstage plugins, custom proxies, layouts, scaffolder actions and on and on. You name it, <a href="https://roadie.io/blog/the-power-of-customization-making-backstage-work-for-you-with-roadie/">we can probably let you customize it</a>.</p><p>This means that you can customize Roadie to meet the specifics of your organization, improving adoption and facilitating ease of use.</p><p><strong>No vendor lock-in</strong></p><p>Roadie is API compatible with Backstage. The same software metadata schema that works on Backstage will work on Roadie. Custom frontend plugins written for Backstage will also work on Roadie. We’re just supporting and extending the same APIs.</p><p>This has two benefits for customers.</p><ol><li>They can migrate off self-hosted Backstage onto Roadie very easily. Numerous organizations have already done this. Simply connect us to your source code management tool and we’ll automatically ingest any software metadata files you’ve created.</li><li>They can migrate back to self-hosted Backstage very easily. In fact, our Terms of Service dictate that we will give you your data back within 30 days of termination.</li></ol><p><strong>User friendly interface</strong></p><p>Roadie’s interface is vastly improved compared to the barebones Backstage experience. We’ve worked with our users to streamline how people use the product. This means better out of the box defaults and easier customization.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1TIBiSInyg1DhJjCj5TvOm/797e7071e5948feec6242bdce60e63ca/idp5.png" alt="idp5"></p><p><strong>Inexpensive</strong></p><p>In many cases, Roadie is 5 times cheaper than self-hosting Backstage.</p><p>People tend not to realize this up front, but the most highly developed Backstage deployments in the world have cost millions of dollars. We talk to companies all the time who have built a team of 5+ engineers around Backstage and run those teams for 3+ years. The most expensive Backstage deployment we’ve seen cost $7.5 million dollars over 3 years.</p><p>Roadie doesn’t require such an extensive team to be built around it. We take away all of the operations and deployment effort, and much of the customer support that has to happen. We also give you out-of-the box features like <a href="https://roadie.io/product/tech-insights/">Scorecards</a> and <a href="https://roadie.io/product/access-control/">Role-Based-Access-Control</a> that you would otherwise have to build yourself to realize the potential of your IDP.</p><p>Lastly, we charge based on usage. You only pay for the engineers who write code which is tracked in the catalog. Everyone else gets to log in for free. This means you can ramp gradually throughout your org, rather than paying for the developement of the entire IDP up front.</p><p><strong>Requires few skills</strong></p><p>Roadie doesn’t require any TypeScript skills. You just drag and drop plugins where you want them, and configure the product in our administration panels. It’s dead simple. This is perfect for platform teams who are trained on cloud-native technologies. We even have no-code plugins that you can use to display lists or charts inside the UI without writing any code.</p><p>If you do need to make something completely custom and you don’t want to write a Backstge plugin yourself, we can sort you out with professional services to suit your needs.</p><p><strong>Short time to value</strong></p><p>We’ve built a ton of features into Roadie to help shorten the time to value. For example, we help teams build out their catalog quickly and we see customers reaching a <a href="https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/">high level of catalog completeness within 4 months</a>.</p><p>We also provide customer success and guidance to help organizations effect change with their IDP. We take the lessons we’ve learned over the years and use them to guide you to success in as short a time as possible.</p><h2>Conclusion</h2><p>We’ve designed Roadie from the ground up with flexibility and easy adoption in mind. We believe engineering organizations should not waste their resources on undifferentiated heavy lifting projects like deploying an open-source IDP. At the same time, they shouldn’t be locked into a proprietary data model or limited in terms of what they can build or the integrations that are available.</p><p>It turns out that you can have your cake and eat it too. An Internal Developer Portal can be both easy to use and supported by a massive and growing open-source community. This means that you can focus on unlocking productivity in your engineering organization while knowing you have a rock-solid and flexible foundation beneath.</p><p>--</p><p><em>I’ve also written about the difference between IDPs and the IDP space in general over on <a href="https://thenewstack.io/internal-developer-portals-is-open-source-enough/">The New Stack</a></em></p><hr><p><em>Image by MSTORK from Pixabay</em></p>
]]></content:encoded></item><item><title><![CDATA[RBAC, the new Backend, Scaling, Notifications and new Catalog customisation options]]></title><link>https://roadie.io/blog/rbac-new-backend-scaling-notifications-catalog-customisation/</link><guid isPermaLink="false">https://roadie.io/blog/rbac-new-backend-scaling-notifications-catalog-customisation/</guid><pubDate>Thu, 31 Oct 2024 11:00:00 GMT</pubDate><description><![CDATA[This is a bumper changelog that covers September and October, so buckle up. The clocks have rolled back, Halloween is upon us, and it's time to wrap up what we've been working on for the last two months. We're got RBAC, the new Backend, Notifications, scaling improvements, custom columns and tables, and a whole raft of plugins.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>🚨👮 RBAC with Fine-Grained Control</h2><p>Earlier this year we launched our Role-based Access Control (RBAC) feature to help our customers have greater control over who does what within Roadie.</p><p>That started off at a fairly coarse-grain, with the ability to restrict basic CRUD actions within the application to a subset of users.</p><p>The ultimate goal was always to have much tighter and more specific controls than that though, so we built a new layer for our RBAC controls.</p><p>Now, with our RBAC plugin you can:</p><ul><li>Create and surface custom permissions for your Custom Plugins</li><li>Create custom permissions policies to group those permissions together, either using our built-in permissions or Custom Plugin permissions you've created</li><li>Attach ALLOW or DENY options to those permissions policies so that you can more flexibily articulate the policy you'd like to create</li><li>Target annotations and other entity metadata to attach permissions</li></ul><p>That means you'll be able to:</p><ul><li>Target specific catalog items</li><li>Block scaffolder templates that you only want certain users to be able to see or run</li></ul><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7d1Dygnglf6M6WgAtmw6gx/724e688867a88827f65f054555462c7d/Screenshot_2024-11-04_at_15.16.33.png" alt="Policies Management"></p><h2>🏛️ New Backstage Backend (and we're now on v1.30)</h2><p>We've made the jump to the new Backend now that it's officially stable. It wasn't a trivial thing though, and there are a few potholes that we hit along the way. To help others making the transition we even <a href="https://roadie.io/blog/migrating-to-backstages-new-backend-a-step-by-step-guide/">wrote a blog</a> about it.</p><p>We also took the opportunity to upgrade to v1.30 of the core project, so that's fun.</p><h2>🎨 Notifications have come to Roadie</h2><p>A set of <a href="https://drodil.medium.com/backstage-notifications-ceedf812ceef">Notifications plugins</a> landed with Backstage v1.28 and they've recently been integrated into Roadie. There's still some way to go for Notifications to mature in the OSS project (for example, individual plugins are responsible for emitting notifications and there currently isn't a way to really manage those - they're all just transparently sent to users 😅) but it's moving fast. There should be some notification management in the next release for example.</p><p>We'll be working in the next few months on integrating Notifications into various plugins that we manage, including Tech Insights. Exciting stuff!</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/38SDXmh1zIuJJpA2CI0qB7/6fb74bc541a37bf890104d5bfae6d6d6/notifications.webp" alt="Notifications"></p><h2>💈 Scaling, scaling, scaling</h2><p>Roadie customers have pushed the limits of Backstage lately.</p><p>Multiple Roadie customers now have:</p><ul><li>Catalogs with 200k+ Component and Resource entities</li><li>Catlaogs with 10k+ users and groups entities</li><li>Tech Insights facts with millions of data points captured</li></ul><p>That presented some major scaling challenges for our infrastructure and the way we have architected our version of Backstage.</p><p>After a lot of work to identify bottlenecks, scale cloud resources where required, and signficant engineering time introducing performance improvements, we're now able to comfortably handle even the largest Backstage installations in the world.</p><h2>🧘‍♂️ Custom Columns v2 &#x26; Fact Tables(in beta)</h2><p>Software Catalogs tend to be tables. They're the most intuitive mechanism for displaying large quantities of information.</p><p>Making those tables super-flexible is top-of-mind for Roadie at the moment. So much so that we have not one but two features in beta-testing to help introduce much more flexibility into our tables.</p><p>The first is Custom Columns v2. The first cut of Custom Columns allows you to take arbitrary metadata from each Catalog entity and add it to your Catalog tab as a new column. The values were rendered as strings though, so we're adding a whole bunch more types to allow easier comprehension of the information. Think: numbers, colourful ranges, links and Catalog entities, all accessible and filterable from within a Custom Catalog Tab.</p><p>The second is Fact Tables, which allow you to display Fact data that you've ingested for your entities from Tech Insights Data Sources (either built-in ones that we provide or custom ones you've created). You can filter it by team and display multiple Data Sources side-by-side.</p><p>Soon the two worlds will collide 🌚 🌝, with facts accessible inside Custom Columns. That means you'll be able to have a Catalog tab with data from any configured source alongside existing Catalog data.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2UJy2DSUb9xAkZEMa1Tes2/1183c92f9fc38ecac22b91c96467ca7d/Screenshot_2024-11-04_at_15.25.07.png" alt="Fact Tables"></p><h3>🔌 Plugins &#x26; Integrations roundup</h3><ul><li><strong><a href="https://roadie.io/docs/integrations/wiz/">Wiz plugin</a></strong>: we made a new plugin! The new Wiz security frontend plugin now surfaces Wiz data inside Backstage. It's available both inside Roadie and as an OSS plugin. Wiz certification for this plugin is ongoing but should happen soon. We'll be working on a Wiz Tech Insights data source after that. Watch this space 👩‍🚀🚀</li><li><strong><a href="https://roadie.io/backstage/plugins/launchdarkly/">LaunchDarkly plugin</a></strong>: we also made a LaunchDarkly plugin for managing feature flags and surfacing configuration inside your Backstage instance.</li><li><a href="https://roadie.io/backstage/plugins/shortcut/" title="Shortcut Plugin | Roadie"><strong>Shortcut plugin</strong></a>: we made a Shortcut plugin. It's not quite as widespread as Jira (that's a pretty high bar...) but Shortcut is gaining some traction as a nicer way to engage with software development tickets and workflows. We use Shortcut internaly and we're big fans. Check it out.</li><li><strong><a href="https://github.com/RoadieHQ/roadie-backstage-plugins/issues/1537">Okta</a> &#x26; <a href="https://roadie.io/backstage/plugins/jira/">Jira</a> plugins</strong>: we've refreshed our Okta and Jira plugins to make them more seamlessly work with the new Backstage backend. Enjoy.</li><li><strong><a href="https://roadie.io/docs/integrations/aws-resources/">AWS account creation</a></strong>: last but by no not least, on the topic of catalog data sources this time rather than plugins - our AWS provider has been modified to ingest EKS clusters and containers auto-magically. This is the first step in allowing a one-hit configuration of both AWS and the Kubernetes plugin (we're currently modifying the K8s plugin to ingest these resources and take the config from AWS).</li></ul>
]]></content:encoded></item><item><title><![CDATA[The Adoption Journey - Initiatives and Strategies]]></title><link>https://roadie.io/blog/the-adoption-journey-initiatives-and-strategies/</link><guid isPermaLink="false">https://roadie.io/blog/the-adoption-journey-initiatives-and-strategies/</guid><pubDate>Tue, 29 Oct 2024 12:00:00 GMT</pubDate><description><![CDATA[Part of the Backstage Adoption series, this post lays out two broad categories of proven initiatives and tactics that can be used to drive adoption of Backstage in your organisation.]]></description><content:encoded><![CDATA[<h1>Successful Strategies for Backstage Adoption in Organizations</h1><p>Driving adoption of Backstage—or any internal developer portal—in an organization usually hinges on organization-wide initiatives. These initiatives attempt to coordinate the adoption of specific parts of the product. <strong>But behavior-change initiatives can be tricky to navigate and succeed with.</strong></p><p>This post outlines a variety of proven strategies for fostering Backstage adoption. The strategies you select should depend on your goals for Backstage and the current structure and realities inside your organization.</p><p>The strategies highlighted here are <strong>based on experiences from Roadie customers</strong> who’ve already implemented them successfully. Determining whether they’re the right fit for <em>your</em> context is down to you.</p><h2>Defining Successful Backstage Adoption</h2><p>First, lets consider what successful Backstage adoption looks like for your organization. How will you measure this success? Do you envision:</p><ul><li><strong>Daily usage</strong>: 80% of engineers logging into Backstage each day?</li><li><strong>Catalog completeness</strong>: 90% of your software fully cataloged?</li><li><strong>Dependence</strong>: 90% of teams reporting Backstage as essential for their workflow?</li></ul><p>Most likely you'll have a combination of goals: Some teams may use scaffolder templates only occasionally, but this could save months of engineering time. Some might use Backstage solely for API documentation, which provides immense daily value on its own. And you may want to <a href="https://roadie.io/product/tech-insights/">automate software maturity reporting with TechInsights</a> so senior management has real-time insights into metrics like software security.</p><p>Each goal has intermediate milestones. For example:</p><ol><li>Every team uses the scaffolder at least once per quarter.</li><li>50% of new software is templated by the scaffolder.</li><li>90% of new production software is templated by the scaffolder.</li></ol><p>And each of these steps requires a distinct strategy to move forward. Broadly, these can be split into:</p><ul><li><strong>Land and Expand</strong>: Start small, growing adoption progressively across teams.</li><li><strong>Expand and Land</strong>: Launch with widespread value, aiming for rapid, organization-wide onboarding.</li></ul><p>Below, we’ll explore tactics within each approach. You don’t have to pick just one—mixing strategies can often yield the best results.</p><h2>1. Land and Expand</h2><p>This approach usually involves piloting Backstage with a few teams or a specific area of the organization before expanding it. It’s especially useful for organizations with a more varied and independant engineering culture, or one lacking top-down support.</p><h3>Benefits of “Land and Expand”</h3><ul><li><strong>Learning and iteration</strong>: Working closely with a small group allows you to gather insights, refine processes, and tailor adoption strategies for a broader rollout.</li><li><strong>Inspire others</strong>: Early adopters create a model that other teams can learn from.</li><li><strong>Build organic momentum</strong>: Introducing Backstage “aha” moments, such as the scaffolder, can drive excitement and encourage others to adopt.</li></ul><h3>Proven Tactics for “Land and Expand”</h3><p>The tactics that work best depend on your organization’s circumstances. Here are some approaches that have worked well in real-world applications:</p><h4>1. Automate multi-step processes with Scaffolder templates</h4><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2YM9YIgGmooXdfwZ9NRZiV/9e37fc29f9ede926517589171c3bb915/Screenshot_2024-09-25_at_15.48.45.png" alt="Scaffolder template in Backstage"></p><p>Start by streamlining time-consuming tasks. Even a handful of templates can deliver significant and measurable savings that are appreciated by both engineers and leadership.</p><p><strong>Example</strong>: If you’re setting up a team to build a customer communications pipeline, a few clicks in Backstage can provision AWS dev/prod accounts, a secure VPC, a sample server with CI jobs, and internal docs, all in under 20 minutes.</p><h4>2. Get content into the catalog early</h4><p>Starting with an initial set of entities can provide immediate value and help teams see the utility of Backstage as a source of information.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1bwmBGVdjhQMLWIrazs8WE/68d1eb502eb006364cf4b1887c839a92/Screenshot_2024-10-28_at_15.14.35.png" alt="Team maps in Backstage"></p><p><strong>Example</strong>: Import users and teams from an identity platform like Okta to build a visual map of team structures with minimal setup. Teams can instantly use Backstage to answer questions like “Who is the product manager for the Customer Success team?”</p><h4>3. Promote API documentation as a go-to resource</h4><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3Bn9JtBFtFmT0xi9j9IIkH/228efaad591934ffd85bb7b293408681/Screenshot_2024-10-28_at_15.16.33.png" alt="API docs in Backstage"></p><p>Make Backstage the primary source for looking up API documentation by encouraging teams to add their APIs early. This can encourage wider catalog usage.</p><h4>4. Develop custom plugins for organization-wide needs</h4><p><a href="https://roadie.io/docs/custom-plugins/overview/" title="Custom plugins in Roadie">Create your own custom plugins</a> that address common pain points across the organization. This can create a compelling reason for users to engage with Backstage regularly.</p><p><strong>Example</strong>: <a href="https://www.lunar.app/">Lunar Bank</a> built a plugin for RabbitMQ dead letter queues, which drove high initial usage and catalog engagement.</p><h2>Expand and Land</h2><p>This approach aims for rapid onboarding of multiple teams by delivering value across the entire organization. It generally requires top-down support or a dedicated platform team to do the initial heavy lifting.</p><h3>Key Tactics for “Expand and Land”</h3><h4>Prerequisites - Tracking usage and integration</h4><p>Having data on team-level adoption can help track progress in adoption in several key areas.</p><ul><li><strong>Catalog</strong>: <a href="https://roadie.io/blog/repositories-in-the-catalog/">Adding repositories to the catalog</a> can help measure catalog adoption and provide comparative progress data for teams. i.e. <em>The Identity team has 30 repositories with CI pipelines and only 5 entities in the catalog.</em> Roadie comes ready populated with repositories in your catalog, a <a href="https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/#how-to-measure-catalog-completeness">graph of catalog completeness</a> and ready made Scorecards to track it in <a href="https://roadie.io/product/tech-insights/">Tech Insights</a>.</li></ul><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1snHg4GTEpQOLQfdFiTYYM/f69d180a393deea37e651ea60a34876e/tech_insights_catalog_scorecard__1_.png" alt="Catalog Completeness can be measured via a Tech Insights Scorecard"></p><ul><li><p><strong>Scaffolder</strong>: Data on Scaffolder usage is available already in the database as standard, though you’ll have to extract it and analyse it.</p></li><li><p><strong>TechDocs and Plugins</strong>: Usage of various plugins can be assessed via tracking like Google Analytics which can <a href="https://backstage.io/docs/plugins/analytics/">easily be integrated into Backstage early on</a>.</p></li></ul><p>Roadie provides usage analytics like Catalog completeness out of the box to its customers as its such a common requirement to understand a portion of the ROI equation.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3ddCSNsNJsbCTgTz4FzHoF/d18ad4f2ac513e76fac5358daa5221f8/Screenshot_2024-10-29_at_14.24.33.png" alt="A Catalog Completeness graph comes out of the box in Roadie"></p><h4>1. Set clear mandates</h4><p>Organizational mandates can drive adoption effectively if they’re tied to broader goals, have backing from senior leadership and are checked up on at a regular cadence or certain date.</p><p><strong>Example:</strong> “Each team must have its production services cataloged with a PagerDuty annotation by September 21st to facilitate incident impact mapping.”</p><h4>2. Regular check-ins and visualizations</h4><p>Regular team check-ins or public dashboards showing migration progress can improve prioritization and motivate teams. <a href="https://youtu.be/P-JMwuuobgY?si=BNihgof1uNBUM2pw&#x26;t=1235">Twilio explained how they do this</a> at the Autodesk Developer Productivity Summit.</p><h4>3. Celebrate success</h4><p>Highlight the progress of individual teams to foster collaboration and enthusiasm. Regular shout-outs for API additions or new scaffolder templates can build positive momentum.</p><h4>4. Integrate Backstage into new hire onboarding</h4><p>Familiarize new hires with Backstage from day one. Add onboarding docs to TechDocs, making Backstage the go-to place for engineers starting out.</p><h4>5. Pre-populate the catalog</h4><p>Consider using scripts or providers like the <a href="https://github.com/RoadieHQ/roadie-backstage-plugins/tree/main/plugins/backend/catalog-backend-module-aws">AWS provider</a> to populate the catalog with essential data, reducing the manual workload.</p><h2>Conclusion</h2><p>There are many potential strategies to achieving your incremental Backstage adoption goals via organization wide initiatives. Planning what to try and at what point is not an easy choice. The ideas above are not an exhaustive list but hopefully can give you some fresh approaches in your own adoption journey.</p><hr><p><em>Image by Dirk (Beeki®) Schumacher from Pixabay</em></p>
]]></content:encoded></item><item><title><![CDATA[Improving and Measuring Developer Experience with Backstage]]></title><link>https://roadie.io/blog/improving-and-measuring-developer-experience-with-backstage/</link><guid isPermaLink="false">https://roadie.io/blog/improving-and-measuring-developer-experience-with-backstage/</guid><pubDate>Mon, 28 Oct 2024 09:00:00 GMT</pubDate><description><![CDATA[Tracking and measuring developer experience is crucial for driving efficiency and productivity as your organization grows. We explore two effective approaches to aggregating DX metrics, either through third-party tools or natively within Backstage using Tech Insights, helping you make informed decisions to optimize your development workflows.]]></description><content:encoded><![CDATA[<h2>What is Developer Experience?</h2><p>Developer Experience (DX) refers to the tools, systems, and culture that impact how developers work. It reflects how efficiently developers can do their best work without <a href="https://www.willett.io/posts/developer-friction/">frustration or impediment</a>, and as such, DX is <a href="https://newsletter.getdx.com/p/impact-of-developer-experience">strongly</a><a href="https://www.offerzen.com/blog/4-strategies-driving-developer-excellence-at-offerzen">associated</a><a href="https://queue.acm.org/detail.cfm?id=3639443">with</a><a href="https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/tech-forward/why-your-it-organization-should-prioritize-developer-experience">high-performing</a> teams that deliver impactful, value-creating software. When building software, the highest cost is almost always engineering time, so it make sense to optimize for developer efficiency and productivity.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4guyCmEw2BXbWCjbuOrt9S/6ccf6062bd23725c2465413379cad61d/image0.png" alt="Developer experience metrics"><em>The benefits of improved DevEx: https://newsletter.getdx.com/p/impact-of-developer-experience</em></p><p>As organizations grow, developer experience often suffers due to increased <a href="https://dev.to/tiuwill/friction-how-human-behavior-influences-code-development-46on">friction</a>, formal processes, and fragmented knowledge, <a href="https://www.thoughtworks.com/insights/articles/friction-developer-portals-solve">which hampers productivity and collaboration</a>. Informal communication channels that once worked well in small teams become ineffective, and developers struggle to find information or navigate complex systems - a sort of software development <a href="https://en.wikipedia.org/wiki/Dunbar%27s_number">Dunbar’s Number</a> . Maintaining a strong DX requires proactive strategies and consistent adaptation to evolving team sizes, ensuring that workflows remain efficient and developers stay engaged and motivated.</p><h2><strong>How to Improve Developer Experience with Backstage</strong></h2><p>How can large software organizations deliver a great DX? Improving DX typically involves addressing core challenges that hinder efficiency as organizations scale, many of which can be resolved by using an Internal Developer Portal (IDP) like Backstage:</p><h3><strong>Discoverability of Services and Information</strong></h3><p>As organizations grow, developers often struggle to find services, APIs, and documentation. A developer might spend hours searching for documentation or rebuilding a service simply because they can't locate what they need.</p><p><strong>How Backstage Solves This:</strong></p><p>The Backstage <a href="https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/">Service Catalog</a> centralizes services, APIs, documentation, and ownership, making it easy for developers to find what they need, see dependencies, and know who to contact - eliminating wasted time. <a href="https://roadie.io/docs/details/techdocs/">TechDocs</a>, a core feature of Backstage, makes internal documentation easily accessible and up-to-date by storing it alongside the code itself. This centralization reduces friction and boosts productivity by providing a single interface for all <a href="https://roadie.io/blog/adopting-backstage-documentation-and-support/">documentation needs</a>.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6Gvl2Jr464bBsEomZvqzjs/cb7026d6f0ae7953941974b76b71deb1/screencapture-demo2-roadie-so-catalog-2024-10-15-13_14_52.png" alt="A typical Backstage software catalog"><em>A typical Backstage software catalog</em></p><h3><strong>Quality and Speed of Releases</strong></h3><p>In software delivery, pace and consistency are critical. Fragmented workflows and inconsistent CI/CD pipelines can lead to uncertainty and delays, slowing down releases and increasing errors.</p><p><strong>How Backstage Solves This:</strong></p><p>Backstage templates standardize releases by automating workflows and enforcing best practices. By integrating with tools like GitHub, Backstage ensures a consistent, transparent software delivery process, reducing friction and increasing developer confidence.</p><h3><strong>Self-Service Capabilities and Workflow Simplification</strong></h3><p>In larger organizations, developers often depend on other teams for tools and approval, which can cause delays and frustration. Self-service tooling helps developers maintain momentum without waiting for approvals.</p><p><strong>How Backstage Solves This:</strong></p><p>Backstage enables developers to self-provision resources, deploy services, and configure environments through integrations with cloud providers like AWS or Kubernetes - removing bottlenecks and keeping developers focused on coding.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/14JFhCw4VdOndqBtKYovrU/edb41ac84aa25d9d40248981484d7fd7/image.png" alt="A collection of software templates that automate common workflows - everything from creating new customer environments to opening a new PR to add software into a Backstage software catalog."></p><p><em>A collection of software templates that automate common workflows - everything from creating new customer environments to opening a new PR to add software into a Backstage software catalog.</em></p><h3><strong>Governance, Standards Adherence, and Complexity Management</strong></h3><p>As organizations grow, maintaining governance and ensuring adherence to standards becomes challenging. Over time, undocumented services and lack of enforced policies lead to vulnerabilities, with leadership often lacking visibility into best practice adherence.</p><p><strong>How Backstage Solves This:</strong></p><p>Backstage’s Tech Insights plugin enforces governance by setting up custom checks to monitor security, testing, and compliance, helping teams ensure their services meet internal standards and guidelines. Developers use Tech Insights to address gaps, while leadership tracks governance adherence, providing a unified view of service health and managing complexity as the organization scales.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4txrYZlAe4iW9Dj46vjwKZ/c35f5e02266c4b6c2a9ab3a208a01fee/image2.png" alt="Tech Insights scorecard showing adherence of the organization to Backstage component best practices over time"></p><p><em>Tech Insights scorecard showing adherence of the organization to Backstage component best practices over time</em></p><h3>How to Track and Measure DX</h3><p>Implementing an IDP like Backstage is an important first step to improving developer experience, and helps address some of the more obvious impediments to DX. The crucial next step is understanding how to track DX and measure it. This is especially true as your organization grows, so do its complexities. Continuous feedback from tools that track DX are critical to ensure that you’re always ahead of potential bottlenecks or compliance gaps. Measuring DX can take many forms. There are two main approaches to consider depending on your organization's needs and existing tools.</p><h3>Approach 1: Aggregate DX Metrics in a Third-Party Tool</h3><p>One approach is to use specialized third-party tools to collect DX metrics and then surface those insights within Backstage. Tools like <a href="http://getdx.com/">getdx.com</a> and open-source <a href="https://github.com/DevoteamNL/opendora/tree/main/backstage-plugin/plugins/open-dora#readme">DORA plugins</a> are specifically designed to aggregate developer productivity data. They track metrics like cycle time, lead time for changes, and deployment frequency, and then present the results in a dashboard format.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1n8i09uZpZuNj1u5iErZ5v/df1063af484f7c0194d16e92ffbb4916/image3.png" alt="OpenDORA courtesy of https://nl.devoteam.com/expert-view/introducing-opendora-team-performance-observability-for-your-organization/"></p><p><em>OpenDORA courtesy of https://nl.devoteam.com/expert-view/introducing-opendora-team-performance-observability-for-your-organization/</em></p><p>For example, you could integrate a DX Plugin that aggregates productivity metrics from across your CI/CD systems and other development tools. These metrics could then be displayed in Backstage, where developers can easily see how well they’re performing against industry benchmarks. Metrics like DORA (Deployment Frequency, Lead Time for Changes, Mean Time To Recovery, and Change Failure Rate) are especially valuable for tracking operational efficiency and developer productivity.</p><p>This approach uses specialized tools for deeper analytics while keeping insights in Backstage, minimizing context-switching.</p><h3>Approach 2: Aggregate Metrics Directly in Backstage with Tech Insights</h3><p>Alternatively, you can collect and track DX metrics natively within Backstage using the Tech Insights plugin. Tech Insights enables you to define and track custom checks and metrics, giving you full control over what data you collect and how it's presented. This approach allows you to integrate metrics from your existing systems (like GitHub or monitoring tools) and aggregate them in Backstage, providing a centralized view of DX without relying on third-party dashboards.</p><p>Using Roadie’s Tech Insights plugin, you can create scorecards that track various metrics related to DX. For instance, you could monitor the percentage of services adhering to security policies, the number of projects using automated testing, or DORA metrics - all from within Backstage itself. Scorecards can also be viewed for each individual team - so with this data, those teams can quickly identify areas of improvement, measure changes, and identify focus areas to optimize DX.</p><p>This approach centralizes everything in Backstage, allowing for more tailored metrics and scorecards based on your organization’s needs while reducing context-switching for developers.</p><p><em><img src="//images.ctfassets.net/hcqpbvoqhwhm/6gCdkXPTEV2BwRYeUJDXls/af0d7cfabf28c9fe5fbadb64a1dbc92f/image4.png" alt="Tech Insights scorecards on a Team page in Backstage "></em></p><p><em>Tech Insights scorecards on a Team page in Backstage</em></p><p>The right DX tracking approach depends on your organization's goals, tools, and the complexity of your developer workflows. Whether you aggregate metrics in a third-party tool and display them in Backstage or centralize everything within Backstage using Tech Insights, the key is making DX measurable and visible to all stakeholders. A strong developer experience is about tracking, iterating, and improving workflows while providing transparency and actionable insights.</p><hr><p><em>Image by GrumpyBeere from Pixabay</em></p>
]]></content:encoded></item><item><title><![CDATA[Migrating to Backstage’s New Backend:  A Step-By-Step Guide]]></title><link>https://roadie.io/blog/migrating-to-backstages-new-backend-a-step-by-step-guide/</link><guid isPermaLink="false">https://roadie.io/blog/migrating-to-backstages-new-backend-a-step-by-step-guide/</guid><pubDate>Fri, 11 Oct 2024 11:00:00 GMT</pubDate><description><![CDATA[Thinking about upgrading to Backstage’s new backend system? Check out our comprehensive guide that walks you through a minimal-disruption migration. Learn how to prepare your environment, use the legacy plugin bridge for existing plugins, and smoothly transition to the modern architecture. We'll help you navigate this migration so you can unlock improved scalability, efficiency, and set the stage for future development.]]></description><content:encoded><![CDATA[<h2>Introduction</h2><p>Migrating to Backstage’s <a href="https://backstage.io/docs/backend-system/">new backend system</a> is more than just an upgrade - it’s a step toward a more scalable, efficient, and streamlined platform. The new backend introduces a modern architecture that simplifies plugin integration, reduces complexity, and improves overall performance. This is a significant shift from the monthly updates and incremental improvements you may be used to. By adopting this new architecture, you’ll gain better control over plugin dependencies, reduce overhead, and set the stage for easier future development.</p><p>This migration is particularly exciting because it marks the culmination of a long journey towards a more modular and efficient backend. Originally introduced as an experimental feature, the new backend has matured into a fully supported system that greatly enhances how plugins interact with core services. It’s a big leap forward from the previous architecture, and while the transition may require some effort, the long-term benefits—such as improved scalability, clearer separation of concerns, and better dependency management—make it a worthwhile investment for teams looking to future-proof their Backstage setup.</p><h2>Overview</h2><p>In this guide, we’ll focus on helping you migrate your existing Backstage backend with minimal disruption. We’ll walk through key steps, such as backing up your current system, creating a bridge for your existing plugins, and gradually integrating them into the new architecture. By the end of this article, you’ll have a clear understanding of how to approach the migration while maintaining your current functionality, and how to unlock the benefits of Backstage’s new backend.</p><p>We’ll focus on performing a minimal migration that gets your main backend (<code>index.ts</code> in <code>packages/backend</code>) up and running in the new system. At this stage, we won't migrate all plugins, as that can be a more time-consuming process. Instead, we’ll leverage a temporary bridge function to convert existing plugins into the new system using a transitional environment, allowing you to start benefiting from the new architecture right away.</p><h2>Preparation for Migration</h2><p>Before diving into the migration process, it's essential to ensure that your new system will function as expected post-migration. Proper preparation can significantly reduce the risk of issues arising during or after the transition.</p><h3>Establish a Testing Plan</h3><ol><li><strong>Manual Testing</strong>: Develop a comprehensive plan for manually testing your application with the new backend. This should include detailed scenarios that mimic real-world usage of your application. Identify critical paths and functionality that must work seamlessly in the new environment.</li><li><strong>Integration Tests</strong>: In addition to manual testing, it's crucial to have a robust set of integration tests. These tests should cover all functionalities, particularly those related to any custom modifications you've implemented in your existing system. Ensure that your tests validate the expected behavior of all integrated components.</li></ol><p>Pay special attention to any custom features or modifications you’ve made. These areas are often the most prone to issues during migration, so thorough testing is essential. And of course it goes without saying, but make sure all your existing tests don’t fail!</p><p>By laying the groundwork with a well-defined testing plan—comprising both manual and automated integration tests—you can increase your confidence in the migration process. This preparation will help ensure that your new backend system operates smoothly and consistently after the migration is complete.</p><h3>Migration Guide from Backstage</h3><p>Before proceeding with this migration guide, it's essential to familiarize yourself with the resources available from Backstage. The official <a href="https://backstage.io/docs/backend-system/building-backends/migrating">migration guide</a> on the Backstage website provides valuable insights and best practices for transitioning your backend system. While there may be some overlap with this guide, repetition can reinforce key concepts and ensure that you don't miss critical steps.</p><p>Take the time to read through the Backstage migration guide before continuing with this document. Doing so will provide you with a solid framework and context that will aid in your migration efforts, ultimately leading to a smoother transition.</p><h3>What’s the Plan for Rollout?</h3><p>To ensure a seamless transition to the new backend, it’s crucial to carefully plan and prepare your rollout strategy. Here are some key considerations to guide your approach:</p><ol><li><strong>Deployment Strategy</strong>:
<ul><li><strong>New Image Creation</strong>: Consider creating a new Docker image for the updated backend. This allows you to maintain clear versioning and ensures that your deployment environment matches your development and testing environments.</li><li><strong>Blue-Green Deployment</strong>: If feasible, implement a blue-green deployment strategy. This involves running two identical environments (blue and green) where one is live while the other is idle. You can switch traffic between them to minimize downtime and facilitate rollback if necessary.</li></ul></li><li><strong>Monitoring and Metrics</strong>:
<ul><li><strong>Set Up Monitoring Tools</strong>: Implement monitoring solutions to track performance metrics, error rates, and resource usage. Tools like Prometheus, Grafana, or your existing APM solutions can provide valuable insights during and after the rollout.</li><li><strong>Health Checks</strong>: Ensure that health checks are in place to quickly identify any issues with the new backend as it goes live.</li></ul></li></ol><h3>Migrating Your <code>index.ts</code> to the New Backstage Backend System</h3><p>In this section, I’ll walk you through the process of migrating your Backstage backend to the new backend system. We’ll focus on updating your <code>index.ts</code> file, which serves as the main entry point for your backend. This migration will allow you to start leveraging the streamlined architecture and dependency injection provided by the new system.</p><h3>Step 1: Backup Your Existing <code>index.ts</code></h3><p>Before starting any migration, it’s essential to create a backup of your current <code>index.ts</code> file. This way, you can reference it or roll back if needed. Let’s save this backup as <code>index.backup.ts</code>.</p><p>Additionally, to avoid issues with type checking while you’re in the middle of the migration, add <code>@ts-nocheck</code> at the top of your backup file.</p><pre><code class="language-tsx">// index.backup.ts
// @ts-nocheck
</code></pre><h3>Step 2: Set Up the New <code>index.ts</code></h3><p>Now, create a new <code>index.ts</code> file, which will be the entry point for your new backend system. For this initial step, you can keep it as minimal as possible. The goal is to have a working backend skeleton, which we will build upon later.</p><p>Here’s an example of a minimal setup:</p><pre><code class="language-tsx">import { createBackend } from '@backstage/backend-defaults';

const backend = createBackend();

backend.start();
</code></pre><p>At this point, your new backend doesn’t do much—it's just an empty shell. But it’s important to verify that the basic setup works before adding any of your legacy plugins.</p><h3>What’s Next?</h3><p>In the next part of the migration, we’ll create a temporary legacy environment for your existing plugins and gradually integrate them into the new backend system. This approach allows for a smooth transition, letting you keep the old plugins running while starting to take advantage of the new architecture.</p><h2>Creating a temporary plugin environment</h2><p>To ease the migration process, Backstage provides a handy bridge function, <code>makeLegacyPlugin</code>, which helps create a temporary environment compatible with the old backend system. This allows your legacy plugins to continue working while you transition to the new backend architecture. The function ensures all required dependencies are injected into the plugin, simulating the old system’s behavior.</p><p>Here's an example of using <code>makeLegacyPlugin</code>:</p><pre><code class="language-tsx">const legacyPlugin = makeLegacyPlugin(
  {
    logger: coreServices.logger,
    cache: coreServices.cache,
    database: coreServices.database,
    config: coreServices.rootConfig,
    reader: coreServices.urlReader,
    discovery: coreServices.discovery,
    tokenManager: coreServices.tokenManager,
    permissions: coreServices.permissions,
    scheduler: coreServices.scheduler,
    events: eventsServiceRef,
    eventBroker: eventBrokerService,
    auth: coreServices.auth,
    httpAuth: coreServices.httpAuth,
    userInfo: coreServices.userInfo,
    pluginName: coreServices.pluginMetadata,
    identity: coreServices.identity
  },
  {
    logger: log => loggerToWinstonLogger(log),
    cache: cache => cacheToPluginCacheManager(cache),
  },
);

</code></pre><p>In this setup, your <code>makeLegacyPlugin</code> function acts as a temporary replacement for the older <code>makeCreateEnv</code> function. If <code>makeLegacyPlugin</code> returns the same dependencies as <code>makeCreateEnv</code>, you're set for a smooth transition.</p><p>You can also provide type conversion functions within the second parameter to handle any type discrepancies between old and new service structures. For example, converting a logger or cache to the format expected by the new backend system.</p><h3>Caveats and Considerations</h3><p>One important note is that all of these dependencies will be instantiated for every plugin that uses the <code>legacyPlugin</code> bridge. This can become problematic, particularly if any dependencies involve heavy operations (e.g., opening database connections). While this bridge is a useful short-term solution, it's highly recommended to fully migrate your backend plugins to the new system as soon as possible.</p><h3>Using the <code>legacyPlugin</code></h3><p>You can add your legacy plugins to the backend using <code>legacyPlugin</code> like this:</p><pre><code class="language-tsx">backend.add(legacyPlugin('todo', import('./plugins/todo')));
...
backend.start();
</code></pre><p>By focusing on migrating only your <code>index.ts</code> file initially, you minimize disruption while keeping the migration manageable. This strategy allows you to stay on top of the process while taking advantage of the new backend architecture in stages, ensuring a smooth transition.</p><h2>Adding new backend system plugins</h2><p>You can directly add your plugins that are already migrated to the new backend system into your <code>index.ts</code> file.</p><pre><code class="language-jsx">backend.add(legacyPlugin('todo', import('./plugins/todo')));
...
backend.add(import('@roadiehq/foo-bar')); // a backend plugin in the new system
...
backend.start();
</code></pre><h2>Overriding Core Services</h2><p>Backstage offers a robust architecture that includes a set of core services added to the application by default. When a plugin or service references a core service, it automatically receives the appropriate instance.</p><h3>Custom Implementations</h3><p>If you had custom implementations of certain services in your old system, you will need to override them in the new backend. This is crucial for maintaining functionality and ensuring that your custom logic remains intact.</p><h3>Recommended Approach</h3><p>To keep your Backstage directory structure organized and maintainable, I recommend creating a separate package for these service overrides. This approach promotes clarity and separation of concerns within your project.</p><ul><li><strong>Package Naming Convention</strong>: In the upstream OSS version of Backstage, this package is commonly referred to as <code>backend-defaults</code>. Adopting this convention in your project will help standardize your codebase and make it easier for others to understand your structure.</li></ul><h3>Implementation Steps</h3><ol><li><strong>Create the <code>backend-defaults</code> Package</strong>:
<ul><li>Set up a new package in your Backstage repository specifically for service overrides.</li></ul></li><li><strong>Implement Overrides</strong>:
<ul><li>Define the necessary overrides for the core services that require customization. Ensure that these overrides integrate seamlessly with the existing Backstage architecture.</li></ul></li><li><strong>Update Your Application</strong>:
<ul><li>Modify your application’s configuration to reference the new <code>backend-defaults</code> package, ensuring that your custom implementations are used where needed.</li></ul></li></ol><p>The current core services that backstage provides are the following:</p><pre><code class="language-bash">// backstage/backstage/packages/backend-defaults/src/entrypoints
auth
cache 
database
discovery
httpAuth
httpRouter
lifecycle
logger
permissions
rootConfig
rootHealth
rootHttpRouter
rootLifecycle
rootLogger
scheduler
urlReader
userInfo
</code></pre><p>You can import the core services from the respective path like this:</p><pre><code class="language-jsx">import {
  RootHttpRouterConfigureContext,
  rootHttpRouterServiceFactory
} from '@backstage/backend-defaults/rootHttpRouter'
</code></pre><h3>Creating Your <code>backend-defaults</code> Package</h3><p>To effectively manage your service overrides in Backstage, you can create a <code>backend-defaults</code> package using the Backstage CLI. This package will serve as a centralized location for your custom service implementations.</p><h3>Step-by-Step Instructions</h3><ol><li><p><strong>Run the Backstage CLI Command</strong>:
Execute the following command to create a new package:</p><pre><code class="language-bash">npx backstage-cli new
</code></pre></li><li><p><strong>Select Package Type</strong>:
When prompted, choose the option for creating a Node library:</p><ul><li><strong>Node Library</strong>: This will allow you to export shared functionality for backend plugins and modules.</li></ul></li><li><p><strong>Set the Package ID</strong>:
Add the package ID as <code>backend-defaults</code> when prompted. This step ensures that your package is correctly identified within your Backstage setup.</p></li><li><p><strong>Package Creation</strong>:
Upon completion, this command will generate a new package in your <code>packages</code> directory named <code>backend-defaults</code>.</p></li><li><p><strong>Organizing Your Overrides</strong>:
To keep your package structured and maintainable, I recommend creating a folder within the <code>backend-defaults</code> package called <code>services</code>. This folder will house your overrides or any new services you implement.</p><pre><code class="language-bash">mkdir packages/backend-defaults/services
</code></pre></li></ol><p>The following is an implementation of an override of the <code>coreServices.database</code> service. This is a potential fix for cases where your <code>createEnv()</code> and <code>pluginID</code> strings did not match so in your old system the database name does not match the API path.</p><p>In the new backend system it will always be the case that your plugin’s ID determines the name of its database and the path it will be attached to. This “fix” is needed because the current upstream Backstage doesn't provide a way to override the database name for these rare edge cases.</p><pre><code class="language-bash">import {
  coreServices,
  createServiceFactory,
} from '@backstage/backend-plugin-api';
import { ConfigReader } from '@backstage/config';
import { DatabaseManager } from '@backstage/backend-defaults/database';

export const databaseServiceFactory = createServiceFactory({
  service: coreServices.database,
  deps: {
    config: coreServices.rootConfig,
    lifecycle: coreServices.lifecycle,
    pluginMetadata: coreServices.pluginMetadata,
  },
  async createRootContext({ config }) {
    return config.getOptional('backend.database')
      ? DatabaseManager.fromConfig(config)
      : DatabaseManager.fromConfig(
          new ConfigReader({
            backend: {
              database: { client: 'better-sqlite3', connection: ':memory:' },
            },
          }),
        );
  },
  async factory({ pluginMetadata, lifecycle }, databaseManager) {
    const pluginId = pluginMetadata.getId();
    let databaseName;
    switch (pluginId) {
      case 'foo-bar':
        databaseName = 'foo_bar';
        break;
      case 'tech-insights':
        databaseName = 'tech_insights';
        break;
      default:
        databaseName = pluginId;
    }
    return databaseManager.forPlugin(databaseName, {
      pluginMetadata,
      lifecycle,
    });
  },
});

</code></pre><h2>Configuring the httpRouter service</h2><p>The backend system comes with its default configured httpRouterService. This is good for basic use cases. It contains multiple middleware and is responsible for the default configuration of the express router.</p><p>If the default configuration is not enough, or you made customizations on your router in the old system you can provide your configurations like this:</p><pre><code class="language-tsx">// index.ts
import {
  RootHttpRouterConfigureContext,
  rootHttpRouterServiceFactory,
} from '@backstage/backend-defaults/rootHttpRouter';
...
backend.add(
  rootHttpRouterServiceFactory({
    configure: (context: RootHttpRouterConfigureContext) => {
      const { app, config, logger, routes, applyDefaults } = context;
			// register your custom middlewares

			applyDefaults() // apply the default middlewares from the backstage core service
    },
  }),
);
</code></pre><p>Backstage comes with a default request logger. You cannot turn it off and can not be overridden. If you want to replace it (or any of the default middleware) and use your own request logger you have only one choice: don’t apply the default middleware, and instead use each individual middleware as needed, and your custom implementations.</p><pre><code class="language-tsx">// index.ts
...
backend.add(
  rootHttpRouterServiceFactory({
    configure: (context: RootHttpRouterConfigureContext) => {
      const { app, config, logger, routes, applyDefaults } = context;

      const backstageMiddlewares = BackstageMiddlewareFactory.create({
        config,
        logger,
      });

			// we leave out the backstageMiddlewares.logging() and use our own logger.
      app.use(myCustomRequestLoggingHandler());

			// add rest of the default middlewares
      app.use(backstageMiddlewares.helmet());
      app.use(backstageMiddlewares.cors());
      app.use(backstageMiddlewares.compression());
      app.use(routes);
      app.use(backstageMiddlewares.notFound());
      app.use(backstageMiddlewares.error());
    },
  }),
);
</code></pre><h2>Healthcheck</h2><p>If you want to override the default healthcheck you can easily do it by attaching a new endpoint for it.</p><p>You can create a backend plugin for your healthcheck and then register it in your <code>index.ts</code> file.</p><p>Use the <code>backstage-cli</code> to create a new plugin.</p><pre><code class="language-bash">// run and select   backend-plugin - A new backend plugin 
npx backstage-cli new
</code></pre><p>This will create a new backend plugin inside your project under the <code>plugins</code>  folder. It will be called the ID that you provided in the prompt.</p><pre><code class="language-tsx">// plugins/healthcheck-backend
export const healthCheck = createBackendPlugin({
  pluginId: 'healthcheck',

  register(env) {
    env.registerInit({
      deps: {
        rootHttpRouter: coreServices.rootHttpRouter
      },
      init: async ({ rootHttpRouter, rootLifecycle }) => {
        rootHttpRouter.use('/healthcheck', (_, res) => {
          res.json({ status: 'ok' });
        });
      },
    });
  },
});

// packages/backend/src/index.ts
backend.add(import('healthcheck-backend'));
</code></pre><h2>Using the lifecycle hooks</h2><p>The new backed system provides a core service to hook into the different lifecycles of the process. There is a plugin scoped and a root scoped <code>lifecycle</code> and <code>rootLifeCycle</code> service respectively.</p><p>In the old system you might have hooked into the service start promise. In the example below we call the function runOnStartup</p><pre><code class="language-tsx">  // packages/backend/src/index.backup.ts

  const service = createServiceBuilder(module)
  service
    .start().then((_server: Server) => {
      logger.info(`Startup finished`, {
        uptime: process.uptime(),
      });

      runOnStartup({
        config
      });
    })
    .catch(err => {
      logger.error(
        'The http server threw an unexpected error',
        isNativeError(err) ? err.message : `unknown error raised: ${err}`,
      );
      process.exit(1);
    });
</code></pre><p>The same functionality can be achieved with the lifecycle hooks in the new backend system.</p><p>Create a module for your functionality. The example demonstrates how to add a lifecycle hook to the tech-insights plugin:</p><pre><code class="language-tsx">export const techInsightsModuleCalculateNew = createBackendModule({
  pluginId: 'tech-insights',
  moduleId: 'calculate-new',
  register(reg) {
    reg.registerInit({
      deps: {
        config: coreServices.rootConfig,
        discovery: coreServices.discovery,
        featureFlagStore: featureFlagStoreServiceRef,
        lifecycle: coreServices.rootLifecycle,
      },
      async init({ config, discovery, featureFlagStore, lifecycle }) {
        const onStartup = () => {
          ...
        };
        lifecycle.addStartupHook(onStartup); // Add the function to be run at startup
      },
    });
  },
});
</code></pre><h1>Testing the New Backend</h1><p>Testing your migrated backend is a critical component of any upgrade process. Ensuring that your application functions as expected after migration can prevent a host of issues down the line.</p><h3>Setting Up a Development/Test Environment</h3><p>It is strongly recommend to establishing a dedicated dev/test environment. This allows you to deploy your new version and conduct tests with minimal traffic interference. Since we are re-architecting the entire Express application and modifying how plugins are integrated, validating that your plugins are accessible is absolutely paramount.</p><h3>Implementing Health Check Tests</h3><p>A good starting point for testing is to implement basic health check tests using your existing end-to-end (e2e) or integration testing framework. These tests should verify that each plugin is mounted correctly to its expected path. Here’s an example of how you can do this with Playwright:</p><pre><code class="language-jsx">import { test, expect } from '@playwright/test';

test.describe('/healthcheck', () => {
  test.describe('GET', () => {
    test('The healthcheck endpoint is configured', async ({ request }) => {
      const result = await request.get(`/healthcheck`);
      expect(result.status()).toBe(200);
    });
  });
});

</code></pre><h3>Reviewing End-to-End Test Coverage</h3><p>Before migrating, take a moment to review your end-to-end test coverage. Understanding your current testing landscape will help identify gaps, especially given the extensive changes involved in this migration. The more coverage you have over critical services, the better equipped you will be to catch issues early.</p><p>Pay special attention to:</p><ul><li><strong>Custom Implementations</strong>: Review any middleware, metrics, or logging mechanisms that may impact the migration.</li><li><strong>Company-Specific Plugins</strong>: Ensure that any frontend/backend plugins unique to your organization are covered.</li><li><strong>Startup Hooks</strong>: Verify that these are functioning correctly post-migration.</li><li><strong>Legacy Functions</strong>: Check the usage of custom legacy functions to avoid conflicts with plugin IDs and database names.</li></ul><h3>Manual Testing</h3><p>Finally, don't underestimate the importance of manual testing. Take the time to navigate through your application before completing the merge. The migration touches a broad surface area, making it challenging to cover every scenario with automated tests. If you've managed to create comprehensive automated tests—kudos to you! But a thorough manual check can help catch any edge cases that might otherwise be missed.</p><h3>Conclusion</h3><p>By following these testing practices—establishing a robust test environment, implementing health checks, reviewing your test coverage, and conducting manual tests—you can significantly reduce the risks associated with your backend migration. This proactive approach will help ensure a smooth transition and a reliable application post-upgrade.</p><h1>Quirks Encountered in the New Backend System</h1><p>As we transitioned to the new backend system, we encountered several quirks that are important to highlight, particularly regarding database naming conventions and plugin path correlations.</p><h2>Database Name and Plugin Path Correlations</h2><p>In the updated architecture, database creation is directly tied to the plugin ID. While this design should streamline the process, it can lead to issues if there has been a lack of consistency in the previous backend, specifically within the <code>createEnv</code> function.</p><pre><code class="language-jsx">const createEnv = makeCreateEnv(config);
const todoEnv = useHotMemoize(module, () => createEnv('todo'));

</code></pre><p>In the legacy system, the database name was simply derived from the parameter passed to the <code>createEnv</code> function. The new backend would generate the databases using the plugin ID or the string you pass to the legacyBackend bridge function and it would use this same string as the API path. This is an issue if your parameter to the <code>createEnv</code> function and the path you attached your plugin to were not the same.</p><p>In the new system, the plugin ID takes precedence—it dictates both the database name and the API path.</p><h3>Key Considerations</h3><ul><li><strong>Consistency is Crucial</strong>: Ensure that you consistently use kebab-case for plugin identifiers across both the legacy and new systems to avoid unintended database creations.</li><li><strong>No Official Override</strong>: Currently, there is no configuration option to override this behavior, making it essential to align naming conventions proactively.</li></ul><h2>Catching Errors in Plugin Configuration</h2><p>It's crucial to ensure that any misconfigurations or missing settings are promptly identified. One common issue is that if a plugin configuration is not available, the application may still start up without any visible indications of the problem. This can lead to missed error messages and difficult-to-diagnose issues later on.</p><h3>Effective Monitoring Strategies</h3><p>To improve your visibility into potential errors during startup I recommend an approach to run your backend service with output filtering that focuses on error and warning messages. Here’s how you can do this:</p><ol><li><p><strong>Start Your Application</strong>: Launch your backend application as you normally would.</p></li><li><p><strong>Use Grep for Real-Time Monitoring</strong>: Pipe the output of your application to <code>grep</code> to filter for key terms like "error" and "warn". This can be done in a Unix-like terminal using the following command:</p><pre><code class="language-bash">yarn start-backend | grep -E "error|warn"

</code></pre><p>This command allows you to see real-time log messages that could indicate issues with your plugin configuration.</p></li></ol><p>By integrating these practices into your development workflow, you'll significantly enhance your ability to catch and respond to configuration errors on time.</p><h2>Conclusion</h2><p>Understanding these quirks is vital for a smooth transition to the new backend system. By ensuring consistent naming conventions, you can mitigate potential issues related to database and plugin management. Catching errors and warnings early can save you a lot of time in the upgrade process. Stay vigilant to avoid complications that could arise from discrepancies in your configurations.</p><hr><p><em>Image by GenerativeStockAI from Pixabay</em></p>
]]></content:encoded></item><item><title><![CDATA[The Power of Customization: Making Backstage Work for You with Roadie]]></title><link>https://roadie.io/blog/the-power-of-customization-making-backstage-work-for-you-with-roadie/</link><guid isPermaLink="false">https://roadie.io/blog/the-power-of-customization-making-backstage-work-for-you-with-roadie/</guid><pubDate>Wed, 02 Oct 2024 04:00:00 GMT</pubDate><description><![CDATA[Every engineering organization is unique, and your Internal Developer Portal should reflect that. Roadie builds on Backstage’s flexibility, and makes it much easier for teams to customize their developer portal to match their tools, culture, and workflows. From theming and UI tweaks, to custom data models, entity providers, and plugins, learn how Roadie combines ease-of-use with extensive customization.]]></description><content:encoded><![CDATA[<p>Backstage is the most flexible way to build an Internal Developer Portal that exists. The downside of this flexibility is that it can take a lot of effort to use. Roadie’s mission is to take the hassle out of Backstage, but we want to deliver on this goal without preventing our users from customizing Backstage to the fullest extent possible.</p><p>This article explains many of the customization options available in Roadie today. From look-and-feel customization like theming, to deep data model flexibility, custom scaffolder actions, and completely custom plugins, Roadie has the flexibility you need.</p><h2>Backstage customizability</h2><p>Backstage is not a Developer Portal. It’s actually a framework for building developer portals. You can think of it as being more like a Software Development Kit for building your own developer portal. Each company is expected to mix &#x26; match and extend the libraries that the Backstage community make available, so that they end up with a completely customized and unique IDP.</p><p>This fact is evident from day one of using Backstage. To get started, you don’t download and run a Docker container like you might expect. Instead, you run a command line tool to scaffold your own Backstage instance. Once that’s done, you write your own code to customize it.</p><pre><code class="language-bash">-> npx @backstage/create-app@latest
? Enter a name for the app [required]: acme-corp-idp

Creating the app....
</code></pre><p>Another example of this flexibility is evident in the plugin architecture. The Backstage community has created hundreds of open-source plugins that can be installed into a Backstage-based IDP in order to provide visibility into the many many tools that might be used. Backstage is designed in a way that makes these plugins easy to install and configure.</p><p>From authentication hook points to source code management tool integrations to self-service scaffolder templates and UI components…. each and every part of Backstage can be ripped out and replaced and customized to your needs and desires. This is part of the philosophy of Backstage.</p><h2>Roadie customizability</h2><p>Despite being built on Backstage, Roadie is a complete, out-of-the-box, developer portal. We take away all of the work of building and maintaining your IDP, so that you can focus on getting value from the tool.</p><p>We believe that most organizations shouldn’t need to build a team of 4 or 5 engineers around Backstage. You should want most of your engineering efforts focussed on initiatives that deliver direct customer value, rather than building internal tools from scratch.</p><p>This is one of the reasons Roadie is delivered as Software as a Service. We want you to show up on day one and just start using it.</p><p>At the same time, we also want to retain the philosophy of Backstage. You need to be able to customize Roadie just as much as you would customize Backstage. You should to be able to connect your own authentication providers, integrate your homegrown tools, and mix and match the plugins that make sense to you. For this reason, we strive to make Roadie as customizable and extensible as we possibly can.</p><p>We believe that by combining Roadie’s ease of use with the flexibility and scope of the Backstage ecosystem, Roadie offers the best of both worlds, and is the best IDP on the market.</p><p>This post explores many of the major ways you can customize Roadie to meet your needs. Here are your options.</p><h2>Theming and UI</h2><p>Your IDP should look and feel familiar to your users, and it should focus their workflows in order to improve efficiency.</p><h3>Logos and branding</h3><p>Roadie let’s you replace the logos and branding across the site so that you can expose an IDP that matches your brand and themes.</p><p>This image shows the dark-mode version of a theme for a fictional company called BeautyBox.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/670uquvHZffr8aHtvMxlam/1d71afd3c790d186c97a635a5f497828/customized-theme.png" alt="customized-theme"></p><h3>Sidebar</h3><p>Roadie’s sidebar configuration lets you cut the navigation down to focus it on the most important use cases in your company. If you don’t need self-service automation capabilities in your IDP then you can easily remove them to streamline things.</p><p>We also support reordering of sidebar sections, custom plugins in the sidebar, and deep links to important documentation.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/FWWZqbbVXeKfK8UNmIvUA/3afa5a4c7fa91f68d44e58a99e18e9ed/sidebar.png" alt="sidebar"></p><h2>Catalog customization</h2><p>The Roadie catalog offers control over how the catalog is collected in the first place, and how it is represented to end users who need to consume information in the catalog.</p><h3>Data model</h3><p>Many engineering organizations have custom terminologies they use to describe their particular software development life cycle. If your company uses Domain Driven Design, you might want top level concepts like “Value Streams” in your catalog. Other companies might want to support groups of people called “Tribes”. Probably every company will want a list of “Products”.</p><p>Roadie supports a flexible and customizable data model that lets you rename existing Kinds and create your own types of entities.</p><p>The screenshot below shows:</p><ol><li>Backstage’s <code>Group</code> kind renamed to “Teams”,</li><li>A custom entity type called “Products” has been created,</li><li>Unused kinds like Locations and Domains have been hidden from users.</li></ol><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5w6J9Mvur1n2DaemlttJPx/713ad2db34650cc2ec8929c21ce5d8b1/custom-data-model.png" alt="custom-data-model"></p><h3>Table columns</h3><p>Different IDP users have different jobs, and that means they need different views. Roadie’s tables can be customized on a per user basis, so that everyone can streamline the UI to match the workflows they need.</p><p>Users can turn columns on and off, resize columns and change the density of catalog tables. The best part is that these settings are persisted so you can easily dive back into your workflow.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6rs6SDhqgOCatiZf6wB2v6/9d013d5ef10d7db637fbc6c7473c0bad/custom-columns.png" alt="custom-columns"></p><p>The launch announcement covered <a href="https://roadie.io/blog/august-2023-product-updates/#new-catalog-page-preview">the benefits of Roadie’s catalog UI</a> in more depth.</p><h3>Entity providers</h3><p><a href="https://backstage.io/docs/features/software-catalog/external-integrations/#custom-entity-providers">Custom entity providers</a> are a way to provide entities into the catalog from external systems and existing data sources. They’re a critical integration point that helps ensure that your catalog will automatically stay up to date with external systems.</p><p>Roadie makes it easy to use custom entity providers via the <a href="https://github.com/RoadieHQ/roadie-agent">Roadie Agent</a>. The agent is a library that you can dump entities into in order to have them appear in the catalog.</p><p>Here’s a simple example where we use the Roadie Agent to dump an array of hardcoded entity metadata.</p><pre><code class="language-tsx">const { RoadieAgent, createRoadieAgentEntityProvider } = require('@roadiehq/roadie-agent');

const fakePayload = {
  type: 'full',
  entities: [
    {
      entity: {
        // Standard Backstage entity metadata omitted for brevity
      },
    },
  ],
};

const myEntityProviderHandler = async (emit) => {
  await emit(fakePayload);
}

RoadieAgent.fromConfig()
  .addEntityProvider(
    createRoadieAgentEntityProvider({
      name: 'testprovider',
      handler: myEntityProviderHandler
    }),
  )
  .start();
</code></pre><h3>Auto-discovery settings</h3><p>Any software metadata in YAML files should be auto-discovered by Roadie so that engineers don’t need to remember to register it via the UI.</p><p>Roadie supports <a href="https://roadie.io/docs/integrations/github-discovery/">custom auto-discovery settings</a>, including glob matching.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7bn1PECDmPvL8k3LG9oUQ6/18b5cf96c291e4ded2d5b8d12b6e0dff/custom-audodiscovery.png" alt="custom-audodiscovery"></p><h3>Group and User ingestion</h3><p>Roadie needs an organization chart in order to ensure that ownership of software can correctly be assigned to teams (aka. Groups), and to ensure that <a href="https://roadie.io/blog/rollups-tech-insights/">scorecard data is correctly rolled up</a> through the org chart.</p><p>The best way to make sure the org chart is up-to-date and correct is to automatically sync it from a HR tool. To facilitate this, Roadie supports built in integrations for <a href="https://roadie.io/docs/integrations/ms-graph-org-provider/#step-3-configure-your-microsoft-graph-org-ingestion-provider">Microsoft Graph</a> (also known as Microsoft Entry ID, or Azure Active Directory) and <a href="https://roadie.io/docs/integrations/okta/">Okta</a>. Other solutions can easily push Users and Groups into the catalog <a href="https://roadie.io/docs/details/entity-push-api/">via our API</a>.</p><h2>Plugins</h2><p>The real power of the Backstage ecosystem comes from its plugins. Roadie supports more than <a href="https://roadie.io/docs/integrations/">70 Backstage plugins</a> out of the box at the time of writing (September 2024). It typically takes us about a day to integrate a new open source plugin once it’s created by the community.</p><h3>Plugin configuration</h3><p>Most plugins have configuration options that Roadie admins need to be able to configure to suit their particular setup. This is easily done via admin panels we have built into the product.</p><p>Take the <a href="https://roadie.io/backstage/plugins/argo-cd/">Argo CD plugin</a> for example. Roadie supports two completely separate ways of integrating with Argo CD:</p><ol><li>The app locator method allows you to dynamically search and identify Argo CD registered applications from multiple Argo CD instances.</li><li>The proxy method is used to construct links to individual Argo CD applications.</li></ol><p>You can choose the best one for your needs, and even configure minute details like Namespace and Resource allowlists and blocklists.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6yC7BTsQPJxsXOC6nNOokP/b251bb4b30d5f9d63d8d8ce54c4432d7/argo-cd-plugin-config.png" alt="argo-cd-plugin-config"></p><p>The APIs of your Argo CD instances are probably not exposed on the public internet, but it’s still easy for Roadie to access them securely. We provide an <a href="https://roadie.io/docs/integrations/broker/">open-source broker</a> that runs in your infrastructure and makes an outbound connection to Roadie.</p><p>For other plugins, like the Kubernetes plugin and EKS, we support AWS Role Sharing as the preferred configuration method. Whatever your connectivity needs, we probably have a solution for it.</p><p>Of course Roadie’s responsibilities don’t begin and end with plugin configuration. We’re also:</p><ol><li>Vetting plugins for quality issues as we integrate them,</li><li>Scanning plugins for licensing issues (which your legal team will be happy about),</li><li>Documenting plugins so people know how to use them,</li><li>Updating plugins as their authors release new features.</li></ol><h3>Frontend plugins</h3><p>If there isn’t a community created plugin that suits your needs, or you need a plugin for a homegrown tool that only exists in your company, Roadie can support that too.</p><p>Users can write native Backstage plugins that run on Roadie and securely connect back to your private internal APIs via the broker.</p><p>The plugin development workflow supports the ability to locally host custom plugins and run them right on Roadie during development. This basically means that you can make a change to your plugin code in your IDE, and simply refresh Roadie to see your changes live.</p><p>We also support running production and development versions of the same custom plugin in Roadie at the same time, so you can iterate on your plugins without disrupting your users.</p><p>Productionizing a plugin is as simple as running the Roadie CLI against it to package it up and push it to Roadie.</p><pre><code class="language-bash">roadie plugin:build --location /my-custom-plugin-folder/ --host https://static-assets.roadie.so/&#x3C;my-tenant>/myCustomPlugin
</code></pre><p>The full range of Backstage frontend APIs are available to your custom plugins. You can query the catalog, check Role Based Access Control permissions, and even push analytics to see how people are using your plugin.</p><h3>Proxies</h3><p>Plugins frequently make use of proxies to upgrade requests with token authentication before forwarding them on to a backend service to retrieve data. Roadie users can create their own proxies inside the application. These proxies can be used to connect to third-party SaaS APIs like PagerDuty, or private internal APIs.</p><p>Here’s a user created proxy for SonarCloud.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/uNqwnrwzkXX4o1K7OoJLX/1b835014c9d6a63a6d20d6c16a649215/sonarcloud-proxy-config.png" alt="sonarcloud-proxy-config"></p><h2>Entity Pages</h2><p>Entity pages are the pages in the catalog which pull together information on a particular piece of software, a team, or an infrastructure resource.</p><h3>UI layouts</h3><p>Adding plugins to Roadie interfaces is a simple matter of picking them off a list. This process is the same for both community created plugins (including plugins created by Roadie, plugins created by Spotify, and official plugins from companies like PagerDuty and Snyk) and for custom plugins you create yourself.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5PVqyqFluRAkjxiEN4BEJz/b4c4f2d272318fb363f7184430e95647/add-card.png" alt="add-card"></p><p>Once you’ve chosen your plugin, simply drag-and-drop it on the page, and resize it to suit your needs.</p><p>Different types of entities need different layouts. The <a href="https://roadie.io/backstage/plugins/lighthouse/">Backstage Lighthouse plugin</a> makes sense for a website, but not for a Kubernetes cluster. Roadie organizes the plugins in this way so that it’s easy to create custom views without too much setup. Our built in role based access control ensures that admins have the power to set up layouts, without overloading regular users with too many options. Team pages are customizable in the same way, they just typically have different cards and widgets.</p><h3>Props on plugins</h3><p>Sometimes individual cards will have their own editable properties. Roadie supports that too. For example, this card displays a list of recent GitHub Actions CI runs and their status. But how many recent CI runs should be displayed, and what layout should be used? Well, you can set that with the editable props.</p><p>This screenshot shows the edit mode of the Dynatrace Backstage plugin.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7tkVE8VIsIytv7dTGiMRd1/9e55dc3736d6628645a7e3ef62665f3f/dynatrace-plugin-props.png" alt="dynatrace-plugin-props"></p><h3>Metadata cards</h3><p>We’re aware that many platform teams wish to display custom information in Roadie, but don’t necessarily have the TypeScript skills on hand to create custom plugins from scratch. To make this easier, Roadie includes no-code cards that can display information from custom metadata on the Entity.</p><p>Imagine we have a Backstage entity with some custom properties in its metadata.</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: sample-service
spec:
  type: service
  owner: group:roadiehq/engineering
  lifecycle: production
  custom:
    engineeringManager: user:dtuite
    productManager: user:samnixon87
</code></pre><p>How would we display this in the catalog? On Roadie, it’s as simple as adding the <code>EntityMetadataCard</code> and configuring it in a couple of clicks. The card even auto-links the Engineering Manager and Product Manager to their User pages in the catalog.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7Cl1juhJ6gq3Andvtd43ZG/80d2c55d6afe3d9c76858ce06fc5b0b1/custom-metadata-card.png" alt="custom-metadata-card"></p><h3>Home page</h3><p>Of course it’s not just pages in the software catalog that can be customized. Roadie users can also customize the <a href="https://roadie.io/docs/integrations/home-page/">Home page plugin</a> to display key information that’s relevant to them. This is the place to display your open pull requests, community news, your calendar, and other useful info that’s tailored to the logged in user’s experience.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/w3ZLsGikg07UecgfSjWxY/a5d7627225e1b54b7f0f30d5912689ac/Screenshot_2024-06-20_at_15.49.28.png" alt="Roadie-instance-homepage-example"></p><h2>Scaffolder</h2><p>Roadie scaffolder supports the full range of customization that you would expect from the Backstage scaffolder.</p><h3>Actions</h3><p><a href="https://roadie.io/docs/scaffolder/self-hosted-scaffolder-actions/#writing-a-template-using-custom-actions">Custom scaffolder actions</a> are a way to execute homegrown CLI’s or arbitrary code as part of scaffolder templates. These are executed within your own infrastructure, which provides an added benefit in that it makes it easy to send internal network requests.</p><p>Users can use the <a href="https://github.com/RoadieHQ/roadie-agent?tab=readme-ov-file#custom-scaffolder-action">Roadie Agent</a> (the same library used in the Custom Entity Providers section above) to register custom scaffolder actions with Roadie.</p><pre><code class="language-tsx">RoadieAgent.fromConfig(config)
  .addScaffolderAction(
    createRoadieAgentScaffolderAction({
      name: 'custom-action', // The name of the action as defined in Roadie
      handler: async (ctx) => {
        try {
          fs.writeFileSync(
            `${ctx.workspacePath}/test.txt`,
            'new file with new contents',
          ); 
          // Writing a new file into the shared workspace
        } catch (err) {
          console.error(err);  // Local logging on the Roadie Agent process
        }

        let count = 0;
        while (count &#x3C; 5) {  // Additional other actions that is wanted to be taken. This time looping for 5 seconds
          await new Promise((resolve) => setTimeout(resolve, 1000));
          count++;
          await ctx.log(`hello world`); // Sending a log message to be displayed to the end user
        } 
      },
    }),
  )
  // Add a second custom scaffolder action
  // .addScaffolderAction(...) 
  .start();
</code></pre><p>These can then be used in your templates like this:</p><pre><code class="language-yaml">steps:

  # This step executes on Roadie
  - id: fetchTemplate
    name: Fetch file
    action: fetch:template
    input:
      url: ./skeleton

  # This sends the workspace context to a self-hosted scaffolder runner
  # and executes the custom action we defined above. The payload passes
  # values to the self-hosted scaffolder runner and logs are written
  # back to Roadie.
  - id: invokeCustomAction
    name: Invoke a custom self-hosted action
    action: roadie:agent
    input:
      action: custom-action
      shareWorkspace: true
      payload:
        name: someValue
</code></pre><h3>Field extensions</h3><p><a href="https://roadie.io/docs/scaffolder/custom-fields/">Custom field extensions</a> give you the ability to build your own inputs and UI components for the scaffolder. You write them using the <a href="https://backstage.io/docs/features/software-templates/writing-custom-field-extensions/">normal Backstage API</a>, and push them into Roadie using the custom plugins pipeline.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2lHAUNIJCp4mWEECgnt4js/89dff377a4f75504a0502405005d86ae/Screenshot_2024-06-18_at_12.05.42.png" alt="Customize-the-Roadie-scaffolder"></p><p>Once registered, they can be used in Scaffolder templates by matching the name of the <code>ui:field</code> prop.</p><pre><code class="language-yaml">parameters:
  - title: Fill in some steps
    required:
      - name
    properties:
      name:
        title: Name
        type: string
        description: My custom name for the component
        ui:field: MyCustomInput
</code></pre><h2>Authentication &#x26; Authorization</h2><h3>Single-Sign-On (SSO) setups</h3><p>Roadie provides native support for any SSO provider you can imagine. Okta, Microsoft Entra ID, Google, AWS, Ping Identity…. you name it, we’re probably already using it in production.</p><p>Setting this up is a simple matter of sending our docs to your IT team, or whoever controls your SSO setup. We will then work directly with that team to get your authentication provider configured.</p><p>Once set up, your IT team can grant and revoke access to Roadie without being blocked by us. They have full control over the process.</p><h3>Roles</h3><p>Customers who have purchased our role-based access control add-on can define custom roles which can then be assigned to groups of users.</p><p>In this example, you can create a role which can only read the catalog. It wouldn’t be able to execute scaffolder templates or register anything in the catalog.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6WAxROTDuksZKP9Hjngzyp/6319b1104af0474c5a2fb93c453d813a/custom-roles.png" alt="custom-roles"></p><p>Roles can also be assigned via your Identity Provider like Okta or Microsoft Entra ID. This is especially useful when your IT team wishes to manage access and roles in a single place.</p><h3>Permissions/policies (coming soon)</h3><p>When role-based access control gets really powerful is with the ability to define custom permissions which can then be combined in roles. For example, perhaps you want to hide Component entities which are tagged with <code>sensitive</code> from everyone except the security team.</p><p>We’re working on this ability at Roadie and expect it to roll out in 2024.</p><h2>Tech Insights (Scorecards)</h2><h3>Custom data sources</h3><p>Data Sources are recurring jobs which gather data that can then be used to write checks against.</p><p>For example, Roadie comes with a built-in GitHub Settings Data Source which records facts about each repository in the catalog, like whether or not branch protection is turned on, or whether or not force pushes are allowed.</p><p>Users can also create their own Data Sources from scratch. At the time of writing, 7 different types of Data Sources are supported. Here are some examples:</p><ol><li>HTTP Data Sources hit third-party APIs and pick values from the JSON response</li><li>Component Repository File Data Sources inspect files in the repository associated with each Component and record values from them.</li><li>Push Based Data Sources receive webhook events and store them for processing.</li></ol><p>Creating Data Sources is a simple matter of filling out a few fields. You don’t need to write any TypeScript or YAML. Data Sources support advanced features like metadata variables, JSONata processing, inline testing, and error handling.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7k85THcrcjLAflDySczrQ4/5fceb794b799ee77bbca4169c7d53d0b/Screenshot_2024-06-18_at_12.43.37.png" alt="Custom-data-sources"></p><h3>Custom checks</h3><p>Checks inspect a value created by a Data Source and give it a pass or fail for each entity it applies to.</p><p>This check ensures there is a README.md in the repository associated with each piece of software.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5QIfu5EIA3phMK07hAn8pJ/4650b11efa61342810c045a201acb4b8/Screenshot_2024-06-18_at_12.45.10.png" alt="Custom-checks"></p><p>Advanced features like boolean logic and live testing of the check are supported. Checks can link to documentation when they fail, or even link to a scaffolder template that can help the owner of a service to fix the failure.</p><h3>Custom scorecards</h3><p>Users can create their own scorecards that apply to the software in the catalog in order to communicate best practices and expectations to teams.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/75CxdnNECkNEfBUzcTRjDM/c4885ac9b50147d9aacea06cc7e3db80/scorecard.png" alt="scorecard"></p><p>Each scorecard is a user defined collection of checks that shows up in reporting, in the catalog, and on team pages. Scorecard data is rolled-up through the catalog so that you can dip in at any level of the org chart and see aggregated data from below that point.</p><h2>And there’s more coming</h2><p>We’re constantly adding to the ways you can customize Roadie. If you want to stay up to date with the latest customizations, <a href="https://roadie.io/backstage-weekly/">subscribe</a> to our newsletter to hear what’s coming.</p><hr><p><em>Image by GrumpyBeere from Pixabay</em></p>
]]></content:encoded></item><item><title><![CDATA[Backstage and Cost Insights: shifting cloud costs left ]]></title><link>https://roadie.io/blog/backstage-and-cost-insights-shifting-cloud-costs-left/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-and-cost-insights-shifting-cloud-costs-left/</guid><pubDate>Thu, 26 Sep 2024 04:00:00 GMT</pubDate><description><![CDATA[As cloud costs grow, organizations are finding that empowering engineers to take ownership of cloud spend and to optimize it as part of the development process leads to more efficient resource usage. We investigate the growing phenomenon of shifting cost awareness left and how it helps teams take ownership of cloud spend and make proactive decisions to optimize efficiency and delivery.]]></description><content:encoded><![CDATA[<h2>Backstage and Cost Insights: shifting cloud costs left</h2><h3>Cloud computing - the double-edged sword</h3><p>Managing cloud costs is fast becoming a strategic priority. While cloud service providers (CSPs) like <a href="https://cloud.google.com/">Google Cloud</a>, <a href="https://azure.microsoft.com/">Microsoft Azure</a>, and <a href="https://aws.amazon.com/">Amazon Web Services</a> offer incredible flexibility and scalability, they can quickly lead to ballooning costs if left unchecked. We’ve all <a href="https://x.com/GergelyOrosz/status/1542449611440328704?lang=en">seen</a><a href="https://news.ycombinator.com/item?id=8927083">the</a><a href="https://newsletter.goodtechthings.com/p/the-cloud-billing-risk-that-scares">cloud</a><a href="https://cloudsoft.io/blog/the-curious-case-of-the-spiralling-aws-lambda-bill">billing</a> horror <a href="https://www.troyhunt.com/how-i-got-pwned-by-my-cloud-costs/">stories</a>, but even for companies that have a good grip on their cloud costs, it’s easy to get into a situation where cost growth begins to outstrip associated revenue growth.</p><p>Such a position can undermine profitability, growth, and operational flexibility, ultimately putting the organization on an unsustainable trajectory. Beyond simple expense, runaway costs can mask inefficiencies in architecture, resource allocation, and service usage, creating a vicious cycle of wasted consumption. Left unmanaged, this can undermine even the most successful products, ultimately affecting their long term viability.</p><p>A notable case in point is Spotify, who <a href="https://redmonk.com/videos/a-redmonk-conversation-shifting-cost-optimisation-left-spotify-backstage-cost-insights/">encountered this scenario early on</a> which forced the company to rethink its approach to managing cloud costs. Spotify realized that to get a handle on this, they needed to empower their engineers - the people closest to the actual cloud usage - to own the costs and be responsible for optimizing them. Rather than a top-down approach mandating a cost reduction, Spotify opted to make the engineers themselves responsible for their own cloud costs, an excellent example of the phenomenon of shifting cloud costs left.</p><h3>Shifting cost awareness left: a cultural change</h3><p>Shifting cost awareness left is a concept in effective cloud cost management that is gaining traction. Essentially a form of embedding cost considerations earlier in the engineering process, not as an afterthought for finance teams, that’s exactly what Spotify did, according to <a href="https://redmonk.com/videos/a-redmonk-conversation-shifting-cost-optimisation-left-spotify-backstage-cost-insights/">James Governor at RedMonk</a>: “Spotify engineering teams are used to having a lot of autonomy, so the company couldn’t just introduce new cost guardrails as a top down concern. Therefore the Cost team tried to foster a culture where optimization would be fun.”</p><p>Shifting cost awareness left isn’t just about giving engineers tools, it’s a cultural shift where engineers take ownership of the financial impact of their work and are invested in cost optimization. When engineers are empowered with real-time cost insights, they become agents of change. Instead of waiting for an end-of-month cloud bill to identify costly inefficiencies, engineers can make informed decisions in real-time. This shifts cost management left from a reactive process driven by finance to a proactive, engineering-led initiative.</p><p>There’s a philosophical value here that goes beyond dollars and cents. By making cost awareness a core part of the development lifecycle, companies can drive a sense of ownership and even healthy competition among teams. Engineers begin to actively look for ways to optimize their cloud usage, often competing with each other to drive down costs. This culture of cost ownership not only saves money but also leads to better architecture and more efficient systems overall.</p><p>According to Janisa Anandamohan, Spotify Senior Product Manager, Cost Engineering:</p><blockquote><p>We know engineers are natural optimizers when it comes to reliability, security, performance, et cetera. And now we’re telling them, hey, add costs into the mix. And they were super excited about that. We had many teams that were just able to tweak their services and data pipelines and to make them more efficient. And we know efficiency doesn’t just help cost, but helps reliability and performance as well. So we were getting double, triple wins along the way.</p></blockquote><h3>Spotify’s solution: Cost Insights</h3><p>Spotify’s approach to shifting managing cloud costs left took the form of a cost optimization and visualization plugin called Cost Insights, on their own internal version of Backstage. The concept is simple: provide engineers with the tools to visualize and manage the costs associated with the services they build, all within the internal platform they are already using - their internal developer portal. By tying cloud costs to specific entities in the Backstage catalog, teams are empowered to take control of their spending and optimize resources more effectively.</p><p>This approach worked well for Spotify, partly due to their internal engineering culture, but mostly as a result of the significant engineering effort invested into getting Cost Insights integrated into their developer platform. While they have since <a href="https://github.com/backstage/community-plugins/tree/main/workspaces/cost-insights/plugins/cost-insights">open-sourced a pared-down version of the Cost Insights plugin</a> back to the community, most organizations attempting to replicate their success will find it challenging, even when using their plugin. This is largely because of the technical complexity required, and because the current Spotify Cost Insights plugin, like many of the other aspects of the Backstage framework, is far from an out-of-the-box solution.</p><p>Just how much work would it take an engineering team, even working with the open-sourced Cost Insights plugin to implement a working solution? It’s a significant lift; here’s what they’d have to do:</p><ol><li><strong>Cloud billing integration</strong>: Set up a mechanism such as an API to pull cost data. This involves not only access configuration but also understanding the data schema of your cloud provider, which can be complex and time-consuming.</li><li><strong>Backend development</strong>: Develop a backend to fetch, process, and store the billing data. This step requires designing a database schema that can handle potentially large amounts of data efficiently and setting up a server to run this service.</li><li><strong>Cost Insights API implementation</strong>: Implement the API endpoints required by the Cost Insights plugin. This includes endpoints for fetching cost data, projecting future costs based on current trends, and breaking down costs by services, projects, or departments.</li><li><strong>Frontend integration</strong>: Integrate the Cost Insights plugin into your Backstage instance. This may involve customizing the plugin to fit into your organization’s Backstage environment and ensuring it interacts correctly with your newly developed backend.</li><li><strong>Testing and validation</strong>: Thoroughly test the plugin with real data to ensure accuracy. Validate the cost projections and insights with your finance team to ensure they align with actual expenditures.</li><li><strong>Maintenance and updates</strong>: Continuously update the backend and frontend as cloud providers change their APIs or pricing models, and as new features or fixes become available in the Cost Insights plugin.</li></ol><p>For a small team of three to four engineers, this could take anything from several months to a year to fully implement. This is a big lift for most organizations, which means that for many, this complexity keeps the solution remains out of reach. While the concept of cost transparency and shifting cost awareness left resonates, the time and effort required to implement and maintain a working Cost Insights tool deters widespread adoption.</p><p>As if it wasn’t hard enough, all of the work above assumes an organization is using only a single CSP in their stack. In reality, any organization that is using multiple CSPs (say, AWS and GCP) faces a thorny additional hurdle: the lack of data homogenization and normalization from CSPs. Each CSP often presents cost data in a different format, making it extremely challenging for organizations to consolidate and interpret this data in a unified manner if they’re using multiple CSPs.</p><p>Fortunately, the recent introduction of the <a href="https://focus.finops.org/">FOCUS (FinOps Open Cost and Usage Specification)</a> standard has changed the landscape for the better. Developed as a collaborative effort by the <a href="https://www.finops.org/introduction/what-is-finops/">FinOps Foundation</a>, the FOCUS standard aims to normalize the cost and usage data across different CSPs, providing a common format that makes it easier for organizations to integrate and analyze their cloud spending. This standardization is a crucial enabler, simplifying the data integration process and reducing the overhead associated with translating disparate data formats.</p><h3>Introducing: Roadie’s Cost Insights Plugin</h3><p>The adoption of the FOCUS standard significantly simplifies the integration of cost data across various cloud platforms. However, the complexity and effort of setting up Cost Insights is still high - which is where we at Roadie saw an opportunity to help. We recognized that the idea of empowering engineers to manage cloud costs was spot on, but the solution needed to be simpler, more accessible, and ready to use out of the box. As such, we’re in the process of refining and enhancing the existing Cost Insights plugin, making it significantly easier to deploy and use right out of the box.</p><p>Our enhanced <a href="https://roadie.io/docs/integrations/cost-insights/">Cost Insights</a> plugin builds on Spotify’s version, but aims to reduces the setup friction, allowing engineering teams to access powerful cloud cost insights immediately - no custom backend development required. This streamlined approach means engineers can spend more time optimizing their services and less time managing the infrastructure for cost tracking.</p><p>With Roadie’s plugin, teams can quickly see which services are driving up their cloud costs, how those costs have trended over time, and what actions can be taken to reduce unnecessary spending. The key here is not just providing data but making it accessible and actionable, so engineers can immediately use it to make decisions.</p><p>Roadie’s Cost Insights plugin takes full advantage of the FOCUS standard, ensuring that the data we present is both accurate and comparable across providers. By eliminating discrepancies in how cost data is reported, the plugin enables engineers to gain a clear understanding of their cloud usage without needing to reconcile different data formats.</p><p>The ability to assign costs to specific entities within a Backstage catalog is what we’re most excited by. In traditional cloud billing tools, costs are often presented at a very high level, making it hard to drill down and see what’s truly driving the expenses. By associating costs with individual services and the teams responsible for them, the plugin makes cloud costs real and relatable for developers. When a team sees exactly how much their service is costing, they can no longer ignore it or avoid responsibility - it becomes part of their job to optimize those costs.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2lkzBoQ0KD8G6n69u1sQY5/6ae8f0ea2d90f6dacc52c97a3c6f64e0/roadie-cost-insights-preview.png" alt="The Roadie Cost Insights dashboard displaying cloud cost by product"></p><p><em>The Roadie Cost Insights dashboard displaying cloud cost by product</em></p><h3>How it works</h3><p>Our Cost Insights plugin, currently in internal beta, integrates seamlessly with a Roadie-managed Backstage instance to provide teams with an out-of-the-box solution for tracking and managing cloud costs. Key features include:</p><ul><li><strong>Easy setup</strong>: No complicated setup required - simply connect to your cloud service provider to Roadie Cost Insights via your cloud administration settings or via a secure broker, and allow the Roadie Cost Insights compatibility layer to take care of all the data modeling and translation.</li><li><strong>Project and group-based cost tracking</strong>: Track cloud costs at the entity, team, or product level, and drill down into specific projects or groups for more detailed insights.</li><li><strong>Trend visualization</strong>: View cloud cost trends over time, breaking down data by dimensions like products, services, and regions to identify patterns and anomalies.</li><li><strong>FOCUS standard integration</strong>: The plugin leverages customer-provided FOCUS data, ensuring consistency and like-for-like comparability across cloud providers.</li><li><strong>Actionable insights</strong>: Dashboards help engineers take ownership of their services, and immediate action by optimizing those services or reducing over-provisioned resources.</li></ul><p>For example, a team might notice that the costs of a particular product are rising faster than expected. With Roadie’s plugin, they can drill down into the cost breakdowns, identify high-cost services, and take corrective action - such as optimizing resource allocations or reducing usage.</p><p>Roadie Cost Insights also supports multiple regions, meaning a DevOps team managing multiple cloud services in different regions could use Roadie’s Cost Insights to track cloud costs by region. For instance, they could spot that their compute resources in one region are significantly more expensive compared to others. Using the plugin’s trend visualization and breakdown capabilities, the team can identify the services or instances driving up the cost and adjust their architecture to either reduce or move resources to more cost-effective regions.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1HujxcKSbrPOdrga7Qfbvo/464c993068096fe762c5967d6655c4d8/roadie-cost-insights-architecture.png" alt="Roadie Cost Insights Architecture"></p><p><em>Roadie Cost Insights Architecture - note that the user configuration overhead is limited to installing the Cost Insights client on the cloud service provider infrastructure (or the use of a <a href="https://roadie.io/docs/integrations/broker/">broker client</a>) while Roadie takes care of all the backend setup.</em></p><h3>The future of Roadie’s Cost Insights Plugin</h3><p>We believe that by simplifying cost management and putting it directly in the hands of engineers, companies can not only save money but also drive innovation, efficiency, and alignment across their teams.</p><p>Cloud cost management is no longer just a financial issue—it’s an engineering one. By providing real-time insights into cloud spending and shifting cost awareness left, Roadie’s Cost Insights plugin helps companies empower their engineering teams to take ownership of their cloud usage. This leads to smarter decisions, lower costs, and more efficient systems.</p><p>While Cost Insights is still currently in an internal beta, if you’re interested in learning more or becoming an early design partner, <a href="https://www.linkedin.com/company/43197350/">get in touch with us today</a>. We’d love to work together to bring the future of cloud cost management to your team.</p><hr><p><em>Title image by Brian Penny from Pixabay</em></p>
]]></content:encoded></item><item><title><![CDATA[The Ultimate Guide to Backstage Software Catalog Completeness]]></title><link>https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/</link><guid isPermaLink="false">https://roadie.io/blog/3-strategies-for-a-complete-software-catalog/</guid><pubDate>Sun, 22 Sep 2024 23:00:00 GMT</pubDate><description><![CDATA[Filling the catalog of your Internal Developer Portal doesn't need to be an insurmountable task. We've collected 3 strategies you can leverage and 12 individual tactics that our customers have used to reach 90% catalog completeness or higher.]]></description><content:encoded><![CDATA[<p>Internal developer portals (IDPs) like Backstage and Roadie are, at their core, software catalogs. The software catalog is the backbone upon which much of the other functionality hangs, and building a complete catalog is vital to the success of the IDP project.</p><p>Without a complete catalog, an IDP cannot fulfill it's primary purpose: improving developer productivity. It cannot drive improved discoverability if the software that teams are trying to discover is not in there. It cannot help to measure the standardization or security posture of software it does not know about.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1I3jY1seLqybnUXxPtTPHn/7c863cdfb25add62d4c8dd5a684794f1/catalog.png" alt="Closeup of Roadie catalog data"></p><p>This article is a comprehensive guide to catalog completeness in Backstage. Roadie is built on top of Backstage, and so all of the same lessons apply. We cover why it's important, how to measure it, and how to achieve it.</p><p>First, let's learn why you want to build a complete catalog in the first place.</p><h2>What is a software catalog?</h2><p>A software catalog is a centralized registry that tracks all the software assets, services, APIs, and resources within an organization. Think of it as a searchable directory that answers fundamental questions: What software do we have? Who owns it? Where does it run? How do systems connect to each other?</p><p>Unlike a traditional service catalog (which typically focuses on IT services available to end users), a software catalog is developer-focused. It tracks the technical components that engineering teams build and maintain: microservices, libraries, data pipelines, APIs, and infrastructure resources. A service catalog might list "Email Service" as something employees can request; a software catalog tracks the underlying email-gateway microservice, its API endpoints, the team that owns it, and its dependencies.</p><p>Modern software catalogs go beyond simple inventory management. They serve as the foundation for improving software discoverability across development teams, enabling compliance monitoring at scale, and providing secure management of software assets. Platforms like Roadie and Backstage combine software catalog functionality with additional capabilities like scaffolding templates, technical documentation, and scorecards that measure software maturity.</p><h2>Why building a complete catalog is important</h2><p>The primary reason companies deploy an IDP is because they want to make developers more efficient. A big part of this is helping teams answer basic questions about the software around them. They want to make it easy to answer questions like "do we have a geocoding service?", "which team owns the checkout service?" and "where is the API spec for the users service?". These questions can only be answered by the IDP if the geocoding service, checkout service, and users service are registered there in the first place.</p><p>A complete catalog is essential when using an IDP to <a href="/blog/tech-insights-for-roadie-backstage/">measure the maturity of the software that teams are producing</a>. Applying scorecards to software in the catalog of the IDP can be a great way to determine the security posture of each production service. But scorecards can only apply to software which is actually in the catalog.</p><p>The software catalog is also the fundamental unit of navigation in the IDP. The lack of a software catalog is the reason that wikis like Confluence or Notion cannot solve the discoverability problem that IDPs solve. In a wiki, content is organized into pages, which are intentionally unstructured and flexible. In an IDP, content is organized by Component (think "piece of software") and is structured so that the same information is available for each Component.</p><h2>How software catalogs improve discoverability and compliance</h2><p>A well-implemented software catalog addresses two persistent challenges that enterprise development teams face: finding existing software assets and ensuring those assets meet organizational standards.</p><h3>Discoverability across development teams</h3><p>When organizations scale beyond a handful of engineering teams, tribal knowledge breaks down. New engineers don't know what services already exist. Teams duplicate functionality because they can't find existing solutions. The software catalog solves this by providing a single, searchable interface where developers can discover services, APIs, libraries, and resources across the entire organization.</p><p>Effective discoverability requires more than a simple list. The catalog needs to track relationships between components (which services depend on which APIs), ownership information (who to contact when something breaks), and metadata like documentation links, deployment environments, and health status. Internal developer portals like Roadie provide customizable features for organizing software services and APIs, making it easy for teams to find what they need.</p><h3>Compliance monitoring and access control</h3><p>For mid-sized tech companies and enterprises, compliance monitoring at scale is difficult without a complete software catalog. You can't measure what you don't track. A software catalog enables automated compliance checks: Does every production service have an owner? Are all APIs documented? Do all services meet security baseline requirements?</p><p>Roadie's <a href="/product/tech-insights/">Tech Insights</a> feature builds on the software catalog to provide scorecards that measure software against organizational standards. Teams can track metrics like test coverage, dependency freshness, and security compliance across hundreds of services. This transforms compliance from a manual audit process into continuous, automated monitoring.</p><p>Access control also benefits from catalog completeness. When every service has a defined owner and clear metadata, organizations can implement fine-grained permissions for who can view, modify, or deploy each component. This is particularly valuable for managing software assets securely across distributed teams.</p><h2>How to measure catalog completeness</h2><p>Having a complete catalog doesn't necessarily mean that every piece of software is in the catalog. Most organizations have their share of abandoned code. It may have been created during hackathons or for brief experiments that never saw production usage. Having all of this stuff in the catalog can create clutter and noise.</p><p>There may also be parts of the organization for whom it doesn't make sense to be in the catalog. We work with companies which have large hardware divisions who write code for embedded devices. They don't necessarily follow the DevOps lifecycle and thus are sometimes intentionally omitted.</p><p>Production software is the most important stuff to have in the catalog. It's the software that most engineers work with most of the time. Production software will have the most frequently referenced APIs and docs and it's much more important to have an understanding of the maturity of the software that runs in production environments, since it has the most attack vectors.</p><p>For this reason, we frequently see customers create a list of software which is deployed to production by referencing ArgoCD or some other deployment tool. They then compare this against software in the catalog, frequently by <a href="/docs/api/catalog/">accessing the Roadie APIs</a> or by using our CSV export functions.</p><p>The next best bet is to compare number of pieces of software (aka. Components) in the catalog against the number of active, unarchived repositories in your source code solution. This will never give you a perfect answer, because "pieces of software" don't necessarily map perfectly one-to-one to repositories (monorepos etc), but it's a good start.</p><p><strong>How Roadie Helps</strong></p><p>At Roadie we give every customer a chart which shows the percentage of their active (non-archived and received a commit in the past 12 months) repositories against the number of Components in the catalog.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1ARgOlDcPMyAFeoRNya90q/0d38708de2c23c6b644b3f33d3a95146/Screenshot_2024-09-20_at_13.54.53.png" alt="Catalog completeness formula"></p><h2>Roadie customers can achieve high catalog completeness</h2><p>Achieving a high level of catalog completeness is definitely possible.</p><p>The majority of established Roadie customers are happy with their level of catalog completeness and we have many customers who have a catalog completeness level (measured as a percentage of active repositories which are registered in the catalog) which is above 80%.</p><p>Here's a chart showing catalog completeness for <a href="https://uplight.com/">Uplight</a> who onboarded in September 2023. Four months later they were at 88% catalog completeness, with more than 600 components in their catalog.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/N1n0O0iqaKNeWu1bjhTAc/39ecee5eac4d8dae1729ae981eb82d8d/uplight_catalog_completeness.png" alt="uplight catalog completeness"></p><p>Here's another Roadie customer who increased their catalog completeness from 40% to 90% over a period of four months. They now have more than 750 components registered.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4ZGSECkA66xHAQgGV2FKgq/801c0f364b46e4c829c88b43547e7bcc/snyk_catalog_completeness.png" alt="customer catalog completeness"></p><p>There are countless more examples like this amongst Roadie customers. The goal of this post is to help all companies achieve the same results.</p><h2>Strategies for building a complete catalog</h2><p>There are multiple strategies that can be used to build a complete catalog. The strategy that is right for your company can depend on factors like the size of the company and the enthusiasm of the developers to have an IDP.</p><p>You may also want to run multiple strategies in parallel, or start with one in order to get the bulk of software into the catalog, and switch to another to get closer to full catalog completeness.</p><p>Think about expanding catalog completeness like the layers of an onion. Start with the most enthusiastic early adopters and get them onboard. Then use them as an example as you expand out to the rest of the organization.</p><p>We recommend you import users (aka. employees) and teams (called Groups in Backstage nomenclature) into the catalog before creating any software components. You will ideally want to assign ownership of each component to a team as you import it. This task is much easier if all of your teams already exist in the catalog.</p><h3>If you don't know which strategy to choose…</h3><p>Experiment! You don't need to roll out to the whole organization in one go. Pick some friendly teams, run one strategy on each team, and analyze the results. Did you end up with software in the catalog? What is the feedback from each team? Where did they get stuck? Take this feedback and use it to improve the process before expanding out to more teams.</p><h3>Strategy 1: Centralized automated</h3><p>This strategy involves connecting to a chokepoint in the organization in order to push software metadata into the catalog programatically. It does not require anyone to write <code>catalog-info.yaml</code> files.</p><p>Frequently, the initial chokepoint will not have all of the information required to build a useful catalog. When this happens, the software metadata must be enriched after the fact.</p><p>Depending on your tech stack and permission settings, following options may be viable:</p><ol><li>An ArgoCD instance which does deployments to the production environment. It has knowledge of the most critical software in the org (software that goes to production) and can thus can be a good starting point for the catalog.</li><li>A Helm chart which is used by a large percentage of the deployable software in the organization.</li><li>A centrally owned CI job or build tool which can be written by the centralized team and applied to software which is owned by other teams in the organization. For example, Lunar Bank have talked about how <a href="https://www.youtube.com/live/a3UjbRse8yk?si=GXheV8QLzHTfPO4G&#x26;t=1307">they populate their catalog from their build tool called shuttle</a>.</li><li>A legacy software catalog, developer portal or spreadsheet.</li></ol><p><strong>Pros</strong></p><ol><li>The software catalog can quite quickly be bootstrapped to a highly complete state.</li></ol><p><strong>Cons</strong></p><ol><li>The ownership link between software and teams is not immediately established.</li><li>Companies with no source of truth or a heavily fractured deployment ecosystem may not be able to implement this strategy.</li><li>The product teams will be less well educated on the value of the IDP when the process is finished.</li></ol><p><strong>Tactics to prioritize in order to succeed with this strategy</strong></p><ol><li>Use custom entity providers</li><li>Make catalog presence a requirement for deployment</li></ol><h3>Strategy 2: Centralized manual</h3><p>This strategy tries to avoid asking the individual product teams to do work. Instead, the centralized team takes it upon themselves to populate the catalog. They may do this in collaboration with the product teams, but they likely won't ask them to write any YAML.</p><p><strong>Pros</strong></p><ol><li>This strategy is likely to be faster than the distributed manual strategy, especially for companies which don't have thousands of microservices.</li></ol><p><strong>Cons</strong></p><ol><li>The product teams will be less well educated on the value of the IDP when the process is finished.</li><li>It may be difficult for the centralized team to gather enough data about each individual software component.</li><li>It becomes the centralized teams job to manually keep the catalog up to date.</li></ol><p><strong>Tactics to prioritize in order to succeed with this strategy</strong></p><ol><li>Store software metadata in a single repository</li><li>Open scripted pull requests</li><li>Customize the catalog nomenclature</li></ol><h3>Strategy 3: Distributed manual</h3><p>This strategy involves asking all of the individual product teams to register the software that they own in the catalog.</p><p>Typically, the central team who own the IDP will meet with and educate the product teams on the value of the IDP, either on an individual basis, or in larger groups. The central team will provide tools and materials to the product teams in order to teach them what they need to do to get their software into the catalog (typically create a <code>catalog-info.yaml</code> file), and give them clear steps to take in order to achieve catalog completeness.</p><p><strong>Pros</strong></p><ol><li>Product teams are more likely to feel a sense of ownership over their software in the catalog because they put it there in the first place.</li><li>Product teams have an opportunity to learn about features of the IDP as they are registering their software. They may then choose to use features in the IDP such as technical documentation.</li></ol><p><strong>Cons</strong></p><ol><li>This strategy requires a lot of work from the central team in order to yield high catalog completeness. It will take time. They will need to educate and continually follow up with product teams throughout the company.</li></ol><p><strong>Tactics to prioritize in order to succeed with this strategy</strong></p><ol><li>Give teams a scaffolder template to register software</li><li>Share catalog completeness numbers publicly</li><li>Tie catalog completeness to a wider initiative</li><li>Write custom plugins to create value</li></ol><h2>Tactics for building a complete catalog</h2><p>The tactics you need to use will depend on the strategy you are implementing. This is a full list of all the tactics we know. Please refer to the strategies above in order to know which tactics to choose.</p><p>Some tactics will apply regardless of the strategy that is chosen. They are:</p><ol><li>Customize the catalog taxonomy</li><li>Move onboarding docs into Roadie</li><li>Present on the value of Roadie to people managers</li><li>Tie catalog completeness to a wider initiative</li></ol><p>Each of the tactics mostly fall into one of the following categories:</p><ol><li>Reduce friction for developers who want to get their software into the catalog.</li><li>Create incentive for developers to put their software into the catalog.</li><li>Educate developers on how to use the catalog.</li></ol><h3>Give teams a scaffolder template to register their software</h3><p><strong>Category:</strong> Reduce friction</p><p>This involves writing a scaffolder template that teams can use inside Backstage in order to create a <code>catalog-info.yaml</code> file in their repositories.</p><p>The scaffolder template will ask the user to fill out some information about their software, typically by picking from pre-defined values, and will open a pull request against a repository when finished. Once the user reviews and merges the pull request, auto-discovery will pick up the <code>catalog-info.yaml</code> file and populate the catalog.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6JZlwjWtlviRJZ447aQSS/981d6528caffed236d0419e717fcf439/scaffolder_template_for_completing_catalog" alt="scaffolder template for completing catalog"></p><p><strong>Benefits</strong></p><ol><li>Individual developers don't need to understand the (long) <a href="https://backstage.io/docs/features/software-catalog/descriptor-format">Backstage YAML API spec</a> in detail.</li><li>The owners of the IDP can use the template to constrain the "type", "lifecycle" and other properties that is assigned to each software component. This puts guardrails in place that will help create <a href="/blog/improving-backstage-performance/">more consistency in the catalog</a>. Catalog consistency is important, and <a href="/blog/kinds-and-types-in-backstage/">will help you avoid problems in future</a>.</li><li>The form can integrate with external APIs to pull in sensible options. For example, in the screenshot above, the "Component Owner" is a selection of all the engineering teams in the company. The user doesn't need to type an exact string.</li></ol><p><strong>How Roadie helps</strong></p><p>Roadie provides a <a href="https://github.com/roadie-demo/getting-started/tree/main/scaffolder/register-new-component">starting point for a software registration template</a> in the getting-started repo. Customers can fork it into their own GitHub org, edit it to meet their needs and import it into their own Backstage instance.</p><h3>Archive unused repositories</h3><p><strong>Category:</strong> Reduce friction</p><p>If our measure of catalog completeness is:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1ARgOlDcPMyAFeoRNya90q/0d38708de2c23c6b644b3f33d3a95146/Screenshot_2024-09-20_at_13.54.53.png" alt="Catalog completeness formula"></p><p>then we can increase catalog completeness by archiving old repositories that nobody is using.</p><p>It may sound somewhat silly, but every organization has abandoned hackathon projects and old test repos that are unused and simply causing clutter. By archiving them, we clean up our source code management tool while improving catalog completeness.</p><p>Believe it or not, Roadie has one customer who increased catalog completeness from 45% to 75% just by archiving repositories.</p><p><strong>How Roadie helps</strong></p><p>Our catalog implementation ensures that Components are removed from the catalog when the repo they reference is archived.</p><h3>Customize the catalog nomenclature</h3><p><strong>Category:</strong> Reduce friction</p><p>By mapping the Kinds of entity that show up in the catalog to familiar concepts in your company, you can create instant recognition for developers who land in the catalog.</p><p>For example, if your company has a concept of "Valuestream" then make this front and center in the catalog so that users instantly understand what they're looking at.</p><p><strong>Benefits</strong></p><ol><li>Users can orient themselves quickly and get value rapidly.</li></ol><p><strong>How Roadie helps</strong></p><p>Roadie customers can use our admin interfaces to customize and rename the core catalog concepts, and create completely new ones.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2c7gYTA1or5GRWYqmu2C2t/236d2036ba43321d81b0acd51a9ab14a/customize_the_catalog.png" alt="customize the catalog terminology"></p><h3>Write custom plugins to create value</h3><p><strong>Category:</strong> Create incentive</p><p>Custom plugins provide value for product teams by giving them faster ways to perform bespoke workflows inside the catalog. By creating custom plugins, a central team can incentivize developers to add their software into the component.</p><p>For example, <a href="https://youtu.be/TZ6-SpoFVeY?si=l-rz2H6HsPzkln0W&#x26;t=369">Lunar Bank have custom plugins for dealing with dead letters in RabbitMQ</a>. These plugins are regularly useful for developers. This causes them to visit the catalog to use the plugin.</p><p><strong>Benefits</strong></p><ol><li>Custom plugins are quite simple to produce and can quickly create value for software engineers.</li><li>Custom plugins unlock tailored value for engineers to help them do things more quickly or more easily than they otherwise could. They sometimes even allow them to perform a task that they cannot do at all outside of Roadie or Backstage.</li></ol><p><strong>How Roadie Helps</strong></p><p>Roadie provides an interface for registering and managing custom plugins. It also facilitates live reloading of custom plugins, the ability to run multiple versions of a plugin side by side, and a <a href="https://github.com/RoadieHQ/software-templates/blob/main/scaffolder-templates/roadie-plugin/template.yaml">scaffolder template</a> to bootstrap a custom plugin monorepo. Custom plugins on Roadie can securely connect back to a private network to load data from internal APIs. <a href="/docs/custom-plugins/getting-started/">Check out our docs to learn more</a>.</p><h3>Move onboarding docs into Roadie</h3><p><strong>Category:</strong> Educate</p><p>Engineer onboarding docs are typically used to help a new engineer to set up their environment and get to productivity quickly. Expedia Group, Spotify and other Backstage adopters have successfully used TechDocs and scaffolder templates to speed up engineer onboarding and help new engineers become familiar with the IDP on day one. The exact same tactic can be deployed on Roadie.</p><p>Expedia Group put 850+ engineers through their Backstage based bootcamp in 2022. They discuss it in their <a href="https://backstage.io/blog/2023/08/17/expedia-proof-of-value-metrics-2/">case study on the Backstage website</a>.</p><p><strong>Benefits</strong></p><ol><li>Newly onboarded engineers start using Roadie on day one. They get used to it and understand how to come back.</li><li>Engineers learn how to use a scaffolder template to create a new service during onboarding. This service is added to the catalog, and they learn how the catalog works.</li></ol><p><strong>How Roadie Helps</strong></p><p>Roadie supports standalone TechDocs that are not tied to a particular software component in the catalog. These are perfect for onboarding docs.</p><h3>Present on the value of Roadie to people managers</h3><p><strong>Category:</strong> Educate</p><p>Roadie provides specific value to managers, directors and VPs that is different than the value that developers might care about. By educating managers on the value they will receive, you can encourage them to work with their teams to get their software into the catalog.</p><p>For example, did you know that frequent Backstage users at Spotify are 5% more likely to be at the company one year later. Retention is important for managers, so they need to know about this.</p><p><strong>How Roadie Helps</strong></p><ol><li><a href="/product/tech-insights/">Roadie provides Scorecards</a> which can help managers ensure that their teams are producing mature and high quality software. This feature is not available for open-source Backstage.</li><li>Roadie gives customers value calculators to help them estimate the dollar value they can expect to receive.</li></ol><h3>Open scripted pull requests</h3><p><strong>Category:</strong> Reduce friction</p><p>Opening an automated pull request containing a <code>catalog-info.yaml</code> file is a good way to ease the burden on developers who want to get their software into the catalog. All they need to do is edit the pull request a little bit, review it and merge it.</p><p>Keep in mind that your script will need permissions to open a pull request against a majority of repositories in your source code management tool. Depending on your security model, this may not be possible.</p><p>This method can work especially well in companies that operate out of large monorepos. A monorepo setup allows the generation of a single pull request that can add many <code>catalog-info.yaml</code> files in one go. It can also be reviewed and merged by a single person with elevated permissions.</p><p>While this is a tempting option to quickly build a catalog with YAML files, we have seen customers experience issues with catalog correctness when they use this method. Some teams may blindly merge the pull request without validating the information that it contains. This tactic should be executed alongside an education program to teach teams what to do. Go slowly and experiment.</p><p><strong>How Roadie Helps</strong></p><p><a href="/docs/api/catalog/">Roadie provides a token authenticated API</a> which the centralized team can use to tell which repositories are already in the catalog. A simple script can consume this to open a PR against the repos which are not already accounted for.</p><p>Our solutions engineering team can work with you to write a simple script that will open a pull request containing a YAML file into each repository.</p><h3>Make catalog presence a requirement for deployment</h3><p><strong>Category:</strong> Create incentive</p><p>By making catalog presence mandatory for deployment, platform teams can be confident that they have all of the important software in the catalog.</p><p>In <a href="/backstage-spotify/#the-origins-of-spotify-backstage">the early days of Backstage at Spotify</a>, teams could not SSH into their machines unless their services were in the catalog. The catalog owner was referenced to determine who was and was not allowed to access the machine.</p><p>This tactic works best when the IDP is orchestrating a new greenfield platform that other teams are migrating onto. Adding the <code>catalog-info.yaml</code> file can be one simple step in what is likely a series of steps that teams have to do to migrate. Outside of this situation, it can be politically problematic to block deployments due to a missing YAML file.</p><h3>Automate catalog collection with custom entity providers</h3><p><strong>Category:</strong> Reduce friction</p><p>If developers find writing YAML files tedious, potentially the best thing to do is to make them optional. <a href="https://backstage.io/docs/features/software-catalog/external-integrations/">Backstage's custom entity providers</a> allow adopters to programmatically shovel software entries into the catalog. Custom entity providers are a great way to connect Backstage to an existing source of truth for catalog data, such as a legacy IDP, an ArgoCD instance, a kubernetes cluster, or a CICD tool.</p><p><strong>How Roadie helps</strong></p><p>Roadie gives customers the <a href="https://github.com/RoadieHQ/roadie-agent?tab=readme-ov-file#entity-provider">roadie-agent</a>. This wraps the custom entity provider concept with a secure connection to Roadie so that customers can easily dump software metadata into the agent and have it appear in the catalog.</p><p>Roadie has an <a href="/docs/api/catalog/">Entity Provider API</a>. Simply push an array of software metadata to this endpoint and it will appear in the catalog. To update the metadata, simply push again.</p><p>Frequently the programatic source of truth will have some but not all of the metadata that the catalog needs. For example, it may be missing the name of the team who owns the software. Roadie allows users to <a href="/docs/integrations/github-discovery/#decorating-catalog-entities">decorate</a> software in the catalog with extra metadata within the UI. This ensures that the catalog can become complete and rich over time.</p><h3>Share catalog completeness numbers publicly</h3><p><strong>Category:</strong> Create incentive</p><p>One great way to get people bought into the idea of building a complete catalog is to make it a group effort. By transparently reporting on the completeness and "health" of the catalog, a shared sense of ownership over the goal can be created.</p><p><a href="https://youtu.be/P-JMwuuobgY?si=BNihgof1uNBUM2pw&#x26;t=1235">Twilio explained how they do this in their catalog</a> at the Autodesk Developer Productivity Summit.</p><p><strong>How Roadie Helps</strong></p><p>Roadie gives all Tech Insights customers an out-of-the-box measurement of catalog completeness.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/44W4D88ivKswQ5Idg1jZ8W/93f4e9fb1a516a926794a293973c006b/tech_insights_catalog_scorecard.png" alt="tech insights catalog scorecard"></p><p>We also report on important aspects of catalog richness, like the percentage of pieces of software in the catalog which have an owner assigned. Customers can use these building blocks to <a href="/docs/tech-insights/tracking-catalog-correctness/mandatory-metadata/">measure other attributes of the health of their catalog</a>.</p><h3>Tie catalog completeness to a wider initiative</h3><p><strong>Category:</strong> Create incentive</p><p>Companies usually want a complete catalog so that they can accomplish some wider engineering goal. By promoting catalog completeness in service of this wider goal, teams can better understand why it is important to be in the catalog, and why they should help.</p><p>For example, we recently worked with a customer who needed a complete and correct list of all software that had access to their users personally identifiable information (PII). By attaching to this company wide goal and leveraging the influence of the company CTO, the company was able to rapidly label hundreds of software components in Roadie with their PII status.</p><p><strong>How Roadie Helps</strong></p><p>We run regular customer success meetings with customers to help them identify wider engineering initiatives where Roadie can help accomplish the goal more quickly or more easily. We will then work with teams in order to project manage the delivery of a solution.</p><h3>Do a whiteboarding session</h3><blockquote><p>I did some eduction with a team at [company] where we brainstormed the services they wanted to cover and their relationships, this was a whiteboard exercise and I then created the PR's and had them review.</p></blockquote><p>When <a href="/case-studies/from-self-hosted-backstage-to-roadie/">Paddle started with Backstage</a>, they realized that they didn't have an existing source of truth they could lean on to populate their catalog, and they would have to collate it manually.</p><p>They started with a whiteboarding exercise so they could iterate quickly. Meeting with each team in small groups, Ioannis Georgoulas (Director of SRE), led the process. He first spent time brainstorming the services that the groups wanted to catalog, and defining their boundaries and the relationships between them. This information was all collected in a simple document to start.</p><p>Once he had a solid understanding of the service map, Ioannis opened pull requests against each repository with the <code>catalog-info.yaml</code> file that was needed. All the teams had to do was review and merge it. Because they had participated in the process to gather this information, they were already bought in and could understand the value of it.</p><p><strong>How Roadie Helps</strong></p><p>We provide frequent customer success calls with every customer through the initial stages of implementation. We'll partner with your implementors to run these whiteboarding sessions and gather the information you need to be successful.</p><h2>Choosing the right internal developer portal for your software catalog</h2><p>For organizations evaluating internal developer portals, the software catalog capabilities should be a primary consideration. The portal you choose will determine how effectively you can organize and track software assets, manage services and APIs, and implement automated workflows for your teams.</p><h3>Key features to evaluate</h3><p>When selecting an internal developer portal, consider these software catalog capabilities:</p><p><strong>Customizable organization</strong>: The ability to define custom entity types, relationships, and metadata fields that match your organization's terminology. A portal that forces you into rigid categories won't capture the nuances of your software landscape.</p><p><strong>Automated workflows</strong>: Look for scaffolder templates that let teams create new services with pre-configured catalog entries, reducing manual work and ensuring consistency. This is particularly valuable for mid-sized tech companies that need to balance speed with standardization.</p><p><strong>Discoverability features</strong>: Search, filtering, and relationship visualization help developers find existing services before building new ones. The portal should make it easy to answer "do we already have something that does X?"</p><p><strong>Compliance and security integration</strong>: Built-in scorecards or the ability to integrate with security scanning tools helps maintain governance across your software portfolio without manual audits.</p><p><strong>Access control</strong>: Role-based permissions that respect your organizational structure, so teams can manage their own catalog entries while maintaining appropriate visibility boundaries.</p><p>Roadie provides all of these capabilities as a managed Backstage solution, removing the operational burden of self-hosting while giving teams the customizable features they need for organizing software services and APIs. For organizations that want the flexibility of Backstage without dedicating engineering resources to infrastructure, this combination of comprehensive software catalog features and managed operations makes Roadie a strong option for improving discoverability and compliance monitoring in enterprise environments.</p><h2>Key takeaways/Conclusion</h2><p>Building a high level of catalog completeness in Backstage need not be intimidating. By carefully choosing the strategy that will work at your company, expanding outwards from the most eager adopters first, and communicating widely as you go, you can reach a high level of catalog completeness in a short amount of time.</p><p>Two topics we have not discussed yet, are catalog richness and completeness. These areas go hand in hand with completeness, and work together to ensure that your IDP has the answers that developers need, when they need them.</p><p>We'll be covering richness and completeness in another article. If you want to be among the first to read it, make sure to subscribe to the newsletter below.</p>
]]></content:encoded></item><item><title><![CDATA[Adopting Backstage - Documentation and Support]]></title><link>https://roadie.io/blog/adopting-backstage-documentation-and-support/</link><guid isPermaLink="false">https://roadie.io/blog/adopting-backstage-documentation-and-support/</guid><pubDate>Thu, 19 Sep 2024 08:00:00 GMT</pubDate><description><![CDATA[Backstage a smoother experience for your team and boost adoption by creating effective internal getting started documentation and support channels.]]></description><content:encoded><![CDATA[<p>This is the first in a series of posts aimed at helping organizations to adopt Backstage.</p><p><strong>Making Backstage Easier for New Users</strong></p><p>Imagine opening Backstage for the first time, searching for a service, and finding... nothing. How do you even search properly? You’d need to understand concepts like entity <a href="https://roadie.io/docs/catalog/modeling-entities/#kinds">kinds</a>, <a href="https://roadie.io/docs/catalog/modeling-entities/#types">types</a>, and <a href="https://roadie.io/docs/catalog/ownership/">ownership</a>.</p><p>And if you can't find the entity, how do you add it? This requires knowing your organization’s ingestion patterns: Do you need a YAML file in the code repo? Or do you manually input the URL into the import flow? Without internal support or clear, beginner-friendly getting started documentation, this process can feel like a maze.</p><p>Backstage isn't always intuitive, especially for new users without internal support or clear documentation. Most available open source documentation is aimed at implementers, not end users. It’s often generic, overwhelming, and assumes you’re using GitHub as your SCM system. So, what can you do to make Backstage easier for your team?</p><h3>How to Simplify Backstage for Your End Users</h3><p><strong>Invest in User-Friendly Documentation</strong></p><p>For Backstage to succeed in your organization, internal getting started documentation to help users use Backstage is a must. This documentation should be front and center for new users, which might mean putting it in an existing documentation platform even if your goal is to eventually move all docs into Backstage’s TechDocs. You can use the <a href="https://backstage.io/docs/getting-started/homepage/" title="Homepage plugin for Backstage">Homepage plugin</a> to highlight certain top level info for new users including <a href="https://backstage.io/storybook/?path=/story/plugins-home-components-featureddocscard--default" title="Featured docs card">a preview card for your getting started docs in TechDocs</a>.</p><p>Your getting started docs should open with a clear intro to what Backstage is and how it helps. Include simple examples of its core features, remembering that <strong>many new users won’t even know what an internal developer portal is meant to do</strong>.</p><p>As well as have a intro page in your internal documentation explaining it you could also <strong>publish a blog post introducing IDPs to your engineers</strong>. 
i.e. Sample content for an intro to IDPs and Backstage</p><pre><code>### Internal Developer Portals
An Internal Developer Portal (IDP) is an application which is designed to give our developers easy access to information and commonly used workflows. Its goal is to reduce context switching and toil for developers by providing a “single pane of glass” that helps to improve productivity, reduce duplication of effort, and foster a more cohesive and efficient development environment.

An IDP is a place to find information about the software we build and use, the teams who build that software, and the API specs, code repositories and documentation associated with that software. It also typically provides self-service automation scripts for common tasks like creating applications or making changes to infrastructure, and scorecards to help ensure that software is adhering to best practices.

### What is Backstage
Backstage is an open-source platform for building developer portals. It was originally created by Spotify to manage and streamline their complex microservices architecture and was released for public use in 2020. 

**Key features of Backstage include**:
- Software Catalog: A centralized listing of all services, providing an overview and allowing easy management and discovery.
- Software Templates: Streamlined processes for setting up new projects and services with pre-defined templates.
- Plugins: Backstage is extensible with plugins to integrate with various tools and services used in AstraZeneca.
- TechDocs: A documentation site generator that converts Markdown files into a browsable documentation site.

Backstage aims to improve developer productivity and collaboration by providing a single, cohesive interface for all development-related activities.
</code></pre><p><strong>Identify key user journeys</strong> and craft your getting started content around them:</p><ul><li>Why would a software engineer or product manager initially visit Backstage?</li><li>What are they trying to accomplish?</li><li>What problems could Backstage help with?</li></ul><p>Talk to teams who are just onboarding or align these journeys with company-wide initiatives. If, for example, you see the Scaffolder as a high-value feature, start with docs explaining how to use it and how to modify templates.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6Hh3EjJGf0hoT9oFXK2aUy/bb31633c50f2784142b86c9a1d0b6806/Screenshot_2024-09-18_at_11.38.53.png" alt="Example documentation"></p><p>You could create these docs in TechDocs and link them from wherever engineers currently go for org-level documentation or just create them in existing documentation systems like Confluence.</p><p>Ensure they are <strong>searchable</strong> by including the right keywords in titles and descriptions, whether hosted in Backstage or elsewhere.</p><p>Lastly, publicise these docs as much as you can - get a few blog posts out on any internal news distribution channels, ping organization wide channels with the link and a short teaser description (we’ll be writing a more detailed post about internal marketing very soon with a bunch of resources for you to use).</p><p><strong>Create a Dedicated Support Channel</strong></p><p>Establish a clear support channel—like a Slack or Teams group—where users can ask Backstage-related questions. This not only helps with adoption but also builds <strong>a community where knowledge is shared and best practices emerge</strong>.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7knkX68JmhePqW2cFGY2o0/e2444660df487392de08d0587bb1e979/Screenshot_2024-09-18_at_11.35.19.png" alt="Support channel example in Slack"></p><p>Pin relevant internal and external docs in these channels to avoid repeated questions. Support channels are also a great place to <strong>identify gaps in your documentation</strong>, allowing you to improve onboarding and make it as self-service as possible.</p><p>You could even <strong>nominate Backstage champions—advocates</strong> within your organization who can help answer questions and lighten the load on your Platform Engineering/DevOps teams.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/63QZpSEMwBDMu2rUikNYi4/ae9d4685392ef15ff35f39f1545d8929/Screenshot_2024-09-18_at_12.33.11.png" alt="Support Champion"></p><p>At Roadie, support channels with our customers have been essential to successful Backstage rollouts. While documentation is the first line of defense, having a place for follow-up questions is key to encouraging usage and reducing friction for busy engineers.</p><hr><p>By streamlining these two areas—<strong>user-friendly getting started documentation</strong> and a <strong>dedicated support channel</strong>—you’ll make engaging with Backstage a smoother experience for your team and boost adoption.</p>
]]></content:encoded></item><item><title><![CDATA[How to Define Engineering Standards]]></title><link>https://roadie.io/blog/how-to-define-engineering-standards/</link><guid isPermaLink="false">https://roadie.io/blog/how-to-define-engineering-standards/</guid><pubDate>Fri, 13 Sep 2024 15:00:00 GMT</pubDate><description><![CDATA[Engineering standards ensure consistency, reliability, and quality across projects and teams.

Defining these standards can be tricky, but that is only half the battle. To make them effective standards must be adhered to. They need to be broken down into bitesized, actionable chunks.]]></description><content:encoded><![CDATA[<blockquote><p>At Roadie we don’t claim to be experts in writing software standards - we’re done it ourselves, both for Roadie and before as part of other startups, scaleups and large companies, but we don’t claim to be world-beaters at it. What we can claim is that we’ve seen many, many companies go through the journey to create standards and then apply them and we have seen what works.</p></blockquote><h2>Why standardise?</h2><p>Defining and adopting engineering standards is essential for organizations as they scale.</p><p>Teams involved in Platform initiatives normally find themselves as the first intrepid explorers in this territory for larger organizations.</p><p>Building tooling for multiple different teams and departments requires consistency and a coherent set of practices. Only then can teams coordinate, share and build scalable, maintainable, and secure software together.</p><p>Without standardization, answering basic questions becomes impossible and progress is painfully slow, a problem that often hits home at times of peak stress. How can you know which teams operate publicly facing software that has critical vulnerabilities if only half of the teams are using a dependency scanning tool? How can you roll out a new security requirement when engineering teams are each using one of 5 different security tools?</p><h2>Defining Engineering Standards</h2><p>Engineering standards are formal guidelines that outline how code should be written, how systems should be designed, and how processes should be executed. These standards ensure that all engineers are working toward the same quality benchmarks.</p><p>That means you need consensus between teams about what exactly should be in those standards. In order to do that, even before you decide what the standards are that you’d like to focus on, it’s good to have a plan for how you can agree together on what they should be.</p><h3>Strategies for Agreeing on Engineering Standards</h3><p>Getting a team to agree on a set of engineering standards can be challenging but is crucial for their success. Here are some strategies to help facilitate agreement:</p><ol><li><strong>Start with your own <a href="https://aws.amazon.com/what-is/sdlc/">SDLC</a> - if you have one - and/or Industry Best Practices:</strong><ul><li>Software Development Lifecycle and Production Readiness documents often effectively contain a lot of standards recommendations. They’re extremely useful as an input into a formal set of software standards and the two should be synced closely together.</li><li>Use industry standards like <a href="https://owasp.org/www-project-top-ten/">OWASP for security</a> or <a href="https://www.w3.org/WAI/standards-guidelines/wcag/">WCAG</a> for accessibility as a baseline. This helps reduce subjective debate by relying on well-known benchmarks. For example, it’s hard to argue that secure logging isn’t important when it shows up prominently in the <a href="https://owasp.org/www-project-top-ten/">OWASP table</a>.</li></ul></li><li><strong>Collaborate across teams and functions:</strong><ul><li>Involve engineers, product managers, and operations teams in defining the standards.</li><li>Run workshops where everyone can voice their opinions, then converge on a decision.</li></ul></li><li><strong>Appoint a champion:</strong><ol><li>Usually this is a member of engineering leadership who is responsible for driving this process and the eventual rollout.</li><li>Occasionally this can be a group though, like <a href="https://github.com/jakubnabrdalik/architecture-guild">an Architecture guild</a>.</li></ol></li><li><strong>Gradually implement:</strong><ul><li>Start small, either:
<ul><li>A minimal set of standards and build from there as the team gets more comfortable. For example, you can start by enforcing code formatting standards and then gradually add performance or security checks.</li><li>A full set of standards with only a few initial checks so that the team can get comfortable with the whole suite of standards</li></ul></li><li>These strategies allow teams to give feedback and for the standards to evolve before they are fully enforced</li></ul></li><li><strong>Regularly review and update:</strong><ul><li>An engineering standards document should never stand still.</li><li>Once the standards are set, encoded them into your systems.</li><li>Hold regular reviews (e.g., fortnightly or monthly) to review progress of each team</li><li>Create a regular cadence of review for the standards themselves, based on team feedback and new technology trends and recommit. Once a year is often enough here.</li><li>As part of these reviews, use data to demonstrate the value of each standard (e.g., reduced production errors, improved system uptime).</li></ul></li></ol><h3>Common Standards</h3><p>To make this concept concrete, let’s consider some common standards teams might define (nb: this is by definition a non-exhaustive list):</p><ul><li><strong>Logging and Monitoring:</strong> Log levels, message formats, error tracking, use of a centralized tool, use of alerts.</li><li><strong>Security:</strong> Authentication, authorization, encryption, secure coding practices, and dependency vulnerability management.</li><li><strong>Performance:</strong> Response times, load management, and scalability.</li><li><strong>Reliability:</strong> Redundancy strategies, failure handling, backup and recovery.</li><li><strong>Code Quality:</strong> Style guides, review processes, formatting rules, and readability standards.</li><li><strong>Documentation:</strong> API documentation, code comments, and README files.</li><li><strong>Testing:</strong> Code coverage, test automation, and test environment standards.</li><li><strong>Version Control:</strong> Branching strategies, commit message guidelines, and pull request processes.</li><li><strong>Deployment:</strong> Continuous Integration/Continuous Deployment (CI/CD) pipelines, rollback procedures, and environment configurations.</li><li><strong>Accessibility</strong>: <a href="https://www.w3.org/TR/WCAG22/">WCAG 2.2</a> guidelines, color contrast, text-to-speech, keyboard navigation</li></ul><p>You don't want to over-elaborate at this point. It is important to end up with ~8-10 different areas to focus on.</p><h3>Nice vocabulary to use when setting standards</h3><ul><li><code>Must</code>. Used to define mandatory items. i.e. <code>A service must use a logger</code></li><li><code>Should</code>. Used to define items which are reasonably expected to exist. If a services choses not to adopt this standard, the expectation is that they justify why not.</li><li><code>May /</code><code>Could / Will</code>. Used to define items which are more aspirational or for services that are consider mature.</li></ul><h2>An example: AcmeCorp.com</h2><p>Let’s imaginee <strong><em>AcmeCorp.com</em></strong> are a well-known platform selling books, clothes, food, laptops, paddling pools and power tools around the world. They’re an anything store, if you will.</p><p>Availability and reliability are key to their business, so they spend a lot of time thinking about how to measure and improve that for the software they build.</p><p>Previously, teams would simply assert that their service was reliable, performance, secure etc, but aside from anecdotally looking at the past weeks/months/years to validate that assertion, it was hard to prove or disprove. It was also hard to compare across services.</p><p>To help that, a cross-functional group at <strong>AcmeCorp.com</strong> have agreed a series of standards that they believe will ensure their service stand up to considerable load during peak periods, and that if outages or incidents do happen that they’ll recover quickly.</p><p>| Area     | Standard     |
| ---------- | ---------- |
| Monitoring       | Health checks for critical components must be defined and an ideal state determined.       |
|        | Service state must be constantly observed and recorded and dashboards should be created to show this data.       |
|        | Monitoring should have metrics that describe how effective a service is. These metrics are available and easily viewable on a dashboard.       |
|        | Events should be exported and/or sampled and collected in addition to other metrics       |
| Availability       | Service availability must be determined programmatically.       |
|        | Expected and unexpected behavior for a given service must be defined in tests and alerts.       |
|        | Basic SLIs should be defined and used to calculate SLO targets. This should include the number of number of good events / total number of events are being recorded       |
|        | SLOs should be actively measured, calculated, and displayed in a dashboard       |
|        | Error budgets may established and a policy outlining what to do when service runs out of budget is established.       |
|        | SLOs (and error budget policy where appropriate) should be documented in a prominent location where teams and stakeholders can easily review.       |
| Logging       | Logs must show startup, shutdown, and errors.       |
|        | Logs must have have a rotation and retention policy defined.       |
|        | Logs from all hosts must be sent to a centralized location.       |
|        | Logging pipeline must be resilient to transient failures and should be fully recoverable when ingestion returns to a healthy state.       |
| Alerting       | Basic health checks must be attached to alerts when failing.       |
|        | A dashboard must display all alerts currently firing.       |
|        | The body of any alert must contain information that is be needed to diagnose and fix the problem.       |
|        | An official on-call rotation for high-priority alerts must be configured and activated.       |
|        | High-priority alerts should tuned such that they don't fire outside of business hours unless necessary. If resolution of an issue can wait until business hours, it should not page the on- call engineer.       |
|        | High-priority alerts should be triggered only for urgent, actionable issues that require a human's intervention.       |
| Scalability       | Operating manuals for service scaling must be up to date and consumable by newly onboarded or tangentially-familiar engineers.       |
|        | Service must handle unexpected increases in load without manual effort, up to a known threshold.       |
|        | Unexpected increases and decreases in load must be handled automatically.       |
|        | Unexpected increases in load above a known threshold may be handled automatically.       |
|        | Owners of a service may run regular scaling exercises to test scaling assumptions.       |
|        | Service may be able to deprioritize features and load when needed.       |
|  Resiliency and Recovery      | Run books must exist that outlines steps of recovering from loss of capacity.       |
|        | Owners should have conducted testing on outages to validate recovery run books and quantify performance degradation.       |
|        | Owners should demonstrate manual recovery is possible with minimal performance degradation (within established threshold)       |
|        | Owners may demonstrate automatically recovery is possible with minimal performance degradation (within established threshold)       |</p><h2>Breaking Standards Down Into Scorecards and Checks</h2><p>Once the standards are defined, they need to be actionable, measurable, and concisely group so that teams can understand them. This is where checks and scorecards come into play.</p><h3>Scorecards</h3><p>Scorecards allow teams to measure in only a few data points how well they are adhering to their engineering standards across a project or organization.</p><p>Scorecards should flow naturally from your standards and be fairly simple to define. Name them things that align with those standards and are comprehensible as a bucket of actionable checks against those standards.</p><p>For example:</p><ul><li><code>Security</code> is a good, simple, easy to understand name. If you wanted to create levels for your scorecards to have some sense of progression, you could say <code>Security - Level 1</code></li><li><code>Secure Coding Standards</code> might be a good option if you wanted to go to a more granular level with your scorecards.</li></ul><p>Try and end up with ~10.</p><h3>Checks</h3><p>For each scorecard/standard, you need to break it down into one or more specific checks. A "check" is a verifiable condition that can be automated or manually enforced.</p><p>Just like Scorecards, Checks should be named things that are comprehensible but crucially should also be actionable.</p><p>For example:</p><ul><li><code>Node version should be >18</code> is a clear true/false statement about what the expectation is for a given service that uses Node.js. It is also clear what needs to be done in order to pass that check.</li><li>Similarly <code>CODEOWNERS should be enabled</code> draws an even more direct line to what needs to be done for a given service to pass a check.</li></ul><p>At Roadie we use our Tech Insights plugin to build these checks - the backend for which we also <a href="https://roadie.io/backstage/plugins/tech-insights/" title="Tech Insights OSS">open sourced</a>.</p><p>Whether you’re using Roadie, OSS or hand-rolling these checks, it’s important to have an idea of what a computationally enforceable check would look like for each of your standards.</p><h3>Automating and visualising standards</h3><p>Last but not least, you need some way to repeatedly check standards are being adhered to.</p><p>This usually comes in two forms:</p><ul><li>You can programatically check to see whether documentation exists. This is often in the form of a runbook at a given path, i.e. <code>/docs/runbooks/recovery</code></li><li>A third party tool is used to capture data that can then be interrogated programmatically. For example, this can be something as simple as an SLO existing in Datadog for a given service.</li></ul><p>Both data sources can be used to confirm at scale whether or not a given service is correctly adhering to a given standard.</p><p>Teams no longer have to assert compliance, they simply need to ensure that the evidence is currently surfaced to prove that they comply.</p><p>Many scorecarding solutions do exactly this for you, without the need for teams to individual wire up different systems. For example, Roadie customers use <a href="https://roadie.io/docs/tech-insights/introduction/">Tech Insights plugin</a> to provide standardised, automated checks across their entire software catalog with minimal or no intervention from individual teams.</p><h3>Returning to AcmeCorp</h3><p>Using the example of AcmeCorp.com again, let’s take one of their areas are turn it into a Scorecard with a series of checks. They use <a href="https://www.datadoghq.com/">Datadog</a> for their dashboards and <a href="https://sentry.io/welcome/">Sentry</a> for their logging so they can both provide sources of truth for their checks.</p><p>| Scorecard     | Underlying Standard     | Checks     | Data source     |
| ---------- | ---------- | ---------- | ---------- |
| Monitoring       | Health checks for critical components must be defined and an ideal state determined.       | Service has >1 health checks defined       | Repo file that contains healthcheck test results       |
|        |    Health checks for critical components must be defined and an ideal state determined.    | Healthcheck test results return current status codes       | Repo file that contains healthcheck test results       |
|        | Service state must be constantly observed and recorded and dashboards should be created to show this data.       | Service has a Datadog dashboard to record service health       | Datadog       |
|        | Monitoring should have metrics that describe how effective a service is. These metrics are available and easily viewable on a dashboard.       | >1 service metric is defined in Datadog       | Datadog       |
|        |    Monitoring should have metrics that describe how effective a service is. These metrics are available and easily viewable on a dashboard.    | Datadog metric monitor is configured       | Datadog       |
|        | Events should be exported and/or sampled and collected in addition to other metrics       | >1 event has been sent to Sentry in the last day       | Sentry       |</p><h3>Making a fix easy to implement</h3><p>The final stage of implementing engineering standards is to make adherence as simple and easy as possible. If a check or scorecard failure is hard to achieve then teams will take longer to resolve it, the rate at which they resolve it will be lower, and standards will ultimately suffer.</p><h4>Ask yourself:</h4><ul><li>How many steps does it take to bring a service into compliance for a given standard? How can that be reduced?</li><li>How can common or shared systems be leverage to help multiple teams into compliance? Using the AcmeCorp example above, a <a href="https://app.datadoghq.com/dashboard/lists">templated Dashboard for Datadog</a> could help all teams skip design and production steps when setting up a Monitoring dashboard.</li><li>How can the overall cycle time from error to resolution be reduced? Can common fixes be added to shared repositories or How To guides be written to help teams?</li></ul><h4>Where possible, implement quick fix options</h4><p>At Roadie we use the Backstage scaffolder to automate many of the fixes for our scorecards. To take a simple example, one of our engineering standards is that branch protection must be enabled on any repository we create. If a service is linked to a repository without branch protection it fails that check. To resolve it, we have a 3-second Scaffolder template that can modify the GitHub settings associated with the repository. The only thing the team needs to do is look at the check and click a button.</p><h2>Conclusion</h2><p>Defining engineering standards is critical for ensuring that software systems are built and maintained with quality, security, and scalability in mind. Breaking these standards down into checks and scorecards allows teams to monitor compliance and ensure continuous improvement. By following a collaborative approach to defining standards and making them measurable, teams can streamline their development processes and produce more reliable software.</p><p>Whatever you are focusing on, the key to success lies in making standards actionable, measurable, and adaptable.</p>
]]></content:encoded></item><item><title><![CDATA[Easier Relationship Mapping in the Backstage Catalog]]></title><link>https://roadie.io/blog/easier-relationship-mapping-in-the-backstage-catalog/</link><guid isPermaLink="false">https://roadie.io/blog/easier-relationship-mapping-in-the-backstage-catalog/</guid><pubDate>Wed, 11 Sep 2024 22:00:00 GMT</pubDate><description><![CDATA[How to make adding relationships between entities easier for everyone in the Backstage catalog. ]]></description><content:encoded><![CDATA[<p>In the Backstage software catalog, relationships provide a glue between the different software, systems, resources and people in your organisation. They can allow you to easily answer questions which otherwise might be much harder to find out in a large or rapidly growing organisation.</p><p>Plugins like the Catalog Graph Plugin allow powerful visualisation of these relationship graphs.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6jMaB3WSx6srUkSNTJxY0o/537bbde8c602bf4535f77d51eede5034/view_full_graph.webp" alt="Catalog Graph Plugin displaying relationships"></p><p>Similarly, relationships can become a core piece of information displayed on entity Overview pages.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1PdCPYszEjsGOK9JK0xrWM/f7b2376530652c2b5e7d9400b98b0bff/Screenshot_2024-09-06_at_15.52.52.png" alt="Relationships list"></p><p>Relationships can be added in one of two ways - via a YAML file update in an SCM repository, or in a third party source like GitHub Teams or Okta. A YAML update looks something like this:</p><pre><code class="language-diff">kind: Component
metadata:
	name: identity-backend
spec:
	...
+	dependsOn:
+		- component:default/users-backend
+		- resource:aws/identity-backend-ec2
</code></pre><h3>Restrictive Relationships</h3><p>There are a very <a href="https://backstage.io/docs/features/software-catalog/descriptor-format/#kind-component">limited and restrictive set of allowed relationships</a> between different Kinds in Backstage by default. <strong>Knowing what is allowed</strong> requires looking up the Backstage documentation each time to <a href="https://backstage.io/docs/features/software-catalog/descriptor-format">check the schemas</a>.</p><p>For instance, a Domain cannot “depend on” anything in the default Backstage relationship model. However, maybe you have a domain like Shipping that depends on another like Identity. The Identity domain exposes APIs for customer addresses that the Shipping domain breaks without access to. You might even want to directly model the hard dependency on specific external API’s inside the Identity domain so that its easy to identify issues and notify the right people in charge of the Shipping domain when something breaks in those APIs.</p><p>Confusingly, many of the default relationships terms that can be used in the YAML <code>spec</code> definition don not map directly to the actual relationship in the catalog. For instance if you want to add a <code>partOf</code> relationship for a Domain to another Domain, you would need to use <code>subdomainOf</code> . Adding it as an explicit <code>partOf</code> field would not work unless you add that in the extended processor as we’ll see shortly. Each Kind has its own differing semantics for each relationship type, which needs to be referenced and checked to ensure a relationship is correctly added.</p><h3>Expanding available relationships</h3><p>Luckily its an easy engineering task to add a processor that can handle more expansive relationships and even new terms for those relationships such as <code>manages/managedBy</code> for Users. This processor would run in addition to the <code>BuiltinKindsEntityProcessor</code> that does the default relationship processing for the catalog and would look something like this:</p><pre><code class="language-tsx">
import { CatalogProcessor, CatalogProcessorEmit, processingResult } from '@backstage/plugin-catalog-node';
import { Entity, getCompoundEntityRef, parseEntityRef, RELATION_DEPENDENCY_OF, RELATION_DEPENDS_ON } from '@backstage/catalog-model';
import { LocationSpec } from '@backstage/plugin-catalog-common';
import { RELATIONSHIP_MANAGED_BY, RELATIONSHIP_MANAGES } from '../constants';

export class ExtendedRelationshipsProcessor implements CatalogProcessor {
  getProcessorName(): string {
    return 'ExtendedRelationshipsProcessor';
  }

  postProcessEntity(
    entity: Entity,
    _location: LocationSpec,
    emit: CatalogProcessorEmit,
  ): Promise&#x3C;Entity> {
    const selfRef = getCompoundEntityRef(entity);
    function doEmit(
      targets: string | string[] | undefined | null,
      context: { defaultKind?: string; defaultNamespace: string },
      outgoingRelation: string,
      incomingRelation: string,
    ): void {
      if (!targets) {
        return;
      }
      for (const target of [targets].flat()) {
        const targetRef = parseEntityRef(target, context);
        emit(
          processingResult.relation({
            source: selfRef,
            type: outgoingRelation,
            target: {
              kind: targetRef.kind,
              namespace: targetRef.namespace,
              name: targetRef.name,
            },
          }),
        );
        emit(
          processingResult.relation({
            source: {
              kind: targetRef.kind,
              namespace: targetRef.namespace,
              name: targetRef.name,
            },
            type: incomingRelation,
            target: selfRef,
          }),
        );
      }
    }

    if (entity.kind === 'Domain') {
      doEmit(
        entity.spec?.dependsOn as string[],
        { defaultKind: 'System', defaultNamespace: selfRef.namespace },
        RELATION_DEPENDS_ON,
        RELATION_DEPENDENCY_OF,
      );
      doEmit(
        entity.spec?.dependencyOf as string[],
        { defaultNamespace: selfRef.namespace },
        RELATION_DEPENDENCY_OF,
        RELATION_DEPENDS_ON,
      );
    }

    if (entity.kind === 'User') {
      doEmit(
        entity.spec?.managedBy as string,
        { defaultKind: 'User', defaultNamespace: selfRef.namespace },
        RELATIONSHIP_MANAGED_BY,
        RELATIONSHIP_MANAGES,
      );
      doEmit(
        entity.spec?.manages as string[],
        { defaultKind: 'User', defaultNamespace: selfRef.namespace },
        RELATIONSHIP_MANAGES,
        RELATIONSHIP_MANAGED_BY,
      );
    }

    if (entity.kind === 'Group') {
      doEmit(
        entity.spec?.managedBy as string,
        { defaultKind: 'User', defaultNamespace: selfRef.namespace },
        RELATIONSHIP_MANAGED_BY,
        RELATIONSHIP_MANAGES,
      );
    }

    return Promise.resolve(entity);
  }
}
</code></pre><h3>Questions to consider</h3><p>There is a rational that restricting the available relationships for each kind in the catalog prevents misuse of relationships by users adding things to the catalog. However it is worth bearing in mind that relationships in the catalog are mostly just syntactic sugar over the same link between two entities. There is nothing different about <code>dependsOn</code> and <code>partOf</code> in Backstage except the perceived meaning of these terms.</p><h3>Educating users</h3><p>Easily accessible documentation on standards and definitions is the best way to educate contributing users to your Backstage catalog. Backstage has its own <a href="https://backstage.io/docs/features/software-catalog/descriptor-format/">schema docs</a> and <a href="https://backstage.io/docs/features/software-catalog/well-known-relations">descriptions on existing relationships</a> but you can also host internal documentation in a <a href="/docs/catalog/showing-dependencies/#available-input-relationships">more compact format</a> that is easier to reference quickly.</p><h2>What we did to fix it</h2><p>At Roadie, we’ve added expanded relationships for all kinds in the catalog to make it easier for people to correctly add relationships using the terms they understand. We’ve <a href="/docs/catalog/showing-dependencies/#available-input-relationships">documented them in a compact format</a> and are working on a UI based editor for these relationships so that users do not have to manually edit each YAML file and go through a PR process to build up a comprehensive mapping of dependencies in an organization.</p><p>We've also added a new card that can show all relationships for an entity, regardless of what kind of relationship it is in list format.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1PdCPYszEjsGOK9JK0xrWM/f7b2376530652c2b5e7d9400b98b0bff/Screenshot_2024-09-06_at_15.52.52.png" alt="Relationships list"></p><p>Contact us via <a href="https://discord.gg/DJNK9Csp">Discord</a> on on this website's chat to find out more about what Roadie can help with or to talk to our engineers about this topic or other issues you might be having adopting Backstage.</p>
]]></content:encoded></item><item><title><![CDATA[Improving Backstage performance (by up to 48x)]]></title><link>https://roadie.io/blog/improving-backstage-performance/</link><guid isPermaLink="false">https://roadie.io/blog/improving-backstage-performance/</guid><pubDate>Wed, 11 Sep 2024 10:05:00 GMT</pubDate><description><![CDATA[Unlock the full potential of Backstage with our guide to optimizing its Catalog performance. Whether you're building a custom internal developer portal or scaling your Backstage deployment, this article provides the crucial insights you need to avoid common performance pitfalls.]]></description><content:encoded><![CDATA[<p>Backstage is an excellent framework for building an internal developer portal. It provides all of the fundamental building blocks to improve developer experience in an organization.</p><p>Core to Backstage is its Catalog of entities. The Catalog provides a database of software components, resources, libraries, and other kinds of software items. It provides client code and an API backend to retrieve items in the catalog, along with the software interfaces required to populate the entity catalog. Its model is flexible, customizable, and powerful.</p><p>However, with great power comes great responsibility. Without experience, it's easy to develop anti-patterns in Backstage catalog usage. These anti-patterns can then turn into major performance issues at scale. This in turn leads affects trust and usage of the product as a whole.</p><p>At Roadie, we provide an out-of-the-box version of Backstage for our customers. We have come across many of the ways in which non-optimal Catalog client usage can affect performance of the application as a whole. We have seen these performance issues result in lagging page loads and (in extreme cases) causing page loads to fail in Backstage.</p><p>By applying the patterns explained in this post, you could see a huge improvement in Catalog response time. In some cases, you may even see Catalog queries perform 48x faster!</p><h1>Architecture of the Entity catalog</h1><p>The entity catalog in Backstage is made up of three components. A Catalog client, the Catalog backend, and the Catalog database. When Backstage starts up for the first time, it will have a Catalog database and a catalog backend. When you visit the Backstage application in your browser, it will make use of the Catalog client to retrieve data from the Catalog backend. The Catalog backend in turn retrieves the requested catalog items from the Catalog database.</p><h1>Using the Backstage Catalog Client</h1><p>Soon after deploying Backstage in an organization, users will want to customize it.</p><p>Frequent customizations we come across include loading entities into the Catalog from an in-house platform or visualizing data from an internal system in the Backstage UI. Customization is normal and is a sign that Backstage is adding value for teams.</p><p>When developers write extensions to Backstage, it's likely they will come across the need to interact with the Catalog. There are two ways they can do this:</p><ol><li>via a Frontend Backstage extension</li><li>via a Backend Backstage extension</li></ol><p>To make use of the Catalog client in a frontend Backstage extension, you are likely to be using the <code>useApi</code> hook, along with a <code>useAsync</code> function.</p><pre><code class="language-tsx">import { catalogApiRef } from '@backstage/plugin-catalog-react';
import { useApi } from '@backstage/core-plugin-api';
import { stringifyEntityRef } from '@backstage/catalog-model';
import useAsync from 'react-use/lib/useAsync';

export const CustomReactComponent = () => {
  const catalogApi = useApi(catalogApiRef);
  const {
    value: entities,
  } = useAsync(async () => {
    const response = await catalogApi.getEntities();
    return response.items || [];
  }, [catalogApi]);

  return (&#x3C;>{entities.map(stringifyEntityRef).join('\n')}&#x3C;/>)
}
</code></pre><p>In a backend Backstage extension, you are likely to be constructing the Catalog client using the discovery client. The discovery client is a helper that allows plugins to discover the API location of other clients.</p><p>Generally, if you are writing a backend plugin, like a new REST API or a Catalog processor, you will have access to the discovery client. Depending on your particular situation, you may have access to the discovery client in a different way.</p><pre><code class="language-tsx">import { CatalogClient } from '@backstage/catalog-client';
import { DiscoveryApi } from '@backstage/core-plugin-api';

export const getAllEntities = async (discovery: DiscoveryApi) => { 
  const catalogApi = new CatalogClient({
    discoveryApi: discovery,
  });

  const response = await catalogApi.getEntities();
  return response.items;
}
</code></pre><p>You will notice that once you have an instance of the CatalogApi, it is used in the same way in either the frontend or a backend extension.</p><pre><code class="language-tsx">(await catalogClient.getEntities()).items;
</code></pre><p>Check <a href="https://github.com/backstage/backstage/blob/master/packages/catalog-client/src/types/api.ts">the Backstage docs</a> for a more comprehensive explanation of the full Catalog interface.</p><h1>How big can a Backstage Catalog get?</h1><p>When thinking about Catalog size, it’s useful to think about two things:</p><ul><li>How big is each individual entity?</li><li>How many entities do you have?</li></ul><h3>A typical Entity</h3><p>A typical Backstage Entity looks like this:</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: artist-web
  description: The place to be, for great artists
  labels:
    example.com/custom: custom_label_value
  annotations:
    example.com/service-discovery: artistweb
    circleci.com/project-slug: github/example-org/artist-website
  tags:
    - java
  links:
    - url: https://admin.example-org.com
      title: Admin Dashboard
      icon: dashboard
      type: admin-dashboard
spec:
  type: website
  lifecycle: production
  owner: artist-relations-team
  system: public-websites
</code></pre><p>It describes a website called the <code>artist-web</code>. It has a few basic Backstage entity properties, and some annotations, tags, and links. It's encoded in YAML here, but in Backstage's database, it is stored as plain text in JSON format.</p><p>Uncompressed, this entity definition is about half a kilobyte. Therefore, a Catalog containing about 20,000 similarly sized entities would add up to about 10 megabytes of data uncompressed. That's a pretty big chunk of data to be sending over the wire.</p><p>However, we haven't seen anything yet…</p><p>The Backstage Catalog model defines an API Kind. These are used to document the endpoints that services make available in Backstage. API entities often contain an embedded OpenAPI doc.</p><p>For example:</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: artist-api
  description: Retrieve artist details
spec:
  type: openapi
  lifecycle: production
  owner: artist-relations-team
  system: artist-engagement-portal
  # The embedded OpenAPI spec is in the defiition
  definition: |
    openapi: "3.0.0"
    info:
      version: 1.0.0
      title: Artist API
      license:
        name: MIT
    servers:
      - url: http://artist.spotify.net/v1
    paths:
      /artists:
        get:
          summary: List all artists
    ...
</code></pre><p>At Roadie, we have seen multiple customers with API kind entities in their Catalog with embedded OpenAPI docs as large as 1 megabyte in size. It's easy for even a small-sized engineering organization to have 50 such APIs documented in Backstage. Unoptimized, that's 50MB+ of data that's being transferred every time we query the full Catalog.</p><p>This Catalog size is important because poor use of the Catalog APIs can cause huge database queries and API response sizes to result, which will cause both unwarranted traffic across the network and unwanted wasted time transferring, encoding, and decoding that data.</p><h1>How to Make Good Use of the Entity Catalog</h1><p>With all this said, we wanted to run through some good practices that are going to help with improving the Backstage experience. The examples we use in the tables below are measured in the browser on a production Catalog that has 14k entities.</p><p>Unoptimized, we're looking at 2.16 seconds and 59.5 MB of data. That's our starting point. Now each experiment we do below is going to improve that data.</p><h2>Only query the entities fields that you need</h2><p>By default, when retrieving entities from the Backstage Catalog, Backstage will return the whole entity for each item listed. As mentioned above, an entity might be as large as 1 megabyte. As such, limiting fields requested to the ones that are strictly required can help a lot. For example, you might have code like the following that is requesting every entity in the Catalog and then converting the result into an array of entity references.</p><pre><code class="language-tsx">(await catalogClient.getEntities()).items.map(stringifyEntityRef);
</code></pre><p>If you look under the hood, you'll find that the <code>stringifyEntityRef</code> function only makes use of the kind, name, and namespace. As such, we can cut down on the amount of data transferred across the network by limiting the fields that are requested.</p><pre><code class="language-tsx">(await catalogClient.getEntities({
  fields: ['kind', 'metadata.name', 'metadata.namespace']
})).items.map(stringifyEntityRef);
</code></pre><p>| <strong>Operation</strong> | <strong>Time</strong> | <strong>Size</strong> |
| --- | --- | --- |
| <code>catalogClient.getEntities()</code> | 2.16 seconds | 59.5 Mb |
| <code>catalogClient.getEntities({ fields: ['kind', 'metadata.name', 'metadata.namespace'] })</code> | 0.767 ms | 1.2 Mb |</p><h2>Make use of the filter option to retrieve only entities that you need</h2><p>We have seen a pattern develop whereby the Catalog client is used to retrieve all of the entities in the Catalog, and then the list is filtered client side.</p><pre><code class="language-tsx">(await catalogClient.getEntities()).items.filter((entity) => entity.kind === 'Group');
</code></pre><p>It is more efficient to send a filter to the catalog client so that filtering is done either in the Backstage backend or the Backstage database.</p><pre><code class="language-tsx">(await catalogClient.getEntities({ filter: { kind: ['Group'] } })).items
</code></pre><p>| <strong>Operation</strong> | <strong>Time</strong> | <strong>Size</strong> |
| --- | --- | --- |
| <code>catalogClient.getEntities()</code> | 2.16 seconds | 59.5 MB |
| <code>await catalogClient.getEntities({ filter: { kind: ['Component'] } })</code> | 69 milliseconds | 2.5 MB |</p><h2>Avoid retrieving all entities in order to count entities</h2><p>A common pattern we see in Backstage is for developers to download the whole contents of the Catalog in order to count the entities in that Catalog. The following code will cause Backstage to query the database for every entity, then the client will need to decode the JSON it retrieves in order to count the number of entities.</p><pre><code class="language-tsx">(await catalogApi.getEntities({})).items.length
</code></pre><p>There is a far more performant way to do this, using the query API. The following requests a limit of 1 entity to be returned, and also requests that the <code>uid</code> field from the entity is the only item that is returned for that entity. The query API always returns the total count of entities for that query. As such it gives us what we need with out downloading the whole Catalog to the client.</p><pre><code class="language-tsx">(await catalogClient.queryEntities({
  fields: ['metadata.uid'],
  limit: 1,
}).totalItems;
</code></pre><p>The change suggested here is going to save work for the Backstage database, the Backstage backend, and the Backstage frontend.</p><p>| <strong>Operation</strong> | <strong>Time</strong> | <strong>Size</strong> |
| --- | --- | --- |
| <code>catalogClient.getEntities()</code> | 2.16 seconds | 59.5 MB |
| <code>catalogClient.queryEntities({ fields: ['metadata.uid'], limit: 1 })</code> | 45 milliseconds | 0.5 Kb |</p><h2>Enable Gzip Compression</h2><p>When not using the Catalog Client, we recommend using <code>gzip</code> encoding to reduce the amount of data transferred. This is crucial because requests for large amounts of data directly from the Backstage APIs can be massive. Enabling compression significantly decreases the data volume sent to the client. You can achieve this by including the <code>Accept-Encoding</code> header with your client requests.</p><p>| <strong>Operation</strong> | <strong>Size</strong> |
| --- | --- |
| <code>curl https://backstage-server/api/catalog/entities</code> | 59.5 MB |
| <code>curl https://backstage-server/api/catalog/entities -H 'Accept-Encoding: gzip'</code> | 6.7 MB |</p><h2>Keep Backstage up to date</h2><p>The first, and perhaps the most important thing to consider is to keep Backstage up to date. If you are using Roadie, you are already using a very recent version of Backstage. However, if you are managing Backstage yourself, you may have fallen behind. Backstage releases new versions at least once a month, and these versions often contain very valuable performance improvements to the Catalog.</p><p>For example, in version 1.6.7 of the Catalog client library, there was an optimization. Previously, the Catalog client would sort all entities before returning them to the caller. This is a nice, helpful utility until there are thousands of entities to sort. Often, it is not necessary or optimal to receive a sorted list of entities.</p><h2>Collaborate on the OSS Backstage core project</h2><p>As part of research for this document, we spoke with the core maintainers of Backstage, and there are some great ideas about how to continue to improve the performance. For example, it has been discussed that by default, the getEntities function should be replaced by an iterator object. That iterator would be used to page over the list of entities rather than retrieving the whole list.</p><p>As such, keeping up to date with Backstage releases will allow you to benefit from these performance improvements.</p><h2>Conclusion</h2><p>This article is illustrative of some of the performance gains that can be achieved, and your mileage may vary. We have not delved into the performance implications of these changes on backend memory and database query performance. However, we can say that these changes can greatly improve these items too. It's difficult to quantify; however, at Roadie, we were seeing huge memory spikes and large garbage collections occurring in Backstage when the whole Catalog is queried. This is possibly due to the physical sizes of the entity Catalogs and the serialization and deserialization that occurs between the client, backend, and database.</p><p>We have shown that making use of some good patterns can result in a much improved load times for users. We have shown some examples where timings are reduced from multiple seconds to sub-second. We have also shown that the sizes sent across the wire can be greatly reduced from multiple megabytes to tens of kilobytes.</p><p>A well-managed and optimized internal developer portal can make your software engineers more efficient and empower them with the information they need. When load times are reduced from multiple seconds to sub 1 second, developers enjoy a fast, responsive experience that means they’re more likely to use Backstage and find what they need.</p>
]]></content:encoded></item><item><title><![CDATA[Update the Backstage catalog instantly without touching any YAML]]></title><link>https://roadie.io/blog/immediate-catalog-updates/</link><guid isPermaLink="false">https://roadie.io/blog/immediate-catalog-updates/</guid><pubDate>Thu, 05 Sep 2024 22:00:00 GMT</pubDate><description><![CDATA[How to make updating entities in your Backstage catalog painless and quick, and unlock large scale scripted updates via API. ]]></description><content:encoded><![CDATA[<p>Updating an entity in the Backstage catalog generally means manually updating a YAML file in a repository somewhere and getting it merged to the main branch.</p><p>For many, this manual process can feel like a series of hurdles—switching contexts, waiting for pull request reviews, and depending on other team members or systems.</p><p>This friction increases significantly when you scale across hundreds or thousands of repositories. For example, changing a group name might require chasing after numerous teams to raise and merge PRs, turning what should be a simple update into a months-long process.</p><p>Having to edit YAML makes keeping your catalog up-to-date a challenge, let alone enriching it with additional data or plugins like PagerDuty.</p><h3>How long does it take to add a PagerDuty plugin?</h3><p>Lets use the <a href="https://www.pagerduty.com/">PagerDuty</a> plugin as an example. Suppose you want to add PagerDuty monitoring to a Backstage entity representing a backend service. All you need to do is add an annotation to the YAML file with the relevant service ID.</p><p>The quickest way to edit the YAML file might seem simple: open GitHub, click the pencil icon on the About card, and start editing.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/735n0qJci6DkFMQ9dqVQ6k/f7b1a9cbbae10426615091199f448504/Screenshot_2024-09-03_at_14.12.21.png" alt="About Card"></p><h4>Manual Process in GitHub</h4><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6cOOLbx3JM4UY0tL58bjp7/e2a087118b20dc6d37630ee9e9adccf1/Screenshot_2024-09-03_at_14.12.51.png" alt="GitHub web editor"></p><p>Once inside GitHub’s web editor, you can make the change, commit it to a new branch, and open a pull request.</p><p>Sounds straightforward, right? But here’s where it gets tricky. To add the PagerDuty plugin, you need the correct service ID, which requires logging into PagerDuty, searching for the service that matches your entity, and then copying the ID from the browser’s address bar (since it’s not easily accessible in the UI).</p><p>In GitHub’s case, we can edit this immediately, commit it to a new branch and open a pull request rather quickly all in the web editor.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6ZdSjaVPiHogqYHZOflezN/75e02a9824b64cf40a2f35b420695881/Screenshot_2024-09-03_at_14.16.12.png" alt="PagerDuty annotation"></p><p>But to find the correct Service ID we must first go to PagerDuty. Hopefully we can log in and access the available services, and then search for the correct one using the names in the Component YAML.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7IYjPTjhIp9tdsLdoMAxO8/9c3f3b0797c88bf53209b059af3e3342/Screenshot_2024-09-03_at_14.19.48.png" alt="PagerDuty service search"></p><p>Now, after locating the ID (in the address bar) and adding it to the YAML file, you still need to commit the changes, submit a PR, and <strong>wait for it to be reviewed and merged</strong>. Depending on the team’s review cadence, this could take days or longer.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2Rs97gaqO8IDXtZ7sZ101b/8ad2f2dd91bbe2a6d5ca63d948f1803d/Screenshot_2024-09-03_at_14.23.47.png" alt="Open PR"></p><p>Multiply this by hundreds of teams and thousands of repositories, and the process becomes incredibly slow and frustrating. Minor updates can take weeks or months to implement at scale.</p><h2>A Better Way: Entity Updates with Roadie</h2><p>At Roadie we’ve built UI based tools that allow you to set things like PagerDuty annotations with a few clicks. Instead of seeing a missing annotation card you’ll be able to select from a list of PagerDuty services and update the entity immediately, all from the same page in the UI.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4ZNEaRsdC1VrjU7FbECPyW/c680669b1593f99696cc7013bc7bcce1/Screenshot_2024-09-05_at_13.31.41.png" alt="PagerDuty Annotation Card"></p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6a6dj4OCYYg4v5PfyibnOK/3f7ccad98074b58730a1dba66b449271/Screenshot_2024-09-05_at_13.31.50.png" alt="Select from existing services"></p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/22jEc6zi7JPchT3VVvpdri/2a4e593de5f3a675bcf163a1a9522d91/Screenshot_2024-09-05_at_11.52.23.png" alt="Working PagerDuty plugin"></p><h3>Bulk updates via UI</h3><p>But what if you need to update multiple entities at once? Roadie also provides a bulk editor that allows you to make updates across many entities using a simple table format. This dramatically reduces the manual effort required to keep your catalog accurate and up-to-date.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3P4MYwP1UEJXEmMaF3td5h/7eca2f377e30b4f043aecb83c0ef5f73/Screenshot_2024-09-05_at_14.06.30.png" alt="Bulk annotation editor"></p><h2>Behind the Scenes: How It Works</h2><p>Our UI-based editors are powered by a backend feature we call Fragments. Fragments are partial entity data stored in a database table and merged into the existing entities in the catalog by a processor. These fragments can be updated through our UI or via API.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/DLOZ3BTheXhSHwRERRm9I/04d038ed679bdd91f0658fbdfef46642/Screenshot_2024-09-05_at_15.52.34.png" alt="Overview of Backend flow"></p><h3>Leverage external APIs to make things easier</h3><p>The custom UI we’ve seen earlier in this post also leverages external API’s to provide a dropdown of available options.</p><p>In this example we’ve used PagerDuty’s API to fetch a list of services using the API token already added to Backstage to make your PagerDuty plugin work. This call is made via the <a href="/docs/integrations/create-proxy/basic/">proxy</a> which is set up to point to PagerDuty using your API Token on the backend.</p><h3>Script bulk changes via API</h3><p>However, sometimes bulk editing of metadata in the catalog is best scripted, and in this scenario, Roadie users make use of our <a href="https://roadie.io/docs/api/catalog/#operations-tag-Fragments">Fragments API</a> to make changes to entities, such as changing the ownership of a group’s entities when the group gets absorbed into another group, or updating any relationship references to a user entity when that individual leaves and someone else takes their role.</p><p>We’re not the only ones who’ve done something like this - companies like <a href="https://www.twilio.com/">Twilio</a> and others have built similar fragment functionality so that they can script updates to there catalog via API and free up platform teams to help keep the catalog data rich and accurate.</p><p>Reach out to us on <a href="https://discord.gg/DJNK9Csp">Discord</a> or our website’s chat messenger if you want to hear more about this and other improvements Roadie has made to the Backstage experience.</p>
]]></content:encoded></item><item><title><![CDATA[How to customize Backstage Kinds and Types without getting in trouble]]></title><link>https://roadie.io/blog/kinds-and-types-in-backstage/</link><guid isPermaLink="false">https://roadie.io/blog/kinds-and-types-in-backstage/</guid><pubDate>Tue, 03 Sep 2024 22:00:00 GMT</pubDate><description><![CDATA[Exploring some of the categorisation issues organisations may come up against when trying to add new entities to the Backstage catalog, and possible solutions. ]]></description><content:encoded><![CDATA[<p>Backstage comes with an <a href="https://backstage.io/docs/features/software-catalog/system-model#core-entities">opinionated set of top level groupings</a> to use in modeling the software and assets in your organization, such as “Component”, “Resource”, “API”, “System” and “Domain”.</p><p>These are called “Kinds” and they are meant to be combined with a “Type” that signifies more detailed sub categories within these generic buckets. For instance, you could have a Component of type <code>library</code> or a Component of type <code>website</code>. Types are completely up to you to define, which can lead to its own set of problems, as we will see.</p><p>When someone wants to add something to your catalog, they will need to decide first what kind that something belongs in, and then the type. Most of the time these questions arise sitting in front of a blank YAML file in your code editor, without any helpful context to hand. This can lead to several problems, as we shall see, even for those who understand the Backstage YAML approach and schema.</p><h2>The common problems</h2><h3>Which Type?</h3><p>How does this decision get made? A conscientious user could first go to the existing catalog and try to find patterns already used for a similar thing, and then decide whether to follow that pattern or not.</p><p>However it might also be the case that the user just tries to use their own common sense to choose the kind and type. This often leads to problems particularly with types as you can end up with multiple versions of the same type - i.e. <code>website</code>, <code>Website</code>, <code>site</code>, <code>web-server</code>, <code>webServer</code>, <code>web-app</code> etc.. This can even be intentional when the existing type doesn’t follow the formatting of other types so a user decides to introduce a corrected duplicate.</p><p><strong>Duplicate types can cause problems</strong> when you want to search for things in the catalog by type as you have to select all variants, both in the UI and via API. The Backstage search might not work as expected or might miss results. And the cognitive load of seeing a messy catalog also has an impact in perceived reliability of the data, which could mean users look at it less and don’t see it as a source of truth anymore.</p><h3>Which Kind?</h3><p>Types are not the only potential problem area. <strong>Kinds may not neatly map to the top level terminology or groupings used inside your organization.</strong> Maybe you are using <a href="https://en.wikipedia.org/wiki/Value_stream">Value Streams</a>, which are groupings of Domains as an organizational concept and way of grouping teams. Backstage has no ValueStream kind, only the Domain kind. Domains are not the same as a Value Stream - in fact Value Streams usually comprise of multiple domains. If you want to model Value Streams with the default options you would have to use a kind of Domain with a qualifying type of <code>value-stream</code>, which might feel unintuitive or even stop users adding it at all.</p><h2>The solutions</h2><p>So how can we address these problems?</p><h3><strong>Problem 1: Kinds don't map to top level concepts in your organization causing confusion</strong></h3><h4>Custom Kinds</h4><p>It is possible to create custom kinds in Backstage that better align to your organization.</p><p>Creating a new kind is a relatively simple engineering task involving registering a new processor and validation for that kind that emits any desired relationship mappings.</p><pre><code class="language-tsx">import { CatalogProcessor, CatalogProcessorEmit, processingResult } from '@backstage/plugin-catalog-node';
import { ProductEntityV1 } from './ProductKind';
import { Entity, entityKindSchemaValidator, getCompoundEntityRef, parseEntityRef, ...} from '@backstage/catalog-model';
import { LocationSpec } from '@backstage/plugin-catalog-common';
import productSchema from './Product.roadie.v1.schema.json';

// Creates a validator using the JSON AJV schema imported above. 
const validateProductEntity = (entity: Entity) => 
	entityKindSchemaValidator(productSchema)(entity) === entity

// Processors will run against every entity in the catalog
export class ProductKindProcessor implements CatalogProcessor {

  getProcessorName(): string {
    return 'ProductKindProcessor';
  }

  postProcessEntity(
    entity: Entity,
    _location: LocationSpec,
    emit: CatalogProcessorEmit,
  ): Promise&#x3C;Entity> {
    const selfRef = getCompoundEntityRef(entity);

    // Function for triggering relationships to be processed
    function doEmit(
      targets: string | string[] | undefined,
      context: { defaultKind?: string; defaultNamespace: string },
      outgoingRelation: string,
      incomingRelation: string,
    ): void {
      if (!targets) {
        return;
      }
      for (const target of [targets].flat()) {
        const targetRef = parseEntityRef(target, context);
        emit(
          processingResult.relation({
            source: selfRef,
            type: outgoingRelation,
            target: {
              kind: targetRef.kind,
              namespace: targetRef.namespace,
              name: targetRef.name,
            },
          }),
        );
        emit(
          processingResult.relation({
            source: {
              kind: targetRef.kind,
              namespace: targetRef.namespace,
              name: targetRef.name,
            },
            type: incomingRelation,
            target: selfRef,
          }),
        );
      }
    }

		// Adding relationships
    if (entity.kind === 'Product') {
      const product = entity as ProductEntityV1;
      doEmit(
        product.spec.owner,
        { defaultKind: 'Group', defaultNamespace: selfRef.namespace },
        RELATION_OWNED_BY,
        RELATION_OWNER_OF,
      );
      doEmit(
        product.spec.system,
        { defaultKind: 'System', defaultNamespace: selfRef.namespace },
        RELATION_PART_OF,
        RELATION_HAS_PART,
      );
      ...
    }
    return Promise.resolve(entity);
  }

	// This is is required so that the catalog will be able to ingest this new Kind
  validateEntityKind(entity: Entity): Promise&#x3C;boolean> {
    if (entity.kind === 'Product') {
      return Promise.resolve(validateProductEntity(entity));
    }
    return Promise.resolve(false);
  }
}
</code></pre><p>In addition you will need to add the kind to an allow list in your <code>app-config.yaml</code> file in the root of your Backstage repo like so:</p><pre><code class="language-yaml">catalog:
  rules:
    - allow:
      - Component
      - Product
      ...
</code></pre><p>However, you will want to ensure that any new Kinds introduced are <strong>stable</strong> enough concepts in you organization that they are unlikely to be changed. Deprecating a top-level Kind when you have thousands of YAML files to manually update across hundreds of teams can be a time consuming process.</p><h3>Documentation</h3><p>You will also want to document your new kind somewhere internally which could mean duplicating Backstage documentation on catalog schemas so that your organization has a single place to go for reference to all available entity schemas.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/21jpRCyndDCX5P3D9GOXyN/b5a7f85deb396fbc4df4eaf6f93847e7/Screenshot_2024-09-04_at_13.52.16.png" alt="Example of Kind documentation in Roadie"></p><h3>Scaffolder</h3><p>Lastly you can create a Scaffolder template to help people bootstrap a new <code>catalog-info.yaml</code> file that uses a dropdown of available kinds so that its easy to know what options are available. An example template for GitHub can be found <a href="https://github.com/RoadieHQ/getting-started/tree/main/scaffolder/register-new-component/">here</a>.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/ZZUXAgWLFqKrExK1tHy4o/51c369d93ef58a94515ddf88b4fadb6f/Screenshot_2024-09-04_at_13.50.55.png" alt="Example of API populated select in Scaffolder"></p><h2><strong>Problem 2: Types are prone to misuse and duplication</strong></h2><p>Types are in some ways a harder problem to solve as the pain point is largely caused by a disconnect between writing YAML manually in a code editor and your Backstage application.</p><p>Its worth noting additionally that types have a constrained set of characters, and for instance cannot have separate words. If the format is invalid they will fail to be ingested to the catalog.</p><h3>Documentation</h3><p>You can write documentation on catalog entity schemas to try and keep track of agreed types and hope that your contributors will reference it.</p><h3>Using the CI Pipeline</h3><p>However a better approach might be using some kind of step in your CI that either forces types to match existing types, tells you if it does or does not, or just tells you what existing types there are.</p><p>You can establish existing types in the catalog easily using the <code>/api/catalog/entity-facets?facet=spec.type</code> endpoint, or you could expose a dedicated type validator endpoint in your Backstage backend that also tells you if a type is already in the catalog.</p><p>Alternatively you could use a config based approach or allowlist of types. With this approach, new types must be added to the validator (and ideally documentation) via a review process.</p><p>The downside of some of these less permissive CI jobs are that they can add friction, which can lead to entities being labelled with types that are not appropriate because the contributor doesn't want to go through the process of requesting a new type even if it is a valid addition.</p><p>Alternatively a more permissive CI job could just print out existing types for the kind(s) you are adding via the <code>/api/catalog/entity-facets?facet=spec.type</code> endpoint and leave it to the contributor to make sure they align if it is not a new one.</p><h3>Using the Scaffolder</h3><p>You can also create a <a href="https://github.com/RoadieHQ/getting-started/tree/main/scaffolder/register-new-component/">Scaffolder template to help people bootstrap a new <code>catalog-info.yaml</code> file</a> that uses a dropdown of available Types so that its easy to know what options are available and not duplicate them. Specifically you can use the <code>SelectFieldFromApi</code> parameter widget in your template and point it to the <a href="https://roadie.io/docs/api/catalog/">Catalog API endpoint</a> to <a href="https://roadie.io/docs/scaffolder/writing-templates/#picker-from-external-api-source">get a list of available types</a>.</p><h2>How Roadie can help</h2><p>At Roadie, we help you overcome these kinds of challenges in a few ways. We take care of <a href="https://roadie.io/docs/catalog/modeling-entities/">adding new kinds to Backstage</a> for you as well as a wide range of relationships or new relationships for those kinds.</p><h3>CI Validator</h3><p>Our <a href="https://roadie.io/docs/catalog/validator/">catalog validator</a> can run on your CI to check that the YAML you are adding or updating is valid (i.e. the type field format will not break the ingestion).</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5RRewCbM4SHw1LFe4DDG6S/2a7d20819860a51695853e60563a6978/Screenshot_2024-09-04_at_13.19.42.png" alt="Example of the Validator running as a Github Action"></p><h3>Tech Insights Scorecards</h3><p>Our Tech Insights feature allows you to track the correctness of YAML files being added to your catalog across the organisation as well as at Group levels, and even fix issues with a link to a pre-filled Scaffolder template.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3xhAvWsogGMPnrYYg3IYMX/a7b26cf9c3f82465ddac929d72818034/Screenshot_2024-09-04_at_13.23.20.png" alt="Scorecard in Roadie&#x27;s Tech Insights"></p><h3>Scaffolder</h3><p>Lastly we make available a series of <a href="https://github.com/RoadieHQ/getting-started/tree/main/scaffolder/register-new-component/">Scaffolder actions</a> that allow non-technical users to add things to the catalog with context provided by API calls that can then populate a list of existing types for you so users won't make a mistake with duplicates.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2lnIkAOSGU3s3EKevqMg0p/ba90dd0c2b1742c1099ea0f2600bcd3d/Screenshot_2024-09-04_at_13.25.54.png" alt="Scaffolder for creating YAML Files"></p>
]]></content:encoded></item><item><title><![CDATA[Scaling Backstage]]></title><link>https://roadie.io/blog/scaling-backstage-intro/</link><guid isPermaLink="false">https://roadie.io/blog/scaling-backstage-intro/</guid><pubDate>Fri, 30 Aug 2024 13:00:00 GMT</pubDate><description><![CDATA[The Backstage developer portal is an excellent tool for platform teams, as well as engineers, to keep a handle on their software, maintain compliance statuses and spin up new services. Unfortunately the open source Backstage is known for its difficult set up time and overall cumbersome maintainability. 
There are multiple challenges that arise when the volume of data in the Backstage grows to 1,000s and 10,000s of entities, ranging from performance to ease of use.]]></description><content:encoded><![CDATA[<p>There are multiple challenges that arise when the volume of data in the Backstage grows to 1,000s and 10,000s of entities, ranging from performance to ease of use. We’ll explore these in this article as well as suggesting possible ways around them for your own Backstage deployments.</p><p>The Backstage developer portal is an excellent tool for platform teams, as well as engineers, to keep a handle on their software, <a href="https://roadie.io/blog/backstage-gets-quality-and-compliance-scorecards-with-roadie/">maintain compliance statuses</a> and <a href="https://roadie.io/docs/scaffolder/writing-templates/">spin up new services</a>. Unfortunately the open source Backstage is known for its difficult set up time and overall cumbersome maintainability.</p><p>This pain is often made worse when the catalog within a Backstage instance gets large. Below are a few high-level pointers that we have come across during our journey to support bigger engineering organizations. We’ve scaled our installation to support organizations with multiple tens of thousands of entities over the last year. We’ll dig deeper into individual topics, based on interest, on further articles.</p><hr><blockquote><p>Let us know about your optimization problems or questions in either Roadie or Backstage Discord channels!</p></blockquote><h2>Handling the Catalog data</h2><p>We have written more extensively about catalog performance and how to improve that <a href="https://roadie.io/blog/improving-backstage-performance/">in a separate blog post.</a></p><p>When developing on top of Backstage, you are always building on the foundation of solid catalog data. This makes the CatalogAPI usually the most used API on both back- and frontend of the application. It may be that the entities in the system grow large (looking at you API specs) or that there is just a large quantity of them (looking at you <a href="https://roadie.io/docs/integrations/aws-resources/">automatically ingested AWS resources</a>). Therefore it is important to retrieve only the actual necessary fields that are displayed to the user and limit the amount of entities being fetched.</p><p>The <a href="https://github.com/backstage/backstage/blob/master/packages/catalog-client/src/CatalogClient.ts">default CatalogClient</a> has the option to retrieve only relevant <em>fields</em> through the API. Do use it. Also make sure to use the pagination if possible and hit the correct endpoints with your catalog client. Retrieving less data is always going to be cheaper than retrieving more data. It really makes a big difference whether you want to JSON parse/stringify the biggest API docs in the world multiple times, along with the rest of the catalog data or you just want to use ineffective string manipulations for sorting purposes.</p><p>For some cases we at Roadie have needed to introduce our own endpoints and catalog queries to improve the performance in larger catalogs. These endpoints could be as simple as creating pointed subset queries directly against the database table to identify only needed entities. Or possibly only returning partial entity data with preformatted response shapes. Having the ability to do use-case specific queries for relevant data, and use the better performance usually present in the database layer makes a big difference at times.</p><h2>Processors vs. Providers</h2><p>In the early days of Backstage the approach to ingest entities into the catalog was by using <em>Processors</em> to retrieve data from third party sources. This is still a remnant within the product, and is (unfortunately) still used by some integrations. The main purpose of processors nowadays is to enhance the entities, but same caveats on their usage are still present.</p><p>CNCF maintainers of the project introduced <em>Providers</em> to Backstage at a later stage. These providers allow more maneuverability to schedule and modify the payloads that you are sending to the catalog. <a href="https://backstage.io/blog/2023/01/31/incremental-entity-provider/">Being able to chunk the ingested entities into smaller buckets</a>, having the ability schedule the intervals with more (or less) granularity and having better visibility to the internals of the catalog is a big benefit when tweaking the catalog ingestion to work optimally.</p><p>In many cases the problem may still remain though. The providers may have the need to <em>emit</em> locations or other intermittent data before it is finally stored into the system as a full entity. And in those cases the entity may need to go through the processing pipeline again.</p><p>When you encounter issues that may be related to this approach, make sure that your processors are nimble beasts and are definitely <strong>not</strong> blocking the event loop. Small milliseconds make a difference here. The system is both processing a lot of data and doing it multiple times. User experience may also suffer when processing times get large or some immediately expected entities are clogged up behind a large processing queue.</p><p>In Roadie we have taken an even more performant alternative approach for few specific entities that we natively support and ingest. We want more fine-grained control to serve our larger users better and have created an alternative, self-contained processing module to do the processing for some specific use cases.</p><h2>Scaffolder</h2><p>By default the Scaffolder within a Backstage project runs in the same process as the rest of the application. This is by design, but there is an escape hatch that can be used to externalize this from the codebase. Backstage is built as a modular monolith and in theory has the possibility to be spread out into multiple services.</p><p>There is a fair amount of work to achieve that but the payoff is usually there. The decision to make here is to identify the tradeoffs that your company is willing to sacrifice. Is it ok that only a single scaffolder run is manageable at one time? Does it matter if the rest of the Backstage application begins to show signs of slowness when other processes are running?</p><p>The Scaffolder, and larger Tech Insights installations, take a lot of CPU cycles from the underlying hardware which may negatively interfere with the user experience. Blocking the event loop is the <em>big no-no</em> in the Node.js world when it comes to performance. If you are running heavy tasks within the same process that you are using to serve your users, you may encounter bad times. If bad times appear, consider externalizing some of the chunkier pieces of your instance. These may include the Scaffolder, Tech Insights, Search indexing, Cost Insights and the catalog processing loop.</p><p>Roadie has extracted the larger, more resource hungry processes into ephemeral standalone processes to avoid eating up the event loop cycles of the main application. In our case these are running in AWS, where we host your Roadie instances, as ECS tasks or Lambda functions, depending on the use case. With the backend system fully out for the Backstage project, it should be much easier to spin out supporting services into their own processes and leave the catalog alone to do what it does best, showing entities to the users in a performant manner.</p><h2>Perceived Frontend Performance</h2><p>Of course, performance is relevant only to the users if they are able to <em>feel</em> it. This is present in standard Backstage installations on the frontend layer of the application. Does your catalog load fast? Do you have a ton of frontend plugins installed and your bundle sizes are big? Do you need to rebuild your tech docs every time you navigate to the docs page?</p><p>For the frontend resources, there are multiple well-known performance tricks that can be included in the build process and hosting solutions that you are using to serve your frontend app. In the end, the Backstage frontend is a single-page application with all the known benefits and caveats. All the data displayed will need to be retrieved from somewhere before they enter the Javascript runtime to be rendered on the screen.</p><p>In most cases getting the data we want to display means API calls. For some that is ok, like getting cheap values from fast endpoints, but for some the roundtrip to the server is not worth it. You can embed relevant information to the <code>index.html</code> that is either served from the backend (in newer Backstage installations) or pre-built during the deployment process. You can also use localstorage to your advantage, in fact Backstage does use this for some of its data, but not necessarily for caching purposes unfortunately.</p><h2>YAMLs and scaling maintainability</h2><p>The canonical and recommended approach by the CNCF open source maintainers of Backstage is to use catalog manifest files, usually called <code>catalog-info.yaml</code> within code repositories to store entity data for the catalog. In a large amount of cases this is the wrong approach. You maybe able to keep domain and system entities up to date easily, since they are small in numbers and change rarely. For other kinds/types of entities we have seen with multiple of our customers that maintaining and keeping those YAML files up to date in an engineering organization is difficult.</p><p>A better approach to ingest entities in many cases is to automate the process of at least initial entity information from a more robust source of truth. In the end, trusting humans to update a random file in their repository just for the sake of updating it seems unlikely to succeed 100% of the time.</p><p>In modern engineering organization there are multiple different good sources to use as the canonical starting point for your entity data. For users and groups you have your Oktas or Azure Entra Ids. For repositories you have your GitHub APIs. For components, APIs and resources you have your running instances, your exposed OpenAPI endpoints and your K8s or cloud provider APIs.</p><p>Ingesting the relevant data automatically from these allows you to trust that the relevant information is up to date and mirrors correctly the software that is actually being developed within your organization and what is running in environments.</p><p>That is in the end the purpose of a developer portal, a mapping of software that your are providing to your customers, not a mapping of software that you have at some point written into a well-formatted text file.</p><h2>Rate limits</h2><p>There are downsides to automating your catalog ingestion as well. Backstage relies heavily on integrations towards third party APIs and this causes some implications on how up to date it can keep the catalog information. Being so reliant on other services and wanting to be the single pane of glass to display that information means that you need to be aware of the limitations of this system.</p><p>Backstage by default is a pull-based system which contacts third parties using API tokens or other authentication information and retrieves relevant data. Usually in the form of plugins, this isn’t a massive issue since the actual concurrent user count is relatively small. Even for the bigger clients we don’t usually see high 3 digit morning rushes. The just-in-time nature of retrieving data on runtime to display from third parties on the familiar frontend thus works well.</p><p>On the other hand, Backstage also stores data internally. Data that it gathers automatically from third parties and uses to generate insights or enhance entities. These <em>processing loops</em> usually run at a schedule and try to slurp in as much as they can. Herein lies the problem where rate limits are introduced in the system.</p><p>Monitoring rate limits against different system is extremely important and helps you identify when you getting close to the edge to make the downstream service angry. Backstage offers a good set of monitoring primitives to expose metrics from your providers. You can for example <a href="https://backstage.io/docs/tutorials/setup-opentelemetry">set up open-telemetry</a> to gather the information you need. Exposing rate limit information either by querying it on a loop periodically or directly embedding it to your fetch client implementations. The former approach gives you the ability to manually tweak your calling schedules to accommodate your integrations, the latter may give you the ability to automatically slow down the calling loop to satisfy the limits.</p><p>Let us know if you have encountered any other aspects or approaches that have helped you scale your Backstage instance and work effectively within your organization.</p>
]]></content:encoded></item><item><title><![CDATA[Solving the Day 2 Problem with Backstage]]></title><link>https://roadie.io/blog/roadie-solving-the-day-2-problem-with-backstage/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-solving-the-day-2-problem-with-backstage/</guid><pubDate>Wed, 28 Aug 2024 13:00:00 GMT</pubDate><description><![CDATA[As organizations strive to innovate and deploy new features rapidly, they often encounter a significant hurdle known as the "Day 2 problem." This term refers to the challenges that arise after the initial deployment of software, including maintenance, scalability, and observability. Roadie, leveraging the power of Backstage, is here to address these issues head-on.]]></description><content:encoded><![CDATA[<h1>Roadie - Solving the Day 2 problem with Backstage</h1><p>In today's fast-paced tech landscape, the importance of efficient software delivery and operations cannot be overstated. As organizations strive to innovate and deploy new features rapidly, they often encounter a significant hurdle known as the "Day 2 problem." This term refers to the challenges that arise after the initial deployment of software, including maintenance, scalability, and observability. Roadie, leveraging the power of Backstage, is here to address these issues head-on.</p><h2>Understanding the Day 2 Problem</h2><p>The Day 2 problem emerges once a service or application is live. Initial deployments, often facilitated by tools like the <strong>Backstage scaffolder</strong>, can be exciting and successful, but afterwards organizations must ensure that their systems are robust, resilient, and scalable to handle ongoing demands.</p><p>Common challenges associated with the Day 2 problem include:</p><ol><li><strong>Operational Complexity</strong>: As the number of services grows, so does the complexity of managing them. Understanding dependencies, configurations, and performance metrics becomes increasingly challenging.</li><li><strong>Monitoring and Observability</strong>: Ensuring that systems are functioning correctly requires effective monitoring tools. Without proper observability, identifying issues and their root causes can be a daunting task.</li><li><strong>Scalability Issues</strong>: As user demand fluctuates, systems must scale efficiently. Inadequate scaling strategies can lead to outages and degraded performance.</li><li><strong>Technical Debt</strong>: Over time, quick fixes and shortcuts can accumulate, leading to a backlog of maintenance tasks that hinder progress.</li></ol><p>While we all know of the pitfalls of continued development of a service, we also often fall into them. So how can we avoid these problems and build better software over time?</p><h2>Roadie and Backstage: A Powerful Duo</h2><p>Enter <strong><a href="https://roadie.io/" title="Roadie">Roadie</a></strong>, a platform designed to enhance developer experience by simplifying the management of software infrastructure. Built on <strong><a href="https://backstage.io/" title="Backstage">Backstage</a></strong>, an open platform for building developer portals, Roadie provides a comprehensive solution to the Day 2 problem.</p><h3>Tech Insights</h3><p>At the core of Roadie's solution to the Day 2 problem is the <strong><a href="https://roadie.io/backstage/plugins/tech-insights/" title="Tech Insights">Tech Insights</a></strong> feature. This powerful scorecarding tool equips teams with the necessary capabilities to monitor and improve their systems effectively.</p><p>Here’s how it addresses the challenges:</p><ol><li><strong>Built-In and Custom <a href="https://roadie.io/docs/tech-insights/data-sources/" title="Data Sources">Data Sources</a></strong>: Roadie provides automated data capture of key systems and the ability to create your own custom data sources. Each of these data sources captures data to evaluate the health and performance of your services. These data sources run regularly, ensuring that any deviations from expected standards are identified quickly, allowing teams to address issues proactively.</li><li><strong><a href="https://roadie.io/docs/tech-insights/checks/" title="Checks">Checks</a></strong>: Roadie supports the creation of custom checks tailored to specific organizational needs. This flexibility allows teams to define criteria that are most relevant to their applications, enhancing monitoring and compliance.</li><li><strong><a href="https://roadie.io/docs/tech-insights/scorecards/" title="Scorecards">Scorecards</a></strong>: Roadie provides scorecard features that offer a clear visual representation of service standards across multiple checks. These scorecards make it easy for teams to assess standards at a glance, highlighting areas that require attention and improvement.</li><li><strong>Technical Documentation</strong>: When issues are detected, Roadie offers direct links to relevant technical documentation. This ensures that developers have immediate access to the information they need to resolve issues efficiently.</li><li><strong>Scaffolder Templates</strong>: In addition to documentation, Roadie integrates scaffolder templates into checks to guide teams in fixing source code or performing various infrastructure tasks. This feature streamlines the process of implementing changes, reducing the time spent on manual fixes.</li></ol><h3>Wider Benefits of Using Roadie with Backstage</h3><ol><li><strong>Improved Developer Experience</strong>: By providing clear visibility, Roadie enhances the overall developer experience. Teams can focus more on building features rather than troubleshooting issues.</li><li><strong>Faster Incident Response</strong>: With automated checks and scorecards, data piped into partner systems like Incident.io, and clear lines of ownership outline in the <a href="https://roadie.io/docs/catalog/ownership/" title="Ownership in the Software Catalog">Software Catalog</a>, teams can respond to incidents faster, minimizing downtime and improving overall service reliability.</li><li><strong>Increased Productivity</strong>: By streamlining service management and automating fixes through scaffolder templates, Roadie empowers teams to be more productive, allowing them to deliver new features and enhancements more rapidly.</li><li><strong>Proactive Maintenance</strong>: The combination of checks, scorecards, and direct access to documentation helps teams adopt a proactive maintenance approach. This reduces technical debt and ensures that systems remain healthy and efficient.</li></ol><h2>Conclusion</h2><p>In a world where rapid deployment is essential, addressing the Day 2 problem is crucial for sustained success. Roadie, in conjunction with Backstage and its scaffolding capabilities, offers a powerful solution to navigate the complexities of service management post-deployment. By leveraging the Tech Insights feature—complete with checks, scorecards, and fix links—Roadie enables teams to focus on what they do best: building and delivering innovative software solutions.</p><p>As organizations continue to evolve, leveraging tools like Roadie and Backstage will be key to overcoming Day 2 challenges and ensuring a smooth journey from deployment to operation. Embracing this approach will not only improve operational efficiency but also foster a culture of continuous improvement and innovation.</p>
]]></content:encoded></item><item><title><![CDATA[Towards a more flexible software catalog]]></title><link>https://roadie.io/blog/towards-a-more-flexible-catalog/</link><guid isPermaLink="false">https://roadie.io/blog/towards-a-more-flexible-catalog/</guid><pubDate>Wed, 21 Aug 2024 11:00:00 GMT</pubDate><description><![CDATA[Summer is a productive time at Roadie. We've been working a lot to build more flexibility into the Backstage data model and UI. This month we're rolling out custom catalog columns and wider relationship support between elements of the software catalog. We've also revamped how we pull data in, starting with the AWS provider.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>🚨 A new Product kind</h2><p>A new kind has made its way into Roadie.</p><p>You can now leverage the <code>Product</code> kind in your catalog. It works in much the same way as other business-level kinds like domain and system, which seek to collate and create boundaries around other elements of your architecture.</p><p>It's flexible Kind, designed to provide a neat wrapper or intemediary between existing concepts.</p><p>You might have:</p><ul><li>Domain > Product > System</li><li>Product > Domain > System</li><li>Domain > System > Product</li></ul><p>The relationships and structure of this are up to you. If you have any problems or run into unexpected inability to articulate the model you're after then just let us know on Slack/Teams.</p><p>Enjoy. https://roadie.io/docs/getting-started/model-software/</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4AqUfkUmoEpBvPLP30Lvmh/843abd0ffca6e9403236857122097a0e/Screenshot_2024-09-02_at_14.13.22.png" alt="Product Kind Demo"></p><h2>🧘‍♂️ More (and more-flexible) relationships between entities</h2><p>We have extended the number of permissible relationships between entities to cover different arrangements of entities.</p><p>We've also introduced the <code>managedBy</code> and <code>managerOf</code> relationships to provide referential integrity between entities that managed things and the entities themselves. People manage people and people manage services (distinct from ownedBy which is often not accurate at describing responsibilities within a system).</p><p>If you want to surface those relationships on an entity page you now can, using the <code>EntityRelationsCard</code>. This allows you to filter the displayed relations and only show the ones that are relevant for your use-case.</p><p>More info here about what is now possible. https://roadie.io/docs/catalog/showing-dependencies/</p><p>If you require additional flexibility in relationships between entity kinds that is not supported, you can request this to be added via the Roadie Support.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5EF6PbzATWULFoIHvFqYYF/63a2099a82ce4e94bd07acc492397edb/Screenshot_2024-08-21_at_16.28.21.png" alt="managedBy Relationships"></p><h2>🏛️ Custom columns for your Catalog (in beta)</h2><p>Ever wanted more control over the columns you have in the catalog? Well now you have it. We expose the setting of default columns and creating custom columns based on entity metadata, for each tab you create in the catalog.</p><p>We currently have this in beta, so let us know if you want some custom columns and we'll create them for you.</p><h2>🎨 Type-based entity layouts</h2><p>In the past you used to be able to set different entity page layouts for the Component kind based on types, but for all other kinds (Resources, Domains etc) it was limited to one per kind.</p><p>This doesn't really make much sense when you think about Resources for example, where an entity that is looking at a kubernetes cluster really needs different information from one looking at a AWS RDS instance.</p><p>So we changed it. Type-based layouts are now available for all kinds. https://roadie.io/docs/getting-started/configure-ui/</p><h2>💈 New sidebar configuration options  (in beta)</h2><p>We've been adding new functionality to the Sidebar for a while in a piecemeal fashion, so we decided it needed a full revamp. You can now:</p><ul><li>Add sidebar items that start with the root pages (i.e. /catalog) resolving a particularly annoying issue where you were blocked from doing so because one of those root pages already existed.</li><li>You can hide the Login to GitHub option entirely for all users.</li></ul><p>More info here:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/jdrLb4UvGLjHLXb0NGrcH/27788d174ecd6d5806894200dcdcffd7/Screenshot_2024-08-20_at_14.14.20.png" alt="Sidebar Config Options"></p><h3>🔌 Plugins &#x26; Integrations roundup</h3><ul><li><strong>Incident.io plugin</strong>: we now expose the <a href="/docs/integrations/humanitec/" title="Humanitec Plugin">Humanitec plugin</a>, so you can view deployment information insider Roadie. Fun.</li><li><strong>AWS account creation</strong>: last but by no not least, on the topic of catalog data sources this time rather than plugins - our AWS provider can now <a href="/docs/integrations/aws-resources/" title="AWS Auto-ingestion">create AWS accounts and associated resources</a> auto-magically. We also now ingest AWS tags as part of AWS automatic resource discovery. You can use tags to build relationships between newly ingested AWS resources and other entities in the catalog. Neat.</li></ul>
]]></content:encoded></item><item><title><![CDATA[Role-based access control in Roadie]]></title><link>https://roadie.io/blog/role-based-access-in-roadie/</link><guid isPermaLink="false">https://roadie.io/blog/role-based-access-in-roadie/</guid><pubDate>Mon, 24 Jun 2024 11:00:00 GMT</pubDate><description><![CDATA[Role-based access control and user management is now available in Roadie, with role management, user management and fine-grain control over the catalog, scaffolder and Tech Insights 🎉 as well as the upgrade to 1.26 and new plugins and integrations.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h2>🚨 Controlling your Catalog with Role-based Access Control (RBAC)</h2><p>We've been working on this for a while and it is with some fanfare that we announce <a href="https://roadie.io/product/access-control/" title="Role-based access control on Roadie">Role-based Access Control (RBAC) on Roadie</a>. 🤝✨🙌</p><p>Transparency of information is at the heart of Roadie (and one of the key philosophical principles of Backstage), but there are often valid reasons for gating access to information and maintaining the principle of least privilege.</p><p>For example, let's say you want to onboard Customer Service agents to your catalog so they can quickly find information about a service that may be experiencing some issues. You don't necessarily want to give Customer Service folks the ability to execute a scaffolder run or browse through Tech Insights Scorecards: that's just unnecessary and may even be confusing.</p><p>What you need in that situation is fine-grained control over your Roadie instance.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/354X9A68s0EEGMCkR0rVEy/732eadd64d7773983fb27b7905a93337/Screenshot_2024-06-21_at_11.17.05.png" alt="Role Management"></p><h3>What is RBAC in Roadie?</h3><ul><li>A new framework for access control in Roadie</li><li>Every customer will have access to this new framework</li><li>Some features are part of a paid add-on</li></ul><h3>And how does it work?</h3><ul><li>Every part of Roadie is behind a Permission</li><li>Roles are made up of sets of Permissions</li><li>Users have Roles</li><li>Admins can manage roles</li><li>Admins can manage users</li></ul><h3>🎭 Role management</h3><p>Several roles are available out of the box. They cover our existing access management setup as well as well as introduce some convenient shortcuts for  new user groups that we've seen emerge recently:</p><ul><li>Admin</li><li>Tech Insights Admin</li><li>Maintainer</li><li>Viewer</li></ul><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/QpWi5K7W067wULqi9K9ar/ebdd594e554cfae7b8ea148800ad2bd5/Screenshot_2024-06-21_at_11.16.43.png" alt="User Management and IdP"></p><h3>📚 Roles from a variety of sources, like your identity provider</h3><p>The new permissions system will be allow you to send us roles from your identity provider as well as define them in the Roadie UI. This will mean the old GitHub Admin group is no longer required</p><p>Roles from all sources appear in the Role Management UI so you can understand where a user is inheriting a role from.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7bfOOPurLzILJnlPFVAN1T/5728a62b4855ea0582d47618c54445c2/Screenshot_2024-06-21_at_11.16.52.png" alt="Roles from an IdP"></p><h3>🆔 User management</h3><p>You can then attach users to roles in the User Management UI and update the roles (and therefore permissions) that a user has access to on-the-fly.</p><p>As part of the switch-over to the new permissions system we run a background task to map the old roles system to the new one so there's no switching cost for existing customers.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/68wOJOShKad1siBeR2uqoJ/4446d1a3f29715ea38761a82fc18b5bd/Screenshot_2024-06-21_at_11.17.45.png" alt="Users and permissions"></p><h3>🛃 Custom permissions and roles</h3><p>The out of the box roles will get you so far, but any organisations need or want a higher degree of control over who can see what in their Catalog, who has the right permissions to trigger Scaffolder actions, and which elements of Tech Insights are or aren't displayed to certain users.</p><p>That's where custom permissions policies and roles step in.</p><p>The ability to create new roles and attach fine-grained permissions are optional paid extras and will cover use cases like hiding services in the catalog or controlling who can run individual scaffolder templates. We can setup trials/start discussions if you’re interested.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2cx3kN942BaakmUkL2KGZ9/6e5f30b24da50c924c937585393aca4e/Screenshot_2024-06-21_at_11.17.28.png" alt="Custom roles"></p><h2>What's next for RBAC?</h2><h3>Individual entity permissions</h3><p>At the moment, while we allow for a much higher degree of control over the catalog than before, we haven't drilled down to the individual entity level to say <code>You need to have </code>permissions-to-view-X-component` to view X component'. That's next.</p><p>That will mean you can:</p><ul><li>Control access to individual components or entities</li><li>Control access to scaffolder templates and actions</li><li>And create your own extremely specific permissions to target these entities</li></ul><hr><h3>🙌 Backstage 1.26</h3><p>We've now upgraded everyone to <a href="https://backstage.io/docs/releases/v1.26.0/" title="Backstage 1.26">1.26</a>. This upgrade introduced some significant changes to the authentication system and lays the groundwork for the <a href="https://drodil.medium.com/backstage-notifications-ceedf812ceef" title="Heikki Hellgren&#x27;s Medium Post on Notifications">notification systems</a> that lands fully in the next few versions.</p><hr><h3>🔌 Plugins &#x26; Integrations roundup</h3><ul><li><strong>Humanitec plugin</strong>: we now expose the <a href="https://roadie.io/docs/integrations/humanitec/" title="Humanitec Plugin">Humanitec plugin</a>, so you can view deployment information insider Roadie. Fun.</li><li><strong>Coder plugin</strong>: speaking of plugins, we also integrated the <a href="https://roadie.io/docs/integrations/coder/" title="Coder Plugin">new Coder plugin</a>. Tying neatly together Humanitec and Coder, Humanitec recently ran <a href="https://humanitec.com/events/coder-with-backstage-the-missing-link-in-developer-productivity-2024-06-18" title="Humanitec Coder Webinar">a webinar</a> on how Coder thinks about integrating into Backstage which is worth checking out.</li><li><strong>AWS tag-based relationships</strong>: last but not least, on the topic of catalog data sources this time rather than plugins - we now ingest AWS tags as part of AWS automatic resource discovery. You can use tags to build relationships between newly ingested AWS resources and other entities in the catalog. Neat.</li></ul>
]]></content:encoded></item><item><title><![CDATA[Customising the Roadie UI]]></title><link>https://roadie.io/blog/a-more-customisable-roadie/</link><guid isPermaLink="false">https://roadie.io/blog/a-more-customisable-roadie/</guid><pubDate>Fri, 17 May 2024 11:00:00 GMT</pubDate><description><![CDATA[Customisations of the sidebar and the Catalog UI have arrived 🎉 as part of a whole suite of tools UI tweaks and tools designed to reduce the cognitive load for users as your catalog grows.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h3>🗂️ Customising the Catalog UI</h3><p>Your Catalog should match how you organise teams and build software. In the past, you've had to use Backstage nomenclature to model those concepts in the Catalog, but no more.</p><p>We've introduced <a href="https://roadie.io/docs/catalog/custom-views/" title="Custom Catalog Tabs">Catalog Customisation</a> in the Admin area so you can use your own naming conventions and concepts to define your Catalog using Kinds and Types and simplify the UI for everyone.</p><p>Add, edit, and reorder Catalog tabs to your hearts content.</p><p>Docs can be found <a href="https://roadie.io/docs/catalog/custom-views/" title="Custom Catalog Tabs">here</a> on how to get started.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6nWCFVgI1YOAEoZZ0cowOi/37391bc6e6d8dbbcbd6a0dec94e7d83b/Screenshot_2024-05-13_at_10.52.14.png" alt="Custom Catalog Tabs"></p><h3>💈 Sidebar slimming</h3><p>In the same vein, we have also opened up support for full customisation of the sidebar.</p><p>This was possible in a limited capacity in the past, but you can now modify the whole kit and caboodle. This applies across your instance of Roadie for all users, allowing Admins to slim down the sidebar and reduce information overhead for everyone.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4Mk8XleuQxmZ9IgF1Mxm6L/8f380b8c65cf28c2e9fdfe37cda7cc2f/Screenshot_2024-05-13_at_10.50.49.png" alt="Sidebar Customisation"></p><h3>🌅 Surfacing metadata</h3><p>We've also making customisation easier by introducing a <code>EntityMetadataCard</code>. <a href="https://roadie.io/docs/catalog/entity-metadata-card/" title="Entity Metadata Card">The new card</a> allows you to pull in any metadata you'd like from a catalog-info.yaml file and surface it on an Entity page. This allows you to pull in rich information on things like ownership or the last time an entity was updated, without the effort of writing a custom Typescript plugin to do the same job.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/67AHwiIOdBHkxS7JW2YxO3/e6b38a7e5bb4d91b2c9f19f300382be7/entity-info-card.webp" alt="Entity Metadata Card"></p><h3>🌈 New default fonts, padding and colours</h3><p>You may also have noticed Roadie has had a general facelift. This is part of our efforts to make it easier for users to consume visual information on any given Roadie page. This will be a long-running piece of work for us, but expect Roadie to just look and feel <em>nicer</em> in the weeks and months to come.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2rVXuvS7JbQ9oJOMxmrbpM/544cf8cf18f9481fdcd85f0e5ccae115/Screenshot_2024-05-13_at_10.46.28.png" alt="New Roadie UI"></p><h3>🙌 Filters for all!</h3><p>We've also added Advanced Filters to all Catalog Kinds. This allows you to create a filtered, linkable views to share with colleagues. It has previously been possible to use Advanced Filters for a number of different Catalog Kinds, but now it's available for all.</p><p>Happy filtering :)</p><h3>🔌 Plugin roundup</h3><p>We've added a new <a href="https://roadie.io/docs/integrations/dynatrace/">Dynatrace</a> plugin. This is the best plugin for customers of hosted Dynatrace services. You can also access the Dynatrace plugin for self-managed users, which has been renamed <code>Dynatrace for Managed</code> in line with Dynatrace naming conventions.</p><p>This is part of our efforts to more closely sync up with plugin maintainers like <a href="https://www.dynatrace.com/" title="Dynatrace">Dynatrace</a> to make sure Roadie customer feedback reaches them. Their plugins get better, Roadie gets better, you get a better service from both. Win, win, win. 🤝</p>
]]></content:encoded></item><item><title><![CDATA[A more customisable Scaffolder]]></title><link>https://roadie.io/blog/a-more-customisable-scaffolder/</link><guid isPermaLink="false">https://roadie.io/blog/a-more-customisable-scaffolder/</guid><pubDate>Thu, 11 Apr 2024 11:00:00 GMT</pubDate><description><![CDATA[Extensions to the Scaffolder UI and Custom Actions for Templates have arrived 🎉 as part of a whole suite of tools designed to help get more value from automation within Roadie.

We've also added the Dynatrace Plugin and adopted the all new, super-shiny PagerDuty plugin.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h3>🤖 Custom Scaffolder Actions.</h3><p>Templates and the Scaffolder get heavily used by our customers to democratise common tasks like adjusting cloud account budgets or making changes to some Terraform repos.</p><p>One of the historic limitations with the Roadie Scaffolder has been that customers were unable to create and use their own custom actions. This added some friction for more advanced Scaffolder users where they were limited by the Scaffolder Actions that we supported by us when they came to write a Template.</p><p>No more though! Roadie customers can now create their own self-hosted Custom Scaffolder Actions.</p><p>That means:</p><ol><li>You can now write your own Scaffolder Actions to complete a task or use some custom CLI that Roadie isn't aware of</li><li>Then register those Actions in Roadie - as many as your heart desires.</li><li>Context can then be passed back and forth between Roadie and these Actions within any template you write.</li></ol><p>Detailed docs can be found <a href="https://roadie.io/docs/scaffolder/self-hosted-scaffolder-actions/" title="Self-hosted Custom Scaffolder Actions">here</a> to get started.</p><h3>⽥ Custom Field Extensions</h3><p>In the same vein, we have also opened up support for <a href="https://roadie.io/docs/scaffolder/custom-fields/" title="Custom Field Extensions">Custom Field Extensions</a> for the Scaffolder.</p><p>You can write your own React Components and validator functions to handle the use cases not currently covered by the existing Templates. This allows you to customise a lot of the Scaffolder forms to your hearts content.</p><h3>🔌 Plugin roundup</h3><ul><li><p>The <a href="https://roadie.io/docs/integrations/dynatrace/">Dynatrace plugin</a> is now supported on Roadie. You can use it to surface recent problems, error traces, and synthetics results for your services.</p></li><li><p>The <a href="https://pagerduty.github.io/backstage-plugin-docs/" title="PagerDuty Plugin">PagerDuty plugin</a> has had a significant upgrade thanks to the fine work Tiago Barbosa is doing as part of PagerDuty taking maintainership of the plugin earlier this year. We've adopted the new plugin, so you'll see a slick new UI and some additional features as they rollout.</p></li></ul>
]]></content:encoded></item><item><title><![CDATA[Repositories have come to the Catalog]]></title><link>https://roadie.io/blog/repositories-in-the-catalog/</link><guid isPermaLink="false">https://roadie.io/blog/repositories-in-the-catalog/</guid><pubDate>Mon, 11 Mar 2024 11:00:00 GMT</pubDate><description><![CDATA[Repositories have come to the Catalog (in beta) 🎉 as part of a whole suite of tools designed to help get to a complete catalog. We also have an AWS Resource Provider and new Catalog editing tools. 

We've also added the End of Life Plugin (both frontend and backend parts) to allow for some neat visualisation of deprecation information.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h3>🤖 [Beta] Repositories have come to the Catalog.</h3><p>We think a lot about catalog completeness at Roadie.</p><p>One of the most successful strategies we've come across to aid in getting components into the catalog is to surface what <em>is</em> in the catalog against a list of repositories. The gap between the two help to identify what <em>is not</em> in the catalog. Simple.</p><p>To help customers do this as easily as possible we've brought Repositories in from the cold and added them as a core part of the Catalog.</p><p>That means:</p><ol><li>There is a new Repository tab in the Catalog</li><li>Repositories are auto-discovered for GitHub users. For other SCMs <a href="https://roadie.io/docs/api/catalog/" title="Roadie API Docs">the Roadie API</a> is available to push your repositories in.</li><li>Quick editing of Repositories inside the Catalog table itself.</li></ol><h3>📝 Set owners and other properties in the Catalog table UI (no yaml required)</h3><p>We’ve expanded <a href="https://roadie.io/blog/decorators-product-announcement/">Decorators</a> again to allow editing of <code>Title</code>, <code>Owner</code>, <code>Type</code> and <code>Lifecycle</code> for Catalog entities from the Catalog table itself.</p><p>You can now edit the title, owner, type and lifecycle of entities right in the Catalog table UI. All without yaml. Just click the pencil icon in the <code>Actions</code> section of whatever row you'd like to edit.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1cl5EkTjzszI9voiNoZ9UJ/3de1de44f578977f62111d3040f71f8d/Screenshot_2024-03-11_at_12.19.40.png" alt="Catalog editing"></p><h3>⛏️ [Beta] Auto-discover AWS Resources with our new provider</h3><p>Our customers often want to represent AWS resources in the Catalog and have been using the Roadie API to do just that. There's a simpler way though, so we decided to build it. Behold: the <a href="https://roadie.io/docs/integrations/aws-resources/">Roadie &#x3C;> AWS Provider</a>.</p><p>It can currently be configured to pull in:</p><ul><li>Lambda functions</li><li>EKS clusters</li><li>S3 buckets</li><li>DynamoDB tables</li><li>EC2 instances</li><li>and RDS DBs</li></ul><p>Coming soon: AWS Accounts.</p><h3>⏫ Upgrade to Backstage 1.23</h3><p>We’ve upgraded Backstage to 1.23 for all customers:</p><ul><li>A fix to a vulnerability identified by us as part of our annual third-party penetration test. It related to the Scaffolder and didn't actually affect us, but we worked on fixing it with the core Spotify team to keep the rest of the community safe. More info <a href="https://backstage.io/blog/2024/02/28/security-notice" title="Backstage Security Notice">here</a>.</li><li>A tweak to Nunchucks trimblocks that allows for creating control over templating in the Scaffolder, worked on by our own Miklos, one of the core Roadie team.</li></ul><p>Full release notes for Backstage <a href="https://backstage.io/docs/releases/v1.23.0">1.23</a>.</p><h3>🔌 Plugin roundup</h3><ul><li>The <em>new</em><a href="https://roadie.io/docs/integrations/endoflife/" title="End of Life Plugin">End of Life plugin</a> is now supported on Roadie. We've bought in both the frontend and the backend for this plugin, so it is able to read from repository files via the "source-location" annotation. It’s nice. We like it.</li></ul><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7cgHthpgj20HR0clcUysC/c7c07996740fdbc8161a800e0ccc6db6/end-of-life-example.png" alt="End of Life Example"></p>
]]></content:encoded></item><item><title><![CDATA[The Roadie API is now live and scorecards have come to the Catalog]]></title><link>https://roadie.io/blog/roadie-api-snyk-id-techinsights-exclusions/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-api-snyk-id-techinsights-exclusions/</guid><pubDate>Mon, 12 Feb 2024 11:00:00 GMT</pubDate><description><![CDATA[The Roadie API is now live for all users on our Growth Plan 🎉. You can use it to push entities to the catalog, dry-run the scaffolder and interrogate TechInsights scorecards. 

We've also added Scorecards to the Catalog, excluding facts is now possible when creating TechInsights checks, expanded the use of decorators, upgraded to Backstage 1.21 and added a few new plugins.]]></description><content:encoded><![CDATA[<p><em>The latest features and updates from Roadie.</em></p><h3>🤖 The Roadie API is now live for all users on our Growth Plan.</h3><p>It includes:</p><ol><li>The Backstage <strong><a href="https://roadie.io/docs/api/catalog/">catalog API</a></strong> exposed, via token authentication for Roadie customers.</li><li>A <strong><a href="https://roadie.io/docs/api/templates/">scaffolder API</a></strong> for testing templates in continuous integration (via dry-run), triggering templates, and listing historical scaffolder runs.</li><li>A **<a href="https://roadie.io/docs/api/techinsights/">Tech Insights API**</a> that allows you to create, read, update and delete scorecards, checks and data sources.</li></ol><p>At the moment <a href="https://roadie.io/docs/api/authorization/">API token</a> generation is limited to Admins. If you are an Admin and need a token, simply navigate to Administration (bottom of the sidebar) and Account (in the tabs across the top). Give your token a name and click Generate Token.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6wst5xDM43P3UHIPEYYAre/08a6212019cf4b2992ff74e99c6f9dbd/Screenshot_2024-02-12_at_12.46.01.png" alt="Roadie API"></p><h3>🎨 More Decorators: you can now set owners and other properties in the Roadie UI (no yaml required)</h3><p>We’ve expanded <a href="https://roadie.io/blog/decorators-product-announcement/">Decorators</a> to allow setting many more fields on the Entities in your catalog.</p><ul><li>You can now override the owner, lifecycle and tags of Components right in the Roadie UI.</li><li>Groups and all other Entity Kinds have also been expanded so that more properties can be set.</li></ul><p>All without yaml. Just use the <code>Decorate Entity</code> feature in the top left corner of each Component page. Simple.</p><h3>💯 Scorecards in the Catalog</h3><p>How to surface scorecard information to teams is something we think a lot about at Roadie. A few months ago we added Rollups to help. This month we launched Scorecards in the Catalog to increase visibility and discoverability for teams. This is something with a long history (<a href="https://github.com/backstage/backstage/issues/2292" title="Backstage Issue - Status API">the original discussion in the Backstage community</a> was way back in September 2020) and it's something <a href="https://www.linkedin.com/posts/davidtuite_back-in-year-1-of-the-backstage-from-spotify-activity-7161038359232937986-wKYp/?utm_source=share&#x26;utm_medium=member_desktop" title="David Tuite&#x27;s LinkedIn Post">we're hyped about</a>.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4Y5F1Y6GcftXCeKnGrplvz/d2110fe969a6ee8fe14a416b6367ad21/Scorecards.jpeg" alt="Scorecards in the Catalog"></p><h3>🙅‍♂️ Tech Insights gets exclusions (in beta) and a new Facts list</h3><p>We’re currently beta testing the ability to <em>exclude</em> entities from a Tech Insights scorecard and a check. This will give fine-grain control over the checks and scorecards you can create and pinpoint areas of your catalog.</p><p>We’ve also added a Facts list under the Data tab in Tech Insights to improve discovery of what you can and can’t do with Tech Insights data sources.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5lbeWnmujHpGlSlbeFiAlq/c5be380c7c6ee48ac9ff3ca91ff10a19/Tech_Insights___Create_a_check___Backstage_-_12_February_2024.gif" alt="TechInsights Exclusions"></p><h3>⏫ Upgrade to Backstage 1.21</h3><p>We’ve upgraded Backstage to 1.21 for all customers. The biggest change since the last update at the end of 2023 is the new scaffolder UI with horizontal paging.</p><p>The upgrade to <a href="https://backstage.io/docs/releases/v1.21.0/" title="Backstage 1.21 Release Notes">1.21</a> also fixes a small but annoying bug that was pointed out by some customers: the scroll position of TechDocs pages now returns to the top when you navigate between different docs.</p><p>Our next version bump will be to 1.23 when that lands (hopefully later this month).</p><h3>🔌 Plugin roundup</h3><ul><li>The <em>new</em><a href="https://pagerduty.github.io/backstage-plugin-docs/" title="PagerDuty Plugin Docs">PagerDuty</a> plugin is now supported. PagerDuty took over support for the plugin in January, deprecated the old plugin and launched their own version. It’s nice. We like it.</li><li>The <a href="https://roadie.io/backstage/plugins/pulumi/" title="Roadie Pulumi Plugin">Pulumi</a> plugin is now supported. With it, you can bring infrastructure data associated with your Pulumi stack into Backstage.</li><li>The <a href="https://roadie.io/backstage/plugins/cost-insights/" title="Cost Insights Docs for self-hosted Backstage users">Cost Insights</a> plugin now has beta support. If you're interested reach out to one of the Roadie team on Slack, Discord or Teams.</li></ul>
]]></content:encoded></item><item><title><![CDATA[Rollups for Tech Insights]]></title><link>https://roadie.io/blog/rollups-tech-insights/</link><guid isPermaLink="false">https://roadie.io/blog/rollups-tech-insights/</guid><pubDate>Tue, 14 Nov 2023 00:00:00 GMT</pubDate><description><![CDATA[Rollups aggregate Scorecard and Check data by team and department, up and down your organisational hierarchy, and let you add scorecard information to teams in the catalog.]]></description><content:encoded><![CDATA[<p>In addition to <a href="/blog/decorators-product-announcement/">the recent announcement of Decorators</a>, and a <a href="/blog/live-custom-backstage-plugins-within-seconds/">much faster custom plugins pipeline</a>, we’ve also shipped Rollups for Tech Insights users.</p><h2>Tech Insights</h2><h3>Rollups</h3><p>Rollups aggregate Scorecard and Check data by team and department, up and down your organisational hierarchy, and let you add scorecard information to teams in the catalog. Learn more below in the Tech Insights section.</p><p>Here’s an example for a team called “engineering”.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1mpMLv2BfD1K9fcWp976xF/5f9610715aa86c6939a6b1f356f9e08b/group-profile-rollup.png" alt="group-profile-rollup"></p><p>From this, we can see that the engineering team is doing a great job of using PagerDuty correctly, but could do better at Dependabot configuration. If there are other teams reporting to this one in the org chart then their data will be rolled up into this view also.</p><p>Add the <code>ScorecardResultForGroup</code> and <code>ScorecardResultsTableForGroup</code> Cards to Group layouts to see results like this.</p><p>You can also see this data presented in report format on a single Scorecard, and dive into the data at different levels in the org.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/Ow2jrqj6OYVqztbK8yQyn/14b61ab16193893487c39440a31ada71/rollup-report-format.png" alt="rollup-report-format"></p><h3>Bug fixes and improvements</h3><p>This month has been packed with improvements.</p><ul><li>We’ve got a built-in Data Source that scans for errors in CODEOWNERS files.</li><li>We’ve got a built-in Data Source that ensures your branch protection is correct.</li><li>The built-in Snyk Data Source has been updated to use the <code>github.com/project-slug</code> where possible.</li><li>We fixed some rounding errors in our Check calculations.</li><li>We fixed a labelling issue where there were two inputs called Type on the New Check form.</li><li>We fixed a bug where regex comparison results were exporting incorrectly.</li><li>We fixed a bug where selected annotations or labels in the filters of Scorecards couldn’t be deselected.</li><li>Improved the performance of Data Sources which iterate over repos containing hundreds of thousands of files.</li><li>The “is not blank” operator used to incorrectly ask for a value. Now it doesn’t.</li><li>We had mislabeled the “Number” type as “Integer”. This is fixed.</li><li>Markdown is now supported in Check descriptions so you can link to supporting documentation.</li><li>The Proxy input is now a typeahead so it’s easier to find your favorite proxy.</li><li>Scorecard rings now calculate in a more accurate way. A Component used to have to pass all checks on a scorecard to be counted in the ring. Now all checks that are passed will contribute to the score.</li><li>We brought consistency and sanity to the positioning of the Re-run, Recalculate and Refresh buttons on Scorecards, Checks and Data Sources.</li><li>Fixed some bugs which would prevent scorecards from showing up in the catalog in some cases.</li><li>GitHub based Data Sources now filter out archived repos.</li><li>Improved a bunch of help text sections on the New Data Source page.</li></ul><h2>Catalog</h2><h3>Decorators</h3><p>Decorators allow you to easily add metadata to the stuff you track in your Roadie Backstage catalog. Check out <a href="/blog/decorators-product-announcement/">the blog post</a> for full details and to learn how to use them. One simple use case is to <a href="/docs/catalog/rich-team-pages/">use decorators to add a Team Charter and some links to groups in the catalog</a>.</p><h3>Bug fixes and improvements</h3><ul><li>API specs are now searchable. Start your endpoint searches with a forward slash.</li><li>Renamed “Create…” in the sidebar to Templates. “Create…” was ambiguous.</li><li>We removed Tools from the sidebar and moved its pages into Administration.</li><li>The card used for displaying Links now hides itself from the interface when there are no links.</li><li>Catalog table column visibility is now independently set for each Kind of Entity.</li><li>Catalog tables can now display a links column.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/6xaMoTIyKVuC7eigEaHR3V/149e19c863db6be4d5e2275f8de1a3bf/links-column.png" alt="links-column"></li><li>Entity Titles are now displayed in the catalog table instead of name when possible.</li><li>Admins can now change the sidebar color in the Theme settings.</li><li>We improved the Locations Log and renamed it to Administration → Entity Locations. You can also find it in the tabs of the Import page.</li></ul>
]]></content:encoded></item><item><title><![CDATA[Measuring Catalog Correctness and completeness]]></title><link>https://roadie.io/blog/measuring-catalog-correctness-and-completeness/</link><guid isPermaLink="false">https://roadie.io/blog/measuring-catalog-correctness-and-completeness/</guid><pubDate>Mon, 16 Oct 2023 15:00:00 GMT</pubDate><description><![CDATA[A Catalog with rich information for every software asset in your organization is the ultimate dream. But you won’t get there in a day. Thus, you need a way to track your progress toward Catalog correctness and completeness.]]></description><content:encoded><![CDATA[<p>A comprehensible Catalog in Backstage is the ultimate goal for many teams. To achieve it you need a plan and a way of tracking your progress. This article will not delve into the former, as a suitable plan is something only you and your team can come up with.</p><p>But measuring how well your Catalog is doing is something everyone needs and can help you tune your plan along the way.</p><h3>What is a correct and complete Catalog?</h3><p>Backstage is used by organizations large and small, and for different purposes. Thus, there’s no universal definition of “correct” or “complete” for the Catalog. But let me explain what I’m referring to in this context.</p><p>With “correct,” I mean the data in individual entities. It’s not enough that a component shows up in the Catalog. It must have all the information required for its type. For example, if a component of type service shows up in the Catalog but doesn’t have PagerDuty and API Docs annotations, it’s not rich enough to meet my definition of “correct.” Most importantly, all the meta-data and annotations in the entity must correspond to the entity for it to be correct.</p><p>With “complete,” I am referring to coverage of software assets surfaced in the Catalog. The absolute meaning would refer to having every single component, user, and other kind of entities in the organization reflected in Backstage. However, this is rare. More often, teams may define “complete” within a scope, such as having an entity tracked for every business-critical service, or covering a subset of teams in the Catalog.</p><h3>Tracking Catalog correctness</h3><p>Once you have more than a few entities registered in your Catalog, tracking how rich and correct the metadata in each of them is becomes impossible, especially if you’re welcoming dev teams to onboard their own components.</p><p>Here’s when you need Tech Insights. Tech Insights lets you check data points across your Catalog entries and summarize the findings in Scorecards.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4IRXh0sjIuiMBjhyvBQLa4/ae8ef8c807c7ac297e1da5b0f1380eea/Screenshot_2023-10-13_at_13.30.22.png" alt="Screenshot: Backstage best practices scorecard"></p><p>Tech Insights is available as building blocks in the <a href="https://roadie.io/backstage/plugins/tech-insights/">OSS plugin</a> and as a fully-fledged Scorecards solution as a <a href="https://roadie.io/product/tech-insights/">paid addon in Roadie</a>. The OSS version will provide you the fundamentals so you can implement your own Scorecards solution, while Roadie offers a no-code UI with hundreds of pre-built checks. You can apply the ideas in this article using either of them.</p><h3>What to check for?</h3><p>With Tech Insights, the possibilities of what you can check for are pretty extensive. Thus, I recommend designing a schema that defines what constitutes best practices in your Catalog.</p><p>For example, for all components I expect a minimum to at least have a name, description, group owner, and a valid type to be set. To overview all of these criteria I need to define several checks in the component metadata:</p><ol><li>Check that name and description are present: make Tech Insights check the <a href="https://roadie.io/docs/tech-insights/tracking-catalog-correctness/mandatory-metadata/">metadata for entities is present</a> for the fields you’re interested in.</li><li>Check that the component is associated to a GitHub repository: linking a component to GitHub unlocks several features in Backstage, thus, make sure to check that the <code>github.com/project-slug</code><a href="https://roadie.io/docs/tech-insights/tracking-catalog-correctness/github-annotation/">annotation is present</a>. You can also check if it complies with a form using regex.</li><li>Check that the type of entity is valid: the type field in Backstage can be an arbitrary string, which is prone to human errors. You can check that the types defined in all components do belong to a <a href="https://roadie.io/docs/tech-insights/tracking-catalog-correctness/valid-type/">set of strings that you expect</a>.</li><li>Check that components are labelled correctly: if you need your components to be tagged with their corresponding region or tier, using labels is the simplest way to go. Thus, you must asses which components are <a href="https://roadie.io/docs/tech-insights/tracking-catalog-correctness/correct-labels/">complying with the expected labels</a>.</li></ol><p>You can check any other detail from the entity metadata, and decide if the check applies for only a subset of them.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/Ogc27ZAn7UICC7A15rVO6/4ac4be0014033cdd1557a2a7d769656a/Screenshot_2023-10-13_at_15.10.03.png" alt="Screenshot: Backstage component check"></p><p>Once you have all the checks you want to apply defined, you can aggregate them in a Scorecard that keeps track of which entities are complying with all the checks.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/54J3uAV2NOM8Wea0icd6wc/634df1720eb80a04ad645390da0e1453/Screenshot_2023-10-13_at_15.22.37.png" alt="Screenshot: Scorecards composition"></p><h3>Go beyond Catalog correctness</h3><p>Happiness in your Catalog becomes a measurable objective with Tech Insights. However, with Tech Insights you can check more than the validity of entity metadata. For example, you can</p><ul><li>Overview <a href="https://roadie.io/docs/tech-insights/track-docker-base-image-migration/">Docker image migrations</a></li><li>Track <a href="https://roadie.io/docs/tech-insights/track-sonarqube/">on-prem SonarQube</a> metrics</li><li>Or, enforce <a href="https://roadie.io/docs/tech-insights/enforce-branch-protection/">branch protectio</a>n.</li></ul><p>If you’d like to see what Roadie’s Tech Insights can do for your team, feel free to <a href="https://roadie.io/request-demo/">book a demo</a>!</p>
]]></content:encoded></item><item><title><![CDATA[Live custom Backstage plugins within seconds]]></title><link>https://roadie.io/blog/live-custom-backstage-plugins-within-seconds/</link><guid isPermaLink="false">https://roadie.io/blog/live-custom-backstage-plugins-within-seconds/</guid><pubDate>Mon, 02 Oct 2023 23:00:00 GMT</pubDate><description><![CDATA[Develop custom plugins with a live preview within Roadie Backstage, and deploy them to production in seconds with the Roadie CLI.]]></description><content:encoded><![CDATA[<p><em>Develop custom plugins with a live preview within Roadie Backstage, and deploy them to production in seconds with the Roadie CLI.</em></p><p>An Internal Developer Portal is as good as it tackles your teams’ unique challenges. While Roadie comes with <a href="https://roadie.io/backstage/plugins/">dozens of integrations</a>—such as PagerDuty, ArgoCD, and Sentry— your teams most likely rely on custom workflows, private systems, or in-house tools as part of their software development life cycle. Bringing those specific requirements into your Developer Portal can streamline your developer experience significantly.</p><p>For example, Lunar Bank built a <a href="https://www.youtube.com/watch?v=6T3Mf6pdg7E&#x26;list=PLj6h78yzYM2PyrvCoOii4rAopBswfz1p7">dead-letter management plugin</a>, while American Airlines centralized their permissions requests through a custom section of their Backstage instance.</p><p>Roadie offers tools to simplify the development and deployment of your custom plugins.</p><h3>Getting Started scaffolder template</h3><p>Register Roadie’s <em><a href="https://roadie.io/docs/custom-plugins/getting-started/#1-register-the-roadie-plugins-monorepo-scaffolder-template-into-your-application">New Custom Plugin</a></em> scaffolder template to jump-start your plugin development. The template will ask you a few details about your plugin and then create a new repository with a basic plugin structure and sample code.</p><h3>Dev previews within your instance</h3><p>Once you have your custom plugin running on your machine, you can get a live preview right within your Roadie instance. Your instance will automatically be updated with any code changes which will be applied when you refresh the page.</p><p>When writing a custom plugin for Roadie, you can use <a href="https://roadie.io/docs/custom-plugins/available-apis/#discoveryapi">APIs provided as React hook</a>s so you don’t have to deal with async requests or authentication at the plugin level.</p><p>You can preview all your plugin’s views within Roadie: pages, widgets, and cards. Furthermore, you can rely on <a href="https://roadie.io/docs/details/previewing-changes/">preview entities</a> to help you develop faster.</p><h3>Deploying custom plugins to Roadie</h3><p>Using the Roadie CLI, you can <a href="https://roadie.io/docs/custom-plugins/deploying/">build and deploy your Backstage plugins</a> and see them in your instance within a few seconds after you push them upstream. The simplest option for deployment is to let Roadie host your plugin, but you can also deploy your plugin to other services like Netlify or GitHub Pages.</p>
]]></content:encoded></item><item><title><![CDATA[Our First 12 Month SOC2 Type 2 Report]]></title><link>https://roadie.io/blog/our-first-12-month-soc2-type-2-report/</link><guid isPermaLink="false">https://roadie.io/blog/our-first-12-month-soc2-type-2-report/</guid><pubDate>Tue, 26 Sep 2023 23:00:00 GMT</pubDate><description><![CDATA[We're very excited to say that we just gained our first 12 Month SOC2 Type 2 Report! This report is a big deal for us and shows just how dedicated we are to keeping your data secure and private.]]></description><content:encoded><![CDATA[<p>We're very excited to say that we just gained our first 12 Month SOC2 Type 2 Report! This report is a big deal for us and shows just how dedicated we are to keeping your data secure and private.</p><p>The SOC2 Type 2 certification is a third-party audit that assesses your compliance over a period of time, to ensure your security, availability and confidentiality controls are operating as they should. Back in July 2022 we achieved <a href="https://roadie.io/blog/soc2-compliance/">our first SOC2 Type 2 report</a>, but it was only a 3 month assessment period. This time around we were audited over an entire 12 month period.</p><p>A SOC2 Type 2 audit covers all the nitty-gritty details, like how we handle data, control access, respond to incidents, and keep a close eye on things. We're leaving no stone unturned when it comes to security and compliance.</p><p>We believe getting a 12 month SOC2 Type 2 certification shows just how committed we are to keeping your sensitive information safe and is a testament to the hard work and dedication of our entire team. We will continue to demonstrate this commitment year on year as we continue to comply with the SOC2 Type 2 standard. We also aim to expand our compliance to additional standards as we grow.</p><p>If you are an existing customer and would like to see a copy of our SOC2 Type 2 Report just reach out via any of the usual channels. We would be happy to share it!</p>
]]></content:encoded></item><item><title><![CDATA[Decorators for rich Team pages]]></title><link>https://roadie.io/blog/decorators-product-announcement/</link><guid isPermaLink="false">https://roadie.io/blog/decorators-product-announcement/</guid><pubDate>Mon, 25 Sep 2023 23:00:00 GMT</pubDate><description><![CDATA[Today we released a feature we call Decorators. Decorators allow you to easily add metadata to the stuff you track in your Roadie Backstage catalog. This metadata is stored within Roadie, and not written to YAML.]]></description><content:encoded><![CDATA[<p>Today we released a feature we call Decorators. Decorators allow you to easily add metadata to the stuff you track in your Roadie Backstage catalog. This metadata is stored within Roadie, and not written to YAML.</p><h2>How to use Decorators</h2><p>Using Decorators is simple:</p><ol><li>Visit an Entity in your Roadie Backstage catalog (e.g. a team, service or system).</li><li>Click the three dots in the top right corner and click “Decorate entity”.</li><li>Add links or annotations to the Entity and press Save.</li></ol><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4h1q3yO1NT3FxMp2aJpkVg/3e256457df259c031d43d61394a3c4a1/usage.gif" alt="usage.gif"></p><p>Decorators you create are <strong>stored inside Roadie and not written back to the YAML file</strong> that backs the Entity.</p><p>This means that Entity metadata can come from multiple places for the same Entity. One annotation could come from YAML, and another from Decorators.</p><p>You can see where a specific piece of metadata is backed off to by clicking the three dots again and clicking “Inspect entity”. In this case below, the <code>backstage.io/source-location</code> is internal to Backstage and the other items are applied by Roadie Decorators.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/E0Zw1isylkgpkJ8Ile4YX/433c187ef187de65fa2432379eabff55/Screenshot_2023-09-19_at_13.18.53.png" alt="decoratorsscreenshot1"></p><h2>Why we’re doing this</h2><p>The introduction of Decorators is a slight deviation from the Backstage way of doing things, so it’s important that we explain why we’re doing this.</p><h3>Auto-ingested sources need decoration</h3><p>Backstage implementations frequently source the hierarchy of Users and Groups from a tool like GitHub Teams, Okta, or a Human Resources application like BambooHR. To support this, Backstage has a number of integrations into these tools.</p><p>These integrations will typically stream the hierarchy of Users and Groups into the Backstage catalog.</p><p>The problem with automatically ingesting Users and Groups is that Backstage users don’t get a chance to enrich their Group with information like links to Slack channels or a team charter. This leads to dead looking Group pages in Backstage.</p><p>Decorators give Backstage users a way to enrich their Group with the information that they want to show-off.</p><h2>What this is not</h2><p>We’re not introducing anything brand new in the Backstage ecosystem and we’re not introducing vendor lock-in.</p><h3>There’s precedent</h3><p>Backstage itself adds internal annotations to each Entity. These annotations are not written back to the YAML files. They are instead stored inside the Backstage database.</p><p>You can see some examples of this internal metadata here:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1K4L8wOBioMn7WkqohIDG5/8d7f0010e23bcf9318753ae4ab3fa6f5/Screenshot_2023-09-19_at_14.54.43.png" alt="decoratorsscreenshot2"></p><p>Roadie is simply piggybacking on this mechanism, to add the ability to store links and annotations.</p><h3>We’re Backstage API compatible</h3><p>We’re not introducing any Roadie-specific changes to the Backstage API or the spec for Backstage YAML files. We’re just making it easier to add values to the existing spec.</p><p>Entity Decorations are available via the standard Backstage HTTP API that we expose, so you can always write them back to YAML if you wish.</p><p>In the future, we will look to support “exporting” Decorators into any YAML file that backs the entity.</p>
]]></content:encoded></item><item><title><![CDATA[New Catalog UI, certified templates, more tutorials]]></title><link>https://roadie.io/blog/august-2023-product-updates/</link><guid isPermaLink="false">https://roadie.io/blog/august-2023-product-updates/</guid><pubDate>Mon, 04 Sep 2023 23:00:00 GMT</pubDate><description><![CDATA[This month we're rolling out a huge visual update to the catalog with much more space to get your work done. We've also got a bunch of new Tech Insights tutorials to help you improve software across your org.]]></description><content:encoded><![CDATA[<p>This month we're rolling out a huge visual update to the catalog with much more space to get your work done. We've also got a bunch of new Tech Insights tutorials to help you improve software across your org.</p><h2>Catalog</h2><h3>New catalog page preview</h3><p>You will shortly see a new catalog page roll out on Roadie. This update affects the main software catalog table, and the filters around it.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7KI6pUUtyeyIhaRNpyNV0b/7891467b8cd1c191e2cedf3af05f7c31/new-catalog-page.png" alt="new-catalog-page"></p><p>This new catalog table brings a number of enhancements:</p><ol><li><strong>More horizontal space</strong> for reading the table. We’ve moved the filters to the top so the table is wider.</li><li><strong>Per user configurable columns</strong>. You can customize the table to show the info that’s important to you. We’ll soon be persisting column choice in your browser so you can pick up where you left off (this is coming imminently).</li><li><strong>Kind specific columns</strong>. Groups used to have an owner column and Users were missing Display Name. We’ve tidied these up and introduced more sensible defaults.</li><li><strong>New table features</strong>. Configurable densities and full screen mode make for slicker presentation. Filter highlights make it easier to find what you’re looking for.</li><li><strong>Persisted filter choices</strong>. If you mostly work with Templates, we’ll keep you on the template page when you navigate away and back.</li><li><strong>Sharable filter choices</strong>. Filters will be in the URL so you can share a link to a specific subset of data.</li></ol><p>All this is building up to the ability to bring Tech Insights data front and centre in the catalog. We want to show scorecards in a column so you can drive more action around important migrations and software quality issues. More on this in the coming weeks and months.</p><h2>Fixed: Disappearing Azure repos entities</h2><p>We spent 3 weeks tracking down and fixing a tricky bug that would cause Entities discovered from Azure Repos to periodically and temporarily disappear from the catalog.</p><p>This is demonstrated by the wigglyness of the entities count before the fix, compared to how flat it is after.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/72CAZ4wOpZZSLFBr2eQGVR/f31a60ffd7f70dcc920724dc193a2785/azure-counts.png" alt="azure-counts"></p><p>It turns out the Azure APIs don’t return consistent results unless a sort is specified on the queries. <a href="https://github.com/backstage/backstage/pull/19478">Here’s the upstream fix we made to Backstage.</a></p><p>This is a really good example of the value Roadie adds. Are bugs like this how you want your Developer Experience team spending their time?</p><h2>Tech Insights</h2><h3>Group check data by owner</h3><p>Check results are now grouped by owner as well as by Component. This makes it easier to track down the team who own the most Components which are failing the check.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/ptAybGAfD7wZpt3EsXVSW/98ade626385901a28b9f297e8d5157ef/group-by-owner.png" alt="group-by-owner"></p><p>We’re currently working to expand this to scorecards, and to aggregate the data up and down the hierarchy of teams, so you can view it at any level.</p><h3>New tutorials</h3><p>We added 4 Tech Insights tutorials this month. Learn how to…</p><ol><li><a href="/docs/tech-insights/track-codeowners-usage/">Find software that doesn’t have a valid CODEOWNERS file</a>.</li><li><a href="/docs/tech-insights/enforce-branch-protection/">Find software that doesn’t have branch protection enabled</a>.</li><li><a href="/docs/tech-insights/track-sonarcloud/">Connect Tech Insights to SonarCloud to collect security hotspot data</a>.</li><li><a href="/docs/tech-insights/track-sonarqube/">Connect Tech Insights to an on-prem SonarQube instance</a>.</li></ol><h3>Bug fixes and improvements</h3><ul><li>We fixed a bug where some Data Sources would fail with an Out of Memory error.</li><li>We now support the YAML content type response when sending HTTP requests in Data Sources.</li><li>Data Sources can now send POST requests for GraphQL APIs and other use cases.</li><li>We rolled out a new version of our broker to patch a <a href="https://github.com/snyk/broker/issues/579">security vulnerability</a>.</li></ul><h2>Scaffolder</h2><h3>Certified label for scaffolder templates</h3><p>You can now add the Certified label to scaffolder templates to designate them as Platform Team approved and ready for use.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/381yzSvf8mK51Ywv4pcjtj/9469016d7c00239f5f9ed4d0b1295ba9/certified-template.png" alt="certified-template"></p><p>Just add the certified annotation to make this work.</p><pre><code class="language-yaml">apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: my-template
  annotations:
    roadie.io/certified: "true"
</code></pre><h3>Bug fixes and improvements</h3><ul><li>The scaffolder now supports a task to open a pull request against Azure repos.</li><li>We updated <a href="/docs/scaffolder/writing-templates/">our scaffolder docs page</a> to include more examples and APIs.</li></ul>
]]></content:encoded></item><item><title><![CDATA[Backstage Gets Quality and Compliance Scorecards with Roadie]]></title><link>https://roadie.io/blog/backstage-gets-quality-and-compliance-scorecards-with-roadie/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-gets-quality-and-compliance-scorecards-with-roadie/</guid><pubDate>Mon, 31 Jul 2023 15:00:00 GMT</pubDate><description><![CDATA[Roadie, the no-code Backstage Developer Portal, is launching Tech Insights to help organizations track software quality, security and compliance
]]></description><content:encoded><![CDATA[<p>DUBLIN, July 2023. <a href="https://roadie.io">Roadie</a>, the Backstage-based SaaS Developer Portal, announced the general availability of its first feature on top of Backstage: Tech Insights.</p><p>With <a href="/product/tech-insights/">Roadie Tech Insights</a>, software engineering teams can use Scorecards to monitor their software assets in the Backstage catalog and make sure they meet the quality and compliance standards they have set. Scorecards are based on Data Sources and Checks that the user can customize through a UI.</p><p>Roadie Tech Insights comes with more than a hundred different types of checks across dozens of data sources like GitHub, Datadog, and Snyk (plus, you can add custom sources). The feature release was covered by <a href="https://devops.com/roadie-adds-scorecard-tool-to-backstage-saas-platform/">DevOps.com</a>, <a href="https://www.tfir.io/roadie-tech-insights-helps-organizations-meet-internal-performance-standards-for-security-operations-and-more/">TFiR</a>, and <a href="https://www.benzinga.com/pressreleases/23/07/n33185786/roadie-announces-general-availability-of-tech-insights-an-enhancement-to-its-spotify-backstage-saa">BENZINGA</a>.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2i78gN8tbDBSgr7nbC88Jv/b62f18c6aa2dffc7cbe3c99a26ae1026/roadie-tech-insights.png" alt="roadie-tech-insights"></p><p>Roadie was founded in 2020 to help organizations boost developer productivity through a Developer Portal. The company raised US$3.7 million in a seed funding round led by <a href="https://boldstart.vc/">Boldstart Ventures</a> and <a href="https://www.firstminute.capital/">Firstminute Capital</a>.</p><p>Back in 2021, Roadie released an <a href="/backstage/plugins/tech-insights/">open-source Tech Insights</a> plugin with primitive APIs so that any Backstage adopter could build their own Scorecards functionality. A few enterprise Backstage adopters like HP and Lunar Bank have implemented their own solutions based on this version of the plugin. However, the open-source Tech Insights requires each team to design its own UI and implement its Data Sources and Checks, on top of managing integrations, security, and databases.</p><p>The new fully-fledged version of Tech Insights—available only to Roadie customers—features more than a hundred facts that you can check against across dozens of data sources like GitHub, Snyk, and PagerDuty. For example, Martin Froehlich, vice president of engineering at <a href="https://www.sumup.com/">SumUp</a>, is using Roadie and Tech Insights to “promote and track adoption of supply chain security and code analysis tools like Dependabot, CodeQL, and others, across all of our production service repositories.”</p><p>Roadie Tech Insights helps engineering teams build a culture of quality and accountability
Using Tech Insights, teams are nudged towards improved software quality over time. Antony Rinaldi, head of architecture and application platform at <a href="https://www.bailliegifford.com">Baillie Gifford</a>, is using Roadie Tech Insights to “promote and track adoption of security tools with our 250 developers.”</p><p>He also added that Baillie Gifford has “created automated checks and scorecards that help us understand which teams have adopted the tools letting us understand how successful our rollout initiative is. We’re excited to expand it to more use cases over the coming months.”</p><p>Roadie Tech Insights is a natural development to make the most out of a Software Catalog and keep quality under control.</p>
]]></content:encoded></item><item><title><![CDATA[Using Backstage’s Scaffolder to Fill up your Catalog]]></title><link>https://roadie.io/blog/using-backstages-scaffolder-to-fill-up-your-catalog/</link><guid isPermaLink="false">https://roadie.io/blog/using-backstages-scaffolder-to-fill-up-your-catalog/</guid><pubDate>Mon, 24 Jul 2023 15:00:00 GMT</pubDate><description><![CDATA[Make software onboarding a one-click experience in your Dev Portal and boost your adoption rate. Here's a step-by-step tutorial for that.]]></description><content:encoded><![CDATA[<p>Backstage is a great framework for building Internal Developer Portals. However, having a successful Dev Portal requires more than simply standing it up. Independently of whether you go with self-hosted or managed Backstage, the task of onboarding entities into your Software Catalog will be primarily in your hands.</p><p>Most Dev Portals rely on putting metadata files (YAML, Terraform, etc) into the services so they can be updated by the teams that work on them. The more friction you cut from the process of creating these metadata files, the easier it is to convince people to create them.</p><p>Backstage’s Scaffolder can make the software onboarding a one-click experience that gives Developers a chance to try out your Dev Portal and easily add their own services to the Catalog.</p><p>In this article, I’ll show you how you can write a software template that prompts the user to tell you about their service and opens a PR on their repository. Once that PR is merged, the Catalog will pick up the service automatically (if you have auto-discovery enabled).</p><h3>Onboard your service with a few clicks</h3><p>The experience you’re after will let developers onboard their service by filling in a few inputs. The Scaffolder then takes care of generating a <code>catalog-info.yaml</code> file and opening a PR against the service's repository.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/usuxhX2IPMzUVpDkk6jJZ/f21b3f6888505a6443b74051ec2ba18f/Screenshot_2023-07-24_at_11.37.05.png" alt="Screenshot of the software template to onboard services into the Catalog"></p><p>In the first section, you’ll ask for basic information about the service. In the second one, you’ll prompt the user to locate their repo and associate it with an owner. And finally, you’ll ask for integration details such as <a href="https://roadie.io/backstage/plugins/argo-cd/#support-for-multiple-argocd-instances---option-1">ArgoCD’s app name</a> or <a href="https://roadie.io/backstage/plugins/pagerduty/#connecting-an-entity-to-a-pagerduty-service">PagerDuty integration key</a>.</p><p>Once you’ve collected all the information, your software template will generate a <code>catalog-info.yaml</code> file and open a PR against the service’s repository.</p><h3>Writing a scaffolder template</h3><p>Software templates in Backstage have two parts: <a href="https://roadie.io/docs/scaffolder/writing-templates/">parameters and steps</a>. The parameters define the inputs you want from the user. The steps are actions—like cloning a repo, editing files, creating an AWS secret, or making an HTTP request —that are run one after the other. In this section, you’ll learn more about how both parts can be implemented for an onboard service template.</p><p><strong>Defining parameters</strong></p><p>Let’s take care first of the parameters. Parameters can be organized into sections in the UI. In this case, you want to have three sections: one for general information, one for the repository, and one for additional details. Here’s what the code that describes the form presented in the last section could look like:</p><pre><code class="language-yaml">parameters:
    - title: What is your service about?
      required:
        - name
      properties:
        name:
          title: Service name
          type: string
          description: Human readable name. We'll generate a dasherized version from it.
        description:
          title: Service description
          type: string
        owner:
          title: Service Owner
          type: string
          description: Owner of the component
          ui:field: OwnerPicker
          ui:options:
            catalogFilter:
              kind: Group
    - title: Where is your codebase?
      required:
        - repoSlug
      properties:
        repoHost:
          type: string
          default: github.com
          ui:widget: hidden
        repoOwner:
          title: Repository owner
          type: string
          default: roadiehq
          enum: ['roadiehq', 'jorgelainfiesta']
        repoSlug:
          title: Repository slug
          type: string
    - title: Integrations (optional)
      properties:
        argoAppName:
          title: Argo CD App Name
          type: string
        pagerdutyKey:
          title: PagerDuty integration key
          type: string
</code></pre><p>The input that the Scaffolder generates is based on the type of property. In all of the cases in this example, we’re dealing with strings but you can also specify numbers, objects, and arrays. You can also specify a component to be rendered as the input with <code>ui:field</code>. For a comprehensive list of these options check out our <a href="https://roadie.io/docs/scaffolder/writing-templates/#parameters">Scaffolder documentation</a>.</p><p>You’ll want to customize the form according to how you want to register services in your Catalog. For example, I’m hard coding the repository’s host and providing two owner options but you may have more than one host option. You could also use a <a href="https://roadie.io/docs/scaffolder/writing-templates/#picker-from-external-api-source">dynamic select box</a> to ease up the selection of repositories.</p><p>To help you get the form right, Backstage (and thus, Roadie) come with a form editor that you can find under <code>/create/edit</code> → Template Editor. It’s quite handy, specially as your form becomes more complex.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7wEqWkjcy8NuawNnAmszYv/6467b5480cc67d85d6ffb550a45c3de8/Screenshot_2023-07-21_at_11.12.21.png" alt="Screenshot of Backstage&#x27;s Scaffolder Form Editor"></p><p><strong>Defining steps</strong></p><p>Now, let’s review the steps side of the template. You’ll need two steps. First, you’ll fetch an existing YAML file and replace placeholders within it with the users’s values. The resulting file will be available in the Scaffolder workspace’s root with the same file name. From there, it’ll be possible to open a Pull Request against the target repo with the content of the Scaffolder workspace, which will be a <code>catalog-info.yaml</code> file that describes the service.</p><p>This code shows how the steps look like:</p><pre><code class="language-yaml">steps:
      - id: fetch-template
        action: fetch:template
        input:
          url: ./skeleton
          values:
            name: ${{ parameters.name }}
            description: ${{ parameters.description }}
            owner: ${{ parameters.owner }}
            repoOrg: ${{ parameters.repoOwner }}
            repoSlug: ${{ parameters.repoSlug }}
            argoAppName: ${{ parameters.argoAppName }}
            pagerdutyKey: ${{ parameters.pagerdutyKey }}
      - id: create-pull-request
        name: create-pull-request
        action: publish:github:pull-request
        input:
          repoUrl: ${{ parameters.repoHost }}?owner=${{ parameters.repoOwner }}&#x26;repo=${{ parameters.repoSlug }}
          branchName: onboard-to-catalog
          title: Onboard service to Catalog
          description: This PR adds a meta data file about this service so that it can be registered in our software catalog.
</code></pre><p>Let’s unpack what’s going on in each step. In the <code>fetch-template</code>, I’m loading a relative path that contains a file with placeholders formatted with <a href="https://mozilla.github.io/nunjucks/templating.html">nunjucks templating</a>. The file is a <code>catalog-info.yaml</code> that looks like this:</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: ${{ values.name | replace(" ", "-") | lower}}
  title: ${{ values.name }}
  description: ${{ values.description }}
  annotations:
    github.com/project-slug: ${{ values.repoOrg }}/${{ values.repoSlug }}
    {%if values.argoAppName %}argocd/app-name: ${{values.argoAppName}} {% endif %}
    {%if values.pagerdutyKey %}pagerduty.com/integration-key: ${{values.pagerdutyKey}} {% endif %}
spec:
  type: service
  owner: ${{ values.owner }}
</code></pre><p>You can manipulate strings using <a href="https://mozilla.github.io/nunjucks/templating.html#filters">filters</a>, use conditional blocks, and pretty much any other templating option available in nunjucks.</p><p>Regarding <code>publish:github:pull-request</code>, the only thing worth mentioning is that <code>repoUrl</code> doesn’t look like a familiar URL. That’s because it’s a standard reference used across the Scaffolder actions rather than an actual repository URL.</p><p>To help test the steps inputs and outputs, you can dry-run your template with <code>/create/edit</code>  → “Load Template Directory.”</p><h3>Conclusion</h3><p>Minimizing the friction to using your Catalog will improve the adoption rate. Plus, it can help you provide developers a clear first touch point to start getting familiar with the Developer Portal that you’re building for them. You can find the complete template in our <a href="https://github.com/RoadieHQ/software-templates/tree/main/scaffolder-templates">software templates repository</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Backstage Users Unconference: a wrap-up]]></title><link>https://roadie.io/blog/backstage-users-unconference-a-wrap-up/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-users-unconference-a-wrap-up/</guid><pubDate>Mon, 26 Jun 2023 15:00:00 GMT</pubDate><description><![CDATA[Last week we had a blast hosting the 4th Backstage Users Unconference! Over 95 people showed up to share how they’re using Backstage and brainstorm solutions to common problems. ]]></description><content:encoded><![CDATA[<p>Last week we had a blast hosting the 4th Backstage Users Unconference! Over 95 people showed up to share how they’re using Backstage and brainstorm solutions to common problems. The vibe on this edition was more intimate, as we were able to go deeper into the discussions around documentation beyond markdown, contributions to the portal by other teams, and creating infrastructure through the Scaffolder.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/49p7Rfemz3sgzB9r4tElDh/9a2ebf78b06040a984ccbef428d27a7e/Screenshot_2023-06-22_at_17.27.25.png" alt="Chosen topics "></p><h3>The people make the Unconference</h3><p>We had over 95 attendees from around the globe, with people connecting from cities in the US such as San Diego, Chicago, Omaha, and Orlando. We also had the presence of peeps from Buenos Aires, Madrid, Paris, London, and Cambridge, and jumped to India and Singapore after passing by Kazakhstan and Israel.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/1FmKL46rGrqY7eAyQJAqw5/ca8751bdeaa350e89bd4147bb91738d6/Screenshot_2023-06-26_at_10.04.24.png" alt="Group of people during a session in the Unconference"></p><p>Most of the people who attended are on a growing internal adoption stage in their Backstage journey (40%), followed by a large group of teams getting started with their PoCs (34%). The Unconference also go input from people who have achieved weekly active usage of their Portal across the board (6%) and folks who have optimized usage for a subset of their users (7%).</p><h3>Making Documentation easier</h3><p>Two out of the eight topics chosen by the community revolved around TechDocs. The first discussion was more higher-level and focused on sharing how teams are using Backstage to bring their varied sources of documentation under the same roof. The second topic had a more technical perspective on how to provide tooling to developers writing TechDocs so it’s easier for them to get their docs into Backstage.</p><h3>Infrastructure and contributions</h3><p>Using the Scaffolder to spin up infrastructure is a common use case, which got some interesting takes regarding how to manage the templates and the possible race conditions as you manage Terraform files and various PRs. Another topic of discussion was opening up the Portal to receive plugins from other teams and how to define the boundaries for integrations and outline a governance approach.</p><h3>Recordings</h3><p>Due to the nature of the unstructured conversations that occur in the Unconference, not all sessions resulted in a recording that could be too interesting to re-watch. We’re working to take pieces from at least two of the sessions to make videos that might be helpful to drive the Backstage discussion further. Stay tuned, we’ll be uploading them to our YouTube channel soon!</p>
]]></content:encoded></item><item><title><![CDATA[Roadie is sponsoring "Unpacked": The Ultimate Virtual DevOps Conference]]></title><link>https://roadie.io/blog/roadie-sponsoring-unpacked/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-sponsoring-unpacked/</guid><pubDate>Thu, 01 Jun 2023 15:00:00 GMT</pubDate><description><![CDATA[We are thrilled to announce that Roadie is sponsoring the first-ever "Unpacked." This virtual conference, hosted by Cloudsmith, promises to bring together DevOps professionals and engineering leaders worldwide.]]></description><content:encoded><![CDATA[<p>We are thrilled to announce that Roadie is sponsoring the first-ever "Unpacked." This virtual conference, hosted by Cloudsmith, promises to bring together DevOps professionals and engineering leaders worldwide. Unpacked aims to enlighten attendees on the intricacies of securing and scaling software delivery, discussing relevant topics such as software distribution at scale, securing open source dependencies, and reducing complexity and cost.
<a href="https://app.zuddl.com/p/a/event/389b4a90-61bc-4978-afdf-872f98b565be?utm_term=roadie&#x26;utm_campaign=2023-unpacked-employee&#x26;utm_medium=conference&#x26;utm_source=outreach"><img src="//images.ctfassets.net/hcqpbvoqhwhm/3LgbCIY5z8jmZGg9W3pQYT/995e866fbeb6726958e7ee072c61f5a5/Screenshot_2023-06-08_at_09.28.21.png" alt="Unpacked 2023: RSVP"></a></p><p>Why Attend Unpacked?</p><ol><li>Cutting-Edge Insights: Unpacked brings together a prestigious lineup of industry leaders who will delve into the latest trends, strategies, and technologies shaping the world of software delivery.</li><li>Software Distribution at Scale: Scaling software delivery is a complex task, and Unpacked aims to demystify this process. The event will explore strategies and best practices for distributing software.</li><li>Securing Open Source Dependencies: With open source software playing a crucial role in modern development, ensuring its security is paramount. Unpacked will address this challenge, offering attendees valuable guidance on securing open-source dependencies and mitigating potential vulnerabilities.</li><li>Reducing Complexity and Cost: Complexity and cost often go hand in hand with software delivery. Unpacked will provide insights into simplifying processes, reducing unnecessary complexity, and optimizing costs associated with software distribution, enabling organizations to achieve greater efficiency and ROI.</li></ol><p>"Unpacked" is set to be a groundbreaking virtual conference that will equip attendees with the knowledge and tools they need to excel in software delivery. With an exceptional lineup of industry leaders and a focus on scaling, security, and reducing complexity, Unpacked is a must-attend event for DevOps professionals and engineering leaders. Roadie is proud to be a sponsor of this transformative event, demonstrating our ongoing dedication to advancing software delivery practices. We look forward to virtually connecting with fellow attendees and exploring the future of software distribution together at "Unpacked"!</p>
]]></content:encoded></item><item><title><![CDATA[Backstage during KubeCon EU ‘23]]></title><link>https://roadie.io/blog/backstage-during-kubecon-eu-23/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-during-kubecon-eu-23/</guid><pubDate>Mon, 24 Apr 2023 15:00:00 GMT</pubDate><description><![CDATA[Backstage was undoubtedly one of the recurring topics of the conference, with at least five talks dedicated to the framework and several others referencing it. As you walked through the busy venue, it was common to pick up people mentioning “Backstage”  as you rushed through the busy venue.]]></description><content:encoded><![CDATA[<p>Backstage was undoubtedly one of the recurring topics of the conference, with at least five talks dedicated to the framework and several others referencing it. As you walked through the busy venue, it was common to pick up people mentioning “Backstage”  as you rushed through the busy venue trying to catch a talk—only to find out the room was complete even if you had arrived 10 minutes earlier.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/32VyH4BRc8IffowTbbMDYG/869eec32ba1bc41a8ff2288ce82df30a/FullSizeRender-1.jpg" alt="Kasper Niesen presenting at KubeCon"></p><p>By far, KubeCon + Cloud Native Conf EU 2023 has been the most impressive since the re-start of the conference series after the p*ndemic. With 10 thousand people in attendance and fascinating talks, being unable to enter the room—even if you arrived 10 mins earlier—became commonplace.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2v4tl5ZB0ZxTEcZQSd8PmN/7d212c4a12974ab0018d31a8e7af0a62/FullSizeRender-3.jpg" alt="Solutions showcase in Amsterdam"></p><p>Let’s address the elephant in the room: there was no Backstage booth in the Project Pavilion, as there was in Detroit. A series of miscommunications caused this unfortunate issue, but after talking with other members of the community, including the maintainers, I can assure you: this won’t happen again!</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6jSD3aHpi5Khp9zxPvvkq6/a45e13ceff4831f6fc8bb91a53f36014/FullSizeRender-2.jpg" alt="People talking"></p><p>Despite that, I was lucky to have talked with many people in the community, including OSS partners RedHat and VMWare, current Backstage adopters, and people who are just getting started with the framework.</p>
]]></content:encoded></item><item><title><![CDATA[Roadie mentioned in The Pragmatic Engineer ]]></title><link>https://roadie.io/blog/roadie-mentioned-in-the-pragmatic-engineer/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-mentioned-in-the-pragmatic-engineer/</guid><pubDate>Tue, 21 Mar 2023 15:00:00 GMT</pubDate><description><![CDATA[At a readership of 160k+ tech professionals, The Pragmatic Engineer has received a lot of praise from the enterprise software and scale-ups industry. For their column on Backstage, the Roadie team had the opportunity of contributing our experiences with Gergely Orosz.]]></description><content:encoded><![CDATA[<p>At a readership of 160k+ tech professionals, The Pragmatic Engineer has received a lot of praise from the enterprise software and scale-ups industry. For their column on Backstage, the Roadie team had the opportunity of contributing our experiences with Gergely Orosz. The outcome of Gergely’s research is truly a deep-dive pillar that explains how Backstage came to be, its different features, adoption stories, and many other aspects of the framework.</p><p>If you’re a Pragmatic Engineer subscriber, make sure not to miss out on this one. Gergely goes through what a Developer Portal is, how to get started, and even compares closed-sourced alternatives. <a href="https://newsletter.pragmaticengineer.com/p/backstage">Check it out</a>!</p><p><a href="https://newsletter.pragmaticengineer.com/p/backstage"><img src="//images.ctfassets.net/hcqpbvoqhwhm/2swufT3sFagjKZZ44sPH3x/8febbce5a2df79fc8bbdffb299abd7fb/Screenshot_2023-03-07_at_14.30.32.png" alt="Backstage on The Pragmatic Engineer: a deep dive"></a></p><p><a href="https://www.linkedin.com/in/martina-iglesias-fernandez/">Martina Iglesias Fernández</a>, CTO of Roadie, saw the inception of Backstage from within Spotify and shared her story with The Pragmatic Engineer. At the time, she was a lead backend engineer at Spotify, where she saw the pain points that derived the need for an Internal Developer Portal. In the column, she explains in detail how Backstage <a href="https://roadie.io/backstage-spotify/#the-origins-of-spotify-backstage">came from an internal tool called System-Z</a>.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/8l0hLorPPGV8LwsChxO43/afe483a516a0c1e9a49948aa68a120fb/system-z.png" alt="Screenshot: System-Z"></p><p>The Roadie team shared other experiences with Gergely, including information about Backstage’s main features and the typical adoption process from MVP to org-wide release.</p>
]]></content:encoded></item><item><title><![CDATA[New in Roadie: Automated language tagging for GitHub entities]]></title><link>https://roadie.io/blog/new-in-roadie-automated-language-tagging-for-github-entities/</link><guid isPermaLink="false">https://roadie.io/blog/new-in-roadie-automated-language-tagging-for-github-entities/</guid><pubDate>Mon, 20 Feb 2023 15:00:00 GMT</pubDate><description><![CDATA[Roadie can now automatically bring this information from GitHub and associate the corresponding languages to your entities through a tag or a label, depending on your preference.]]></description><content:encoded><![CDATA[<p>Part of understanding your software assets is knowing, at a glance, the languages used in each of them. Now, Roadie can automatically bring this information from GitHub and associate the corresponding languages to your entities through a tag or a label, depending on your preference.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4BoB9FB6afZCi819OMRGTQ/801d89f996112e2dfe21047603066ea9/Screenshot_2023-02-16_at_15.24.52.png" alt="Screenshot: Catalog with tags"></p><p>Imagine you’re browsing your software catalog looking for a library that you can use for setting up your Go API routes or to find patterns others have used to set up a test harness for their Java apps. In these cases, it would be helpful to be able to filter entities by language.</p><p>Since GitHub already compiled this information for you, you only need to bring it into your Developer Portal. Now, Roadie can do this automatically: we’ll label or tag—according to your preference—your entities with their associated language.</p><p>If you’re a Roadie customer, you can get your Catalog to start tagging your entities with the corresponding languages by switching the feature on in Administration > Settings > Catalog.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3ch0Q2nw3JDmkpwsZRvCVe/68888c5c5604241556b0b2548d719eee/Screenshot_2023-02-20_at_15.47.08.png" alt="Screenshot: Catalog settings page"></p>
]]></content:encoded></item><item><title><![CDATA[Incident Management in your Backstage Developer Portal]]></title><link>https://roadie.io/blog/incident-management-in-your-backstage-developer-portal/</link><guid isPermaLink="false">https://roadie.io/blog/incident-management-in-your-backstage-developer-portal/</guid><pubDate>Fri, 20 Jan 2023 15:00:00 GMT</pubDate><description><![CDATA[Backstage has a handful of plugins that integrate incident managers into your Internal Developer Portal. In this post, you'll get an overview of each of them, including what they do. ]]></description><content:encoded><![CDATA[<p>It’s a typical operations day. You’re taking cozy sips of your lapsang tea, and there are only a few meetings on the horizon. Life is good. But, just then, an alarm goes off: the MetaQueriesPostProcessor service is down. It’s a relatively new service, not too critical. Given you’re not entirely familiar with the service, it wouldn’t hurt to get some more context of what’s going on, so you visit your Backstage-based Internal Developer Portal:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1kVAiXmMCtzJf74LvfzFoj/2c482ce7c579411ceb20030313ac11bb/Screenshot_2023-01-18_at_16.26.30.png" alt="Screenshot of an Internal Developer Portal with a PagerDuty integration"></p><p>From the service’s entity page in Backstage, you can get a glimpse of the recent workflows runs, the service’s dependencies, and other issues like code quality, vulnerabilities, and documentation without jumping around platforms. Along with the ownership information, you’ll be up to speed to address this incident and contact responsible parties more easily. Additionally, other colleagues who work with this entity will get visibility on the ongoing incident and get ready to collaborate to resolve it.</p><p>Thankfully, Backstage has a handful of plugins that integrate incident managers into your Internal Developer Portal. Below is a list, in alphabetical order, of the incident management plugins you can integrate into your Backstage internal development portal:</p><h3>FireHydrant</h3><p>With the <a href="https://roadie.io/backstage/plugins/firehydrant/">Firehydrant’s Backstage plugin</a>, you can manage your incidents within Backstage. Teams can stay organized and quickly identify information about services like active incidents and healthiness analytics. The plugin includes an entity widget so you embed PageDuty’s information on the entity’s overview page.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1DTGb9tPNuIvCo1oeG0NUs/db624553b3d28d2de80ad7db986ad742/firehydrant-service-card.webp" alt="Screenshot: Firehydrant service card"></p><p>To install it on a self-hosted Backstage instance, check out our <a href="https://roadie.io/backstage/plugins/firehydrant/">FireHydrant Plugin guide</a>. If you’re a Roadie customer, you don’t need to install it; just <a href="https://roadie.io/docs/integrations/firehydrant/">set it up in your Admin panel</a>.</p><h3>OpsGenie</h3><p>The <a href="https://roadie.io/backstage/plugins/opsgenie/">OpsGenie Backstage plugin</a> offers two options: an entity widget and an additional standalone page. The entity widget shows the alerts for its corresponding entity on the overview page:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/74EYBeAhEAmsQJqrVE1rse/ace34b9d6da6cee19364d19fa1711334/opsgenie-plugin-alerts-on-entity-page.webp" alt="Screenshot: Opsgenie plugin with alerts on entity page"></p><p>Additionally, the OpsGenie plugin comes with a standalone page where you’ll get to see who’s on call and an aggregated list of alerts happening in the entities registered in Backstage.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1Pp5I3ENM484pn4awZmILA/52037a6ce4e35d91a4434870bf9b526e/opsgenie-plugin.webp" alt="Screenshot: Opsgenie plugin"></p><p>To install this plugin on a self-hosted Backstage instance, follow our <a href="https://roadie.io/backstage/plugins/opsgenie/">OpsGenie Plugin Guide</a>. If you’re a Roadie customer, check out the <a href="https://roadie.io/docs/integrations/opsgenie/">no-code guide to OpsGenie</a>.</p><h3>PagerDuty</h3><p>With <a href="https://roadie.io/backstage/plugins/pagerduty/">PagerDuty’s Backstage plugin</a>, you can view the ongoing incidents related to an entity, as well as who’s on call. Additionally, you get handy buttons to create an incident right from the entity’s overview page.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5u2pmguozXbBMpUa3WGm6K/e5c0564e77ce2f7590e157dc8e2f1c16/pagerduty-plugin-2077x955.webp" alt="Screenshot; pagerduty plugin in internal developer portal"></p><p>To install it on a self-hosted Backstage instance, check out our <a href="https://roadie.io/backstage/plugins/pagerduty/">Pager Duty Plugin Guide</a>. If you’re a Roadie customer, <a href="https://roadie.io/docs/integrations/pagerduty/">set up PagerDuty from your Admin panel</a>.</p><h3>Rootly</h3><p>The <a href="https://roadie.io/backstage/plugins/rootly/">Rootly Backstage plugin</a> is the most generous: it provides you with three options for viewing incident information in your Internal Developer Portal. First, there’s the standard entity widget for the overview page that every plugin offers:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4AEsRRk1gHRyVUPS8Rwgk2/25f542bb58639ae3c3fe969fb14ef98a/rootly-entity-overview.png" alt="Screenshot: Rootly overview card"></p><p>You can see the ongoing incidents right on the entity overview page, and gives you handy links to create an incident or see a more detailed list. Second is a dedicated entity tab to get more details about the ongoing incidents for the associated entity:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1dilsv7exsxJsdPpb1sDSf/426ea2dac36c072d81f00935a5561d69/rootly-entity-incidents.png" alt="Screenshot: Rootly entity incidents"></p><p>And, <a href="https://rootly.com">Rootly</a> also offers you a dedicated page where the incidents from across all entities are aggregated:
<img src="//images.ctfassets.net/hcqpbvoqhwhm/2EAMsnDlQF58vDDkZrcY9/1c96a44a918b62f333e20112c0c989e6/rootly-incidents-page.png" alt="rootly-incidents-page"></p><p>To install this plugin in a self-hosted Backstage instance, follow our <a href="https://roadie.io/backstage/plugins/rootly/">Rootly Plugin Guide</a>. If you’re a Roadie customer, <a href="https://roadie.io/docs/integrations/rootly/">set up Rootly in your Admin panel</a>.</p><h3>Splunk On-call</h3><p>The Splunk On-call  (formerly VictorOps) plugin for Backstage provides you with a widget for your entity overview page. The plugin will show a list of ongoing incidents and provide links to open an incident or acknowledge/resolve one right from the entity page. If you want, you can also set the widget to read-only so that no actions can be triggered from Backstage.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5gmqoH2am1TBeX3ehfluRt/72acbba3db0b3edd44ca8100faf1c9c9/image__6_.png" alt="image (6)"></p><p>To install this plugin in a self-hosted Backstage instance, <a href="https://github.com/backstage/community-plugins/tree/main/workspaces/splunk/plugins/splunk-on-call">check out the plugin’s README</a>. If you’re a Roadie customer, support for this plugin is being implemented at this very moment as requested last week by a customer.</p><hr><p>And that’s a wrap! Setting up your incident manager with your Backstage Internal Developer Portal can help you manage incidents and prepare your teammates to collaborate when needed. If you know of an incident management plugin that I missed, please let me know through <a href="https://twitter.com/jorgelainfiesta">Twitter</a> or <a href="https://www.linkedin.com/in/jrlainfiesta/">LinkedIn</a>! I’ll be pleased to add it to the list.</p>
]]></content:encoded></item><item><title><![CDATA[Roadie customers are not affected by Backstage’s RCE vulnerability]]></title><link>https://roadie.io/blog/roadie-customers-are-not-affected-by-backstages-rce-vulnerability/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-customers-are-not-affected-by-backstages-rce-vulnerability/</guid><pubDate>Wed, 23 Nov 2022 06:00:00 GMT</pubDate><description><![CDATA[Last week, the Oxeye research team published a report of a vulnerability found in Backstage caused by an outdated vm2 third-party library. Roadie customers are unaffected by this vulnerability because their instances are upgraded regularly (currently at v1.8) and due to extra security measures in the Scaffolder implemented in Roadie from the beginning. ]]></description><content:encoded><![CDATA[<p>Last week, the Oxeye research team published a report of a vulnerability found in Backstage that could allow a threat actor to execute remote code by exploiting an outdated vm2 third-party library. The Backstage team patched this issue on version 1.5.1 back on August 29th. Roadie customers are unaffected by this vulnerability because their instances are upgraded regularly (currently at v1.8) and due to extra security measures in the Scaffolder implemented in Roadie from the beginning.</p><h3>The problem</h3><p>The remote code execution (RCE) vulnerability was possible due to a known issue in the vm2 library used in the Scaffolder, which has been patched since Backstage 1.5.1. By overloading definitions through a software template, the researchers manage to create a function outside the Scaffolder’s sandbox context that allows them the execute arbitrary code in the instance.</p><p>Furthermore, the researchers pointed out that Backstage by default doesn’t provide authentication for backend requests. This allowed unauthenticated actors to access the Scaffolder, and therefore, exploit the vulnerability from outside the Developer Portal.</p><h3>Roadie customers are not affected</h3><p>Roadie customers were running on Backstage 1.8 at the time of the vulnerability disclosure and were patched for this vulnerability shortly after Backstage 1.5.1 was released because the team keeps a close eye on CVE notifications.</p><p>Furthermore, due to Roadie’s architecture, the risk from this vulnerability was greatly mitigated for Roadie customers.  Roadie executes templates on a transient ECS task with access to scoped and temporary credentials required for the execution of the template instead of the default execution strategy.</p><p>Also, Roadie provides authenticated access to both frontend and backend requests, which means no unauthenticated actor could have accessed the Scaffolder in the first place.</p><h3>Upgrade your instance ASAP</h3><p>If you’re running a self-hosted Backstage instance and still use a pre-1.5 version, you’re facing a vulnerability with a 9.8 CVSS score, which is the most severe for exploitability and impact.</p><p>If you don’t want to bother to run upgrades again, switch over to Roadie! We’ll keep your instance safe through regular upgrades and extra security layers. Request a demo!</p>
]]></content:encoded></item><item><title><![CDATA[Backstage consolidating its role in the Cloud native ecosystem]]></title><link>https://roadie.io/blog/backstage-consolidating-its-role-in-the-cloud-native-ecosystem/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-consolidating-its-role-in-the-cloud-native-ecosystem/</guid><pubDate>Thu, 03 Nov 2022 06:00:00 GMT</pubDate><description><![CDATA[At the moment, Thoughtworks, Red Hat, Gartner, VMWare, and the Linux Foundation endorse Backstage as a viable solution for improving the developer experience of growing engineering teams through a Developer Portal. ]]></description><content:encoded><![CDATA[<p>Last week, <a href="https://roadie.io/blog/wrap-up-backstagecon-and-kubecon-na-2022/">the very first BackstageCon</a> brought along news from influential firms voicing the maturity that the Backstage project has achieved. At the moment, Thoughtworks, Red Hat, Gartner, VMWare, and the Linux Foundation endorse Backstage as a viable solution for improving the developer experience of growing engineering teams through a Developer Portal.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/qtNTyiu94usjt2qt3khmi/6ac228e054b9419bb331becc665e6756/techradar-adopt-backstage.png" alt="Thoughtworks tech radar highlighting Backstage on the adopt area"></p><p>In the most recent Tech Radar, Thoughtworks <a href="https://www.thoughtworks.com/radar/platforms?blipid=202010066">moved Backstage to <strong>Adopt</strong></a> in their Platform quadrant. This means that their advisors “feel strongly that the industry should be adopting” Backstage. And in fact, Thoughtworks offers Backstage as part of their digital transformation offering to their customers.</p><p>On the other hand, <a href="https://developers.redhat.com/articles/2022/10/24/red-hat-joins-backstageio-community?utm_medium=email&#x26;_hsmi=231622490&#x26;_hsenc=p2ANqtz-__nDAtR6CoUhDpb-ROTQghiD66wD7wF3VjbYGHBZ6D1xNfV4f0GNxrIEOkUMsL8lvGBPHa44rTMARZCBtfjQ_Cn5rQQg">Red Hat announced</a> that they will start actively participating in the Backstage community. They will be contributing to the framework to support OpenShift and improve the Kubernetes experience in Backstage.</p><p>Additionally, VMWare has been a commercial partner of Backstage offering it as a UI for a security product in their Tanzu suite. VMWare has committed an Open Source team dedicated to contribute back to Backstage Open Source.</p><p>Previously this year, Gartner reported on Backstage as a solution for Developer Portals. It also mentions Roadie as a solution to tackle the complexity of adopting the framework via de self-hosted route.</p><p>Finally, the Linux Foundation is also betting on Backstage after having witnessed the success it has brought to most adopters in the Cloud native space. The Linux Foundation is investing in creating learning resources, starting with an <a href="https://www.edx.org/course/introduction-to-backstage-developer-portals-made-easy?index=product_value_experiment_a&#x26;queryID=14e994e787035593a6f4ec5bc75f1fbb&#x26;position=1">Introduction Course</a> to the framework.</p><p>For Roadie customers, this news means you’re about to get all the benefits from the substantial contributions of a robust community without having to do anything. It also shows you’re on the right track by adopting Backstage through Roadie!</p>
]]></content:encoded></item><item><title><![CDATA[Wrap up: BackstageCon and KubeCon NA 2022]]></title><link>https://roadie.io/blog/wrap-up-backstagecon-and-kubecon-na-2022/</link><guid isPermaLink="false">https://roadie.io/blog/wrap-up-backstagecon-and-kubecon-na-2022/</guid><pubDate>Mon, 31 Oct 2022 06:00:00 GMT</pubDate><description><![CDATA[Backstage made its way to the center stage last week in Detroit, as maintainers, contributors, and adopters deepened their relationship and shared their excitement about the framework with the wider Cloud Native community.  ]]></description><content:encoded><![CDATA[<p>Backstage made its way to the center stage last week in Detroit, as maintainers, contributors, and adopters deepened their relationship and shared their excitement about the framework with the wider Cloud Native community.</p><h3>Monday: BackstageCon</h3><p>During the first conversations about setting up a Backstage-exclusive conference, the CNCF event organizers said we could aim for 50-100 attendees because it’d be the first time. Well, BackstageCon ended up being the largest co-located event at KubeCon + Cloud Native NA 2022 with 150+ attendees!</p><p>The event was co-hosted by Suzanne from Spotify and Martina from Roadie. They did a fantastic job at curating the line-up, <a href="https://www.youtube.com/playlist?list=PLj6h78yzYM2OKySsTuiip3BqmdYZQRnSf">all talks are available on Youtube</a> so check them out!</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4gxhWHmzQlYiKPuqsdmUJw/889b7caa53cdb9c192c1d99b49f82f6a/suzanne-martina-cohost.png" alt="Suzanne and Martina co-hosting BackstageCon"></p><p>Roadie had a very orange table at BackstageCon, where we greeted everyone.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/CeS4ucJTkrLUEMFI9Gt3Y/62d3fdae873363630a142578c48f0601/Screenshot_2022-10-31_at_13.49.42.png" alt="Roadie&#x27;s table at BackstageCon"></p><p>Of course, we enjoyed hanging out with each other after the conference. Here’s a picture of Roadie having dinner with Spotify folks, including the Backstage maintainers we all love.
<img src="//images.ctfassets.net/hcqpbvoqhwhm/4k9F3DH4txq8MOa2KTCdLW/cc54812987d82298ccc35e63dd080a85/spotify-roadie-dinner.png" alt="Roadie sharing dinner with Spotify and Backstage maintainers "></p><h3>Tuesday: Backstage Project Meeting</h3><p>At first, a 4-hour meeting to talk about Backstage seemed a bit much. But once we got started, it turned out to be enough to only scratch the surface on topics like the sources of truth for the Catalog, frontend performance, maintainership, and adoption challenges. Having adopters, maintainers, and partners in the same room made ideas flow endlessly and lead to actionable tasks to make it easier to adopt and contribute to Backstage in the near future.</p><p>In the evening, Frontside and Roadie had what is, without doubt, the best food anybody attending KubeCon enjoyed, at Yemen Café.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4Bw8DsP9SlfF5Ces7lqhqQ/4126d581f1757a60762d970e97aa04e6/Screenshot_2022-10-31_at_13.51.57.png" alt="Roadie having dinner with Frontside"></p><h3>Wednesday-Friday: KubeCon</h3><p>Roadie had a booth at KubeCon, where we explained Backstage and how we offered a managed version to people attending the event. We also were giving away organic cotton Backstage t-shirts, hand-printed in Barcelona. We heard they were the best t-shirts of the conference in terms of comfort!</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5YcZDEV7v6e6i4qLsAfpb5/5716e3006c98f4408a044e35be337158/roadie-booth.png" alt="Roadie Booth at KubeCon"></p><p>We had the wonderful chance to meet and talk with our customers attending the event. Here we are with MyFitnessPal:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6FixgaPG6zIzUmLgd4DtAf/3f5700555549f0fbdcff1fca9e68060b/roadie-myfitnesspal.png" alt="Roadie and MyFitnessPal at KubeCon"></p><h3>See you next time!</h3><p>We all had a blast last week in Detroit and aligned on how to take Backstage further as it gains popularity. Next <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/">KubeCon is in Amsterdam</a> from 17th to the 21st of April, 2023. Hope to see you there!</p>
]]></content:encoded></item><item><title><![CDATA[Roadie.io is boosting Backstage Developer Portals for Scale-Ups with Scorecards]]></title><link>https://roadie.io/blog/boosting-backstage-developer-portals-scorecards/</link><guid isPermaLink="false">https://roadie.io/blog/boosting-backstage-developer-portals-scorecards/</guid><pubDate>Mon, 24 Oct 2022 06:00:00 GMT</pubDate><description><![CDATA[Roadie.io, the company offering a CNCF Backstage SaaS option, is extending the most popular Developer Portal framework by introducing Tech Insights.]]></description><content:encoded><![CDATA[<p><strong>Dublin, Ireland, October 24th, 2022.</strong><a href="https://Roadie.io">Roadie</a>, the company offering a CNCF Backstage SaaS option, is extending the most popular Developer Portal framework by introducing Tech Insights.</p><p>The Open Source version of Backstage is used by industry leaders such as HP, VMware, and Expedia Group. But, as highlighted by Gartner, it requires significant effort and dedicated staff to stand up and maintain. Instead of self-hosting Backstage, dozens of scale-ups like Netlify, Snyk, and MyFitnessPal have adopted it through Roadie’s managed option. Now, Roadie is expanding its offering by introducing Tech Insights, which doesn’t have an equivalent in the Open Source Backstage.</p><p>Tech Insights lets Roadie users create Scorecards to keep track of quality standards within their organization. This is useful for assessing software maturity and security compliance across all services.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/61MusI6aZN8c0OuYoPCwnI/8e36ff9c7d89c15fd64a08850141eb4e/Screenshot_2022-10-11_at_14.37.27.png" alt="Screenshot: Scorecard sample">
*Roadie users can define the criteria to be measured in their services via a Scorecard.
*
Given that Roadie Backstage users already have all their software assets registered in their Software Catalog, extracting insights is a valuable next step in any Developer Portal.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3ORMtxmEh5Ri1qVKMaLmJz/2740da0ed736b0f5e92f4a8a5025910a/Screenshot_2022-10-10_at_09.57.37.png" alt="Screenshot: list of scorecards values">
*Roadie users can overview the health of their ecosystem across services
*</p><p>Roadie is currently working with a handful of design partners to develop Tech Insights to ensure it brings out the most value for leadership and developers.</p>
]]></content:encoded></item><item><title><![CDATA[Roadie now keeps the catalog in sync with your GitHub with the webhooks API!]]></title><link>https://roadie.io/blog/roadie-now-keeps-the-catalog-in-sync-with-your-github-with-the-webhooks-api/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-now-keeps-the-catalog-in-sync-with-your-github-with-the-webhooks-api/</guid><pubDate>Tue, 04 Oct 2022 22:00:00 GMT</pubDate><description><![CDATA[As a Roadie user, editing a Backstage YAML file in your GitHub repo will result in those changes almost immediately appearing in your Catalog. Our team designed and implemented a GitHub integration based on webhooks to replace the default poll-based discovery shipped in Backstage.]]></description><content:encoded><![CDATA[<p>As a Roadie user, editing a Backstage YAML file in your GitHub repo will result in those changes almost immediately appearing in your Catalog. Our team designed and implemented a GitHub integration based on webhooks to replace the default poll-based discovery shipped in Backstage.</p><p>Previously, we relied on Backstage’s default behavior for keeping the catalog up to date. This was a pull-based approach where Roadie polls your GitHub and kept the catalog in sync.</p><p>By default, the polling interval was set to 2 minutes. This is a long time to wait while you are in the middle of editing your scaffolder templates and still figuring things out.</p><p>Polling large catalogs would also result in many requests being sent to the GitHub APIs. This could result in rate limiting and a degraded user experience.</p><p>With this release, we are utilizing the GitHub webhooks API to get notifications when you change your Backstage YAML files.</p><p>We also added a new feature to the GitHub integrations settings page to be able to manually trigger a sync with your GitHub repos. This is useful if you added a catalog-info.yaml files to a repository where you did not have the Roadie GitHub app installed.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/73rsjT80nq4xgl73H9bo9b/5c92316a4c367d359116a4e112be6692/catalog-instant.png" alt="Screenshot: Catalog settings enabling webhooks "></p><h2>The benefits for Roadie users</h2><p>We believe this new webhooks based approach brings a number of benefits:</p><ol><li>We eliminated the usage of Location entities for discovery. We can spare the additional fetches for the whole organization repositories for every configured github-discovery Location entity.</li><li>It results in an almost immediate reaction from the catalog when you push something to your configured branches.</li><li>Now you can safely rename your catalog files in your GitHub repo. (This will result in a deleted filename for the old file and an added one for the new file)</li><li>It can refresh your API entity when the referenced e.g. openapi/grpc file is changed (if it is hosted in GitHub)</li></ol><p>Read on for more technical juicy details about the implementation.</p><h2>Tech Stuff</h2><p>Let me walk you through this journey to implement and roll out instant updates for Roadie users!</p><h3>The Past</h3><p>Before webhooks, we relied on the default implementation of auto-discovery from Backstage. This used the processing loop, and the provided processors to ingest entities from GitHub organizations. We used the <code>GithubDiscoveryProcessor</code> from OSS Backstage.</p><p>It works like this:</p><ul><li>This processor is configured and added to your catalog builder.</li><li>This processor is evaluated on every entity when it is processed that should this run or not.</li><li>This processor will execute its logic when an entity is processed that is a Location entity and its type is <code>github-discovery</code></li><li>It fetches all of your repositories from your organization then creates an optional Location entity for every repo.</li><li>These Location entities then will be processed and they are going to fetch the files and emit the entities that they found in the target paths.</li></ul><p>This processor has 2 main drawbacks:</p><ol><li>It is tied to the processing loop so you cannot set a different interval for it. This is a problem if you’re being rate limited by GitHub. There is no option to lengthen the loop duration.</li><li>It makes unnecessary requests towards GitHub API by fetching all of the repositories every time it runs.</li></ol><h3>The present</h3><p>We built a Roadie-specific entity provider which can act on the incoming <a href="https://docs.github.com/en/rest/webhooks">GitHub webhooks</a>.</p><p>It uses your configured Roadie Backstage GitHub app to forward the GitHub push events from your organization’s repositories to our servers.</p><p>The GitHub webhooks API sends Roadie the <strong>modified</strong>, <strong>added</strong>, <strong>deleted</strong> array of files. This indicates what happened in this event. The provider differentiates the modified and added/deleted events.</p><p><strong>When a modification event happens:</strong></p><ul><li>we get the event from GitHub</li><li>Get all the modified filenames in this push event</li><li>trigger a refresh on the Backstage database</li></ul><p>We will try to refresh with every filename and let the database decide if there was a matching entity to schedule the refresh. This was implemented this way because it enables us to provide an instant refresh on API entities when a referenced <code>$text</code> placeholder’s value is managed in GitHub and you change that open API descriptor we will refresh the API entity that it belongs to.</p><p><strong>When the event contains additions/deletions:</strong></p><ul><li>Get the event from GitHub</li><li>Construct a set of filenames for added files</li><li>Construct a set of filenames for deleted files</li><li>Filter these based on the configuration</li><li>Create an optional Location entity for these files with proper location annotations</li></ul><p>This path is pretty similar to the previous discovery. We are creating Location entities where the location’s <code>spec.target</code> will point to the file that we got in the GitHub event. For every added/deleted file that matches your configuration and we rely on the processing loop fetch and emit the actual content of the file.</p><p>We removed the polling for entities, and we disabled the possibility to add github-discovery Location entities to the catalog.</p><h2>Some things to iron out</h2><p>With the current implementation, some edge cases can be confusing or not work as expected.</p><p><strong>Multiple entities in one file (catalog-info.yaml)</strong></p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: valid-same-file-entity-1
spec:
  type: library
  owner: user:kissmikijr
  lifecycle: production
---
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: valid-same-file-entity-2
spec:
  type: library
  owner: user:kissmikijr
  lifecycle: production
</code></pre><p>This approach you can cause undesired behaviour if you end up with a validation error in one of your Entities.</p><p>If this happens the catalog will create a location entity which will point to this file, however, the processing of the entities won’t finish, this means Backstage will not store the correct information to be able to trigger refreshes and even though you fix your validation errors in the next commit you’ll need to wait for the regular processing loop to handle the refresh.</p><p><strong>Registering an Entity via the <code>/register-existing-component</code> page</strong></p><p>In this case, because this entity was not added to the catalog via the webhooks, when you delete this file from your GitHub repo the webhook won’t be able to remove it.</p><p>Updating this entity will be instant.</p><p><strong>Using the <code>Location</code> kind</strong></p><p>If you used Location entities before in your repository to register this and let the processing loop find the other targets.</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Location
metadata:
  name: roadie-backstage-plugins
spec:
  targets:
    - ./plugins/**/catalog-info.yaml
    - ./utils/**/catalog-info.yaml
</code></pre><p>The automatic refreshes will not work on the target entities.</p><p>This is a shortcoming of the open source implementation of the refresh handling. It is planned to be fixed. Until then, the best advice is to ditch the top-level Locations and configure the targets in the <code>/administration/settings/integrations/github</code> configuration page.</p><p>In this case, you’d add two entries to the Targets:</p><pre><code># Entry 1
https://github.com/RoadieHQ/roadie-backstage-plugins/blob/-/plugins/**/catalog-info.yaml 

# Entry 2
https://github.com/RoadieHQ/roadie-backstage-plugins/blob/-/utils/**/catalog-info.yaml 
</code></pre><p>To configure your targets check out the <a href="https://roadie.io/docs/integrations/github-discovery/">documentation</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Measuring and Improving Software Quality with Roadie Backstage]]></title><link>https://roadie.io/blog/tech-insights-for-roadie-backstage/</link><guid isPermaLink="false">https://roadie.io/blog/tech-insights-for-roadie-backstage/</guid><pubDate>Thu, 01 Sep 2022 10:00:00 GMT</pubDate><description><![CDATA[Improve software quality and set minimum standards in your engineering organization with Roadie's new Tech Insights feature. Coming soon.]]></description><content:encoded><![CDATA[<p>We are building Tech Insights on top of Roadie Backstage. It will help you ensure that all of your software assets have the support and maintenance they need in order to keep you secure, compliant, productive, agile, and available.</p><p>Our customers will be able to use Tech Insights to spot unloved services in production, find teams who need more support in order to produce top quality software, and help engineering organizations understand where the bar is.</p><p>Roadie users will be able to create scorecards which define what it means to build quality software within their company. They will be able to apply these scorecards to software in the Backstage catalog. They will see reports to help identify software which needs improvement, and they will be empowered to work with owners to deliver said improvements.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2r4E6j5HLOxswVTnp3JPpQ/0390dbb3e1ac13125554347bfeb48592/scorecard.png" alt="scorecard wireframe"></p><p>Tech insights will be the first major proprietary Roadie feature. It is in development as I write this post, and we are already working with design partners on rollout.</p><p>Throughout our work with dozens of customers, it has become clear that implementing Backstage and cataloging software is only the first step on a longer journey for most organizations.</p><p>If you would like to supercharge your Backstage experience with this feature, please <a href="/request-demo/">request a demo</a> of Roadie.</p><h2>Measuring the quality of software</h2><p>The first step towards determining whether or not software is of a high enough quality, is to first define what “quality” means in your organization.</p><p>Quality is an aggregate measure which accounts for many dimensions. Quality software is usually easy to operate in production, easy to develop, secure and compliant, amongst many other attributes.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3O64QrFLrl9TMloXlZU2L9/44ba87b76181e401513c7871d97a035a/quality_software.png" alt="quality software"></p><p>Each of these individual measures is composed of its own factors, each of which is likely tracked in a different tool.</p><p>Here are some simple attributes you might check to get a measure of operability:</p><ol><li>Software which is easy to operate in production usually has adequate uptime. This might be measured against a service level objective recorded in Datadog.</li><li>It is usually connected to a logging tool. This might be Datadog again, or it might be a self-hosted ELK stack.</li><li>The software should have an on-call rota associated with it. This would be stored in Pagerduty.</li><li>It should have an owner who is responsible for keeping the software up-to-date and deploying it frequently. This would be stored in the Backstage catalog.</li><li>it should have runbooks defined. These might be in a Confluence document.</li></ol><p>Once we factor in a few of these attributes, our measurement of good software starts to look more complex.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1KYWhQ5QyGlQ15E7UWvJPW/963d3680e16327fb5bda54a2e9293a80/dive_into_operable_quality_software.png" alt="dive into operable quality software"></p><p>Run the same process for scalable, secure, easy to develop, compliant and any other top level factors and you quickly start to realise that you have to check 40+ factors and integrate with 20+ tools to determine if a piece of software is high quality or not.</p><h3>Is there <em>enough</em> quality?</h3><p>The next level of complexity comes from attempting to determine whether or not the current level of quality is “enough”.</p><p>The required level of quality will vary depending on the purpose and exposure of the software in question. Take uptime for example. Your payment processing service might need 5 nines of uptime, while some internal reporting tool might be down for 5 months before anyone notices, and that might be ok.</p><p>To account for this in the measurement, you will need to attach different expectations to different software components.</p><p>This is frequently done by categorizing and dividing software into tiers. Software in tier zero might be mission critical and require the highest levels of quality. Tier 3 might be far less important from an overall business standpoint, and will be subject to less stringent requirements.</p><p>We can contextualize the measurements we defined above by defining quality levels, as shown in the diagram below.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/Uofx3NwrOxeUGXCJcaqxu/6a7358a7887587ba9cd366a8c0220744/quality_levels.png" alt="quality levels"></p><h2>Driving improvements in software quality</h2><p>Invariably, when you define software excellence and compare the current state of the world against it, you will find gaps. This is a good thing. The next step should be to encourage the improvement of this software.</p><p>There can be a multitude of legitimate reasons why a piece of software doesn’t meet a given quality bar. Teams will constantly streamline their priorities to try to match those of the wider business, and quality will sometimes suffer.</p><p>The important thing is that this quality gap is visible. We may actively choose not to rectify it, but we should at least be able to spot it and have that conversation.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3hI8yVvubJ3e3aZT9sMIov/6ec231cffa6c83f45d9531092ba1e4f1/product_quality_matrix.png" alt="product quality matrix"></p><p>Teams should be able to independently make realizations like “everyone else in the org seems to have passed the security level 1 bar… perhaps we should think about doing some security work soon”.</p><p>We believe Roadie Backstage has a role to play by increasing visibility into organizational software quality. We should nudge people towards increasing software quality, in a thoughtful and conscientious way.</p><h2>We’re working on Tech Insights at Roadie</h2><ol><li>You will be able to write Checks which continually test the software in your catalog in an automated way. These Checks can be anything from “Does the software have an SLO set in Datadog” to “Is the log4j version semantically greater than 2.16.0”. Checks can leverage custom data or data from the APIs of standard SaaS tools that you already use.</li><li>You will be able to group Checks into Scorecards, and target those scorecards to subsets of the software in the catalog. Perhaps you want to target Tier 0 and Tier 1 Java services. Perhaps you want to target Python services in the data science org.</li><li>You will be able to slice and dice reports of software quality so that you can find software which needs more support in order to meet the quality bar.</li><li>Teams will be carefully and conscientiously nudged towards improving the quality of their software over time.</li></ol><p>We’re already rolling out early versions of this software to our design partners. If you would like to learn more, or participate in the betas, <a href="/request-demo/">please request a demo of Roadie Backstage today</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Roadie Has Achieved SOC2 Type 2 Compliance ]]></title><link>https://roadie.io/blog/soc2-compliance/</link><guid isPermaLink="false">https://roadie.io/blog/soc2-compliance/</guid><pubDate>Fri, 15 Jul 2022 14:00:00 GMT</pubDate><description><![CDATA[We have achieved SOC2 Type 2 compliance. We have a set of mature and robust security and availability practices at Roadie and wanted to validate them against industry standards. We see this achievement of SOC2 Type 2 compliance as a milestone in our ever-improving security journey.]]></description><content:encoded><![CDATA[<p>We are delighted to announce that we have achieved SOC2 Type 2 compliance across three areas: Confidentiality, Security, and Availability.</p><p>From the start, we built Roadie with security, availability, and privacy as fundamental values, and we recognize them as essential to our success. Our team is made up of people who have worked in large enterprise companies and scale-ups such as Workday, Spotify, and Intercom, so we are no strangers to enabling and ensuring good security practices. We understand that if you build these processes early, they will grow with your company and help you scale securely and reliably.</p><p>We have a set of mature and robust security and availability practices at Roadie and wanted to validate them against industry standards. We see this achievement of SOC2 Type 2 compliance as a milestone in our ever-improving security journey.</p><h2>What this means</h2><p>A SOC2 Type 2 report is one of the most well-known IT security and compliance auditing accreditations. It is highly comprehensive: it doesn’t look at any one business area in isolation.</p><p>An accredited external audit firm scrutinized Roadie’s engineering practices—such as our database security controls, monitoring and alarming, and testing methods—as well as the ecosystem within which these practices live. Meaning, that we train our staff, we care about who we hire, we restrict access to data, and we review all vendors that we choose to use.</p><h2>Why is this important?</h2><p>Simply put, we want this report to give our customers even more peace of mind when choosing to trust Roadie with their data. The SOC2 Type 2 report shows that we have opened our doors to a third-party and allowed them to test and scrutinize our security and availability practices.</p><p>As the old but fitting adage goes, “trust but verify.” Our goal is to provide you with confidence that we have robust, mature, and industry-standard practices that are monitored and updated frequently.</p><h2>A milestone, not the end goal</h2><p>The comprehensive nature of this audit affirmed our confidence that we are set up with excellent foundational security and availability practices which we can continue to build on as we scale.</p><p>We will continue to keep our compliance with SOC2 up to date, and we will undergo an annual audit to test our SOC2 compliance. We also aim to expand our compliance to additional standards as we grow.</p><p>If you would like to see a copy of our SOC2 Type 2 report, reach out to <a href="mailto:legal@roadie.io">legal@roadie.io</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Backstage Home plugin on Roadie]]></title><link>https://roadie.io/blog/backstage-home-plugin-on-roadie/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-home-plugin-on-roadie/</guid><pubDate>Mon, 04 Apr 2022 15:00:00 GMT</pubDate><description><![CDATA[The Backstage Home plugin is now available to all Roadie Backstage users. Learn what it is, why it exists, and how to use it on Roadie.]]></description><content:encoded><![CDATA[<p>The Home plugin provides a view on what’s important to the currently logged in Backstage user.</p><p>Until recently, the catalog has been the primary way to interact with Backstage. You could pick a software component from the list, and quickly get a sense of how that component is doing, who owns it etc.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2XMqG6pZx54pnBcA1dBtma/5ef492e0f82ec8983574fde7db01a05b/catalog-overview-page.png" alt="catalog-overview-page"></p><p>But a single software engineer typically has to manage and track multiple components at once and it seems inconvenient to need to visit the Overview page of each individual component in order to get a sense of what’s happening. Why can’t the info be co-located in one place?</p><p>Software engineers don’t just interface with software either. They also need up-to-date information on the latest goings on in their organization, and they even have to (begrudgingly 😃) attend meetings sometime. Shouldn’t Backstage also be plugged into this part of an engineers job?</p><h2>The pulse of the team</h2><p>Tackling these problems is the remit of the Home plugin. It’s a Backstage page which you can visit first thing in the morning to take the pulse of the team. It’s a page for bringing together information from many different systems, in a way which is most relevant to you!</p><p>Here’s how the Home plugin looks for me in Roadie Backstage:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2xJRM0p5M6MIkeg6u4N7NX/19bb36c861c1354fdc6e5de821d10c19/home-set-up.png" alt="home-set-up"></p><p>As you can see, important information from a number of sources here.</p><ol><li>I have <strong>quick access to the components that I own</strong>, and the entities that I have “starred” in the catalog. I frequently use these components when I’m doing Backstage demos, so it’s useful to have them within easy reach.</li><li><strong>My calendar is available</strong>, thanks to the Google Calendar plugin created by Alex Rybchenko from Box (<a href="https://github.com/backstage/backstage/pull/9719">#9719</a>). I can even click the zoom links to go directly to a meeting.</li><li>I can see <strong>my open review and pull requests</strong> on GitHub. If the team need me to review something, I can see that any time I log into Backstage.</li><li>Finally I have a <strong>Roadie News widget</strong>. This displays content from a static markdown URL hosted on GitHub. It can be used to share organization news or information about upcoming events. We’ve also seen our customers use it to bookmark links to important pages outside Backstage.</li></ol><h2>Where this is going</h2><p>The Home plugin is super new but we’re already seeing amazing demand from our customer base.</p><p>At Roadie, we’re hard at work producing more widgets for the Home page. We’d love to use this space to display the Jira tickets you’re working on or keep you updated on the builds running on your PRs for example.</p><p>Hopefully, a good portion of the 60+ open-source Backstage plugins which already exist will end up having Home widgets added.</p><p>Once this community work happens, the vision of having a single place to get the pulse of your org, your team, your software and your own work will be realised.</p><h2>Learn more</h2><p>The Home plugin is available to all Roadie Backstage users. You may need to enable it in you Administration area before it is visible.</p><p>Learn more in our <a href="/docs/integrations/home-page/">Home plugin documentation</a>.</p>
]]></content:encoded></item><item><title><![CDATA[The Backstage scaffolder is now generally available on Roadie]]></title><link>https://roadie.io/blog/roadie-backstage-scaffolder-launch/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-backstage-scaffolder-launch/</guid><pubDate>Wed, 09 Mar 2022 16:30:00 GMT</pubDate><description><![CDATA[When we first launched hosted Backstage, we made the hard decision to disable the scaffolder for security reasons. Today we are launching a re-designed and hardened scaffolder architecture which is safe for Roadie customers to use.
]]></description><content:encoded><![CDATA[<h2>Introduction</h2><p>At Roadie, we think deeply about the security of every feature we roll out. This has sometimes slowed us down, or meant that we've had to run without some features or plugins available. For example, we've written at length about steps we take to <a href="/blog/avoid-leaking-github-org-data/">properly authenticate access for GitHub Apps</a>.</p><p>Last year, to ensure our customers security, we made the hard decision to launch Roadie Backstage with the scaffolder disabled. This cost us some customers over the past 6 months, but it was worth it for security.</p><p>Today, after months of hard work, we are proud to launch scaffolder support with a completely re-designed and hardened architecture. This new architecture ensures that scaffolder tasks are run in an isolated and ephemeral environment which keeps customer data secure.</p><p>The scaffolder is available on all Roadie Backstage environments today.</p><p>To use it, <a href="/free-trial/">start a free trial here</a>, then check out <a href="/docs/getting-started/scaffolding-components/">our docs for the scaffolder</a>, and read our <a href="/blog/roadie-backstage-scaffolder-website/">walkthrough to learn how to write scaffolder templates</a>.</p><h2>What is the scaffolder?</h2><p>Imagine you're an engineer looking to create a new microservice. You want to get started as quickly as possible, with minimal boilerplate and red-tape to jump through. At the same time, engineering organizations benefit from having consistency in production, and often put gates in place to enforce it.</p><p>Instead of creating blockers for engineering teams, Spotify use the scaffolder and quickly create new microservices, while helping to ensure that production remains mostly consistent.</p><p>Engineers can choose a pre-defined software template, fill out a few form fields to provide values like the name of the GitHub repo that the new service will occupy, and click a button to run the template and create a the new service.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3hVEO7edySTmVrPxis1Nn1/67f958dcebdf5a404d69d808c32ce761/scaffolder.png" alt="A list of software templates in the Backstage interface. Each one has it&#x27;s own card with a title and description"></p><p>By making it easier to start new projects, your engineers can move more quicly, while preserving standaeds and reducing complexity in your tech ecosystem.</p><p>See it in action in this 3 minute video where we demo a scaffolder template which creates a GitHub pages website.</p><h2>Roadie's scaffolder architecture</h2><p>Before starting the work to enable the scaffolder, we audited the open source scaffolder actions one by one and found that the Backstage Scaffolder was running them on the API backend process of Backstage. Scaffolder tasks technically had access to the same resources available to the API backend.</p><p>Tasks that copied files or accessed network resources might are a little risky even when running Backstage inside your corporate firewall. On a SaaS platform like Roadie, they are unacceptable.</p><p>To isolate our scaffolder jobs, we run them in a separate process in a private network on AWS ECS. A single container task is spun up for each execution and destroyed once it completes.</p><p>The container can only access to its own Backstage database and the public internet. The container does not have access to the network services available to the API backend process and it cannot do things like copy files from the local API backend services file system.</p><p>We already support the most frequently used scaffolder actions on Roadie. You can fetch pre-defined templates, use them to create GitHub repositories and write to the Backstage catalog.</p><p>You can even send HTTP requests to the public internet, using <a href="/backstage/plugins/scaffolder-http-requests/">an open source library we created</a>, so that newly templated microservices can automatically register with your SaaS tools like Circle CI or PagerDuty.</p><p>The full list of supported scaffolder actions is available inside your Roadie Backstage instance at <code>https://[sub-domain].roadie.so/create/actions</code>.</p><h2>Next steps</h2><p>We're going to continue working on the scaffolder to make it faster, more featured and more secure. We're already thinking about features such as custom container support and fully custom scaffolder actions.</p><p>We've also begun publishing more <a href="https://github.com/RoadieHQ/roadie-backstage-plugins/tree/main/plugins/scaffolder-actions">open-source modules for the scaffolder</a>. We've already created a general utils package and some dedicated AWS actions.</p><p>If there's something you'd like to see, please reach out on <a href="https://discord.gg/W3qEMhmx4f">our public Discord channel</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Deploy a GitHub pages website with the Roadie Backstage scaffolder]]></title><link>https://roadie.io/blog/roadie-backstage-scaffolder-website/</link><guid isPermaLink="false">https://roadie.io/blog/roadie-backstage-scaffolder-website/</guid><pubDate>Fri, 25 Feb 2022 14:00:00 GMT</pubDate><description><![CDATA[Learn how to deploy a GitHub pages website from a predefined skeleton. The Backstage Scaffolder will automatically create a GitHub repo, fill it with templated code, call the GitHub API to enable GitHub pages and hook our new website to a monitoring tool.]]></description><content:encoded><![CDATA[<h2>Introduction</h2><p>In this tutorial we are going to learn how to deploy a customized GitHub pages website with the Backstage scaffolder on Roadie.</p><p>The website we create will be...</p><ol><li>Created from a prepared code skeleton.</li><li>Published from its own GitHub repo.</li><li>Automatically deployed to the public web via GitHub pages.</li><li>Customized with information we collect from the user when they run the scaffolder.</li><li>Available in the Backstage catalog so others can find it.</li><li>Automatically be hooked up to a monitoring service.</li></ol><p>The skills you will learn include:</p><ol><li>How to write scaffolder code skeletons and templates.</li><li>How to collect input from the user in the scaffolder UI, and pass it through to the codebase.</li><li>How to make HTTP requests from the scaffolder, and use the Roadie Backstage proxy to securely add authentication to the requests.</li><li>How to create proxies in Roadie.</li></ol><p>This tutorial involces some steps which are specific to Roadie and won’t work on a vanilla Backstage installation. To try Roadie, <a href="/free-trial/">apply for a free trial on our website</a>. You will also see the best results if you are an admin of your Roadie Backstage instance.</p><h3>Just show me it working!</h3><p>If you'd prefer to see the scaffolder in action before working through this tutorial, you can watch this short demo video. It demonstrates the same GitHub Pages scaffolder action we create in the step that follow.</p><p>This video is part of the <a href="/backstage-bites/">Backstage Bites series</a>.</p><h2>Step 1: Create a GitHub repo containing code from a skeleton directory.</h2><p>A Backstage Scaffolder template will usually consist of at least two things:</p><ol><li>A skeleton code structure, which is used to stamp out new websites, services, or other types of component.</li><li>A <code>template.yaml</code> file which describes the steps to run during the scaffolding process.</li></ol><h3>Create a basic skeleton and template</h3><p>To get started, make a directory structure to hold our template and the code skeleton we will use to stamp out our GitHub pages website.</p><pre><code>.
├── skeleton
│   └── index.html
└── template.yaml
</code></pre><p>Put the following HTML into the <code>index.html</code>:</p><pre><code class="language-html">&#x3C;!DOCTYPE html>
&#x3C;html lang="en">
&#x3C;head>
    &#x3C;meta charset="UTF-8">
    &#x3C;meta name="viewport" content="width=device-width, initial-scale=1.0">
    &#x3C;title>my website&#x3C;/title>
&#x3C;/head>
&#x3C;body>
  &#x3C;h1>Welcome to my website&#x3C;/h1>
  &#x3C;p>This website was created by following the &#x3C;a href="https://roadie.io/blog/roadie-backstage-scaffolder-website/">Backstage scaffolder tutorial published by Roadie&#x3C;/a>&#x3C;/p>
&#x3C;/body>
&#x3C;/html></code></pre><p>And place the following YAML into <code>template.yaml</code>. We’ll explore how this works at a later point in the tutorial. For now, let’s just get it working.</p><p>Optionally, you may wish to edit the <code>spec.owner</code> property to refernce the name of a real Group (aka. a team) in your Backstage catalog.</p><pre><code class="language-yaml">apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: github-pages-website
  title: GitHub Pages Website
  description: Create a static HTML website and publish it via GitHub pages.

spec:
  owner: my-group-name
  type: website
  parameters:
    - title: Choose a Source Control Management tool to store your new website in.
      required:
        - repoUrl
      properties: 
        repoUrl:
          title: Repository Location
          type: string
          ui:field: RepoUrlPicker
          ui:options:
            allowedHosts:
              - github.com

  steps:
    - id: template
      name: Fetch Skeleton + Template
      action: fetch:template
      input:
        url: ./skeleton

    - id: publishToGitHub
      name: Publish to GitHub
      action: publish:github
      input:
        allowedHosts: ['github.com']
        # This will be used as the repo description on GitHub.
        description: 'A static HTML website. Just like the good old days.'
        repoUrl: ${{ parameters.repoUrl }}
        defaultBranch: main
        repoVisibility: public

  output:
    remoteUrl: '{{ steps.publishToGitHub.output.remoteUrl }}'
</code></pre><p>Now that we have a basic <code>template.yaml</code> and skeleton, we need to push it to GitHub where the Roadie Backstage scaffolder can access it. You must install the Roadie GitHub App before proceeding. To learn how to do that, read <a href="/docs/getting-started/getting-started-for-admins/">our Getting Started docs</a>. Turn your directory structure into a Git repository and push it to GitHub.</p><h3>Import your template into Roadie Backstage</h3><p>Copy the URL of the <code>template.yaml</code> file on GitHub and go to Roadie Backstage to import it into the catalog.</p><p>On the main catalog view, click the “CREATE COMPONENT” button, then click the “REGISTER EXISTING COMPONENT” button.</p><p>Paste the URL of the template.yaml into the URL text field and click “ANALYZE”.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/56NGuSpvh0pRVk7YxGdYyR/ba810cc50965d13b14c0b13c7868257e/analyze-template.png" alt="analyze-template"></p><p>Assuming there are no errors, click the “IMPORT” button.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2ktyndMeIG9S7Takb5zxQ/b40712c585b769122008d5e84574a3ca/import-template.png" alt="import-template"></p><p>Great. Your template is now imported into Backstage and ready to use.</p><p>Click the “Create...” link in the Backstage sidebar to go back to the main scaffolder view. You should see your template is now available.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6RoplZoqC5vmvkOEhcdtdR/6ea4e3c2398792c6e70dc2309bbbd5e2/template-imported.png" alt="template-imported"></p><h3>Run the template</h3><p>Click “CHOOSE” and you should be taken to a form like this:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1FTqegk6x01ysm8yJo0Ixg/1ff3801ca572131afbfd04712ca83c00/choose-source-code-tool.png" alt="choose-source-code-tool"></p><p>Fill in the name of your GitHub Org in the “Owner” field. This template will use the Roadie Backstage GitHub app to run the scaffolder, so the Org must be the same org that the Roadie Backstage GitHub app is installed into.</p><p>Choose an arbitrary GitHub repository name and fill it into the “Repository” field. Something like “my-cool-website” will work. This repository doesn’t need to exist. The scaffolder will create it as it runs through the steps.</p><p>Click “NEXT STEP” and then “CREATE”.</p><p>After 15 or 20 seconds, you should see scaffolder logs being to appear. These logs will give you information about what the scaffolder is doing, and will display errors if failures occur.</p><p>Assuming everything worked correctly, you will see checkmarks appear in the left column of the scaffolder interface as it does its thing.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6mvmeIt6GQIKBZJXX4A3I4/a04050ad59347752baf62b5be124b29d/task-activity-page.png" alt="task-activity-page"></p><p>Click the link labelled “Repo” in the left sidebar to visit the GitHub repo which we just created. You should see it contains an <code>index.html</code> with the same content we placed in the HTML file earlier.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4KH8i8Ig7tLOC6JBYJNyHN/1a4e0e5edc6233a3422d1397d90d31fe/index-html-created-on-github.png" alt="index-html-created-on-github"></p><h2>Step 2: Publish the site to GitHub Pages</h2><p>Having a repo with some code in it is great, but it’s not a website until you can visit it in the browser.</p><p>To turn this HTML into a GitHub pages website, we can manually visit the settings of our newly created GitHub repo and click some buttons to publish the site to the web. We don’t like manual work though. Let’s see if we can do this in an automated fashion instead.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6x8NfmAnjWSZQ1ntBXsI4R/804a04a56d89b79402a7532bf7a72ce5/manual-turn-on-github-pages.png" alt="manual-turn-on-github-pages"></p><p>To make this work, we’re going to use the Open Source <code>http:backstage:request</code> scaffolder action from Roadie. You can see this package on <a href="https://github.com/RoadieHQ/roadie-backstage-plugins/tree/main/plugins/scaffolder-actions/scaffolder-backend-module-http-request">GitHub.</a> Don’t worry about following any of the installation steps. This package is already installed in Roadie Backstage.</p><p>This scaffolder action allows us to send HTTP requests from our <code>template.yaml</code>. We will use it it hit the GitHub API endpoint which can <a href="https://docs.github.com/en/rest/reference/pages#create-a-github-pages-site">turn on GitHub pages for a GitHub repository</a>.</p><h3>Using the HTTP Backstage Request scaffolder action</h3><p>Add a new step called <code>publishToWeb</code> to your <code>template.yaml</code>, commit it and push it to GitHub.</p><pre><code class="language-yaml"># ... spec and other properties above this point
  steps:
    # ... existing template and publishToGitHub steps here.

    - id: publishToWeb
      name: Publish to web with GitHub Pages
      action: http:backstage:request
      input:
        method: 'POST'
        path: /api/proxy/github/api/repos/${{ (parameters.repoUrl | parseRepoUrl)["owner"] }}/${{ (parameters.repoUrl | parseRepoUrl)["repo"] }}/pages
        headers:
          content-type: 'application/json'
        body:
          source:
            branch: main
            path: '/'
</code></pre><p>Let’s go through this YAML to learn what it’s doing.</p><pre><code class="language-yaml">- id: publishToWeb
  name: Publish to web with GitHub Pages
  action: http:backstage:request
</code></pre><p>The first three lines are relatively self explanatory. We’re adding a new step with an <code>id</code> and a human readable <code>name</code>. Then we’re declaring that we’re going to use the <code>http:backstage:request</code> action. If you ever want a list of all the available actions, just visit <code>/create/actions</code> inside Roadie Backstage.</p><pre><code class="language-yaml">input:
  method: 'POST'
  path: /api/proxy/github/api/repos/${{ (parameters.repoUrl | parseRepoUrl)["owner"] }}/${{ (parameters.repoUrl | parseRepoUrl)["repo"] }}/pages
  headers:
    content-type: 'application/json'
  body:
    source:
      branch: main
      path: '/'
</code></pre><p>The <code>input</code> is passed to the <code>http:backstage:request</code> action when it runs. Our input says we want to make a <code>POST</code> request with a particular <code>body</code> which satisfies the requirements of the GitHub API. We also want to pass some <code>headers</code>. These are the same details you might expect to see if we were calling this API endpoint with <code>curl</code> or Postman.</p><p>The <code>path</code> is interesting. Instead of calling the GitHub API directly, we’re proxying through the Roadie Backstage backend. As the request passes through Roadie Backstage, the proxy will transparently add an authentication token to the request. This means we don’t have to hardcode the authentication token into the <code>template.yaml</code>, or ask the user to provide it at runtime.</p><p>The first three parts of the path, <code>/api/proxy/github/api</code> , contain the location of the proxy on the Backstage API. Everything after <code>github/api</code> is the path of the API endpoint we want to hit on GitHub.</p><p>The GitHub path we want to call is <code>/repos/[owner]/[repository]/pages</code>. Of course, we don’t know the name of the owner and repository until the user runs the template. To work around this, we use a special syntax to parse them out at runtime.</p><p>We can get the “Owner” the user types in with <code>${{ (parameters.repoUrl | parseRepoUrl)["owner"] }}</code> and we can get the “Repository” the user types in with <code>${{ (parameters.repoUrl | parseRepoUrl)["repository"] }}</code>.</p><h3>Storing the GITHUB_TOKEN securely</h3><p>Our proxy knows how to forward requests over to GitHub, but it doesn’t yet have the correct  token to be able to successfully authenticate. Lets securely store an authentication token in Roadie so the proxy can use it.</p><p>Click “Secrets” in the left sidebar of the SETTINGS tab on Roadie Backstage.</p><p>At the top of the list you should see a row for “GITHUB_TOKEN”. Visit your GitHub account settings and <a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token">create a personal access token</a> which has the <code>pages</code> scope. Be sure to authorize it on your GitHub org if you use SSO.</p><p>Click the pencil icon on the secrets page to open a dialog box. Paste in the personal access token you just created and click “SAVE”.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7wz1toqR5yzzzFxQC4CUmd/2819f288da4b0d7f5b4c1cfb54d8c20a/save-github-token-secret.png" alt="save-github-token-secret"></p><p>Roadie Backstage has to restart to enable this token. This can take a few seconds. Wait for the GITHUB_TOKEN table row to display a green status indicator and the text “Ready” before proceeding.</p><h3>Putting it all together</h3><p>Now we have a template which sends a HTTP request to a proxy and we have a secret which the proxy will use to authenticate with GitHub.</p><p>Since we pushed the new version of our <code>template.yaml</code> a while ago, the Backstage catalog should have already looped over it and picked up the new version with the call to the GitHub pages API in it.</p><p>Go back to your scaffolder template and fill out the details again. Remember, GitHub repos must have unique names, so if you haven’t deleted the repo you created in step 1, you’ll have to choose a different name this time.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6w2ThGaScwLr8oTRkVq9rC/aadd293d0263feb4451c64ba2a3bb373/choose-source-code-tool-2.png" alt="choose-source-code-tool-2"></p><p>Click “NEXT STEP” and “CREATE”.</p><p>This time you should see that we have 3 steps due to run in the left sidebar of the Task Activity page. The scaffolder is going to:</p><ol><li>Fetch Skeleton + Template</li><li>Publish to GitHub</li><li>Publish to web with GitHub Pages</li></ol><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4vPLT5ySSl8u4mxfnAKEqo/cd7046a5f1e8e94f3e1df29bc2bc8519/task-activity-page-2.png" alt="task-activity-page-2"></p><p>Success! Now visit the repo you just created on GitHub and go to the Pages part of the Settings.</p><p>Visit the following URL to see your new website:</p><pre><code>https://[owner].github.io/[repository]/
</code></pre><p>It doesn’t look like much but it’s a start!</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2r62E8xWLmQymLQ3kFRz2f/f061babfe71f5cfdc20e1d6b50f6688e/website-is-live.png" alt="website-is-live"></p><h2>Step 3: Customize the website</h2><p>Creating a website from a template is great, but it would be better if we were able to customize it a little. We don’t want our website looking exactly like hundreds of other scaffolded websites 😃.</p><p>For customization, the scaffolder lets you pass values through to the template as it is being processed. To see how this works, let’s collect a website name from the user and use it in the title tag and main heading.</p><p>First, we have to update the <code>index.html</code> in our website to render a value called <code>website_name</code>. This will be provided by the user when they run the scaffolder task.</p><pre><code class="language-html">&#x3C;!DOCTYPE html>
&#x3C;html lang="en">
&#x3C;head>
    &#x3C;meta charset="UTF-8">
    &#x3C;meta name="viewport" content="width=device-width, initial-scale=1.0">
    &#x3C;title>${{ values.website_name | title }}&#x3C;/title>
&#x3C;/head>
&#x3C;body>
  &#x3C;h1>Welcome to ${{ values.website_name }}&#x3C;/h1>
  &#x3C;p>This website was created by following the &#x3C;a href="https://roadie.io/blog/roadie-backstage-scaffolder-website/">Backstage scaffolder tutorial published by Roadie&#x3C;/a>&#x3C;/p>
&#x3C;/body>
&#x3C;/html></code></pre><p>We also need to update our template to make it ask the user for the website name.</p><p>Add the following YAML to the <code>template.yaml</code> in the <code>parameters</code> section, above the item which asks for a repo owner and repository.</p><pre><code class="language-yaml">- title: Provide some simple information
  required:
    - website_name
  properties:
    website_name:
      title: Website name
      type: string
      description: This will be displayed prominantly on your website and in the title tag.
</code></pre><p>Finally, we need to name and pass the collected website name value down into the templating step so it is available in the <code>index.html</code>.</p><pre><code class="language-yaml">steps:
  - id: template
    name: Fetch Skeleton + Template
    action: fetch:template
    input:
      url: ./skeleton
      values:
        website_name: ${{ parameters.website_name }}
</code></pre><p>Commit and push those changes. Once the catalog refreshes the template from GitHub, you should now see a new text field in the interface. If you don’t want to wait, you can manually reimport the template to refresh it.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/69UagRPRpd8ry5kmsGyKQ5/d26b5dfe591e66183abd72563471e2c4/website-name-parameter.png" alt="website-name-parameter"></p><p>If we fill that in with a website name and proceed through the rest of the steps as before, we should eventually end up with a new GitHub pages website which has a unique name. In this example, I typed “My Cool Site 3” into the text field.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/KjVzAmBb6iOAtSUxCFXD0/fe691e628044cf703172d1f3a5936969/customized-website.png" alt="customized-website"></p><h2>Step 4: Add a catalog-info to the new service</h2><p>This new website looks promising, but it would be even better if it was automatically added to the Roadie Backstage catalog so that other people in your company could discover it.</p><p>To make this happen, we can add a <code>catalog-info.yaml</code> to our skeleton codebase and pass some values into it to set sensible defaults. Once this YAML file is created, we can rely on Roadie Backstage’s auto-discovery mechanism to pick it up.</p><p>Create a file named <code>catalog-info.yaml</code> inside the skeleton directory we created earlier. Place the following content in it:</p><pre><code class="language-yaml">apiVersion: scaffolder.backstage.io/v1beta3
kind: Component
metadata:
  name: ${{ values.repo_name }}
  title: ${{ values.website_name | title }}
  description: A static HTML website. Just like the good old days.
  links:
    - url: https://${{ values.repo_owner}}.github.io/${{ values.repo_name }}/
      title: Live website
  annotations:
    github.com/project-slug: ${{ values.repo_owner }}/${{ values.repo_name }}
spec:
  type: website
  owner: engineering
  lifecycle: experimental
</code></pre><p>You can see that this file relies on a few values which contain information about the website we are templating. In order to have access to these values, we need to pass them in from the <code>template.yaml</code>. Edit the step named “template” to pass in the values.</p><pre><code class="language-yaml">steps:
  - id: template
    name: Fetch Skeleton + Template
    action: fetch:template
    input:
      url: ./skeleton
      values:
        website_name: ${{ parameters.website_name }}
        repo_name: ${{ (parameters.repoUrl | parseRepoUrl)["repo"] }}
        repo_owner: ${{ (parameters.repoUrl | parseRepoUrl)["owner"] }}
</code></pre><p>Next time we run this template, we will see that an extra file called <code>catalog-info.yaml</code> is created in the newly scaffolded GitHub repo. Roadie Backstage’s auto discovery will automatically find this file and use it to populate the Backstage catalog.</p><p>You may notice that we have specified some <code>links</code> in the <code>catalog-info.yaml</code>. This metadata will automatically populate the Links Backstage plugin with a link to our website on the public internet. Clicking the link will take the user to GitHub pages.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4GgahN2FSswkRME696KtTA/43b5b1f49dc675156fbae69663686478/link-to-live-github-pages.png" alt="link-to-live-github-pages"></p><p>If you don’t see this card, ask a Roadie Backstage admin to add the <code>EntityLinkCard</code> to your website component overview page.</p><h2>Step 5: Register the service with Better Uptime</h2><p>Calling the GitHub Pages API was relatively simple, because there was already a proxy in place to use. But what if we want to call an authenticated endpoint which doesn’t have a default proxy?</p><p>To see how this works, we’re going to create our own proxy for the website monitoring service, <a href="https://betteruptime.com">Better Uptime</a>, and then use the HTTP Request scaffolder action to setup a website ping when our scaffolder template is executed.</p><p>Roadie has no affiliation with Better Uptime. They have an easy to use UI and a free tier, which is useful for a tutorial like this.</p><h3>Create a monitor in Better Uptime</h3><p>The first thing you will need to do is to sign up for <a href="https://betteruptime.com">Better Uptime</a>. Their free plan allows more than enough features to get through this tutorial.</p><p>One you have an account you will need to take note of your API token. Click Integrations in the sidebar, then APIs in the secondary header. Click the Copy to clipboard link.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1ioyUG2Fm9rTdV7KhutZ83/ff430afa09b39e5687e9b3600f492ed3/better-uptime-api-key.png" alt="better-uptime-api-key"></p><p>To see how the Better Uptime API works, let’s create a monitor for <a href="https://google.com">https://google.com</a> as a test.</p><p>Run the following curl command, making sure to substitute <code>&#x3C;API_TOKEN></code> with the API token you copy/pasted from the Better Uptime integrations page.</p><pre><code class="language-bash">curl -X POST \
     --header 'Authorization: Bearer &#x3C;API_TOKEN>' \
     --header 'Content-Type: application/json' \
     --url https://betteruptime.com/api/v2/monitors 
     --data '{"url":"https://google.com"}'
</code></pre><p>Assuming that runs successfully, you should see a monitor has been created in Better Uptime.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/ki4bUvduz26gBtE83FL1b/527587bdf830c8b4673ca727d57fe109/better-uptime-google-monitor.png" alt="better-uptime-google-monitor"></p><p>Now that we know how to do that with curl and Google, let's see how to do it with the scaffolder and our GitHub Pages website.</p><h3>Calling the Better Uptime API from the scaffolder</h3><p>To set up monitoring for the website we create with the scaffolder, all we need to do is reimplement this curl command in our <code>template.yaml</code> using the HTTP Backstage Request module.</p><p>The YAML step to accomplish that looks like this:</p><pre><code class="language-yaml">steps:
  # all of the previous steps already discussed

  - id: registerInBetterUptime
      name: Register in Better Uptime
      action: http:backstage:request
      input:
        method: 'POST'
        path: /api/proxy/betteruptime/monitors
        headers:
          content-type: 'application/json'
        body:
          url: https://${{ (parameters.repoUrl | parseRepoUrl)["owner"] }}.github.io/${{ (parameters.repoUrl | parseRepoUrl)["repo"] }}/
</code></pre><p>In this case, are sending a POST request to the path <code>/api/proxy/betteruptime/monitors</code>. This path doesn’t currently exist, so let’s create it using Roadie’s proxy UI.</p><h3>Create a proxy for Better Uptime</h3><p>To create a proxy, click “Administration” at the bottom of the main sidebar, then go to the “SETTINGS” tab and click “Proxy” in the minor left sidebar.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4WSQwrCtntxAG7xe9fsekI/fb53cfc710136113b49f9c8b6bb440f7/empty-proxy-area.png" alt="empty-proxy-area"></p><p>Click “ADD PROXY” to add a proxy.</p><p>There are a lot of fields and options available here. We don’t need to understand all of them to complete this tutorial. So we will focus on the most important ones.</p><p>“Path” is the path on which we will be able to call our proxy. This must match a part of the <code>path</code> we added to our <code>template.yaml</code> earlier. In our case, this is <code>/betteruptime</code>.</p><p>“Target” is the URL which we want to forward request on to. The root of the Better Uptime API is at <code>https://betteruptime.com/api/v2</code>, so in this case that is our Target.</p><p>Click the Advanced Settings to see some more options.</p><p>The “Allowed Methods” field can be used to restrict the HTTP methods that the proxy will accept. For example, if we choose <code>GET</code> and <code>POST</code>, then the proxy will refuse to forward <code>DELETE</code> requests. In this case, we only need to enable <code>POST</code> requests.</p><p>The “Headers” section can be used to add headers to the request as it is being forwarded on to the Target. This is how we will add an authentication token to our requests.</p><p>To make a Better Uptime proxy for our scaffolder template request, fill out the following:</p><ol><li>Path = <code>/betteruptime</code></li><li>Target = <code>https://betteruptime.com/api/v2</code></li><li>Allowed Methods = <code>POST</code></li><li>Check the “Secure” and “Change Origin” checkboxes at the bottom of the page.</li></ol><p>To add the authentication token to the request, create a header called <code>authorization</code> and set its value to Bearer <code>${CUSTOMER_TOKEN_1}</code>. When the request is proxied, <code>${CUSTOMER_TOKEN_1}</code> will be replaced with an actual token, but we don’t want to store that here in plain text. Instead, we will use Roadie’s secure secrets functionality for this.</p><p>We also need to add a <code>Content-Type</code> header which specifies <code>application/json</code>.</p><p>Our proxy settings now look something like this:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2XknpFCHGa9NDOOMKdF8AE/dae36b209541ee15cf6b2465594384d0/better-uptime-proxy-config.png" alt="better-uptime-proxy-config"></p><p>Click “SAVE” and then “APPLY &#x26; RESTART” to create the proxy. You will need to wait approximately 3 minutes for the settings to be applied.</p><h3>Securely store the Better Uptime token</h3><p>In the previous section, we created an authorization header with the placeholder <code>${CUSTOMER_TOKEN_1}</code>. In order for this placeholder to be replaced with the actual authentication token, we have to store the token in the Roadie Secrets area, just like we did previously with the GITHUB_TOKEN.</p><p>To store the token securely, visit the Secrets page in the Roadie Backstage Administration area and set the <code>CUSTOMER_TOKEN_1</code> to the value of the API token we got from Better Uptime.</p><h3>Re-run the scaffolder job</h3><p>Assuming those steps all worked correctly, we should now be able to re-run our scaffolder template and create a GitHub pages website which is hooked up to Better Uptime.</p><p>Go back to the Roadie Backstage scaffolder, choose your GitHub pages template and fill in the website name and GitHub owner and repo name again. Remember to use unique values. Hit CREATE and watch the magic happen.</p><p>This time, 4 steps run together, including the new Better Uptime step.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/cS8MhqDQ1BBeoQQpnfkT0/f9b8f8bf3660ee1530b9d79a2a3142b9/task-activity-better-uptime.png" alt="task-activity-better-uptime"></p><p>If we check out the Better Uptime monitors page, we should see that our website is registered. Initially it will be reporting as “DOWN”. That happens because it takes GitHub pages a minute or two to publish the website. Once it goes live, Better Uptime will update and go green.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5whLrZKe7nq7GcmxOpnIFC/5f83fc1a8321dd1a4cecf2b89a4bd7fa/better-uptime-monitor.png" alt="better-uptime-monitor"></p><h2>Conclusion</h2><p>Now that you’ve learned how to use the templates, proxy and secrets together, you should be able to apply this knowledge to other tasks. For example,</p><ol><li>Try to add some basic TechDocs to the skeleton so that the docs show up in Backstage automatically.</li><li>Register the website with an error tracking tool like Sentry.</li><li>Enable branch protection on the GitHub repo created by your scaffolder template.</li></ol><p>The scaffolder is capable of automating all of these tasks to help your org be more productive and more standardized.</p><h3>Future work in Roadie</h3><p>We love this functionality already, but we also believe we have more work to do to make it the best it can be. Here are some of the areas we will be looking to improve next:</p><ol><li>Custom secrets so you can add your own secrets and choose their labels.</li><li>Custom scaffolder actions so you can run completely custom code in the scaffolder.</li><li>Open source scaffolder packages such as our utils package and an AWS package.</li></ol><p>To learn about these new features and releases as they roll out, please join our <a href="/backstage-weekly/">Backstage Weekly newsletter</a>.</p>
]]></content:encoded></item><item><title><![CDATA[10 reasons to get Backstage from Roadie]]></title><link>https://roadie.io/blog/10-reasons-to-get-backstage-from-roadie/</link><guid isPermaLink="false">https://roadie.io/blog/10-reasons-to-get-backstage-from-roadie/</guid><pubDate>Thu, 27 Jan 2022 16:00:00 GMT</pubDate><description><![CDATA[From getting started faster to growing adoption faster. We believe Roadie is the right choice for many companies. Here's why.]]></description><content:encoded><![CDATA[<p>At Roadie, we ❤️  Backstage. We’ve talked to countless companies who have adopted and are getting value from the technology.</p><p>Self-hosted Backstage remains the right choice for many organizations. Large organizations with thousands of developers and abundant resources will be able to staff a team to deploy, customize and manage Backstage.</p><p>For companies with a few hundred developers, it’s not so easy. Every engineering hour spent on internal tools is an hour that could be spent delivering customer-facing value. We believe Roadie is the right answer in this situation.</p><p>Below are 10 reasons why Roadie might be right for your company.</p><h2>1) Roadie cuts the time to value</h2><p>Internal friction on the path to production can mean it takes longer than you expect to get a production deployment of self-hosted Backstage running. Consideration must be given to the TechDocs pipeline, search, authentication, config management and other complications.</p><p>When adopting new tools, speed and momentum are important. If the migration and ramp up takes too long, the initiative will lose steam before it even gets going.</p><p>Before joining Roadie, software engineer Miklós Kiss worked on the Backstage implementation at Prezi. He spent multiple weeks working with a colleague to get Backstage deployed to production there.</p><blockquote><p>A big chunk of the time went to understanding what Backstage is, how it works, what plugins are and how to utilize them. We spent time fighting with the GitHub rate limit, figuring out the authentication, adding telemetry into it etc. It was a lot of work!</p><p>Miklós Kiss</p></blockquote><p><strong>With Roadie, in less than an hour you can go from clicking the “Request a free trial” button to having a catalog populated with components, basic TechDocs for documentation, and plugins installed and integrated.</strong></p><h2>2) We handle the upgrades</h2><p>Once you’ve gotten it running, you have to keep it recent. We’ve all seen examples of self-hosted software which is way out of date and generally unmaintained.</p><p><strong>At Roadie, we upgrade every Backstage instance approximately once per week, and you’re typically not much more than 2 weeks behind the latest release.</strong></p><p>We can do this in a cost effective way because we have economies of scale working in our favor. But it costs valuable engineering time each week, and you may want to keep your engineers focussed on customer facing work.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/hUaXYsZSlZ0vWzUKSURe9/c9bf2f839b6befbdb29277adb5139582/intercom-inbound.png" alt="intercom-inbound"></p><p>Upgrades can and do go wrong. At Roadie, we perform automated and manual verifications against each release to try and ensure it is going to roll out cleanly. Broken features cause frustrated users, and nobody wants to adopt a tool they believe will be flakey.</p><h2>3) We help you <em>adopt</em> Backstage</h2><p>Of course, <strong>deployment and maintenance is only half the battle.</strong> The other side of the challenge is adoption. We’ve built features into Backstage to help your engineering teams get the most from the technology.</p><p>For example</p><ol><li>We automatically syntax check and validate your Backstage metadata YAMLs before they’re ingested. This catches errors which can prevent components from showing up in the catalog as expected.</li><li>We provide a Locations Log where users can go to understand why their components are not appearing as they expect.</li><li>We’ve produced <a href="/docs/getting-started/overview/">simple, user focussed documentation</a> which helps your end users get started quickly.</li></ol><p>These measures work together with a healthy dose of support provided by our customer communications channels. We’d like your org to get maximum value from Backstage while ensuring you don’t need a full-time staff employed to answer questions from your engineers.</p><p>If you want to learn more about the features Backstage provides, check out our <a href="/backstage-spotify/">Ultimate Guide to Backstage by Spotify</a>.</p><h2>4) We prioritize security</h2><p>We take security seriously at Roadie. Our founding team comes from enterprise companies like Workday and Spotify and we understand what it takes to keep data and processes safe.</p><p>Some of our security measures include:</p><ol><li>Thoughtful and careful design of every part of our architecture. For example, you can read about <a href="/blog/avoid-leaking-github-org-data/">how we designed our GitHub integration</a> to prevent cross-org access.</li><li>Multiple GitHub apps so you can choose the level of access you grant to Roadie Backstage.</li><li>Frequent and extensive third-party pen-testing. We often provide the reports to security teams as we go through procurement.</li><li>Running certification programs. We’re working on SOC 2 Type II and expect to be certified by September 2022.</li></ol><p><strong>At Roadie, we’re always working to improve security.</strong> Please <a href="/request-demo/">request a demo</a> if you would like to hear more about what we’re doing to keep customer data safe.</p><h2>5) You don’t have to edit the code</h2><p>Many people expect that the Backstage repository works like a standard UI application. You clone the repository, run it and start using it immediately.</p><p>In reality, it’s more like <a href="https://github.com/facebook/create-react-app">create-react-app</a>. It’s a framework or set of components and plugins that you can compose together to make a developer portal for your organization.</p><p>This means that changes like adding a plugin to the Backstage interface require editing the code, committing and re-deploying.</p><p><strong>At Roadie, we’ve built a drag and drop composer on top of the normal Backstage plugins, so adding a plugin takes a couple of clicks.</strong></p><p>Configuration is handled in a similar way. Want to set up the Kubernetes plugin? Just head to the administration area and add a cluster via the UI.</p><h2>6) 22+ plugins work straight out of the box</h2><p>Backstage wouldn’t be much without its plugins. From TechDocs documentation to Kubernetes integration, it’s the plugins which give Backstage much of its discoverability value and power.</p><p><strong>We support all of the best Backstage plugins and if there is something you don’t see, we typically integrate it for you within a couple of hours.</strong></p><p>Not only do we support all the best plugins, <a href="https://github.com/RoadieHQ/roadie-backstage-plugins">we actually built some of them</a>. We’ve created 12+ open-source Backstage plugins which are free for the community to use. Our open-source team is always looking for inspiration, and we frequently take customer feedback on board when deciding where to focus our efforts.</p><h2>7) You can bring your own plugins</h2><p>Every company has home-grown tools and technologies that only make sense in the context of the place they were invented. Sometimes our customers want to build Backstage plugins around these tools so they can be more easily discovered by other engineers in their company.</p><p><strong>At Roadie, every Growth Plan customer gets a private artefact repository where they can publish vanilla Backstage plugins.</strong> These plugins integrate and run inside Roadie Backstage just like all the normal open-source plugins. You don’t have to do any special magic to your plugins to make this work. Just <code>npm publish</code> them using your normal <code>npm</code> workflow, and we handle the rest.</p><h2>8) We track the community</h2><p>With 50+ pull requests being merged into the project each week, hardly a week goes by when there isn’t a new feature or plugin released.</p><p>Your teams have customer-facing work they’re trying to get done and they won’t have time to follow all of this work and understand how they can integrate it and get value from it.</p><p>At Roadie, we eat, drink and breathe Backstage, so we know what’s happening. <strong>We uptake and integrate significant new features for you, so you can stay focussed on what you do best.</strong></p><p>If you do want to keep your finger on the pulse, we publish a <a href="/backstage-weekly/">regular newsletter which tracks the project</a> which can help you stay up to date with the most important changes.</p><h2>9) We’ve got the scaffolder</h2><p>Early versions of Roadie didn’t have the scaffolder because we knew it needed special consideration to run safely in a multi-tenanted environment.</p><p>After months of hard work we’re ready to make the scaffolder broadly available. In early 2022, we’re making the scaffolder generally available on Roadie.</p><p>By making it easier to start new projects, your engineers get to the good part of coding features faster. And your organization’s best practices are built into the templates, encouraging standards and reducing complexity in your tech ecosystem.</p><h2>10) We’re here to support you</h2><p>We understand that most teams don’t want to go it alone. That’s why we do our best to support your company on its Backstage journey.</p><p>Every customer gets a dedicated support channel in Slack or Discord. If something is not working as expected, we’re there to help you debug it. Anyone in your company is free to join the conversation.</p><p>We also meet each one of our customers on a regular cadence so they have a place to make requests, get support, and influence our roadmap. Feedback delivered in these meetings feeds directly into our planning process.</p><p>We’re constantly working to improve Roadie. Want to hear more about an item on the list, or ask us anything at all, why not <a href="/request-demo/">Request a Demo</a> on our website.</p>
]]></content:encoded></item><item><title><![CDATA[Roadie's response to recent log4j vulnerabilities]]></title><link>https://roadie.io/blog/roadies-response-log4j-vulnerabilities/</link><guid isPermaLink="false">https://roadie.io/blog/roadies-response-log4j-vulnerabilities/</guid><pubDate>Wed, 22 Dec 2021 16:00:00 GMT</pubDate><description><![CDATA[Roadie is not impacted by the log4j vulnerabilities, CVE-2021-44228 or CVE-2021-45046, also known as log4shell.]]></description><content:encoded><![CDATA[<p><strong>Roadie is not impacted by the log4j vulnerabilities, <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228">CVE-2021-44228</a> or <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-45046">CVE-2021-45046</a>, also known as log4shell.</strong></p><p>On December 9th, 2021 <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228">CVE-2021-44228</a> was announced, impacting versions 2.x of log4j (also known as log4j2). This issue was believed to be fixed in log4j 2.15.0, however on December 14th, 2021 <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-45046">CVE-2021-45046</a> was announced, and log4j 2.16.0 was released, fixing the additional exploitation vectors.</p><p>Roadie is written in TypeScript and JavaScript and therefore does not make use of the Java logging library, log4j or the Java Virtual Machine. There is one component in our stack, PlantUML, which is written in Java, but it <a href="https://forum.plantuml.net/15151/is-plantuml-affected-by-log4j-security-vulnerability">does not make use of log4j</a>.</p><h2>SaaS</h2><p>Roadie’s SaaS platform was not impacted by the log4j vulnerabilities. As a TypeScript application, we do not make use of log4j directly. While thoroughly examining our cloud environment, we determined that we are not running any impacted software in a way that is publicly available.</p><p>We have taken the following steps to ensure our infrastructure is not vulnerable:</p><ol><li>Audited our cloud environment to ensure we are not running log4j in any application code directly.</li><li>Upgraded all AWS EC2 Node Groups to the latest AMI version provided by Amazon.</li><li>Hotpatched all AWS ECS containers with the mitigations provided by Amazon.</li><li>Audited our sub-processors to ensure they are taking steps to mitigate the vulnerability in their own software stacks.</li></ol><p>Links to sub-processor responses:</p><ol><li><a href="https://aws.amazon.com/blogs/security/using-aws-security-services-to-protect-against-detect-and-respond-to-the-log4j-vulnerability/">AWS</a> - upgrades applied</li><li><a href="https://auth0.com/blog/auth0s-response-to-log4j/">Auth0</a> - not vulnerable</li><li><a href="https://security.googleblog.com/2021/12/apache-log4j-vulnerability.html">Google Analytics</a> - not vulnerable</li><li><a href="https://blog.sentry.io/2021/12/15/sentrys-response-to-log4j-vulnerability-cve-2021-44228">Functional Software</a> - not vulnerable</li><li><a href="https://amplitude.com/blog/log4j-vulnerability">Amplitude</a> - upgrades applied</li><li><a href="https://www.intercomstatus.com/">Intercom</a> - upgrades applied</li></ol><h2>Open Source</h2><p>Roadie’s OSS code is not impacted by the log4j vulnerabilities. As TypeScript applications, our Open Source code does not make use of log4j directly.</p>
]]></content:encoded></item><item><title><![CDATA[OAuth Token Exchange: AWS → GCP]]></title><link>https://roadie.io/blog/aws-gke-token-exchange/</link><guid isPermaLink="false">https://roadie.io/blog/aws-gke-token-exchange/</guid><pubDate>Fri, 03 Dec 2021 21:00:00 GMT</pubDate><description><![CDATA[How to exchange short lived tokens between AWS and GCP]]></description><content:encoded><![CDATA[<p>When working with multiple cloud providers, it can often become difficult to manage authentication. Even more so with inter communication. In this blog post, I will talk about my experience with negotiating AWS identity tokens for GCP OAuth tokens.</p><p>Normally, when trying to gain access to another AWS account, we use cross account federation. With this cross account federation, we authorize access to certain AWS principals (roles etc). This is done by assuming a "role". This "role" is exclusively controlled by the owner's account. The account owners can determine exactly what access the external account has. With this, we are able to provide a secure way for two (or more) AWS accounts to communicate with each other.</p><p>Now between cloud providers, this is a lot more complicated. Each cloud provider has their own method of authentication as well as authorization. This is where the difficulty lies when trying to exchange an AWS role identity token for a GCP token.</p><p>Thankfully AWS provides a service that allows us to add authentication to our API requests through HTTP. It adds AWS specific headers/query params, that are then used to confirm the identity of the request.</p><h1>GetCallerIdentity</h1><p>In a lot of cases when working with cloud providers, it is difficult to grasp exactly the identity a service might be using. In many cases an identity may change due to a specific behaviour. AWS provides an easy mechanism for this and it is controlled by the Security Token Service (or STS). More specifically the GetCallerIdentity API. This here returns details on the caller. This includes the unique Identity and Management (IAM) name (ARN). Using this ARN, we are able to pinpoint a user and or a service. This can be valuable when trying to confirm the identity of a user.</p><h1>GCP</h1><h2>Service account</h2><p>Service accounts in GCP is a concept that is shared throughout the GCP ecosystem. They are used to gain access to certain resources and have permissions attached to them. With service accounts, we are able to restrict and regulate actions that a particular user/service is allowed to do. For our investigation, we will be acting on behalf of a service account.</p><h2>Creating a service account</h2><p>For the sake of simplicity, I will be using the gcloud cli although this can very easily be configured through the GCP console.</p><pre><code class="language-bash">$ gcloud iam service-accounts create aws-service-account-demo \
  --description="A service account that AWS can access" \
  --display-name="aws-service-account-demo"
</code></pre><h2>Workload identities</h2><p>As stated before, using service accounts allows us to restrict access and assign specific permissions to a service or application. In order to gain access to these service accounts, we need some way of verifying our identity. This can be done in two ways, using long term tokens or short lived ones. For this document we will be using the short lived ones. This concept is referred to as workload identities federation.</p><h2>Enabling IAM services on your GCP project</h2><p>For the access token exchange flow to work, we must expose/enable services on our GCP project. This can once again be configured through the console but for simplicity, the gcloud cli is favoured.</p><pre><code class="language-bash">$ gcloud services enable sts.googleapis.com
$ gcloud services enable iamcredentials.googleapis.com
$ gcloud resource-manager org-policies allow constraints/iam.workloadIdentityPoolAwsAccounts \
    &#x3C;aws-account-id> --organization=&#x3C;gcp org></code></pre><p>Note: you will also need to ensure that you have the Workload Identity Pool Admin (roles/iam.workloadIdentityPoolAdmin) and Service Account Admin (roles/iam.serviceAccountAdmin) roles on the project.</p><h2>Creating a workload identity provider</h2><p>Here we will create a workload identity provider for our token exchange with AWS</p><p>First let's create the pool</p><pre><code class="language-bash">$ gcloud iam workload-identity-pools create aws-pool \
    --location="global" \
    --description="Workload identity pool for aws connectivity." \
    --display-name="AWS pool"
</code></pre><p>Then the provider</p><pre><code class="language-bash">$ gcloud iam workload-identity-pools providers create-aws aws-test-account \
    --location="global"  \
    --workload-identity-pool="aws-pool" \
    --account-id="&#x3C;your account>" \
    --display-name="Test AWS provider"  \
    --description="The Identity Provider for AWS test service"
</code></pre><p>Note: if you would like to be more explicit about what aws role can access this workload identity add the following (replacing the account name and role)</p><pre><code class="language-bash">--attribute-condition="'arn:aws:sts::000000000000:assumed-role/some-role' == attribute.aws_role"  \
</code></pre><h2>Combining workload federation and service accounts</h2><p>Now that we have the ability to gain an access token using the workload federation, we need to allow the workload provider to assume the service account role.</p><pre><code class="language-bash">$ gcloud iam service-accounts add-iam-policy-binding aws-service-account-demo@example-project.iam.gserviceaccount.com \
   --role roles/iam.workloadIdentityUser \
   --member "principalSet://iam.googleapis.com/projects/$gcp_project_number/locations/global/workloadIdentityPools/aws-pool/subject/arn:aws:sts::${aws_account_id}:assumed-role/$role_name"
</code></pre><p>Note: if you want to allow all resources in the workload identity pool to assume the service account replace the --member field with the following</p><pre><code class="language-bash">--member "principalSet://iam.googleapis.com/projects/$gcp_project_number/locations/global/workloadIdentityPools/aws-pool/*"
</code></pre><h1>Token exchange flow</h1><p>Combining all the steps configured and knowledge above, we are now ready to initiate the token exchange flow. Below is a flow chart of each of the steps.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6OFru54GarSFLEPbyA80t0/f6c0672fa0e5b783165281d4d87346b9/flow.png" alt="Flow diagram"></p><ol><li>The users logs in to their aws ecosystem and receives a set of aws specific temporary credentials</li><li>User signs their GetCallerIdenity request with their personal temporary credentials</li><li>User sends a POST request to GCP; <code>https://sts.googleapis.com/v1/token</code>. This contains all the custom signed credentials in the requested payload. This will be used to verify the user's identity.</li><li>GCP calls on the AWS STS api to verify the credentials in the payload. If the signed request matches the users identity, GCP will return a federated workload identity token (this is a one time token).</li><li>User now exchanges the one time token for an ephemeral service account OAuth token. This is done using a POST request to <code>https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/$SA-NAME@$PROJECT-ID.iam.gserviceaccount.com:generateAccessToken</code></li><li>User is now able to access all the resources the service account has access to.</li></ol><p>The following flow above is also demonstrated here in a <a href="https://github.com/RoadieHQ/poc-gke-token-exchange">POC Javascript project</a>.</p><p>We have also mocked up a library for you to use at your own discretion <a href="https://www.npmjs.com/package/cloud-token-exchanger">cloud-token-exchanger</a>.</p><h1>Resources</h1><p>Information on AWS <a href="https://docs.aws.amazon.com/STS/latest/APIReference/API_GetCallerIdentity.html">GetCallerIdentity</a></p><p>Information on AWS <a href="https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html">Signed requests</a></p><p>GKE workload identity federation <a href="https://cloud.google.com/iam/docs/workload-identity-federation#conditions">conditions</a></p><p>GKE workload identity federation <a href="https://cloud.google.com/iam/docs/configuring-workload-identity-federation#gcloud">configuration of pools</a></p><p>GKE workload identity federation <a href="https://cloud.google.com/iam/docs/using-workload-identity-federation#gcloud_1">token exchange flow</a></p>
]]></content:encoded></item><item><title><![CDATA[How to model monorepos in Backstage]]></title><link>https://roadie.io/blog/backstage-monorepo-guide/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-monorepo-guide/</guid><pubDate>Mon, 22 Nov 2021 21:00:00 GMT</pubDate><description><![CDATA[There are multiple different ways to represent monorepos in Backstage, each with its own setup, benefits and drawbacks. We show you how to do it in your situation.]]></description><content:encoded><![CDATA[<p>We've been onboarding an increased number of awesome engineering organisations to our SaaS Backstage platform recently, and one question comes up again and again... "Does Backstage support monorepos?"</p><p>The good news is that Backstage does support monorepos. In fact, there are multiple different ways to represent monorepos in Backstage, each with its own setup, benefits and drawbacks. This post will teach you everything you need to know to get your monorepo code loaded and represented the way you want.</p><p>This post will be applicable whether you're using the Roadie hosted Backstage platform or self-hosted Backstage.</p><p>Huge thanks to Enrique Amodeo Rubio, Staff Engineer at Contentful (<a href="https://www.linkedin.com/in/enrique-amodeo/">linkedin</a>). He did a lot of the hard work of testing the various monorepo representations in Backstage, and was kind enough to share some tips with us. These tips formed the basis of this guide.</p><h2>Combined vs Split monorepo representations</h2><p>There are two approaches to treating monorepos in Backstage. Combined and Split monorepos.</p><h3>Combined monorepos</h3><p>Combined monorepos present as a single entity in Backstage. When you look at your Backstage catalog, you'll see one row to represent the monorepo, regardless of the number of sub-components contained within.</p><p>It will only have one Backstage metadata file and one set of TechDocs which describe the entire monorepo.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7dkJdgCwojlNJxp6t9mafl/f0dc188124c7374032f0cf43b2f2a04a/combined-in-catalog.png" alt="A combined monorepo in the Backstage catalog. It has only one entry, named after the monorepo, not the components contained within."></p><h3>Split monorepos</h3><p>Split monorepos treat each component of the monorepo as an individual Backstage entity. A split monorepo will have multiple associated catalog entries, one for each sub-component within the monorepo. It will contain many Backstage metadata files and many sets of TechDocs.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4hzX16s3g3BhenjVd2ShbT/ecfb46697222066ab60ca5fbcb027aa9/split-in-catalog.png" alt="A split monorepo in the Backstage catalog. There is one entry for each component in the monorepo."></p><h2>Which option should I use?</h2><p>Combined monorepos make sense when the entire monorepo is owned by a single team. The Backstage project itself is perhaps a good example of this. It's a relatively large monorepo but it's owned by a single team of maintainers.</p><p>The combined monorepo has a single place in Backstage to find the documentation for all of the sub-components. if the components are tightly coupled or frequently used together, this approach might make it easier to browse all of the docs at once.</p><p>Split monorepos make sense when different components within the monorepo are owned by different teams. For example, a monorepo which co-locates many different backend services which expose different HTTP APIs and are owned by different teams within a company.</p><p>Split monorepos also make more sense when each component in the monorepo exposes its own HTTP API spec.</p><h4>Summary</h4><ul><li>Use combined monorepos when the monorepo contains tightly coupled components, all which expose one or zero HTTP APIs, and are owned by a single team.</li><li>Use split monorepos when the monorepo contains loosely coupled components which each have their own HTTP API and their own owners.</li></ul><h2>Setting up your YAML files</h2><h3>Combined monorepo setup</h3><p>The combined monorepo representation is easier to set up in Backstage because it requires less YAML configuration.</p><p>Simply create a top level <code>catalog-info.yaml</code> file, of the <code>Component</code> kind, in the root of the monorepo. Name it after the monorepo.</p><pre><code class="language-yaml">---
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: combined-monorepo
  description: All our components represented as a monorepo
  annotations:
    github.com/project-slug: RoadieHQ/sample-combined-monorepo
spec:
  type: service
  owner: engineering
  lifecycle: production
</code></pre><p>Here's a <a href="https://github.com/RoadieHQ/sample-combined-monorepo">public GitHub repository</a> which demonstrates this setup.</p><h3>Split monorepo setup</h3><p>The split monorepo setup uses a single metadata file with the <code>Location</code> kind in the root of the monorepo, and many metadata files with <code>Component</code> kind in the subdirectories. The <code>Location</code> acts as a pointer to each of the components in the sub-directories, insuring they can be managed from a single location.</p><p>Assuming we have a monorepo structure something like this:</p><pre><code>.
└── services
    ├── banana-service
    │   └── src
    └── pricing-service
        └── src
</code></pre><p>Then we would create one metadata file for each component and co-locate it with the component code. In this example they are called <code>backstage.yaml</code> files.</p><pre><code>.
└── services
    ├── banana-service
    │   ├── backstage.yaml
    │   └── src
    └── pricing-service
        ├── backstage.yaml
        └── src
</code></pre><p>Then we would create a metadata file containing a <code>Location</code> at the root of the repo:</p><pre><code class="language-yaml">---
apiVersion: backstage.io/v1alpha1
kind: Location
metadata:
  name: split-monorepo
spec:
  type: url
  targets:
    - ./services/pricing-service/backstage.yaml
    - ./services/banana-service/backstage.yaml
</code></pre><p>Using this setup, each team can independently manage their own <code>backstage.yaml</code> files and individual components can be added or removed from the Backstage catalog simply by updating the <code>catalog-info.yaml</code> file in the root of the monorepo.</p><p>Here's an example of a <a href="https://github.com/RoadieHQ/sample-split-monorepo">monorepo set up with the split monorepo representation</a>.</p><h2>Using TechDocs in monorepos</h2><p>TechDocs is used slightly differently in each of the two possible representations, and the results can be quite different.</p><h3>Combined monorepo representation</h3><p>The combined monorepo representation makes use of the <a href="https://github.com/backstage/mkdocs-monorepo-plugin">mkdocs-monorepo-plugin</a> created by Spotify. This plugin supports having multiple sets of MkDocs TechDocs within one monorepo.</p><p>Within Backstage, the TechDocs automatically render with a nested sidebar so the reader can browse through the documentation for each component in one place. Here we can see two services, the calculator and candle service, represented in the documentation of the Combined monorepo.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1LNN9GfWeU0BeaPjhtjW5k/1ea71142600104bc22d490a24053cdb7/combined-sidebar-docs.png" alt="A table of contents with nested headings in the table. The nested parts are indented to indicate that they are sub-headings."></p><p>To set up TechDocs in the combined monorepo fashion, create a <code>docs</code> directory and <code>mkdocs.yml</code> file in the sub-directory of each component.</p><pre><code>├── services
    ├── calculator-service
    │   ├── docs
    │   ├── mkdocs.yml
    │   └── src
    └── candle-service
        ├── docs
        ├── mkdocs.yml
        └── src
</code></pre><p>The markdown documentation files live in each <code>docs</code> directory and the <code>mkdocs.yml</code> file points to them as normal.</p><pre><code class="language-yaml"># Note: Whitespace is not currently supported in this site_name
site_name: calculator-service

nav:
  - Home: index.md

plugins:
  - techdocs-core
</code></pre><p>To create the nested sidebar effect, create one more <code>mkdocs.yml</code> file in the root of the monorepo, at the same level as the <code>catalog-info.yaml</code>.</p><p>In it, include the <code>monorepo</code> plugin and use the <code>!include</code> directive to pull in each of the <code>mkdocs.yml</code> files in the sub-directories.</p><p>As a bonus, you can also reference mardown files in a <code>docs</code> directory at the root of your monorepo, as we are doing below. These root level might be a good place to talk about the nature of the monorepo and the components contained within.</p><pre><code class="language-yaml">site_name: Root docs

nav:
  - Home: index.md
  - Subdirectory docs:
    - Calculator Service: '!include ./services/calculator-service/mkdocs.yml'
    - Candle Service: '!include ./services/candle-service/mkdocs.yml'

plugins:
  - monorepo
  - techdocs-core
</code></pre><p>Lastly, add the <code>techdocs-ref</code> annotation to the <code>catalog-info.yaml</code> file in the monorepo.</p><pre><code class="language-yaml">---
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: combined-monorepo
  description: A combined monorepo
  annotations:
    # ..
    backstage.io/techdocs-ref: dir:.
spec:
  type: service
  owner: engineering
  lifecycle: experimental
</code></pre><h3>Split monorepo representation</h3><p>As we saw in the introduction, the split monorepo representation results in each monorepo component having its own entity in Backstage. As you can imagine, each component gets its own set of TechDocs, just like a non-monorepo component would.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6qffnQmCqLekUmm6rAa7NG/031ea58bba5da1d634d1dfb0f8b13bc9/banana-techdocs.png" alt="Some documentation rendered in Backstage. There is a heading and some text. There is also a sidebar with a table of contents."></p><p>To set up docs in the split monorepo fashion, simply create an <code>mkdocs.yml</code> file and <code>docs</code> directory in the sub-directory of each component.</p><pre><code>.
└── services
    ├── banana-service
    │   ├── backstage.yaml
    │   ├── mkdocs.yml
    │   ├── docs
    │   └── src
    └── pricing-service
        ├── backstage.yaml
        ├── mkdocs.yml
        ├── docs
        └── src
</code></pre><p>The markdown documentation files live in each <code>docs</code> directory and the <code>mkdocs.yml</code> file points to them as normal.</p><pre><code class="language-yaml">site_name: Pricing Service

nav:
  - Home: index.md

plugins:
  - techdocs-core # required to style your docs like Backstage
</code></pre><p>Don't forget to add the <code>techdocs-ref</code> annotation to each <code>backstage.yaml</code> file.</p><pre><code class="language-yaml">apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: pricing-service
  title: Pricing service
  description: Component for Pricing service
  annotations:
    # ...
    backstage.io/techdocs-ref: dir:.
spec:
  type: service
  owner: engineering
  lifecycle: production
</code></pre><h2>Conclusion</h2><p>Whether you end up using the combined or split monorepo representation, Backstage can certainly support your needs.</p><p>Have you got other tips for using monorepos with Backstage? We'd love to mention them here and credit you. Please email <a href="mailto:support@roadie.io">support@roadie.io</a> with your ideas.</p>
]]></content:encoded></item><item><title><![CDATA[Plugins migration to monorepo]]></title><link>https://roadie.io/blog/monorepo-plugins-migration/</link><guid isPermaLink="false">https://roadie.io/blog/monorepo-plugins-migration/</guid><pubDate>Mon, 23 Aug 2021 16:00:00 GMT</pubDate><description><![CDATA[Major change in the way we handle our plugins.]]></description><content:encoded><![CDATA[<p>Contributing to the Backstage community has been one of the top goals in our roadmap. We have focused on developing plugins for developers with the goal of making their job more efficient. Over time, we produced multiple plugins contained within their own repositories. This is sometimes referred to as a multirepo approach as opposed to a monorepo with a single repository that contains multiple plugins. Our multirepo setup was a reasonable approach to begin with.</p><p>Although a number of teams have embraced monorepos, there are reasons why we have stayed away up until now. We started to face challenges with the increasing number of plugins that we maintain. One of the main challenges was with dependency management across all of our repositories which eventually became very complex. Instead, we wanted to have an automated, simple solution that would not be so time consuming and would give us a solid ground for additional features we have in mind. So, we made a decision to migrate all of our plugins to the <a href="https://github.com/RoadieHQ/backstage-roadie-plugins">RoadieHQ/backstage-roadie-plugins</a> monorepo.</p><h2>Improvements</h2><p>There are a number of improvements we introduced by moving to monorepo.</p><h1>1) Better control of dependency management.</h1><p>As mentioned previously, we wanted to simplify internal and third-party dependency management. Having plugins in different repositories raised concerns about having diamond dependency conflicts and challenges of having different versions of the same dependency in different repositories.</p><p>Testing specific versions of a dependency is easier because it gives us the ability to test for breaking changes and backwards compatibility across the entire codebase when an update is needed. It is easier and faster to follow Backstage team updates so that we can make sure our plugins work with the latest versions of the Backstage packages.</p><h1>2) Better visibility of all the plugins.</h1><p>It is easier for contributors to test against other plugins and possibly make multiple plugin changes in a single commit or pull request. It can also help encourage more collaboration and code reuse.</p><h1>3) One place to store all configs and tests.</h1><p>We can reuse and improve CI/CD configuration and tests across all of our plugins at the same time without needing to have  separate and sometimes duplicated configuration per plugin.</p><h1>4) Easier to keep track of upstream updates.</h1><p>We created a workflow that runs periodically to check for updates from the Backstage team. The workflow automatically creates a pull request with updates to the package versions for all of the plugins. The workflow also runs checks to ensure everything works as expected once the changes are merged to the main branch.</p><h2>Challenges</h2><p>The monorepo approach is not withouts its challenges. We believed we would stumble across a few, especially in terms of building and publishing packages.</p><h1>1) Build Pipelines</h1><p>Ensuring builds are efficient and practical is a challenge regardless of the team size or codebase. The monorepo approach results in a lot of source code in one place. We recognized that it may take more time for CI to run all required tasks in order to approve every pull request. Ultimately, we did not see a substantial increase to build time for our monorepo..</p><h1>2) Manage publishing of the packages</h1><p>Although all plugins are contained within a single source code repository, each plugin is individually published to NPM. We needed a tool that would allow us to publish multiple packages but also optimize the workflow to ensure only packages that have changes are published.</p><p>We decided to use <a href="https://lerna.js.org/">Lerna</a> to manage our monorepo. We settled on a semi-automatic build and publish workflow. The package versioning is done manually and the publishing is done automatically. Lerna helps with detecting changes in the packages and only publishes the ones that have updated versions.</p><h2>Conclusion</h2><p>All of the plugins we developed and maintain are gradually being migrated to the <a href="https://github.com/RoadieHQ/backstage-roadie-plugins">RoadieHQ/backstage-roadie-plugins</a> repository.</p><p>Plugin users will not notice any difference with how they consume our plugins from NPM. This migration does make a difference for plugin contributors. You can read more about contributing in our <a href="https://github.com/RoadieHQ/roadie-backstage-plugins/blob/main/CONTRIBUTING.md">CONTRIBUTING.md</a> file.</p><p>This type of structural change is always a bit difficult at the start but we are confident it will result in a better experience for our plugin users. We always welcome contributions to our plugins and hope that this change will also make it easier to contribute.</p>
]]></content:encoded></item><item><title><![CDATA[GitHub Apps - How to avoid leaking your customer’s source code with GitHub apps]]></title><link>https://roadie.io/blog/avoid-leaking-github-org-data/</link><guid isPermaLink="false">https://roadie.io/blog/avoid-leaking-github-org-data/</guid><pubDate>Thu, 19 Aug 2021 11:16:00 GMT</pubDate><description><![CDATA[How to avoid leaking your customer’s source code with GitHub apps.]]></description><content:encoded><![CDATA[<p>Security, tenant isolation and protecting our customer’s intellectual property is important to us at Roadie. While investigating options for integrating with GitHub APIs we recognized that you have to work hard to do it securely. There are a number of ways to access GitHub APIs. It is quite easy to integrate with them incorrectly and potentially leak data between customers.</p><p>In fact we found (and reported) a vulnerability in a handful of major SaaS products that allowed a user to access the resources of an organisation that the user was not a member of.</p><p><strong>Using default settings with GitHub Apps may put you at risk of leaking data between GitHub App installations.</strong></p><p>Roadie provides its customers a hosted and managed Backstage environment. Backstage is a platform that helps you build developer portals on top of a centralised software catalog. Your developers can extend Backstage by creating new or customizing existing frontend and backend plugins in order to build a developer portal that meets your user’s needs. Read our <a href="/backstage-spotify/">Ultimate Guide to Spotify Backstage</a> to learn more.</p><p>Backstage is a three tier application. The frontend tier runs in the browser and is built with React. The backend tier is built with Express running on Node.js. This tech stack ensures developers have access to a large community and ecosystem of packages and tools that make it even easier to extend their Backstage implementations.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/tYD8is9lJJVPfFD5wYv2q/2b743ebf665cf731e23d6d6638bf44d7/three-tier.png" alt="Backstage &#x26; GitHub"></p><p>The Backstage backend includes the <em>Backstage Software Catalog</em> to help bring visibility to your software components. The software catalog is a core component of Backstage. It can discover and index Git repositories hosted on GitHub and GitHub Enterprise using GitHub REST and GraphQL APIs. Indexed data is exposed via backend REST APIs to the frontend.</p><p>Backstage plugins can be added to the frontend to extend the core software catalog and integrate with external services. For example, the <a href="https://roadie.io/backstage/plugins/github-pull-requests/">GitHub Pull Request plugin</a> allows you to see a list of open pull requests associated with an item in the software catalog. Frontend plugins such as this one can use GitHub APIs to retrieve information and render it directly in Backstage. This is just one very specific yet common use of GitHub.</p><p>When it comes to integrating with GitHub APIs, there are at least six ways to authenticate. Some of these authentication methods are suitable for the frontend, the backend or both. Let’s go through each.</p><p>|Type   | Frontend   | Backend |
|---|---|---|
| Personal Access Token (PAT) | No | Yes |
| GitHub OAuth Apps | Yes | No |
| GitHub Apps acting as a GitHub app itself | No | No |
| GitHub Apps acting as an Installation | No  | Yes |<br>
| GitHub Apps acting as an OAuth provider | Yes | No |
| Anonymously | Yes | Yes |</p><h2>Personal Access Token (PAT)</h2><p>Any GitHub user can create a Personal Access Token (PAT) via their GitHub developer settings. The user chooses a set of permissions to allow for the token. This token can be used to access any resource from the GitHub API on behalf of that user. It is a sensitive, long-lived token that should not be used in a frontend application. Technically it could be used on a backend application to retrieve data from GitHub APIs. However, this approach is concerning from a security perspective. The backend would essentially be using credentials that allow it to act on behalf of the token’s owner. This is not desirable because the backend, and in turn its token, is shared by all users of the application. In order to mitigate this, you could create a special user in GitHub, and create a PAT for the application. This can be difficult to manage and maintain.</p><h2>GitHub OAuth Apps</h2><p>GitHub provides a way to create an OAuth app that can be used to login application users via a web frontend. Let’s disregard the details about how the OAuth token negotiation works in this article. Effectively, the frontend application sends the user to GitHub to get a token from the OAuth app. The user is then redirected back to the frontend application with a code that can be exchanged for a token. This token can be used by the frontend application to call GitHub APIs on behalf of the user. Any permissions that apply to the user also apply to the API requests made with the token. This is a reasonable mechanism for frontend only uses.</p><h2>GitHub Apps</h2><p>GitHub allows developers to create what is referred to as a GitHub app. A GitHub app can be installed on a GitHub organization or a personal GitHub account. Once installed, the GitHub app can request a new token for each <em>installation</em> of the app.</p><p>We believe that although it can be difficult to implement correctly, GitHub Apps is the best way to provide GitHub API access to a backend application.</p><p><strong>“GitHub Apps is the best way to provide GitHub API access to a backend application”</strong></p><p>As we mentioned earlier and alluded to in the title, it can be difficult to get GitHub Apps configured in a way that ensures customer isolation between installations of a GitHub App. This is especially true for a multi-tenanted application as is the one we are providing to our customers.</p><p>So let’s take a look at the three ways a GitHub app can be used to authenticate to GitHub APIs:</p><h3>Acting as a GitHub app itself</h3><p>The GitHub App has a private key that is used to generate a GitHub App token. This token can be used for a subset of the GitHub APIs. One of the available APIs can be used to retrieve a list of its app <em>installations</em> and request GitHub to generate a token for each installation. This GitHub App private key is very sensitive. Suppose your service has two customers, and two installations of the GitHub app. Technically speaking, that private key can be used to retrieve a token for both customers and then read and write data for both customers with GitHub APIs. As such the token should only be used minimally for the purposes of retrieving an installation token.</p><p>In order to generate a GitHub App token, the GitHub App encodes a JWT token with the GitHub App ID and signs it with the private key of the GitHub App. Here is what it looks like in Ruby:</p><pre><code class="language-ruby">GITHUB_APP_PRIVATE_KEY_FILE = "private-key.pem"
GITHUB_APP_ID = "12345678"

private_key = OpenSSL::PKey::RSA.new(File.read(GITHUB_APP_PRIVATE_KEY_FILE))

payload = {
  iat: Time.now.to_i - 60,
  exp: Time.now.to_i + (10 * 60),
  iss: GITHUB_APP_ID
}

GITHUB_TOKEN = JWT.encode(payload, private_key, "RS256")
</code></pre><p>The token can then be used to list installations of that GitHub App:</p><pre><code class="language-bash">curl -X GET https://api.github.com/app/installations \
     -H "Authorization: Bearer ${GITHUB_TOKEN}"
</code></pre><pre><code class="language-json">[
  {
    "id": 12345678,
    "account": {
      "login": "AcmeInc",
      "id": 12345678,
      …
    }
  },
  {
    "id": 12345679,
    "account": {
      "login": "SomeCorporation",
      "id": 12345679,
      …
    }
  }
]
</code></pre><p>And retrieve a token for a specific GitHub App installation:</p><pre><code class="language-bash">curl -X POST https://api.github.com/app/installations/12345678/access_tokens \
     -H "Authorization: Bearer ${GITHUB_TOKEN}"
</code></pre><pre><code class="language-json">{
  "token": "ghs_&#x3C;redacted>",
  "expires_at": "2021-08-17T13:16:07Z",
  "permissions": {
    "members": "read",
    "organization_administration": "read",
    "actions": "read",
    "contents": "read",
    "metadata": "read",
    "security_events": "read"
  },
  "repository_selection": "selected"
}
</code></pre><p>Once the GitHub App has retrieved a token for a specific installation, it can call GitHub APIs. The set of APIs that it is allowed to access is configured in the GitHub App and requested during the installation of the app. <strong>This installation-specific token should only be used in a backend.</strong></p><p>It is incumbent upon the GitHub App owner to make sure that the resources retrieved from one installation are only available to members of the organization of the installation.</p><h3>Acting as an OAuth provider</h3><p>The GitHub App can also act as an OAuth provider, so that users of the backend can retrieve a token in a web frontend. The GitHub App’s settings contain the credentials required to allow the web frontend to login users with GitHub in the same way that the GitHub OAuth App does. Once the user’s browser has retrieved the token from GitHub, the web frontend application can gather data and perform actions on behalf of the user in GitHub.</p><h3>Acting as an Installation</h3><p>We saw that the GitHub App’s private key can be used to retrieve an access token for any installation. So how do developers of large multi-tenanted SaaS software make sure that users are only accessing data from installations that they are supposed to? For example, if one of the customers chooses to install the GitHub App, then no other customers should be able to see that customer’s data.</p><p><strong>“Avoid using the Setup URL Callback and validate the ownership of installations ids”</strong></p><p>So how do we make sure that a customer is allowed to use a GitHub App installation?</p><p>GitHub Apps allow a developer to provide a URL to which the user is redirected to after installation. The setting is called “Setup URL”. When the GitHub App installation completes the user is redirected to the Setup URL with the id of the installation.</p><p>The problem is that GitHub does not provide any means for the installing application to verify the ownership of the installation. Installation ids are 8 digits and not considered secure. They can be guessed easily. Consider the highlighted message in the sequence diagram below. In this setup, it is very easy to bombard the application with any guessed or known installation id.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5GBU8v4CKyJut1XQwk2FBV/2e8e0ed04483bd38e2ebb6c22b03a251/setup-url-flow.png" alt="Setup URL Flow"></p><p>We found (and reported) this vulnerability in a handful of major SaaS products. This would have allowed us to access other GitHub organizations.</p><p>The way to ensure that this does not occur is to verify the user’s identity and organisations and compare whether or not they are allowed to access/install the app. This can be done by setting “Request User Authorization'' and providing a “Callback URL” on your app.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5PZdYXTf7skzwIy0ReWt52/224fa1405e51f696cdf1c6bbdb909eb1/request-user-auth.png" alt="Request User Auth"></p><p>With this setting enabled, users are forced to login and the callback contains a code that can be exchanged for an auth token for the user. The backend should use this code to validate that the user is allowed to access this installation.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/Bi61t2gl2pvspkFjLZPEX/c3e0aa82abe807454303f9f0c3126af0/callback-url-flow.png" alt="Callback URL Flow"></p><p>So what does this validation look like? Here is what we have done in our backend Express application:</p><pre><code class="language-tsx">router.post('/installations', async (req, res) => {
 const code = req.query.code as string;
 const installationId = req.query.installation_id as string;
 const setupAction = req.query.setup_action as string;

 if ((!setupAction || setupAction !== 'install')) {
   logger.error(`Action is not of type 'install'. Got: ${setupAction}`);
   httpResponse = 400;
 } const userGitHub: Octokit = (await Octokit.auth({
   type: 'oauth-user',
   code: code,
   factory: (options: OAuthAppAuthOptions) => {
     return new Octokit({
       authStrategy: createOAuthUserAuth,
       auth: options,
     });
   },
 })) as Octokit;

 const { data } = await userGitHub.request('GET /user/installations');

 if(data.installations.some(installation => installation.id === installationId)) {
  // The following is pseudocode for storing the installation Id.
  database.saveInstallationId(installationId)
  res.sendStatus(201);
 } else {
  res.sendStatus(403);
 }
}
</code></pre><p>This works because we have ensured that the request contains a code. Only a user who has logged into GitHub could have access to this code, and then we can validate that the user is a member of this organization before persisting the installation.</p><h2>In conclusion</h2><p>If you are rolling out features that require GitHub API access to your customers, be mindful of how you are doing it. We hope you will appreciate how easy it is to unintentionally  and unexpectedly expose a customer’s GitHub data to unauthorized users.</p>
]]></content:encoded></item><item><title><![CDATA[Backstage TechDocs - How to embed lucid chart diagrams]]></title><link>https://roadie.io/blog/how-to-embed-diagrams-in-techdocs/</link><guid isPermaLink="false">https://roadie.io/blog/how-to-embed-diagrams-in-techdocs/</guid><pubDate>Tue, 03 Aug 2021 10:30:00 GMT</pubDate><description><![CDATA[TechDocs converts markdown files to Backstage docs so your engineering teams can find them and it is very useful. But how do we embed diagrams from lucid chart.]]></description><content:encoded><![CDATA[<p>TechDocs is the core Backstage feature which transforms markdown documentation into HTML and displays it inside Backstage where your engineering teams can find it.</p><p>You can easily embed diagrams from lucid charts and other external sources in techdocs. Start by exporting the generated iframe from the external application. For example if you are using lucid charts you can click the Share button in the top right.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2fZlhF7uWTL3xVX14SVHpp/b44e60471174cdf17701de540dfceb35/button.png" alt="button"></p><p>This will show a dialog as follows.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2UZpVMPlf8xwn2ZuAEf3CQ/56973ae195dc01804c1bf9f5ceebb2f2/dialog.png" alt="dialog"></p><p>Click advanced and then click embed.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6xd93HkKR4o7YHrnPn0SYS/d4fe85d5c0e39284d3ea558802cea8a8/embed-dialog.png" alt="embed-dialog"></p><p>You can choose to adjust the size of the embedded diagrams.</p><p>Copy the html snippet and click the "Activate Embedded Code" button.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3pvqlxxyQYSZ3kIsn2zk9l/89dd5b51a2111dd2684e67b69a7d2ffc/embed-code-button.png" alt="embed-code-button"></p><p>Now copy the code snippet into your techdocs files as it is and you will get diagrams in your techdocs that update when the diagrams are changed in lucid chart.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3cw6zkaXV529vt4D6ypMun/7248a6dd63b00d5bc2ef1875aad115f7/embedded-diagram-in-techdocs.png" alt="embedded-diagram-in-techdocs"></p>
]]></content:encoded></item><item><title><![CDATA[How to model software in Backstage]]></title><link>https://roadie.io/blog/modelling-software-backstage/</link><guid isPermaLink="false">https://roadie.io/blog/modelling-software-backstage/</guid><pubDate>Tue, 29 Jun 2021 21:00:00 GMT</pubDate><description><![CDATA[How to use Backstage concepts — components, APIs, systems and domains — to model software and represent the relationships between different pieces of code.]]></description><content:encoded><![CDATA[<p>Backstage's service catalog serves as a metadata store for useful information about the software assets being used and developed in your organization.</p><p>It can also group software in ways which makes logical sense to the humans who build and use the software. Grouping software makes it easier to understand the overall architecture and can highlight previously unseen dependencies.</p><p>A more understandable architecture is easier to onboard engineers onto and faster to repair when problems occur.</p><p>To see how different types of software asset are represented in Backstage, we're going to model part of the architecture you might find in a hypothetical ride-sharing company. We'll also see how Backstage models the relationships between software and how it can diagram the network of dependencies.</p><p>Here's how our hypothetical architecture looks. We have two backend services. One of them, Passenger Backend, is dependent on two important libraries, the Core Queueing Library and the Core Auth Module. The second backend service, Trips Counter, calls the API of the Passenger Backend.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/2NcSF3qQAHfKnKAKBUwzJ4/0e888dbd33ae4d9dc035eb6d477f0e75/everything.png" alt="Rectangles with arrows pointing between them to represent the architecture"></p><p>This model doesn't demonstrate all of the modeling capabilities that Backstage has to offer. We have omitted Resources, which typically represent shared infrastructure, like a Kubernetes cluster. We have also omitted the sub-component relationship because it has a very niche use-case in the fat-client world.</p><p>We have also ignored the other fundamental pillar of modeling in Backstage — humans and the teams they group themselves in. Backstage provides <code>User</code> and <code>Group</code> concepts for this purpose. They are outside the scope of this document.</p><h2>Modeling components, the basics</h2><p>Let's start with a simple concept like a typical backend service, the Passenger Backend. This could be a NodeJS or Go application perhaps. It probably has some API endpoints, some business logic, a connection to a database and a bunch of libraries installed into it.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/2DmEG4S0K0MRt0zGDcUAPE/96a6d93be83623f3ea01fdbfb4703f8d/passenger-backend.png" alt="the same diagram as above with all boxes removed except for one"></p><p>Backstage represents services like this using three properties, the <code>kind</code>, <code>type</code> and <code>name</code>.</p><pre><code class="language-yaml">kind: Component
type: service
name: passenger-backend
</code></pre><p>Components are one of the fundamental building blocks in Backstage. A component is a single logical unit of code which is owned by a person or group of people. Assuming you don't use mono-repos, it might correlate to a single GitHub or Gitlab repository. It has a type which indicates how it might be used. To represent a backend service like our Passenger Backend, we use the type <code>service</code>.</p><p>A good rule of thumb is to draw the boundaries between pieces of software by considering their ownership. If a codebase, or a part of a codebase, is owned by a team, that's probably a component you want to model by itself.</p><h2>Adding libraries</h2><p>Components can depend other components. For example, our Passenger Backend has two important libraries installed into it. The Core Queuing Library is used to pass jobs over a shared queuing service and the Core Auth Module is used to authenticate incoming requests.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/565hQpEFZWgvRyklZi7eOL/db030abe26ad09efdaaf2869014646c9/libraries.png" alt="two boxes have been added under the box that already existed. They represent libraries and are connected by arrows."></p><p>The Core Queueing Library is represented as a library component.</p><pre><code class="language-yaml">kind: Component
type: library
name: core-queuing-library
</code></pre><p>The relationship between the the Core Queuing Library and the Passenger Backend is defined by a property on Passenger Backend.</p><pre><code class="language-yaml">kind: Component
type: service
name: passenger-backend
dependsOn:
  - core-queuing-library
</code></pre><p>Once that relationship is defined, we can show it off in Backstage by adding the <code>EntityDependsOnComponentsCard</code> to the interface.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/5gamV8pzKG1YmNBn0c5qwm/ff2cdef587b3cbdc752b22c7e32382b7/entity-dependency-of-components-card.png" alt="A table with text for the name of the library, the owner, the lifecycle state - production - and the description of the library"></p><p>You typically wouldn't attempt to represent all dependencies of a service like this. Some services will have hundreds of libraries they depend on, and trying to account for all of them will introduce too much fragility into the model.</p><p>However, it might be appropriate to indicate a dependency on important libraries which are developed in-house and are found in lots of other components across the company.</p><p>Once we indicate that the Passenger Backend depends on the Core Queuing Library, Backstage has enough information to establish an inverse relationship. If we add the <code>EntityDependencyOfComponentsCard</code> and visit the Core Queueing Library in the Backstage catalog, we should see that it is a dependency of Passenger Backend.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/7bTn52EWEIPFdYYpoUhhfQ/39d504857f45e86249e1b5f085f76add/entity-dependency-of-components-card.png" alt="a table with text describing the passenger-backend service with a description and lifecycle state"></p><h2>Representing an API</h2><p>The Passenger Backend service exposes a RESTful HTTP API so that other software in the company can communicate with it. They may use this API to look up the current location of a passenger for example.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/6VaLP64BdE4cvKjxN4H22E/b9bc96454e41987709c45a3203ee46f7/api.png" alt="a 4th box has been added to represent an API. It is connected to passenger-backend via an arrow"></p><p>APIs are represented in Backstage using the same three properties as components.</p><pre><code class="language-yaml">kind: API
type: openapi
name: passenger-api
</code></pre><p>The <code>type</code> specifies the specification language you are using to describe your API. We've specified OpenAPI here but others like GraphQL and gPRC are supported.</p><p>Once we have defined the API, we can indicate that the Passenger Backend service exposes it.</p><pre><code class="language-yaml">kind: Component
type: service
name: passenger-backend
providesApi:
  - passenger-api
</code></pre><p>Once we have indicated this relationship, we can show it off in the Backstage UI by adding the <code>EntityProvidedApisCard</code>. We would typically add this card to a tab on the Passenger Backend component so that people can look that component up in the catalog in order to read its API definition.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/6nVop1GyxU1IUktCIKYtsx/2040d8a3102f41be71de6477f283b6eb/provided-apis-card.png" alt="a table element with text showing the API which is provided by the passenger-backend. The description, lifecycle and other attributes are there."></p><h2>Combining things into systems</h2><p>Together, the Passenger Backend and Passenger API make up a logical system. They are a group of entities with a well defined purpose, providing and managing information on passengers.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/4bpajEq0Cn6eNF1zY7t6n/14a6fbed7de78fa2b31a85e9d25c7199/one-system.png" alt="A box has been added above everything else to indicate a hierarchy. Passenger backend and the API have arrows up to the new box."></p><p>To represent them as a logical group in Backstage, we can define a system. Systems don't have types, they are just systems.</p><pre><code class="language-yaml">kind: System
name: passengers
</code></pre><p>We can declare that the Passenger API and the Passenger Backend are part of the system by adding the system property to their definitions.</p><pre><code class="language-yaml">kind: Component
type: service
name: passenger-backend
system: passengers
providesApi:
  - passenger-api
</code></pre><pre><code class="language-yaml">kind: API
type: openapi
name: passenger-api
system: passengers
</code></pre><p>Once the system exists in Backstage, it will get it's own page in the UI where we can represent its relationships. For example we can add the <code>EntityHasApisCard</code> to see the APIs which are part of this system.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/2FlxVT5Aukp6TeHlHV1JWB/31634f2d3e1857f6fb9bbe15287713d5/entity-has-apis-card.png" alt="A table listing the one API which is part of the system"></p><p>Similarly, we can add the <code>EntityHasComponentsCard</code> to see the components which are part of the system.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/6tc9kydEiWvltISzYGnYHG/2290683b0987ba45ef60ddcb45ca4e03/entity-has-components-card.png" alt="A table listing the one API which is part of the system"></p><p>It's important to note that the Core Queueing Library and the Core Auth Module are not considered to be part of the Passengers system. This is because they are shared libraries which are used in a large number of components throughout the org. The are probably owned and developed by a different organization, within our ride-sharing company.</p><p>Now that we have defined a system, Backstage can diagram it for us. When we add the <code>EntitySystemDiagramCard</code>, we see something like the following:</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/3zgwoGVGLQFELUZbdg6dGM/944e8f899313fa606b5242c99ecd5698/system-diagram.png" alt="An architecture diagram which was produced programatically by Backstage. It has boxes with arrows between them."></p><h2>Consuming APIs</h2><p>Of course, it takes more than just a Passengers system to make a ride-sharing company hum. Those passengers need to go on trips, and we need to count the trips to see how rich we're going to get. Let's add the Trips system into Backstage, give it a Component and connect it up to the Passenger API.</p><p><img src="//images.contentful.com/hcqpbvoqhwhm/3NvX75arTIWdvzFdz9jNnt/3598924faba727c62112a2016599947b/consuming-apis.png" alt="A second system has been added with two boxes. One to represent the system and another to represent the trips counter. There is an arrow pointing to the passenger API to show consumption."></p><p>The key properties required to represent this in Backstage are as follows:</p><pre><code class="language-yaml">kind: System
name: trips
---
kind: Component
type: service
name: trips-backend
consumesApis:
  - passengers-api
</code></pre><p>When we look up the Passengers API in Backstage, we can now see that the trips-backend is a downstream dependency.</p><p>For complex systems, it would be quite onerous to track and compile these dependencies manually. We are hopeful that the community will develop  integrations into technologies like API gateways and service meshes so that dependencies can be inferred and represented in Backstage automatically.</p><h2>Business domains</h2><p>After some time, upper management decides that our ride sharing company should branch out into food delivery. To achieve this vision, they establish a new arm of the company.</p><p>To differentiate the systems we have created to move passengers around from those required to move takeout around, we can create a Domain in Backstage. Domains represent collections of systems which make up a coherent business unit.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2NcSF3qQAHfKnKAKBUwzJ4/0e888dbd33ae4d9dc035eb6d477f0e75/everything.png" alt="Rectangles with arrows pointing between them to represent the architecture"></p><h2>Conclusion</h2><p>With some simple labels like <code>kind</code>, <code>type</code> and <code>name</code> and a handful of relationships like <code>dependsOn</code>, <code>providesApi</code> and <code>consumesApi</code>, complex software architectures can be accurately modeled in Backstage.</p><p>Of course, it's up to you to decide how granularly you want to represent your software. It's totally fine to add components to Backstage and to choose not to group them into systems or domains.  APIs are probably the second most useful concept to include since they indicate the interfaces between components.</p><p>To learn more about this topic, please refer to the <a href="https://backstage.io/docs/features/software-catalog/descriptor-format">Backstage documentation on entities</a> and <a href="https://backstage.io/docs/features/software-catalog/well-known-relations">well-known relations</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Developer portals are a superpower]]></title><link>https://roadie.io/blog/developer-portals-are-a-superpower/</link><guid isPermaLink="false">https://roadie.io/blog/developer-portals-are-a-superpower/</guid><pubDate>Wed, 12 May 2021 21:00:00 GMT</pubDate><description><![CDATA[A rebuttal of a post by AWS guru Corey Quinn, who claims that developer portals are an anti-pattern.]]></description><content:encoded><![CDATA[<p>Last week, Cloud Economist and AWS guru Corey Quinn, wrote a blog post declaring that <a href="https://www.lastweekinaws.com/blog/developer-portals-are-an-anti-pattern/">developer portals are an anti-pattern</a>. He mentioned Backstage, and explained why he believed that it was taking the industry in the wrong direction.</p><p>Despite generally excellent commentary on all things tech, in this case Corey's arguments are mistaken.</p><p>Corey's case against developer portals, and specifically Backstage, is centred around two main arguments:</p><ol><li>Building in-house tooling to wrangle cloud services "robs a company’s engineers of an opportunity to develop reusable skills."</li><li>"Developer portals inherently lag the underlying service’s capabilities".</li></ol><p>Let's look at each argument in turn, and see why Backstage and its vibrant open-source community, is part of a better engineering future.</p><p>To learn more about Backstage in general, and understand what it can do for your engineering organization, checkout out our <a href="/backstage-spotify/">Ultimate Guide to Spotify Backstage</a>.</p><h2>In-house tooling</h2><p>Corey's first argument is that</p><blockquote><p>building in-house tooling to wrangle cloud services [...] robs a company’s engineers of an opportunity to develop reusable skills.</p></blockquote><p>This broad argument can be applied to any in-house tool, not just developer portals and Backstage. Building a bespoke continuous integration tool will rob engineers of an opportunity to learn how to use GitHub Actions or Circle CI for example.</p><p>It's odd to see Backstage mentioned in this context because Backstage is actually part of the solution to this problem, rather than an exacerbating factor.</p><p>Backstage's open source nature means that it can be deployed inside any company. If you use <a href="https://roadie.io">Roadie</a> then you can use it as a SaaS tool just like GitHub Actions, Circle CI or any other reusable tool.</p><p>If the project eventually turns out to be as successful as something like Kubernetes, you will be able to leave a company which has Backstage, join a new one, and fire up Backstage on day one to learn about the ecosystem around you.</p><p>In fact, Backstage brings an opinionated UI/UX which increases the chance that skills will be transferable between companies, even if the internals are customized to the tools and cloud vendor of each companies choosing.</p><h2>Capability lag</h2><p>Corey's second point is that</p><blockquote><p>developer portals inherently lag the underlying service’s capabilities</p></blockquote><p>This is true of any downstream technology dependency. Features must be released in the upstream project before they can be exposed to users. Amazon's Elastic Kubernetes Service will lag new Kubernetes releases, AWS Lambda will lag new NodeJS versions. Yet, thousands of companies use these services every day.</p><p>Backstage is not trying to completely hide underlying technologies from its users. If you have a special case or you need a cutting edge feature, you are absolutely free to jump into the PagerDuty UI or call the Kubernetes API directly. Backstage doesn't block this.</p><p>Backstage's goal is to handle the use cases which make up 80% of work. Reading docs, checking who is on call, re-triggering builds and so on.</p><p>The fact that Backstage is open-source software will help ensure that this lag is minimised. An <a href="https://backstage.io/plugins">array of open-source plugins</a> are already being created by the community. If a feature is not supported, you can add it for yourself and for everyone else who is using that plugin. At Roadie, we are actively funding the maintanance and improvement of these plugins.</p><p>Each day, a large proportion of Spotify's engineering organization choose to use Backstage, not because they are forced to, but because it adds value for them.</p><h2>Proofpoints</h2><p>As evidence of the apparent ills of developer portals, Corey offers up the fact that he hasn't seen Backstage deployed in any company other than Spotify.</p><p>The reality is that Expedia Group, Zalando, and American Airlines have all chosen Backstage for their internal developer portal. The adopters list has <a href="https://github.com/backstage/backstage/blob/master/ADOPTERS.md">many more participants listed</a>.</p><p>Let's be clear, we are still early in the curve of Backstage adoption. The open-source version is just over a year old. It was released early and with limited functionality in place. Spotify are rapidly iterating, alongside the community and with their input, rather than simply dropping a finished product.</p><p>This development style means that open-source Backstage isn't quite baked enough for some companies. That is ok. The community is flourishing, the CNCF is backing it, and Spotify and Roadie are heavily invested in building a powerhouse project.</p><h2>Roadie</h2><p>Of course, I'm biased in my belief that Backstage will succeed. I spent years working on a developer portal and service catalog at Workday, and I've seen the value first hand, both for the business and the end user.</p><p>Our vision is to make Backstage as ubiquitous, powerful, and pleasant to use as GitHub. Backstage will be a reusable skill for engineers all over the world. They will use it because it improves their work lives and gives them access to the information they need to do their jobs. Long live developer portals.</p><p>If you share this vision, <a href="https://careers.roadie.io">join us</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Backstage TechDocs - How it works]]></title><link>https://roadie.io/blog/how-techdocs-works/</link><guid isPermaLink="false">https://roadie.io/blog/how-techdocs-works/</guid><pubDate>Sun, 18 Apr 2021 21:00:00 GMT</pubDate><description><![CDATA[TechDocs converts markdown files to Backstage docs so your engineering teams can find them. But how does it work and how do you set it up?]]></description><content:encoded><![CDATA[<p>TechDocs is the core Backstage feature which transforms markdown documentation into HTML and displays it inside Backstage where your engineering teams can find it.</p><p>There are two ways to set up TechDocs in Backstage, the Basic approach and the Recommended approache. But how do they work and which should you use?</p><p>Read on to find out.</p><h2>Prerequisites</h2><ol><li>Docker installed and running locally on your machine.</li><li>The <code>git</code> version control system and a GitHub account.</li></ol><h2>Basic TechDocs</h2><p>First let's see what the basic experience gets us and how it works.</p><p>Use <code>git</code> to clone the main Backstage repo. We have used <a href="https://github.com/backstage/backstage/tree/1570824aa8d1c2509e098d60636636b482b08ddf">this point in the history</a> but most versions should work. Run <code>yarn install</code>, <code>yarn tsc</code> and <code>yarn build</code> to prepare the codebase and then start it with <code>yarn dev</code>.</p><p>Backstage should shortly be running on <a href="http://localhost:3000">http://localhost:3000</a>. Sign in as a guest, add <a href="https://github.com/RoadieHQ/sample-service/blob/main/catalog-info.yaml">this sample-service</a> to your Backstage catalog and navigate to its docs tab.</p><p>Once the loading process completes, you should see some docs. Simple!</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4HxT9sbsi5bhxvNlmpvS3j/fccb415c7f87dd6948c1b81f5062e011/basic-sample-service.png" alt="basic-sample-service"></p><p>Let's take a look at what actually happened under the hood.</p><h2>How basic TechDocs works</h2><p>Docs were generated and displayed because Backstage detected the <code>backstage.io/techdocs-ref</code> annotation contained in the <code>catalog-info.yaml</code> file of our sample-service. This tells Backstage that there are docs available which it should show to the user.</p><p>There are three actors involved in building the docs: the preparer, the generator and the publisher.</p><p>The preparer cloned the sample service repository into a temp directory on our local machine so the docs can be accessed.</p><p>The generator then downloaded the <code>spotify/techdocs</code> image from Docker Hub. This image contains Python, a few dependencies and a Python library called <a href="https://github.com/backstage/mkdocs-techdocs-core">mkdocs-techdocs-core</a>. The generator ran this library against the docs directory of the sample-service in order to convert the markdown files located there into HTML, CSS and JS files.</p><p>The mkdocs-techdocs-core library is a wrapper around two other libraries, <a href="https://www.mkdocs.org/">MkDocs</a> and <a href="https://squidfunk.github.io/mkdocs-material/">Material for MkDocs</a>.</p><ol><li>MkDocs is a static-site generator which takes a directory and some config and uses it to create a documentation website containing HTML, CSS and JS.</li><li>Material for MkDocs is a MkDocs theme which emulates the Material UI design pattern.</li></ol><p>So MkDocs is generating a static website and Material for MkDocs is styling it. What's next?</p><p>Once the documentation site has been stamped out into a temp directory, it must be moved somewhere where Backstage can access it.</p><p>The publisher is responsible for this step and by default it chooses to move the documentation to <code>plugins/techdocs-backend/static/docs/default/Component/sample-service-1/</code> . If you open this directory you will find a sensible structure containing HTML, JS and CSS files. You should notice a clear similarity between these files and the docs you see in Backstage.</p><p>Now that the files are on the filesystem, the TechDocs frontend can simply request them and insert them into the browser's DOM as a shadow-DOM. That's how they end up in the page where you can see them.</p><h2>Limitations of basic TechDocs</h2><p>This basic architecture is easy to get started with but it has a number of downsides:</p><ol><li>Docker must be available in the place where you want to generate the docs. This may not be viable if Backstage is running in an environment like a Kubernetes pod. You can use the MkDocs binary instead, but then you end up with non-core Backstage dependencies in your Dockerfile.</li><li>It's slow on the first request because TechDocs must generate the docs and place them in the filesystem.</li><li>When running multiple Backstage backends, TechDocs may generate and store the docs once for each backend. This leads to extra slowness for the end users.</li><li>Backstage is pulling down the entire source code of the component to the local filesystem to generate the docs. This may not match your security expectations.</li></ol><p>For these reasons, the TechDocs team recommends a CI driven architecture for generating and storing docs.</p><p>The idea is that a process, like a GitHub action or other CI build, runs every time there is a change to the markdown files which contain our documentation. This process uses the mkdocs-techdocs-core library to convert the markdown files to a static website just like before. However, instead of writing the resulting HTML, CSS and JS files to the local filesystem, it pushes them to an object store like an AWS S3 bucket. From here, Backstage can request them when needed and render them in the browser for the user.</p><h2>Converting to the recommended architecture</h2><p>To convert our basic setup to the recommended architecture, we need to make a few changes. We're using AWS in this example but Google Cloud Platform, Azure and a host of other platforms are supported. We're also using GitHub Actions but CircleCI and others shoud work too.</p><p>We need the following things:</p><ol><li>An AWS S3 bucket to store our docs, and credentials to authenticate uploading and downloading.</li><li>A process to convert markdown docs to HTML, CSS and JS and to push the resulting files to our bucket.</li><li>Configuration to tell Backstage to pull the docs from S3 instead of generating them with Docker.</li></ol><p>Follow <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html">the official AWS documentation</a> to create an AWS S3 bucket. Aquire an Access Key ID and Secret Access Key which will authenticate requests to your bucket. You will also need to note the region your bucket lives in.</p><p>Add a GitHub Action to your component to do the markdown to HTML conversion and push to S3. The official Backstage docs have a <a href="https://backstage.io/docs/features/techdocs/configuring-ci-cd#example-github-actions-ci-and-aws-s3">really good example of the code required</a>. Don't forget to create secrets in your GitHub repo to store the bucket name and AWS credentials you created earlier.</p><p>Edit the <code>app-config.yaml</code> file in your Backstage repo.</p><ol><li>Change <code>techdocs.builder</code> to <code>external</code> to tell Backstage to stop generating docs locally.</li><li>Change <code>techdocs.publisher.type</code> to <code>awsS3</code>.</li><li>Set <code>techdocs.publisher.awsS3.bucketName</code> to the name of your bucket.</li></ol><p>The techdocs section of your <code>app-config.yaml</code> should now look like this:</p><pre><code class="language-bash">techdocs:
  builder: 'external' # Alternatives - 'external'
  generators:
    techdocs: 'docker' # Alternatives - 'local'
  publisher:
    type: 'awsS3' # Alternatives - 'googleGcs' or 'awsS3' or 'azureBlobStorage' or 'openStackSwift'. Read documentation for using alternatives.
    awsS3:
      bucketName: 'demo.roadie.so'
</code></pre><p>Restart backstage, with the AWS credentials present in the environment variables:</p><pre><code class="language-bash">env AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=yyy AWS_REGION=ppp yarn dev
</code></pre><p>From now on, when you merge a change to the default branch of your GitHub repo, a GitHub action will run to generate and publish docs to S3. From there, Backstage will request them and show them to the user.</p><h2>Conclusion</h2><p>Converting your TechDocs from the basic to the recommended setup brings a number of advantages and it only takes a few minutes to switch over one repo.</p>
]]></content:encoded></item><item><title><![CDATA[Deploying Backstage application to AWS ECS Fargate]]></title><link>https://roadie.io/blog/backstage-fargate-up-and-running/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-fargate-up-and-running/</guid><pubDate>Wed, 17 Feb 2021 16:00:00 GMT</pubDate><description><![CDATA[How to deploy Backstage to AWS Elastic Container Service (ECS) using the Fargate serverless computing engine to run Docker containers ]]></description><content:encoded><![CDATA[<p>In this tutorial, we're going to deploy a basic Backstage application to AWS. The application will be using a stack of AWS resources to its advantage. We'll set up a database to run PostgreSQL on AWS RDS, store our environment variables to AWS SSM Parameter Store, route our traffic through an AWS Application Load Balancer and last but not least, run our Backstage application on AWS Fargate compute engine.</p><p>We'll be using the AWS console for most of the actions to scaffold the application, but all steps can be done using either <code>aws-cli</code> or infrastructure as code tools like Terraform or Pulumi.</p><h2>Prerequisites</h2><p>To complete this tutorial, you will need:</p><ul><li><a href="https://docs.docker.com/get-docker/">Docker</a> installed and running on your local machine.</li><li><a href="https://nodejs.org/en/">NodeJS</a> and <a href="https://classic.yarnpkg.com/en/docs/install/#mac-stable">Yarn</a> installed on your local machine.</li><li><a href="https://aws.amazon.com/console/">AWS account</a> with permissions to create IAM policies, RDS databases, Load Balancers, ECS Fargate Clusters and managed ECR repositories.</li><li><a href="https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html">AWS CLI</a> set up locally with your AWS credentials.</li></ul><h2>Step 1 - Spinning up your RDS Database instance</h2><p>To run properly, Backstage needs a database to store and handle data. In AWS environment we can spin up an RDS PostgreSQL database to handle that for us.</p><p>Let's navigate to the <a href="https://eu-west-1.console.aws.amazon.com/rds/home">AWS RDS console</a> and do just that. We'll start of by clicking the big orange button, saying 'Create database'.</p><p>We select the standard create option and select PostgreSQL as our database engine. For templates, we can for now go with the free tier one if it is still available for your AWS account.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1p3iLOrliU8NcN6ciCLHsZ/554d5f89df5af52f495faba7b3d9bc76/rds_step_1.png" alt="rds step 1"></p><p>On the settings section we will set up our database name and master username, and finally generate a password using our favorite password manager. These are good items to temporarily store somewhere, because we will be needing them later. For this deployment the database instance does not yet have to be big and beefy so we will go with the free tier <code>T2.micro</code> instance.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7cGjCQGsbc6wdYslgJXe3G/765f5634e2fdc8c410c8a8d325b4259d/rds_step_2.png" alt="rds_create_db_settings.png"></p><p>We can leave Storage, Availability &#x26; durability as well as Database authentication sections to their default values and focus our attention to the Connectivity section. In this section we will select our preferred VPC and subnets. If nothing special is needed, you can use the default VPC for now as well as the default subnet group. Ideally you don't need your database subnet to be able to accessible from the internet, or even access the internet itself but securing networking within AWS is out of scope for this tutorial.</p><p>We do want to create a new security group to our instance though. We'll name it <code>backstage_rds_SG</code> and select 5432 as our port. AWS will automatically create a new security group for us that grants access to the database port from our IP address. We will later change this IP to point to the security group of our Fargate service.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5NrqFLLEZw8uC4Z0Fe3MmU/a03dd84283b56c658cabc4753dee7bc8/rds_step_3.png" alt="rds_networking_security_group.png"></p><p>After these selections we can click <code>Create database</code> and wait for it to become available.</p><h2>Step 2 - Setting up proper policies to run Fargate containers</h2><p>Before we can start shipping our Backstage container to AWS we need to have few prerequisites set up for the task to be able to run properly. We'll want good logging so we'll give the task permissions to write to CloudWatch. We also want to be able to read environment variables stored in System Manager Paramater Store, so we'll create a policy to do just that as well. Additionally we are creating a private repository for our container images so we'll create a policy to be able to pull those down. All of these policies will be attached to the AWS IAM Role that we will assign to the running container.</p><p>To set up these policies and roles, let's go to <a href="https://console.aws.amazon.com/iam/home">AWS IAM Management Console</a>. In there we will first go to the Policies section and click create a new Policy. The policy json to read SSM Parameters is the following:</p><pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ssm:GetParameters"
      ],
      "Resource": "*"
    }
  ]
}
</code></pre><p>We should additionally restrict the star-scoped resource to match only needed parameters for this application. That could be something like <code>arn:aws:ssm:[REGION]:[ACCOUNT_ID]:parameter/roadie/backstage/*</code>, depending on the namespace we choose to use in later steps.</p><p>We also want our logs from Fargate to go to some place where we can see them so we'll create another policy:</p><pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents",
        "logs:DescribeLogStreams"
      ],
      "Resource": [
        "arn:aws:logs:*"
      ]
    }
  ]
}
</code></pre><p>Again, if we want to write into just some predefined log stream, so we should scope the resource section to match that. We can also leave <code>CreateLogGroup</code> out in that case since the Fargate task doesn't need permissions to create it.</p><p>The last policy we want to create is to allow Fargate to download the Docker images we have pushed to our private ECR.</p><pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer",
        "ecr:GetAuthorizationToken"
      ],
      "Resource": "*"
    }
  ]
}
</code></pre><p>Finally, now that we have our policies set up, we can create a Role that we can attach to the running Fargate task.</p><p>We'll jump into the Roles section of IAM console and click the 'Create role' button. We select trusted entity type to be 'Elastic Container Service' and our use case to be 'Elastic Container Service Task'. On the next page where a list of permissions are displayed we select the three policies we created above.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3lCdKPTRc9h9nCpLq8WLZZ/a902f6955c8a72d39bdcbe8c4c051a53/fargate_instance_role.png" alt="fargate instance role"></p><p>When we navigate to the role we should make sure that the correct trust policy JSON has been assigned to it. We don't want to use this same role to be used by our running tasks, only the ECS service itself, so out trust policy is pointing to ecs.amazonaws.com only.</p><pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "ecs.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
</code></pre><p>Now we have all the prerequisites on IAM side ready for our deployment.</p><h2>Step 3 - Defining our environment in System Manager Parameter Store</h2><p>We know what the connection string is to connect to our database so next we will move on to set up those. We'll also set up our  Github token in the same way as an environment variable (make sure you have created a Github token explained <a href="https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token">here</a>, if you want to use Github with your Backstage). There are few different ways to pass in environment variables to running containers in AWS ECS. We will be using AWS System Manager Parameter Store to save those in a safe place where they can be then loaded to the running container. AWS ECS also provides the possibility to load environment variables from a flat file stored in S3 or pass them in directly (unsecured) to the task definition.</p><p>Let's navigate to the <a href="https://eu-west-1.console.aws.amazon.com/systems-manager/parameters/?region=eu-west-1&#x26;tab=Table">Parameter Store</a> and populate the needed values in there.
By clicking Create Parameter we can create values for our database credentials for the RDS instance we created previously as well as the Github token we have created.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5L8QpmhYE20857SOmQDpNR/dc7763f84bdec7bf478e1e6da1dee5e3/param_store_create.png" alt="param store create"></p><p>At least the <code>DB_PASSWORD</code> and <code>GITHUB_TOKEN</code> should be of type SecureString, so they are encrypted. We'll be using just the default KMS Key in this case to encrypt the values, but it might be worthwhile to generate a specific key for these parameters.</p><p>In the end we should be ending up with a list of few parameters that we can use later.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/79stOe8AqOCuex5Zi3bBrw/fe5a7287c28eb9ed48a12c998db55ff4/param_store_list.png" alt="param store list"></p><p>Note that we have not defined the database port to be retrieved from parameter store here. That might be something you want to do if the ports change regularly or are non-standard, but is not really necessary.</p><h2>Step 4 - Creating a Load Balancer for our Backstage service</h2><p>The last scaffolding bit we want to do to support our Fargate Backstage is to set up a load balancer in standby to wait for our Fargate service to attach itself to it. We do this step a bit prematurely just to have a good static URL available to point to when we eventually start building the actual Backstage application.</p><p>Let's navigate to <a href="https://eu-west-1.console.aws.amazon.com/ec2/v2/home?region=eu-west-1#LoadBalancers">AWS Load Balancer section</a> in the console and spin one up.</p><p>We want to create an Application Load balancer that is internet facing. It is a good idea to select all subnets, so the load balancer is able to target our containers even if they are spread out across different AWS Availability Zones. For now the only listener we attach to the load balancer is listening to HTTP traffic through port 80, but if you have a domain that you control and can create certificates for, you should be using HTTPS and port 443.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1JzkyiuD1UmvagrItZfcLV/383de0be482e3380713ea973619754f5/lb_basic_setup.png" alt="lb basic setup"></p><p>On the next step we configure the security of our balancer. If you didn't choose to use HTTPS protocol, AWS will show you a little warning to do so. If you did, you should be putting in your certificate details for the domain name you have available.</p><p>We'll continue onwards to setting up our security groups. In this case we want create a new one for the load balancer. The only thing we need to listen (in this setup without HTTPS) is to configure this group to allow traffic to port 80 from everywhere (0.0.0.0/0, ::/0). If you want to restrict access to your Backstage instance, you can define an IP range of your office network or VPN, or your personal public IP.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/3EbByHv9juIXtH6VUvnLMs/ed2ae34afe0f0c84639af168f2acfe26/lb_sec_groups.png" alt="lb sec groups"></p><p>For target groups we just change the port to 7000, which will be the one our Backstage instance will be using and give the target group a name. We will not be registering any actual targets yet. That will be handled when we spin up our Fargate services.</p><p>The load balancer will take few a minutes to spin up. While waiting for that we will take note of the DNS name of the balancer, this will be the entry we'll modify our application configuration with. Of course, if you have added an CNAME/Alias entry of your own domain to point to the load balancer, you should use that instead.</p><h2>Step 5 - Creating your Backstage image</h2><p>To deploy the Backstage application we want to have it packaged into a docker image with configurations best suitable for our environment. We'll start this journey in the Backstage repository. For more information how to scaffold the initial application you can take a look at the post to get <a href="https://roadie.io/blog/backstage-docker-service-catalog/">Backstage running with Docker compose</a>.
For this post we start the same way and scaffold a new fresh Backstage application by running <code>npx @backstage/create-app</code>. After we have figured out a good name for the app and selected PostgreSQL as our database provider, we are ready to massage our configuration files to match what we want our environment to look like.</p><p>If we take a look at the default <code>app-config.yaml</code> file we see few environment variables that are needed to get the app running properly. These environment variables, for our use case, based on the default <code>app-config.yaml</code> file are:</p><pre><code>POSTGRES_HOST
POSTGRES_PORT
POSTGRES_USER
POSTGRES_PASSWORD
GITHUB_TOKEN
</code></pre><p>These happen to be the same items we created in AWS Parameter Store previously so looks like we are on the right track.</p><p>A lot of the values in the default configuration file are not necessary and can be removed. Things like default catalog locations can be removed since that section depends a lot on the way you want to configure your Backstage instance. For this tutorial, we will leave the whole configuration file as is.</p><p>Previously we created a load balancer to have a more stable DNS entry we can use as the application entrypoint. We'll add that in to our application configuration by modifying the <code>app-config.production.yaml</code>. We'll also turn off HTTP->HTTPS redirection from our Content-Security-Policy for now since our Load Balancer only supports port 80. This would be something that can be omitted for more secure environments where HTTPS is set up.</p><p>The whole file would eventually look something like this:</p><pre><code class="language-yaml">app:
  baseUrl: http://roadie-fargate-loadbalancer-123456789.eu-west-1.elb.amazonaws.com

backend:
  baseUrl: http://roadie-fargate-loadbalancer-123456789.eu-west-1.elb.amazonaws.com
  listen:
    port: 7000
  csp:
    upgrade-insecure-requests: false # For tutorial purposes only
</code></pre><p>That is all configuration needed to build an image we can run on Fargate. To create the actual deployable we can rely on the built-in <code>build-image</code> command that produces a Docker image with the current content of our workspace. We run <code>yarn build-image</code> and wait for Backstage-CLI to do its thing. By running <code>docker images</code> we can see that the previous command has created a Docker image for us with the repository name <code>backstage</code> and tag <code>LATEST</code>.</p><p>To run containers in Fargate we need to store the Docker image somewhere where the ECS service can download it from. For that we will create a new repository in ECR which we can use as the home for our container. The easiest way to do this is to use the <code>aws-cli</code> tool. Note that we are using <code>eu-west-1</code> region throughout this tutorial, so be sure to change to your preferred region accordingly.</p><pre><code class="language-bash">aws ecr create-repository --repository-name fargate-backstage --region eu-west-1
</code></pre><p>AWS responds to us with the configuration of the repository which we can then use to tag and push our image to. Here is the json output from the command:</p><pre><code class="language-json">{
    "repository": {
        "repositoryArn": "arn:aws:ecr:eu-west-1:123456789012:repository/fargate-backstage",
        "registryId": "123456789012",
        "repositoryName": "fargate-backstage",
        "repositoryUri": "123456789012.dkr.ecr.eu-west-1.amazonaws.com/fargate-backstage",
        "createdAt": "2021-02-16T13:56:38+01:00",
        "imageTagMutability": "MUTABLE",
        "imageScanningConfiguration": {
            "scanOnPush": false
        },
        "encryptionConfiguration": {
            "encryptionType": "AES256"
        }
    }
}

</code></pre><p>From that configuration we grab the <code>repositoryUri</code> and use that to tag our Backstage image with the correct repository path and version number. We'll trust that this first iteration of our image is production ready, so we bravely start versioning from number 1.0.0.</p><pre><code class="language-bash">docker tag backstage:latest 123456789012.dkr.ecr.eu-west-1.amazonaws.com/fargate-backstage:1.0.0
</code></pre><p>Now we are ready to push our image to our newly created repository and move to scaffold other AWS resources.
First, let's login to ECR:</p><pre><code class="language-bash">aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.eu-west-1.amazonaws.com
</code></pre><p>and then push the image up to AWS ECR</p><pre><code class="language-bash">docker push 123456789012.dkr.ecr.eu-west-1.amazonaws.com/fargate-backstage:1.0.0
</code></pre><h2>Step 6 - Defining our Fargate tasks</h2><p>All the supporting configuration should now be done, and we can finally move to define the actual container, service and task that will be running our Backstage instance.</p><p>We'll start of by creating a new cluster in <a href="https://eu-west-1.console.aws.amazon.com/ecs/home?region=eu-west-1#/clusters">AWS ECS</a>. Clusters in ECS are mostly just for namespacing purposes, but they are tied to a specific VPC, so make sure you choose the same VPC where the load balancer and RDS database are.</p><p>The next step is to create a task definition. This will contain the settings for our Fargate instance, our container definitions and the configuration on how we pass in our environment variables. We'll select Fargate as the type of task and fill in the needed values. The role for the task itself as well as the task execution role should be the one we created earlier for this purpose. We'll select half a vCPU and 1GB of memory for this first iteration and see how the service behaves. These can be updated later if there is need for more resources.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/41QWq0orML0jx0aSYXVCEA/c9fa050ba680ce5f33d5298d0353faef/task_def_basic.png" alt="task def basic"></p><p>Our task of course needs a container, so we will create a new container definition by clicking 'Add container'. We'll give our container a descriptive name and on the image textfield add our freshly created and pushed <code>123456789012.dkr.ecr.eu-west-1.amazonaws.com/fargate-backstage:1.0.0</code> Docker image. For port mappings we'll add a single item, exposing port <code>7000</code> from the container. Other values on this section can be left as default.</p><p>A little bit further down in the environment section we will add few lines to retrieve our env variables from Parameter store. The environment variable names came from our <code>app-config.yaml</code> file and were:</p><pre><code>POSTGRES_HOST
POSTGRES_PORT
POSTGRES_USER
POSTGRES_PASSWORD
GITHUB_TOKEN
</code></pre><p>Most of the environment variables we define will use the 'ValueFrom' type to retrieve needed information. For these we add the key and point the value to the ARN of the corresponding parameter in Parameter Store. The port value alone is passed in as plain text.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7rfRZ1YlG9ksOUt7szes03/d02ed01e59980e9fe225dda89d7604d9/fargate_container_env.png" alt="fargate container env"></p><p>We'll also click on autoconfigure CloudWatch Logs to be able to see the logs from the running container.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/5yDYutwlwZfNa2noWVQhHZ/28a9d7a4157793e42f3389d2ebae5b29/fargate_container_log_config.png" alt="fargate container log config"></p><p>That is all the configuration needed for the task definition for now.</p><p>Final step to start up these tasks is to create an ECS service within our cluster that points to the task definition we have created. We'll navigate back to the cluster we have created and on the services tab click Create.</p><p>We'll select Fargate launch type and pick our just created task definition. The Platform Version is good to set as 1.4.0 since 'Latest' counterintuitively  actually points to '1.3.0'. We can leave deployments to be a rolling update for now and Task Tagging config to be their default values.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/1CVDoDL24WLSAlmL5TLQ06/140ef18e296634fe3f7f4be39103a1ba/fargate_service_basic.png" alt="fargate service basic"></p><p>Next step on the wizard is the networking configuration. We'll choose our same VPC that our cluster, RDS and load balancer are and select few (or all, our LB supports all of them) of the subnets from the dropdown. We want to assign a public IP to the service in this case because we are accessing a regional AWS service, SSM Parameter Store and we don't have a VPC endpoint set up for it.</p><p>We will create one more security group for this service. The security group doesn't really need to accept traffic from anywhere else than our load balancer. It is good a practice to keep the firewall as secure as possible, so we'll configure the new security group only to allow access to port 7000 and from only one source group, our load balancer security group.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4DPrFvLDH1cTQmg8MEBC8m/5148dfe03e279c9dccf54352ba1c764d/fargate_service_sg.png" alt="fargate service sg"></p><p>Note that AWS doesn't really make UX around this too easy by deciding to display security group ids only. You need to navigate to either security groups in the VPC console or directly to your Load Balancer to see what the id of the security group is.</p><p>Now that our security group allows access from our load balancer, we can click the radio button selection on the Load Balancing section to be Application Load Balancer. We select our load balancer from the drop down, select our container from the second drop down and add that to be balancer. Most of the values are autopopulated for us. We'll choose our created target group and let ECS to register the service as a target to it.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/580fl8VaYSJnyGve6lmnap/3ba1faf8563ff6a744fe035fde3bccf7/fargate_service_lb.png" alt="fargate service lb"></p><p>The rest of the settings we can leave as defaults and just click through the wizard. ECS will automatically start spinning up our service. When we can navigate to our ECS service and tasks tab we should be able to see ECS trying hard to provision our containers.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/2R0QKEQw5CYBrensEULUwe/f85e946b106f969fe2e6748bde363616/fargate_provisioning_tasks.png" alt="fargate provisioning tasks"></p><p>It will take few minutes before it reaches 'RUNNING' status. Unfortunately it doesn't seem to stay in 'RUNNING' status for too long and instead ends up in a loop of starting a new task and failing one after another.</p><p>We can investigate and debug why the running container fails to stay up by checking <a href="https://eu-west-1.console.aws.amazon.com/cloudwatch/home?region=eu-west-1#logsV2:log-groups">CloudWatch logs</a> that our task has written. In cases where the task doesn't start at all we can take a look at the task itself from the ECS pages to see what prevents it from starting. These could be something like IAM policy issues or perhaps a wrong URL to the image that we have defined.</p><p>When we take a look at the logs we can see that the container starts up and Backstage itself within the container tries its best to start up. It fails on Knex timeout error, telling us that Knex is unable to connect to the database.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6RJU8sHEllHmgUpTZKeo7T/bba1dfcbbf1aecad250ef240cb9e3dc0/db_connectivity_error.png" alt="db connectivity error"></p><p>There is one thing we want to do to fix that. In the first step we spun up an RDS database and created a new security group for it. This security group does not allow our container to access the database, so we need to make some modifications to it. We can navigate to the security group via RDS and modify the inbound rules of it. We will add a new line allowing traffic to port <code>5432</code> from security group that we have created to our Fargate service. After clicking save, the change to the firewall is immediate, and the next task ECS spins up for us should be able to connect to the database and stay up and running.</p><p>And that should be it!</p><p>We can now navigate to our load balancer URL and we should be seeing a running Backstage instance with default data scaffolded for us.</p><h2>Conclusion</h2><p>Setting Backstage up and running on AWS Fargate requires multiple steps and configurations but provides a secure and manageable Backstage instance after the initial configuration is done. There are few avenues where this solution can evolve from here. Things like high availability and monitoring are something to think about when spinning up a Backstage instance as well and will eventually bring more complexity into the solution. With this tutorial you can get going and start experimenting with Backstage before moving into more complex architectures.</p>
]]></content:encoded></item><item><title><![CDATA[How to deploy Backstage on KIND Kubernetes]]></title><link>https://roadie.io/blog/backstage-service-catalog-kubernetes/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-service-catalog-kubernetes/</guid><pubDate>Mon, 01 Feb 2021 21:00:00 GMT</pubDate><description><![CDATA[How to build and run Backstage on a local Kubernetes cluster created with KIND.]]></description><content:encoded><![CDATA[<p>In this tutorial, we're going to build a basic Backstage application and deploy it to a local Kubernetes cluster created with Kind. The application will be able to store data, such as the services in the Backstage catalog, in an in-memory Sqlite3 database.</p><h2>Prerequisites</h2><p>To complete this tutorial, you will need:</p><ul><li><a href="https://docs.docker.com/get-docker/">Docker</a> and installed and running on your local machine.</li><li><a href="https://nodejs.org/en/">NodeJS</a> installed on your local machine.</li><li>The <a href="https://classic.yarnpkg.com/en/docs/install/#mac-stable">Yarn package manager</a> installed. You can use <code>npm</code> if you like, although you will have to modify the shell commands somewhat.</li><li>The <a href="https://kind.sigs.k8s.io/docs/user/quick-start/">KIND</a> Kubernetes cluster manager installed. You can skip this requirement if you already have a Kubernetes cluster which you wish to install Backstage into.</li><li>The Kubernetes <code>kubectl</code> command line tool, for interfacing with the cluster we will create.</li></ul><h2>Step 1 - Scaffold a Backstage application</h2><p>To run Backstage on Kuberentes, we first need to scaffold a Backstage application to work with. The main Backstage codebase does ship with a sample application we can run, but best practices dictate that we should create our own so we can customize it with our company name and other attributes.</p><p>Backstage requires a database to store information about the components, websites and other entities you want to track in the catalog. There are two built in database options, Sqlite and PostgreSQL. We're going to use Sqlite3 for this tutorial.</p><p>It is simpler and quicker to get set up with Backstage and Sqlite3. The downside is that our data will be stored in memory, and will be lost if we upgrade or restart our Backstage instance or Kubernetes pod.</p><p>This tutorial uses version <code>0.3.7</code> of the Backstage CLI to create this application. You may see different results if you're using a different version.</p><pre><code class="language-bash">npx @backstage/create-app --version
npx: installed 67 in 5.094s
0.3.7

npx @backstage/create-app
npx: installed 67 in 4.944s
? Enter a name for the app [required] scaffolded-app-sqlite
? Select database for the backend [required] SQLite

Creating the app...

 Checking if the directory is available:
  checking      scaffolded-app-sqlite ✔

 Creating a temporary app directory:
  creating      temporary directory ✔

 Preparing files:
  templating    .gitignore.hbs ✔
  copying       .eslintrc.js ✔
  copying       app-config.production.yaml ✔
  templating    app-config.yaml.hbs ✔
  templating    catalog-info.yaml.hbs ✔
  copying       README.md ✔
  copying       lerna.json ✔
  templating    package.json.hbs ✔
  copying       tsconfig.json ✔
  copying       .eslintrc.js ✔
  copying       Dockerfile ✔
  copying       README.md ✔
  templating    package.json.hbs ✔
  copying       index.test.ts ✔
  copying       index.ts ✔
  copying       types.ts ✔
  copying       app.ts ✔
  copying       auth.ts ✔
  copying       catalog.ts ✔
  copying       proxy.ts ✔
  copying       scaffolder.ts ✔
  copying       techdocs.ts ✔
  copying       .eslintrc.js ✔
  copying       cypress.json ✔
  templating    package.json.hbs ✔
  copying       apple-touch-icon.png ✔
  copying       android-chrome-192x192.png ✔
  copying       favicon-16x16.png ✔
  copying       favicon-32x32.png ✔
  copying       favicon.ico ✔
  copying       index.html ✔
  copying       manifest.json ✔
  copying       robots.txt ✔
  copying       safari-pinned-tab.svg ✔
  copying       .eslintrc.json ✔
  copying       app.js ✔
  copying       App.test.tsx ✔
  copying       App.tsx ✔
  copying       LogoFull.tsx ✔
  copying       LogoIcon.tsx ✔
  copying       apis.ts ✔
  copying       index.tsx ✔
  copying       plugins.ts ✔
  copying       setupTests.ts ✔
  copying       sidebar.tsx ✔
  copying       EntityPage.tsx ✔

 Moving to final location:
  moving        scaffolded-app-sqlite ✔

 Building the app:
  executing     yarn install ✔
  executing     yarn tsc ✔

🥇  Successfully created scaffolded-app-sqlite

See https://backstage.io/docs/tutorials/quickstart-app-auth to know more about enabling auth providers
</code></pre><h2>Step 2 - Building a Docker image</h2><p>Backstage comes with a built in command to help you build a Docker image which we can deploy into a Kubernetes cluster.</p><p>Change into the <code>scaffolded-app-sqlite</code> directory which we just created, and use <code>yarn</code> to run a command which will build the Docker image.</p><pre><code class="language-bash">yarn build-image
yarn run v1.22.10
$ yarn workspace backend build-image
$ backstage-cli backend:build-image --build --tag backstage
# Lots of output omitted...
=> => naming to docker.io/library/backstage                                                                                                                                                                                                                                                                                                                                            0.0s
✨  Done in 177.33s.
</code></pre><p>We should now see that an image has been built successfully.</p><pre><code class="language-bash">docker images                                                                                                                                                                                                                                                                                                                                                                                                                                       1 ↵
REPOSITORY         TAG       IMAGE ID       CREATED         SIZE
backstage          latest    7b452013e713   3 minutes ago   1.1GB
</code></pre><p>And we can run it using Docker directly.</p><pre><code class="language-bash">docker run -p 7000:7000 backstage
2021-01-31T16:41:18.319Z backstage info Initializing http server
2021-01-31T16:41:18.322Z backstage info Listening on :7000
</code></pre><p>Open <a href="http://localhost:7000">http://localhost:7000</a> in your browser to check that Backstage is working correctly.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7tERp9hwauzlNUpbZkEyKA/1a47d8ab41bd05bca2a33da26c6c1200/backstage-service-catalog-screenshot.png" alt="The Backstage service catalog with default software displayed"></p><h2>Step 3 - Create a KIND Kubernetes cluster</h2><p>Now that we have a docker image for Backstage, we need somewhere to deploy it. In this tutorial, we are going to deploy our image to a local development cluster created with <a href="https://kind.sigs.k8s.io/docs/user/quick-start/">KIND</a>.</p><p>Similar deployment steps should work on other Kubernetes providers such as minikube, AWS or Google Cloud platform.</p><p>Use <code>kind</code> to create a Kubernetes cluster to work with. We need some special settings on our cluster so we can configure ingress in the cluster with Nginx. Use <a href="https://kind.sigs.k8s.io/docs/user/ingress/">this snippet from the KIND docs</a>.</p><pre><code class="language-bash">kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.19.1) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind
</code></pre><p>Once this completes, your <code>kubectl</code> command line utility should be automatically configured to use this newly created cluster.</p><pre><code class="language-bash">» kubectl config get-contexts                                                                                                                                                                                                                                                            130 ↵
CURRENT      NAME           CLUSTER        AUTHINFO       NAMESPACE
*            kind-kind      kind-kind      kind-kind
</code></pre><p>The backstage Docker image we built previously is not automatically shared with our KIND kubernetes cluster. Before we can use it, we have to load it into the cluster. This is covered in <a href="https://kind.sigs.k8s.io/docs/user/quick-start/#loading-an-image-into-your-cluster">the Kind docs</a>.</p><pre><code>kind load docker-image backstage:latest
Image: "backstage:latest" with ID "sha256:fe0c8bf5323b46fc145cab5832e6df4d7871d1cfd230e497d025e5bb5bdd2c05" not yet present on node "kind-control-plane", loading...
</code></pre><p>Now that the image is loaded, we can create a Backstage deployment and a service to expose it on an IP inside the cluster. Save the following YAML into a file called <code>manifest.yaml</code>.</p><pre><code class="language-yaml">apiVersion: apps/v1
kind: Deployment
metadata:
  name: backstage
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backstage
  template:
    metadata:
      labels:
        app: backstage
    spec:
      containers:
        - name: backstage
          imagePullPolicy: Never
          image: docker.io/library/backstage:latest
          ports:
            - containerPort: 7000
---
kind: Service
apiVersion: v1
metadata:
  name: backstage-service
spec:
  selector:
    app: backstage
  ports:
    - port: 7000
</code></pre><p>You'll notice that we have set the <code>imagePullPolicy</code> to <code>Never</code>. This prevents a problem where kubernetes will attempt to find a new version of the backstage docker image on the network, instead of using the one we loaded onto the cluster earlier. This cluster has no network access and thus, without setting <code>imagePullPolicy: Never</code>, our deployment would fail.</p><p>We apply this change to the cluster with the following command.</p><pre><code class="language-bash">kubectl apply -f manifest.yaml
</code></pre><p>We can double-check that the change was applied successfully by inspecting our backstage Kubernetes pod.</p><pre><code class="language-bash">kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
backstage-64d46b7886-r7l7r   1/1     Running   0          8m14s
</code></pre><p>We know this is running successfully because the STATUS is <code>Running</code>.</p><h2>Step 4 - Access Backstage in the browser</h2><p>Our local KIND kubernetes cluster doesn't provide a way to access Backstage from our local machine, which is outside the cluster.</p><p>To work around this, we will have to forward a port inside the cluster, to one on our local machine. To do this, we will use the built in port forwarding feature of <code>kubectl</code>.</p><pre><code>kubectl port-forward backstage-64d46b7886-4rdtp 7000:7000
</code></pre><p>As before, open <a href="http://localhost:7000">http://localhost:7000</a> in your browser to view Backstage. It looks like nothing has changed, but this page is being rendered inside our Kubernetes cluster and exposed to the browser.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7tERp9hwauzlNUpbZkEyKA/1a47d8ab41bd05bca2a33da26c6c1200/backstage-service-catalog-screenshot.png" alt="The Backstage service catalog with default software displayed"></p><h2>Conclusion</h2><p>In this tutorial you learned how to get Backstage running in a local Kubernetes cluster and expose it to your browser.</p>
]]></content:encoded></item><item><title><![CDATA[Using GitHub Auth with Backstage]]></title><link>https://roadie.io/blog/github-auth-backstage/</link><guid isPermaLink="false">https://roadie.io/blog/github-auth-backstage/</guid><pubDate>Wed, 05 Aug 2020 21:00:00 GMT</pubDate><description><![CDATA[Setting up GitHub authentication can be a little tricky, but this post will tell you everything you need to know.]]></description><content:encoded><![CDATA[<p><strong>Update Sept 2021:</strong> Backstage now supports GitHub authentication via GitHub apps. If you are using a GitHub app, you do not need to follow the steps described below. They are only valid if you are using a GitHub Personal Access Token with Backstage.</p><p>GitHub is one of the most popular Backstage authentication mechanisms going. There's a good reason for this, Backstage ultimately needs to pull service catalog information from YAML files, those YAML files usually live in git, and the git repos usually live on GitHub.</p><p>Setting up GitHub authentication can be a little tricky, but this post will tell you everything you need to know.</p><p>There are basically two steps:</p><ol><li>Create an OAuth application on GitHub,</li><li>Pass the identity information from this application to Backstage.</li></ol><p>Let's get into it.</p><h2>Create an OAuth application on GitHub</h2><p>To create an OAuth app for local development, visit <a href="https://github.com/settings/developers">your OAuth Apps settings page on GitHub</a>. Click the "New OAuth App" button and you'll see a form you have to fill out.</p><p>Enter the following values:</p><p>Your form should now look something like this:</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/67kFoKTMRZy9DM0PT1D0ie/23e4dbd3fd16ebcceb98c90c188d58d7/github-register-new-oauth-filled.png" alt="a screenshot of the form on GitHub which allows the user to register a new OAuth application. The values mentioned above are prefilled."></p><p>The tricky thing with this, is that the homepage URL should point to the Backstage Frontend, because that's what your users will consider to be "Backstage", but the Authorization callback URL must point to the Backstage Backend.</p><p>When GitHub authenticates a user, it will call out to the application Backend, with some authentication parameters included in the URL query string. Backstage will check these parameters and then server-side render a confirmation page for the user.</p><p>Once you submit that form, GitHub provides you with a Client ID and Client Secret for your OAuth application.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/6On6ErS08FrnZKtXSFHQhO/bb3624e2563145b91c615503a16bc83e/github-id-and-secret.png" alt="A screenshot of GitHub showing the client ID and secret for a demo application"></p><p>Note these down, you'll need them in the next step.</p><h2>Tell Backstage about your OAuth application</h2><p>Go back to the command line where you run the Backstage backend and pass the Client ID and Client Secret into Backstage when you start it up.</p><pre><code class="language-shell"># starting in the root of your Backstage repo
» cd packages/backend
» env AUTH_GITHUB_CLIENT_ID=eafc816045b5533ba581 AUTH_GITHUB_CLIENT_SECRET=34922f6547991760e8f5219a529a9c00b0fd44ea yarn start
</code></pre><p>That's all there is to it. When Backstage starts up and opens on http://localhost:3000, you'll be able to login via GitHub.</p><h1>Further reading</h1><p><a href="https://github.com/spotify/backstage/tree/master/docs/auth">The authentication docs for Backstage</a> can be found in the <code>docs/auth</code> directory, within the repo.</p><p>They are rather engineering focussed and not the easiest to follow if you're just trying to authenticate with GitHub.</p>
]]></content:encoded></item><item><title><![CDATA[Running the Backstage service catalog with Docker Compose]]></title><link>https://roadie.io/blog/backstage-docker-service-catalog/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-docker-service-catalog/</guid><pubDate>Tue, 09 Jun 2020 21:00:00 GMT</pubDate><description><![CDATA[How to build and run Backstage Docker containers to get started with the service catalog in Docker Compose.]]></description><content:encoded><![CDATA[<hr><p>In this tutorial, we're going to build and run a basic Backstage application with Docker Compose. The application will be able to store data in a PostgreSQL database, and connect to GitHub to pull in repositories. We will also make a config change in the Backstage application and re-run it.</p><p><strong>Just want to get started quickly?</strong> Check out our community <a href="/backstage/docker-image/">Backstage Docker image</a>.</p><h2>Prerequisites</h2><p>To complete this tutorial, you will need:</p><ul><li><a href="https://docs.docker.com/get-docker/">Docker</a> and <a href="https://docs.docker.com/compose/">Docker Compose</a> installed and running on your local machine.</li><li><a href="https://nodejs.org/en/">NodeJS</a> installed on your local machine.</li><li>The <a href="https://classic.yarnpkg.com/en/docs/install/#mac-stable">Yarn package manager</a> installed. You can use <code>npm</code> if you like, although you will have to modify the shell commands somewhat.</li></ul><h2>Step 1 - Scaffold a Backstage application</h2><p>To run Backstage on Docker Compose, we need to create a Backstage instance to work with. The main Backstage codebase does ship with a sample application we can run, but best practices dictate that we should create our own so we can configure it with our company name and other attributes.</p><p>Backstage requires a database to store information about the components, websites and other entities you want to track in the catalog. There are two built in database options, Sqlite and PostgreSQL. We're going to use PostgreSQL for this tutorial.</p><p>Backstage comes with a CLI for creating Backstage instances. Let's use it to scaffold a new instance and configure it for PostgreSQL. We'll call this instance <code>scaffolded-app</code>, but you can choose a name that makes more sense for you.</p><p>This tutorial uses version <code>0.3.2</code> of the Backstage CLI to create this application. You may see different results if you're using a different version.</p><pre><code class="language-bash">» npx @backstage/create-app --version
0.3.2

» npx @backstage/create-app
npx: installed 68 in 14.197s
? Enter a name for the app [required] scaffolded-app
? Select database for the backend [required] PostgreSQL

Creating the app...

 Checking if the directory is available:
  checking      scaffolded-app ✔

 Creating a temporary app directory:
  creating      temporary directory ✔

 Preparing files:
  copying       README.md ✔
  copying       .npmignore ✔
  copying       lerna.json ✔
  templating    app-config.yaml.hbs ✔
  templating    package.json.hbs ✔
  copying       tsconfig.json ✔
  copying       .eslintrc.js ✔
  copying       cypress.json ✔
  templating    package.json.hbs ✔
  copying       .eslintrc.js ✔
  copying       android-chrome-192x192.png ✔
  copying       favicon-16x16.png ✔
  copying       apple-touch-icon.png ✔
  copying       favicon-32x32.png ✔
  copying       favicon.ico ✔
  copying       manifest.json ✔
  copying       index.html ✔
  copying       safari-pinned-tab.svg ✔
  copying       robots.txt ✔
  copying       App.tsx ✔
  copying       App.test.tsx ✔
  copying       index.tsx ✔
  copying       apis.ts ✔
  copying       plugins.ts ✔
  copying       sidebar.tsx ✔
  copying       setupTests.ts ✔
  copying       .eslintrc.json ✔
  copying       app.js ✔
  copying       .eslintrc.js ✔
  copying       Dockerfile ✔
  copying       README.md ✔
  templating    package.json.hbs ✔
  copying       index.ts ✔
  copying       types.ts ✔
  copying       index.test.ts ✔
  copying       auth.ts ✔
  copying       catalog.ts ✔
  copying       identity.ts ✔
  copying       proxy.ts ✔
  copying       scaffolder.ts ✔
  copying       techdocs.ts ✔

 Moving to final location:
  moving        scaffolded-app ✔

 Building the app:
  executing     yarn install ✔
  executing     yarn tsc ✔
  executing     yarn build ✔

🥇  Successfully created scaffolded-app
</code></pre><p>If we <code>cd</code> into the <code>scaffolded-app</code> directory which was just created, we can see the directory structure which was created for us.</p><pre><code class="language-bash">» ls -al                                                                                                                                                                                                                                                                                                                            146 ↵
total 1776
drwxr-xr-x    19 myuser  staff     608  9 Jan 20:20 .
drwxr-xr-x     3 myuser  staff      96  9 Jan 19:17 ..
-rw-r--r--     1 myuser  staff      36  9 Jan 19:17 .eslintrc.js
-rw-r--r--     1 myuser  staff     420  9 Jan 19:17 .gitignore
-rw-r--r--     1 myuser  staff      93  9 Jan 19:17 README.md
-rw-r--r--     1 myuser  staff     184  9 Jan 19:17 app-config.production.yaml
-rw-r--r--     1 myuser  staff    3250  9 Jan 19:17 app-config.yaml
-rw-r--r--     1 myuser  staff     399  9 Jan 19:17 catalog-info.yaml
drwxr-xr-x     4 myuser  staff     128  9 Jan 19:19 dist-types
-rw-r--r--     1 myuser  staff     116  9 Jan 19:17 lerna.json
drwxr-xr-x  1698 myuser  staff   54336  9 Jan 19:19 node_modules
-rw-r--r--     1 myuser  staff    1339  9 Jan 19:17 package.json
drwxr-xr-x     4 myuser  staff     128  9 Jan 19:17 packages
-rw-r--r--     1 myuser  staff     272  9 Jan 19:17 tsconfig.json
-rw-r--r--     1 myuser  staff  829904  9 Jan 19:19 yarn.lock
</code></pre><p>The main bulk of the application is in the <code>packages</code> directory. This contains two subdirectories.</p><pre><code class="language-bash">» ls -al packages
total 0
drwxr-xr-x   4 myuser  staff  128  9 Jan 19:17 .
drwxr-xr-x  19 myuser  staff  608  9 Jan 22:23 ..
drwxr-xr-x  10 myuser  staff  320  9 Jan 19:40 app
drwxr-xr-x   9 myuser  staff  288  9 Jan 19:50 backend
</code></pre><p>The <code>app</code> subdirectory contains the frontend UI of Backstage and the <code>backend</code>, as you might expect, contains the API layer and parts that connect to the database.</p><h2>Step 2 - Building a Docker image</h2><p>Backstage comes with a built in command to help you build a Docker image which you can run with Docker Compose.</p><p>For simple deployments, the Backstage <code>backend</code> has the ability to serve the frontend <code>app</code> to the browser, so you only have to build one Docker image.</p><pre><code class="language-bash">» yarn workspace backend build-image
yarn workspace v1.22.10
yarn run v1.22.10
$ backstage-cli backend:build-image --build --tag backstage
# Lots of output omitted...
=> => naming to docker.io/library/backstage                                                                                                                                                                                                                                                                                                                                           0.0s
✨  Done in 114.02s.
</code></pre><p>Check the image has been built successfully.</p><pre><code class="language-bash">» docker images                                                                                                                                                                                                                                                                                                                                                                                                                                       1 ↵
REPOSITORY         TAG       IMAGE ID       CREATED         SIZE
backstage          latest    7b452013e713   3 minutes ago   1.1GB
</code></pre><p>Now that we have a Docker image, let's try to run it.</p><pre><code class="language-yaml">» docker run backstage
2021-01-09T19:51:13.883Z backstage info Loaded config from app-config.yaml, app-config.production.yaml
2021-01-09T19:51:13.887Z backstage info Created UrlReader predicateMux{readers=azure{host=dev.azure.com,authed=false},bitbucket{host=bitbucket.org,authed=false},github{host=github.com,authed=false},gitlab{host=gitlab.com,authed=false},fallback=fetch{}}
Backend failed to start up, Error: connect ECONNREFUSED 127.0.0.1:5432
</code></pre><p>This fails because the Backstage backend cannot connect to port <code>5432</code>. Backstage needs to connect to the database in order to store catalog items and other data. It expects to find PostgreSQL running on port <code>5432</code>. When it can't, it fails and bails out.</p><p>To fix this, let's use Docker Compose to make PostgreSQL available to our Backstage backend.</p><h2>Step 2 - Adding PostgreSQL</h2><p>Below is a simple <code>docker-compose.yaml</code> file which runs the Backstage image we just created and a default PostgreSQL database. Create this file inside your Backstage application and save it.</p><pre><code class="language-yaml">version: '3'
services:
  backstage:
    image: backstage
    environment:
      # This value must match the name of the postgres configuration block.
      POSTGRES_HOST: db
      POSTGRES_USER: postgres
    ports:
      - '7000:7000'

  db:
    image: postgres
    restart: always
    environment:
	# NOT RECOMMENDED for a production environment. Trusts all incomming
      # connections.
      POSTGRES_HOST_AUTH_METHOD: trust
</code></pre><p>Once you've done that, you can use Docker Compose to start both of these Docker images.</p><pre><code class="language-bash">» docker-compose up
Creating network "blog-post-test_default" with the default driver
Creating blog-post-test_db_1        ... done
Creating blog-post-test_backstage_1 ... done
Attaching to blog-post-test_backstage_1, blog-post-test_db_1
# Lots of output omitted...
backstage_1  | Backend failed to start up, Error: Failed to initialize github scaffolding provider, Missing required config value at 'scaffolder.github.token'
blog-post-test_backstage_1 exited with code 1
</code></pre><p>It still fails, but we've made progress. Backstage has successfully connected to the database and then failed because of a missing GitHub token.</p><h2>Step 3 - Configuring GitHub</h2><p>Backstage needs a GitHub token in order to authenticate with the GitHub API for tasks like templating new applications and reading the <code>catalog-info.yaml</code> files it uses to store metadata.</p><p>Head over to the GitHub docs to learn how to <a href="https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token">create a Personal Access Token</a>. If you don't want to use GitHub, you can use a nonsense value like <code>abc</code> in place of the GitHub token value.</p><p>Once you have your token, pass it into Backstage via the environment variables.</p><pre><code class="language-yaml">version: '3'
services:
  backstage:
    image: backstage
    environment:
      POSTGRES_HOST: db
      POSTGRES_USER: postgres
      # Add your token here
      GITHUB_TOKEN: &#x3C;your-github-token>
    ports:
      - '7000:7000'

  db:
    image: postgres
    restart: always
    environment:
      POSTGRES_HOST_AUTH_METHOD: trust
</code></pre><p>Once that's done, let's give it one more go.</p><pre><code class="language-bash">» docker-compose up
Creating network "blog-post-test_default" with the default driver
Creating blog-post-test_db_1        ... done
Creating blog-post-test_backstage_1 ... done
Attaching to blog-post-test_backstage_1, blog-post-test_db_1
# Lots of output omitted...
backstage_1  | 2021-01-09T22:42:27.061Z backstage info Initializing http server
backstage_1  | 2021-01-09T22:42:27.065Z backstage info Listening on :7000
</code></pre><p>Hurray! 🎉 Now, if you visit <code>localhost:7000</code>, you should see Backstage.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/4YqK0Pp1qu4TeS9AnhJWyj/fe87757ae1738c2a9d3bf3290ca5612c/backstage-running.png" alt="Backstage running in the browser"></p><h2>Step 4 - Making a change</h2><p>Our Backstage instance isn't quite as perfect as it could be. You'll notice the header says "My Company Service Catalog". Let's change that to include the name of our company, Roadie.</p><p>This is a simple change to make. Fire up your text editor and open the <code>app-config.yaml</code> file.</p><p>In there, you'll see the following two lines</p><pre><code class="language-yaml">organization:
  name: My Company
</code></pre><p>Simply change "My Company" to something like "Roadie", rebuild the docker image, run <code>docker-compose up</code> and refresh your browser window to see the change.</p><h2>Conclusion</h2><p>In this tutorial you learned how to get Backstage running locally and change it's configuration. As a next step, you may wish to try <a href="https://roadie.io/blog/backstage-docker-compose/">adding the Lighthouse plugin</a> to the deployment.</p>
]]></content:encoded></item><item><title><![CDATA[How to use the Backstage Lighthouse plugin]]></title><link>https://roadie.io/blog/backstage-lighthouse-plugin/</link><guid isPermaLink="false">https://roadie.io/blog/backstage-lighthouse-plugin/</guid><pubDate>Sat, 23 May 2020 21:00:00 GMT</pubDate><description><![CDATA[The first plugin shipped by the Backstage team is a Lighthouse plugin. It allows you to track your website speed over time in Backstage.]]></description><content:encoded><![CDATA[<p>Lighthouse is an open-source, automated tool for improving the quality of web pages. You give it the URL of a web page, it loads the page and runs tests to check the page's quality.</p><p>You can use it via <a href="https://developers.google.com/speed/pagespeed/insights/">the PageSpeed Insights website</a>. Simply enter a URL in the box, hit Analyze, and a few seconds later you will have a quality score for the website behind the URL, information about how long the page took to load and some suggestions about what to do better.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/7bvugyMf7ZlYkq2PLvWjo1/a2b5d8a158ee9b66a7b78f766d3fbe40/pagespeed-insights.png" alt="A site being tested in PageSpeed insights"></p><p>You can also use Lighthouse via the Chrome DevTools, the command line and as a NodeJS module.</p><pre><code class="language-shell">» npm install -g lighthouse
» lighthouse https://www.davidtuite.com/
</code></pre><h2>Lighthouse with Backstage</h2><p>In your company, there may be many teams making websites for different purposes. It is useful to track the quality of these websites over time to ensure that code changes are not hurting performance or accessibility.</p><p>If customers complain that your website is slow, it can be helpful to look back over Lighthouse results to figure out if and when the performance drop occurred. You can then look at commits around this time to pinpoint the cause of the slowness.</p><p>Backstage has a Lighthouse plugin available which makes it easy to run tests against the websites your company produces.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/10EYZC18ZQbiZ3yg4KFzIR/f5e9b5f664ecc33eb07f4104be49ffba/audit-view.png" alt="The audit view of a site in the Lighthouse plugin"></p><p>You can track the results of Lighthouse tests over time to see if your site is performing better or worse as you make changes.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/61wwIgoQJClqG7FTStDKYs/758f8b19a444f676c5c432ca42e1f797/audit-list.png" alt="Sites trending over time in the plugin"></p><h2>Running Lighthouse with Backstage</h2><p>To use Lighthouse with Backstage, you need three things:</p><ol><li>A Backstage instance you can run locally with <a href="https://github.com/backstage/community-plugins/tree/main/workspaces/lighthouse/plugins/lighthouse">the Lighthouse Plugin</a> installed.</li><li>A running <a href="https://github.com/spotify/lighthouse-audit-service">Lighthouse microservice</a> which actually executes the Lighthouse tests before sending the results back to the plugin.</li><li>A PostgreSQL database for the Lighthouse microservice to talk to.</li></ol><p>Let's set them up in reverse order.</p><h3>PostgreSQL</h3><p>Assuming you already have posgresql installed and running, you can easily create a database for Lighthouse with the following command. The database this creates will have no password but it's fine for local development.</p><pre><code class="language-shell">» createdb -O [username] -U [username] -w lighthouse_audit_service
</code></pre><p>You can verify this database by logging into it with <code>psql</code>.</p><pre><code class="language-shell">» psql -h localhost -p 5432 -U [username] -d lighthouse_audit_service
psql (11.5)
Type "help" for help.

lighthouse_audit_service=#
</code></pre><p>Fantastic.</p><h3>Lighthouse microservice</h3><p>We're going to run the Lighthouse microservice with docker. We'll have to pass a few environment variables to our Docker run command. The easiest way to do this is by putting them in a file.</p><p>Create a file called <code>development.env</code> with the following variables.</p><pre><code class="language-bash">LAS_PORT=3003
LAS_CORS=true
PGUSER=[username]
PGDATABASE=lighthouse_audit_service
PGHOST=host.docker.internal
</code></pre><ul><li><code>LAS_PORT</code> tells the Lighthouse microservice which port to expose to incoming HTTP requests. It's important that this port matches the one defined in the Backstage Lighthouse plugin. Otherwise the plugin will never receive a response to its Lighthouse testing requests.</li><li><code>PGHOST</code> is important because our Lighthouse microservice is running inside Docker but our PG database is exposing a port on localhost. We have to use a <a href="https://docs.docker.com/docker-for-mac/networking/#use-cases-and-workarounds">special Docker DNS name</a> to allow this.</li></ul><p>With that file defined, we should now be able to run the Lighthouse microservice like this:</p><pre><code class="language-shell">» docker run -p 3003:3003 -p 5432:5432 --env-file development.env spotify/lighthouse-audit-service:latest                                              130 ↵
yarn run v1.22.0
$ node ./cjs/run.js
info: building express app... {"service":"lighthouse-audit-service","timestamp":"2020-05-23T19:03:00.202Z"}
info: running db migrations... {"service":"lighthouse-audit-service","timestamp":"2020-05-23T19:03:00.290Z"}
info: listening on port 3003 {"service":"lighthouse-audit-service","timestamp":"2020-05-23T19:03:00.320Z"}
</code></pre><p>It might take a few seconds to start up when you run it for the first time because it will have to download the Docker container from the internet. When it starts, it automatically runs some database migrations to prepare your database.</p><h3>Backstage with the Lighthouse Plugin</h3><p>Luckily for us, Backstage comes with the Lighthouse Plugin installed and enabled so it's easy to try it out.</p><p>Follow the <a href="https://github.com/backstage/backstage/blob/master/docs/getting-started/index.md">Getting Started Guide</a> to get Backstage installed.</p><p>If you open <code>packages/app/src/plugins.ts</code> in your favorite code editor you should see that the Lighthouse plugin is already installed.</p><pre><code class="language-typescript">export { plugin as LighthousePlugin } from '@backstage/plugin-lighthouse';
</code></pre><p>In <code>packages/app/src/apis.ts</code> you should see that the Lighthouse plugin is configured to send requests to port <code>3003</code>.</p><pre><code class="language-typescript">builder.add(lighthouseApiRef, new LighthouseRestApi('http://localhost:3003'));
</code></pre><p>Now, run Backstage with <code>yarn start</code> and visit <code>http://localhost:3000/lighthouse</code> and you should see the Backstage Lighthouse interface.</p><p><img src="//images.ctfassets.net/hcqpbvoqhwhm/24mM1NjvqAOVvRTE9kJDdD/69ddd18ae60e33e0fd2827837ca4c28b/lighthouse-running-in-backstage.png" alt="A successful audit in Backstage"></p><p>Awesome! If you run an audit a few times on the same website you can see the trend over time. Perfect for ensuring that your websites are staying responsive and accessible.</p>
]]></content:encoded></item></channel></rss>