Roadie’s Blog

The Ultimate Guide to Backstage Software Catalog Completeness

By David TuiteSeptember 22nd, 2024
representation of filling a catalog

Internal developer portals (IDPs) like Backstage and Roadie are, at their core, software catalogs. The software catalog is the backbone upon which much of the other functionality hangs, and building a complete catalog is vital to the success of the IDP project.

Without a complete catalog, an IDP cannot fulfill it’s primary purpose: improving developer productivity. It cannot drive improved discoverability if the software that teams are trying to discover is not in there. It cannot help to measure the standardization or security posture of software it does not know about.

Closeup of Roadie catalog data

This article is a comprehensive guide to catalog completeness in Backstage. Roadie is built on top of Backstage, and so all of the same lessons apply. We cover why it’s important, how to measure it, and how to achieve it.

First, let’s learn why you want to build a complete catalog in the first place.

Why building a complete catalog is important

The primary reason companies deploy an IDP is because they want to make developers more efficient. A big part of this is helping teams answer basic questions about the software around them. They want to make it easy to answer questions like “do we have a geocoding service?”, “which team owns the checkout service?” and “where is the API spec for the users service?”. These questions can only be answered by the IDP if the geocoding service, checkout service, and users service are registered there in the first place.

A complete catalog is essential when using an IDP to measure the maturity of the software that teams are producing. Applying scorecards to software in the catalog of the IDP can be a great way to determine the security posture of each production service. But scorecards can only apply to software which is actually in the catalog.

The software catalog is also the fundamental unit of navigation in the IDP. The lack of a software catalog is the reason that wikis like Confluence or Notion cannot solve the discoverability problem that IDPs solve. In a wiki, content is organized into pages, which are intentionally unstructured and flexible. In an IDP, content is organized by Component (think “piece of software”) and is structured so that the same information is available for each Component.

How to measure catalog completeness

Having a complete catalog doesn’t necessarily mean that every piece of software is in the catalog. Most organizations have their share of abandoned code. It may have been created during hackathons or for brief experiments that never saw production usage. Having all of this stuff in the catalog can create clutter and noise.

There may also be parts of the organization for whom it doesn’t make sense to be in the catalog. We work with companies which have large hardware divisions who write code for embedded devices. They don’t necessarily follow the DevOps lifecycle and thus are sometimes intentionally omitted.

Production software is the most important stuff to have in the catalog. It’s the software that most engineers work with most of the time. Production software will have the most frequently referenced APIs and docs and it’s much more important to have an understanding of the maturity of the software that runs in production environments, since it has the most attack vectors.

For this reason, we frequently see customers create a list of software which is deployed to production by referencing ArgoCD or some other deployment tool. They then compare this against software in the catalog, frequently by accessing the Roadie APIs or by using our CSV export functions.

The next best bet is to compare number of pieces of software (aka. Components) in the catalog against the number of active, unarchived repositories in your source code solution. This will never give you a perfect answer, because “pieces of software” don’t necessarily map perfectly one-to-one to repositories (monorepos etc), but it’s a good start.

How Roadie Helps

At Roadie we give every customer a chart which shows the percentage of their active (non-archived and received a commit in the past 12 months) repositories against the number of Components in the catalog.

Catalog completeness formula

Roadie customers can achieve high catalog completeness

Achieving a high level of catalog completeness is definitely possible.

The majority of established Roadie customers are happy with their level of catalog completeness and we have many customers who have a catalog completeness level (measured as a percentage of active repositories which are registered in the catalog) which is above 80%.

Here’s a chart showing catalog completeness for Uplight who onboarded in September 2023. Four months later they were at 88% catalog completeness, with more than 600 components in their catalog.

uplight catalog completeness

Here’s another Roadie customer who increased their catalog completeness from 40% to 90% over a period of four months. They now have more than 750 components registered.

customer catalog completeness

There are countless more examples like this amongst Roadie customers. The goal of this post is to help all companies achieve the same results.

Strategies for building a complete catalog

There are multiple strategies that can be used to build a complete catalog. The strategy that is right for your company can depend on factors like the size of the company and the enthusiasm of the developers to have an IDP.

You may also want to run multiple strategies in parallel, or start with one in order to get the bulk of software into the catalog, and switch to another to get closer to full catalog completeness.

Think about expanding catalog completeness like the layers of an onion. Start with the most enthusiastic early adopters and get them onboard. Then use them as an example as you expand out to the rest of the organization.

We recommend you import users (aka. employees) and teams (called Groups in Backstage nomenclature) into the catalog before creating any software components. You will ideally want to assign ownership of each component to a team as you import it. This task is much easier if all of your teams already exist in the catalog.

If you don’t know which strategy to choose…

Experiment! You don’t need to roll out to the whole organization in one go. Pick some friendly teams, run one strategy on each team, and analyze the results. Did you end up with software in the catalog? What is the feedback from each team? Where did they get stuck? Take this feedback and use it to improve the process before expanding out to more teams.

Strategy 1: Centralized automated

This strategy involves connecting to a chokepoint in the organization in order to push software metadata into the catalog programatically. It does not require anyone to write catalog-info.yaml files.

Frequently, the initial chokepoint will not have all of the information required to build a useful catalog. When this happens, the software metadata must be enriched after the fact.

Depending on your tech stack and permission settings, following options may be viable:

  1. An ArgoCD instance which does deployments to the production environment. It has knowledge of the most critical software in the org (software that goes to production) and can thus can be a good starting point for the catalog.
  2. A Helm chart which is used by a large percentage of the deployable software in the organization.
  3. A centrally owned CI job or build tool which can be written by the centralized team and applied to software which is owned by other teams in the organization. For example, Lunar Bank have talked about how they populate their catalog from their build tool called shuttle.
  4. A legacy software catalog, developer portal or spreadsheet.

Pros

  1. The software catalog can quite quickly be bootstrapped to a highly complete state.

Cons

  1. The ownership link between software and teams is not immediately established.
  2. Companies with no source of truth or a heavily fractured deployment ecosystem may not be able to implement this strategy.
  3. The product teams will be less well educated on the value of the IDP when the process is finished.

Tactics to prioritize in order to succeed with this strategy

  1. Use custom entity providers
  2. Make catalog presence a requirement for deployment

Strategy 2: Centralized manual

This strategy tries to avoid asking the individual product teams to do work. Instead, the centralized team takes it upon themselves to populate the catalog. They may do this in collaboration with the product teams, but they likely won’t ask them to write any YAML.

Pros

  1. This strategy is likely to be faster than the distributed manual strategy, especially for companies which don’t have thousands of microservices.

Cons

  1. The product teams will be less well educated on the value of the IDP when the process is finished.
  2. It may be difficult for the centralized team to gather enough data about each individual software component.
  3. It becomes the centralized teams job to manually keep the catalog up to date.

Tactics to prioritize in order to succeed with this strategy

  1. Store software metadata in a single repository
  2. Open scripted pull requests
  3. Customize the catalog nomenclature

Strategy 3: Distributed manual

This strategy involves asking all of the individual product teams to register the software that they own in the catalog.

Typically, the central team who own the IDP will meet with and educate the product teams on the value of the IDP, either on an individual basis, or in larger groups. The central team will provide tools and materials to the product teams in order to teach them what they need to do to get their software into the catalog (typically create a catalog-info.yaml file), and give them clear steps to take in order to achieve catalog completeness.

Pros

  1. Product teams are more likely to feel a sense of ownership over their software in the catalog because they put it there in the first place.
  2. Product teams have an opportunity to learn about features of the IDP as they are registering their software. They may then choose to use features in the IDP such as technical documentation.

Cons

  1. This strategy requires a lot of work from the central team in order to yield high catalog completeness. It will take time. They will need to educate and continually follow up with product teams throughout the company.

Tactics to prioritize in order to succeed with this strategy

  1. Give teams a scaffolder template to register software
  2. Share catalog completeness numbers publicly
  3. Tie catalog completeness to a wider initiative
  4. Write custom plugins to create value

Tactics for building a complete catalog

The tactics you need to use will depend on the strategy you are implementing. This is a full list of all the tactics we know. Please refer to the strategies above in order to know which tactics to choose.

Some tactics will apply regardless of the strategy that is chosen. They are:

  1. Customize the catalog taxonomy
  2. Move onboarding docs into Roadie
  3. Present on the value of Roadie to people managers
  4. Tie catalog completeness to a wider initiative

Each of the tactics mostly fall into one of the following categories:

  1. Reduce friction for developers who want to get their software into the catalog.
  2. Create incentive for developers to put their software into the catalog.
  3. Educate developers on how to use the catalog.

Give teams a scaffolder template to register their software

Category: Reduce friction

This involves writing a scaffolder template that teams can use inside Backstage in order to create a catalog-info.yaml file in their repositories.

The scaffolder template will ask the user to fill out some information about their software, typically by picking from pre-defined values, and will open a pull request against a repository when finished. Once the user reviews and merges the pull request, auto-discovery will pick up the catalog-info.yaml file and populate the catalog.

scaffolder template for completing catalog

Benefits

  1. Individual developers don’t need to understand the (long) Backstage YAML API spec in detail.
  2. The owners of the IDP can use the template to constrain the “type”, “lifecycle” and other properties that is assigned to each software component. This puts guardrails in place that will help create more consistency in the catalog. Catalog consistency is important, and will help you avoid problems in future.
  3. The form can integrate with external APIs to pull in sensible options. For example, in the screenshot above, the “Component Owner” is a selection of all the engineering teams in the company. The user doesn’t need to type an exact string.

How Roadie helps

Roadie provides a starting point for a software registration template in the getting-started repo. Customers can fork it into their own GitHub org, edit it to meet their needs and import it into their own Backstage instance.

Archive unused repositories

Category: Reduce friction

If our measure of catalog completeness is:

Catalog completeness formula

then we can increase catalog completeness by archiving old repositories that nobody is using.

It may sound somewhat silly, but every organization has abandoned hackathon projects and old test repos that are unused and simply causing clutter. By archiving them, we clean up our source code management tool while improving catalog completeness.

Believe it or not, Roadie has one customer who increased catalog completeness from 45% to 75% just by archiving repositories.

How Roadie helps

Our catalog implementation ensures that Components are removed from the catalog when the repo they reference is archived.

Customize the catalog nomenclature

Category: Reduce friction

By mapping the Kinds of entity that show up in the catalog to familiar concepts in your company, you can create instant recognition for developers who land in the catalog.

For example, if your company has a concept of “Valuestream” then make this front and center in the catalog so that users instantly understand what they’re looking at.

Benefits

  1. Users can orient themselves quickly and get value rapidly.

How Roadie helps

Roadie customers can use our admin interfaces to customize and rename the core catalog concepts, and create completely new ones.

customize the catalog terminology

Write custom plugins to create value

Category: Create incentive

Custom plugins provide value for product teams by giving them faster ways to perform bespoke workflows inside the catalog. By creating custom plugins, a central team can incentivize developers to add their software into the component.

For example, Lunar Bank have custom plugins for dealing with dead letters in RabbitMQ. These plugins are regularly useful for developers. This causes them to visit the catalog to use the plugin.

Benefits

  1. Custom plugins are quite simple to produce and can quickly create value for software engineers.
  2. Custom plugins unlock tailored value for engineers to help them do things more quickly or more easily than they otherwise could. They sometimes even allow them to perform a task that they cannot do at all outside of Roadie or Backstage.

How Roadie Helps

Roadie provides an interface for registering and managing custom plugins. It also facilitates live reloading of custom plugins, the ability to run multiple versions of a plugin side by side, and a scaffolder template to bootstrap a custom plugin monorepo. Custom plugins on Roadie can securely connect back to a private network to load data from internal APIs. Check out our docs to learn more.

Move onboarding docs into Roadie

Category: Educate

Engineer onboarding docs are typically used to help a new engineer to set up their environment and get to productivity quickly. Expedia Group, Spotify and other Backstage adopters have successfully used TechDocs and scaffolder templates to speed up engineer onboarding and help new engineers become familiar with the IDP on day one. The exact same tactic can be deployed on Roadie.

Expedia Group put 850+ engineers through their Backstage based bootcamp in 2022. They discuss it in their case study on the Backstage website.

Benefits

  1. Newly onboarded engineers start using Roadie on day one. They get used to it and understand how to come back.
  2. Engineers learn how to use a scaffolder template to create a new service during onboarding. This service is added to the catalog, and they learn how the catalog works.

How Roadie Helps

Roadie supports standalone TechDocs that are not tied to a particular software component in the catalog. These are perfect for onboarding docs.

Present on the value of Roadie to people managers

Category: Educate

Roadie provides specific value to managers, directors and VPs that is different than the value that developers might care about. By educating managers on the value they will receive, you can encourage them to work with their teams to get their software into the catalog.

For example, did you know that frequent Backstage users at Spotify are 5% more likely to be at the company one year later. Retention is important for managers, so they need to know about this.

How Roadie Helps

  1. Roadie provides Scorecards which can help managers ensure that their teams are producing mature and high quality software. This feature is not available for open-source Backstage.
  2. Roadie gives customers value calculators to help them estimate the dollar value they can expect to receive.

Open scripted pull requests

Category: Reduce friction

Opening an automated pull request containing a catalog-info.yaml file is a good way to ease the burden on developers who want to get their software into the catalog. All they need to do is edit the pull request a little bit, review it and merge it.

Keep in mind that your script will need permissions to open a pull request against a majority of repositories in your source code management tool. Depending on your security model, this may not be possible.

This method can work especially well in companies that operate out of large monorepos. A monorepo setup allows the generation of a single pull request that can add many catalog-info.yaml files in one go. It can also be reviewed and merged by a single person with elevated permissions.

While this is a tempting option to quickly build a catalog with YAML files, we have seen customers experience issues with catalog correctness when they use this method. Some teams may blindly merge the pull request without validating the information that it contains. This tactic should be executed alongside an education program to teach teams what to do. Go slowly and experiment.

How Roadie Helps

Roadie provides a token authenticated API which the centralized team can use to tell which repositories are already in the catalog. A simple script can consume this to open a PR against the repos which are not already accounted for.

Our solutions engineering team can work with you to write a simple script that will open a pull request containing a YAML file into each repository.

Make catalog presence a requirement for deployment

Category: Create incentive

By making catalog presence mandatory for deployment, platform teams can be confident that they have all of the important software in the catalog.

In the early days of Backstage at Spotify, teams could not SSH into their machines unless their services were in the catalog. The catalog owner was referenced to determine who was and was not allowed to access the machine.

This tactic works best when the IDP is orchestrating a new greenfield platform that other teams are migrating onto. Adding the catalog-info.yaml file can be one simple step in what is likely a series of steps that teams have to do to migrate. Outside of this situation, it can be politically problematic to block deployments due to a missing YAML file.

Automate catalog collection with custom entity providers

Category: Reduce friction

If developers find writing YAML files tedious, potentially the best thing to do is to make them optional. Backstage’s custom entity providers allow adopters to programmatically shovel software entries into the catalog. Custom entity providers are a great way to connect Backstage to an existing source of truth for catalog data, such as a legacy IDP, an ArgoCD instance, a kubernetes cluster, or a CICD tool.

How Roadie helps

Roadie gives customers the roadie-agent. This wraps the custom entity provider concept with a secure connection to Roadie so that customers can easily dump software metadata into the agent and have it appear in the catalog.

Roadie has an Entity Provider API. Simply push an array of software metadata to this endpoint and it will appear in the catalog. To update the metadata, simply push again.

Frequently the programatic source of truth will have some but not all of the metadata that the catalog needs. For example, it may be missing the name of the team who owns the software. Roadie allows users to decorate software in the catalog with extra metadata within the UI. This ensures that the catalog can become complete and rich over time.

Share catalog completeness numbers publicly

Category: Create incentive

One great way to get people bought into the idea of building a complete catalog is to make it a group effort. By transparently reporting on the completeness and “health” of the catalog, a shared sense of ownership over the goal can be created.

Twilio explained how they do this in their catalog at the Autodesk Developer Productivity Summit.

How Roadie Helps

Roadie gives all Tech Insights customers an out-of-the-box measurement of catalog completeness.

tech insights catalog scorecard

We also report on important aspects of catalog richness, like the percentage of pieces of software in the catalog which have an owner assigned. Customers can use these building blocks to measure other attributes of the health of their catalog.

Tie catalog completeness to a wider initiative

Category: Create incentive

Companies usually want a complete catalog so that they can accomplish some wider engineering goal. By promoting catalog completeness in service of this wider goal, teams can better understand why it is important to be in the catalog, and why they should help.

For example, we recently worked with a customer who needed a complete and correct list of all software that had access to their users personally identifiable information (PII). By attaching to this company wide goal and leveraging the influence of the company CTO, the company was able to rapidly label hundreds of software components in Roadie with their PII status.

How Roadie Helps

We run regular customer success meetings with customers to help them identify wider engineering initiatives where Roadie can help accomplish the goal more quickly or more easily. We will then work with teams in order to project manage the delivery of a solution.

Do a whiteboarding session

I did some eduction with a team at [company] where we brainstormed the services they wanted to cover and their relationships, this was a whiteboard exercise and I then created the PR’s and had them review.

When Paddle started with Backstage, they realized that they didn’t have an existing source of truth they could lean on to populate their catalog, and they would have to collate it manually.

They started with a whiteboarding exercise so they could iterate quickly. Meeting with each team in small groups, Ioannis Georgoulas (Director of SRE), led the process. He first spent time brainstorming the services that the groups wanted to catalog, and defining their boundaries and the relationships between them. This information was all collected in a simple document to start.

Once he had a solid understanding of the service map, Ioannis opened pull requests against each repository with the catalog-info.yaml file that was needed. All the teams had to do was review and merge it. Because they had participated in the process to gather this information, they were already bought in and could understand the value of it.

How Roadie Helps

We provide frequent customer success calls with every customer through the initial stages of implementation. We’ll partner with your implementors to run these whiteboarding sessions and gather the information you need to be successful.

Key takeaways/Conclusion

Building a high level of catalog completeness in Backstage need not be intimidating. By carefully choosing the strategy that will work at your company, expanding outwards from the most eager adopters first, and communicating widely as you go, you can reach a high level of catalog completeness in a short amount of time.

Two topics we have not discussed yet, are catalog richness and completeness. These areas go hand in hand with completeness, and work together to ensure that your IDP has the answers that developers need, when they need them.

We’ll be covering richness and completeness in another article. If you want to be among the first to read it, make sure to subscribe to the newsletter below.

Become a Backstage expert

To get the latest news, deep dives into Backstage features, and a roundup of recent open-source action, sign up for Roadie's Backstage Weekly. See recent editions.

We will never sell or share your email address.