Entering the Danger Zone: Multiple Software Catalogs

A version of this article appeared on the New Stack: https://thenewstack.io/the-hidden-costs-of-multiple-service-catalogs-in-development/

Developer tooling often requires service catalogs to scope the data that is created within them, especially when that tooling relates to every piece of software created by your organization. Because of this, you often end up with multiple different service catalogs for services you’re brought.

Commonly, you’ll see a service catalog at the heart of classes of tools like:

Data and observability, like DataDog and New Relic
Incident management, like Incident.io, PagerDuty, and FireHydrant
Developer Experience, like DX

Manually creating a catalog in these applications takes valuable time away from Platform and Software teams.

It’s also not the core of any of these services, so the quality of the catalog ingestion, visualization and manipulation is rarely optimal. Some form of a catalog describing the types of software built by an organization is required to make them work effectively for your organization, but that doesn’t mean that that piece of software is focused on creating a fantastic ingestion mechanism for catalog creation.

You can easily sleepwalk into having multiple service catalogs in multiple places with multiple scope levels. This is inefficient and quickly these catalogs fall out of sync.

It’s a pain.

Why you should use an IDP to build one catalog

To avoid multiple catalogs you need a single source of truth. One catalog to rule them all. There’s a strong case for an Internal Developer Portal (IDP) to be that catalog. IDPs like Backstage, Port, and Cortex are all, at their core, software catalogs. They have some other important features (scorecards, some automation runne,r etc) but the bread and butter is making a service catalog easy to create, configure, and use.

Information from various systems is surfaced to development teams to create a single pane of glass. In building an IDP, organizations inherently create an integration and enrichment point for data about their software, which in turn can be used as part of a wider and more complex data flow.

Think in terms of data flows:

Metadata about software goes in, either auto-ingested or manually added.
Rich objects are created for each piece of software.
Structural information is defined and included as part of the data model for that catalog, allowing a graph of software to be constructed of pieces of software and how each one relates to other pieces.

That catalog is then a rich store of information about the software you have built. It’s just a single step away from being the source of truth for that information to other services that require it.

Enter: Roadie

Backstage comes with a few built-in advantages as an IDP that help it excel at this use-case. As the dominant IDP on the market, it garners a lot of support from the third-party service providers you then need to connect to, as well as providers of catalog information (like AWS, who are particularly active plugin developers):

Plugin ecosystem. Third parties are constantly building new options for supporting this use-case. These plugins support either the visualization of information in the catalog or often, more crucially, the ingestion or extraction of catalog data from Backstage.

Auto-ingestion. Backstage has For example, AWS recently released a plugin that supports auto-ingestion of resources like S3 buckets and RDS instances that make completing your software catalog much easier than using another service.

Ease of editing. Backstage comes with a slew of simple enrichment options, leaning heavily on democratically edited yaml files in a format that

Extraction of data. The Backstage Catalog API and plugin ecosystem make it easy to get data out of Backstage when you’re ready to connect to a third-party system.

How to use the Roadie Catalog as a source of truth

Let’s take a look at some examples of how this can be done with examples from incident management, data visualisation and developer experience:

DataDog
Incident.io
DX

DataDog

Using catalag-info.yaml files

The core of the Backstage software catalog is a series of yaml files stored alongside code in your source code management (SCM) tool of choice (Backstage supports them all). These are often simply referred to as catalog-info.yaml files. They’re basically just service metadata and reference keys to other services. DataDog maintains it’s own ingestion mechanism that uses these catalog-info.yaml files to ingest Catalog information. The integration constantly scans repositories in your SCM for Backstage YAML files named service.datadog.yaml and catalog-info.yaml — which you create when you add your service to the Backstage Software Catalog. The code snippet below shows an example of catalog-info.yaml.

You’ll need to enable the GitHub integration for this example

Using DataDogs API

You can also POST Backstage YAML files to the Datadog API. This allows you to programmatically send Backstage service definitions that may not exist in your GitHub repositories. The Backstage Catalog API can respond with your whole Catalog (or just a subset of it), so syncing the two is possible using this route.

https://www.datadoghq.com/blog/service-catalog-backstage-yaml/

Incident.io

Incident.io maintains a variety of different ways to connect their internal software catalog to sources of truth.

Using catalag-info.yaml files

Incident.io works in a similar way via their catalog-importer . The catalog-importer is a little more involved though, so it’s worth taking a look at. The importer can pull data from a variety of sources, “catering for all the ways people normally store their catalog data” as they so delightfully put it. One option is GitHub. This works in much the same way as the DataDog ingestion mechanism outlined above.

Using the Catalog API

Another option is to read Catalog information directly from Backstage itself, via the Backstage Catalog API. This in essence makes a GET /entities call to your Catalog and retrieves information directly. You can filter that as you see fit to make sure you’re only extracting the subset of information that’s relevant for Incident.io.

DX

DX takes a different approach. They’ve built a full Backstage backend plugin to handle the extraction of data from Backstage.

Using a Backstage backend plugin

The DX Backstage backend plugin sets up jobs within Backstage to sync the DX catalog Those jobs make a call to the DX API in order to send Catalog information. As this can be a lot of data (at Roadie we routinely see Catalogs with 300k entities), you probably want to use the optional params for filtering. You can set these in your app-config.

app-config.yaml

dx:
  catalogSyncAllowedKinds: [API, Component, User, Group]
You may also want to control the schedule of the sync, so as not to spam your Catalog. Again, just a bit of config in app-config.

app-config.yaml
dx:
  schedule:

frequency: minutes: 45

Using the Roadie API

At Roadie we run a managed SaaS version of Backstage for many different customers. We often are asked how to make it as easy as possible to use Backstage as a source of truth for other systems. We spend a lot of time making it as simple as possible to take catalog data out of Backstage and use it meaningfully in other applications and workflows.

To help, we expose several endpoints to allow easy syncing with different systems (either to ingest new information or pull catalog information out):

Catalog endpoints like /entities endpoint allows you to query the Catalog API and programmatically access your software catalog in its entirety. And a /fragment endpoint allows you to sync different third party systems with the Catalog (i.e. ingest Slack handles for your users) and fluidly update your data s you see fit
A set of endpoints for scorecards and software standards
A set of endpoints to expose Scaffolder template information

Why you should use an IDP to build one catalog

Enter: Roadie

How to use the Roadie Catalog as a source of truth

DataDog

Using catalag-info.yaml files

Using DataDogs API

Incident.io

Using catalag-info.yaml files

Using the Catalog API

DX

Using a Backstage backend plugin

Using the Roadie API

Become a Backstage expert