# Catalog data sources

> Configure catalog data sources to fetch integration data into Roadie’s catalog datastore on a schedule.

*Published: 2026-04-09*


## Overview

**Data sources** are how you pull structured data from the outside world into Roadie’s **catalog datastore**. 

Each data source is part of an [integration](/docs/catalog/building-your-catalog/integrations/). 

Data source runs can be scheduled so the underlying data stored in Roadie stays fresh before [workflows](/docs/catalog/building-your-catalog/workflows/) read that data and turn it into catalog entities.

Open **Data sources** in the catalog administration experience to:

- Create and edit sources that map integration responses into datastore objects
- See object counts, last run times, and status at a glance
- Drill into a source to test extraction and adjust configuration

Data sources are usually the **upstream** step in a generation pipeline: sync objects to a data source, then consume them with **Data source** nodes in Entity Workflows.

![datasource-list](./datasource-list.webp)

## Creating a Data Source

<div role="alert">
  <div class="bg-teal-500 text-white font-bold px-4 py-2">Note</div>
  <div class="bg-teal-100 border border-t-0 border-teal-400 text-teal-700 px-4 py-3">
    <p>All Data Sources require a configured Integration. Unconfigured Integrations appear greyed out in the Integrations list, indicating that config and/or secrets need to be added for those services.</p>
  </div>
</div>

1. Click `+ New` on the Data Source page
2. Select the `Integration` you will be pulling data from for this Data Source.
3. Select an API path to call to retrieve data. Integrations expose several endpoints that are ingested via OpenAPI specs to form this dropdown list.
4. Most API endpoints require some parameters to be passed into them. Add those where appropriate.
5. Once you've made your edits, `Save` and `Dry Run` to see results of your Data Source. If there are issues, you'll then see information about any errors that might be present. If the Data Source executed successfully, you'll see counts for each step for how many objects were returned.
6. If that all looks correct, hit `Run saved version`.
7. On the Data Source screen, mark as `Active` using the toggle.

## Filtering  

If an API call returns more data than require, you can optionally filter it before the response is saved to the Datastore.

## Chained Sources

If the data you want requires multiple calls, you can use a Chained Source. Chained Sources can either Enrich (add additional data to each item returned from the previous Source) or Flatten (replace items from the previous Source with the results of the latest call).

## Scheduling
Data Sources run on a schedule.

For each Data Source you can modify the Schedule for when and how frequently it runs.

## Storing in the Datastore

Objects are indexed on their way into the Datastore. By default this uses the `id` from each object. 

## Advanced options and additional headers

Each Data Source supports adding extra information to requests and tuning how responses are turned into datastore objects:

| Option                                   | Detail                                                                                                                                      |
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| Additional Headers                       | Extra HTTP headers sent with each Data Source request to the integration (for example custom auth or tracing headers).                       |
| Response Parsing - array expression      | Expression that selects the array within the response whose elements should be stored as objects in the datastore.                           |
| Response Parsing - object id expression  | Expression that selects the stable identifier for each object when indexing into the datastore (see [Storing in the Datastore](#storing-in-the-datastore)). |
| Pagination settings                      | Configuration for APIs that split results across pages, so the Data Source can retrieve the full dataset.                                    |

## Further reading

- [Building your Catalog — overview](/docs/catalog/building-your-catalog/)
- [Workflows to create entities](/docs/catalog/building-your-catalog/workflows/)
- [Object graph](/docs/catalog/building-your-catalog/object-graph/)
- [HTTP integration](/docs/integrations/http/)
