The Three Big Problems Every Platform Engineering Team Must Solve

I learned about platform engineering the hard way. At Workday, I was an infrastructure product manager building what was essentially a private AWS inside the company. We had virtualization platforms, logging systems, monitoring systems, the works.

Everything worked fine when we had 10 or 20 services. Then we hit 50. That's when people started asking questions we couldn't answer: What is all this stuff? Who owns which service? If something breaks at 2 AM, who do we call? How do we know everything is secure?

We tried a spreadsheet. It lasted about three months before it became hopelessly out of date. So we built a UI from scratch to list services, show where they were running, and let people create new services without going through us for everything.

That was 10 years ago. Today, this problem has a name, internal developer portal, and there's an entire market built around solving it. But after talking to hundreds of platform teams, I've found they all struggle with the same three fundamental problems.

Problem 1: The Discoverability Crisis

The discoverability problem shows up when your organization crosses a threshold. At 10 engineers, everyone knows what everyone else is working on. At 100+ engineers, it's chaos.

Here's what this looks like: Your security team builds a virus scanning service. Your mobile developers need to scan files for viruses, but they don't know this service exists. In an enterprise, they can't just sign up for some random SaaS tool because of compliance requirements. So they open a Slack thread, ask around, wait for responses, schedule a meeting, and maybe get an answer in a week.

Meanwhile, your platform team is getting bombarded: "Do we have an API for X?" "Who owns service Y?" "Which services depend on this database?"

The discoverability problem has three common variations:

Documentation sprawl: Teams scatter docs across Confluence, GitHub wikis, Google Docs, and Notion. No one can find anything. You need to centralize documentation in a software catalog where people can actually discover it.

Dependency mapping: You need to understand which services depend on each other. When you're planning to upgrade a database, you need to know what breaks. Without a dependency graph, you're operating blind.

API discovery: Different teams build APIs, but there's no central place to see what exists, what they do, or how to use them. You need a searchable catalog of API specs.

This problem scales exponentially. We don't work with companies under 50 engineers because they don't face this yet. Most of our customers have 100+ engineers. At that scale, you can't just shout across the office anymore.

Problem 2: The Self-Service Bottleneck

Your platform team becomes a bottleneck. Developers want to get things done without opening a JIRA ticket and waiting days for someone to configure their network rules or provision an S3 bucket.

At most organizations, you're waiting days for basic requests. This creates two problems:

First, developers get frustrated and stop being productive. They're blocked on simple tasks that should take five minutes.

Second, they bypass your platform entirely. This is shadow IT. A mobile developer who doesn't know Terraform just goes into the AWS console and clicks "Create S3 Bucket" because they can't wait. Now you have untracked infrastructure that's not in your Terraform state, doesn't follow your naming conventions, and might not meet your security requirements.

The self-service problem shows up in two main ways:

Project creation: A developer wants to start a new service on your platform. They need a repo, CI/CD pipeline, monitoring, logging, and all your platform integrations. If they configure this manually, they'll get it wrong. You want to give them a template that sets everything up correctly in five minutes, a "golden path".

Infrastructure requests: A developer needs an S3 bucket, a database, or network rules configured. They don't know Terraform and shouldn't have to learn it. You want to let them fill out a form that opens a templated pull request against your Terraform repository. Someone reviews it, merges it, and the infrastructure gets created. The request is tracked, the developer doesn't need specialized knowledge, and you maintain control.

Self-service doesn't mean "let developers do whatever they want." It means giving them fast, easy ways to do things the right way. You're creating guardrails, not removing them.

Problem 3: The Governance Gap

You have hundreds of services running on your platform. You need to know they're secure, reliable, and following your best practices. But you have no single place to check.

Here's a concrete example: You use Incident.io for incident management. Every service should be registered in Incident.io with an on-call person assigned. How do you verify this? You could manually check each service, but that's impossible at scale. You could send a Slack message asking people to audit their services, but half won't respond.

The governance problem shows up in several ways:

Security compliance: You need to verify that all services are scanning dependencies for vulnerabilities, using approved authentication methods, and following your security policies. Without automated checks, you're relying on self-reporting.

Reliability standards: You need to know which services have proper monitoring, alerting, and on-call rotations. When an outage happens, you need to immediately know who to call.

Best practices enforcement: Your platform team has defined standards, code review requirements, test coverage thresholds, documentation expectations. You need to see which teams are falling behind so you can work with them to improve.

The governance problem is about visibility rolled up across your org chart. You want to ask: "Show me the director of engineering who has the most services failing our security checks." Then you can work with that director to improve things.

Some teams also want DORA metrics (deployment frequency, lead time, change failure rate, mean time to recovery) visible in one place. This is harder than it sounds. One of the DORA metrics is deployment frequency, the more you deploy, the smaller your changes are, the less likely they cause issues. But what counts as a "deployment" when different teams use Argo CD, Netlify, and five other deployment tools? Some normalization has to happen before you can just look at a DORA metric. Teams aren't necessarily there yet, and they're expecting a bit of magic that isn't that simple.

The same applies to defining what counts as a "service." That's a hard question to answer, and it's one of the most important challenges when trying to get any developer portal working in your organization.

Why This is Hard to Solve

These three problems, discoverability, self-service, and governance, all stem from the same root cause: you have a lot of software, and you need organized metadata about it.

At Workday, we spent a lot of money building a custom solution. When I left in 2020, I talked to other companies and found they'd all built similar things.

The problem is that building this from scratch takes a year and requires a dedicated team to maintain. You're essentially building a product inside your company.

In 2020, Spotify open sourced Backstage , which gave the world a framework for building developer portals. But Backstage isn't a ready-to-use portal, it's a set of TypeScript libraries you use to build your own portal. This creates new problems:

Language barrier: Platform teams typically work in Go, Python, or YAML. They don't know TypeScript, which is a web development language.

Build time: Because you're building from libraries, not deploying a container, it takes six months to a year to get Backstage into production.

Team requirements: We surveyed the Backstage community and found that teams who report being happy with self-hosted Backstage have at least three dedicated engineers. Large deployments have 12+ engineers working on Backstage full-time.

Missing features: Backstage doesn't include basic features like role-based access control out of the box. The search runs on PostgreSQL full-text search, which is okay but not as good as Elasticsearch. Your search won't be great unless you manage an Elasticsearch cluster as well as your Backstage instance.

Getting Backstage takes a year and a team of five people. That's why the internal developer portal market exists.

How Companies Choose Solutions

When people come to us, they're typically in one of three situations:

They already have Backstage: Someone stood it up at some point and people are using it. They're realizing it's a lot of effort and they don't want to staff that team of five people.

They want Backstage specifically: They want a developer portal, like the idea of Backstage, and don't want to be locked into a proprietary data model. They want to customize their solution because they have legacy tools they need to integrate. But they don't want to staff the team around it.

They just want a developer portal: They don't care if it's Backstage or not. In this case, it's more competitive between proprietary solutions and Backstage-based options.

For the first two groups, they're doing a build versus buy evaluation: how much will it cost us to build and maintain this versus how much will it cost to buy a managed solution?

For the third group, it's more of a feature comparison and proof-of-concept scoring across vendors.

What You Actually Need

Solving these three problems requires a few key capabilities:

A software catalog: Your source of truth for what software exists, who owns it, and how it's configured. The catalog needs to integrate with your existing tools, your repos, your CI/CD, your cloud providers, so it stays up to date automatically.

Self-service actions: Your developers need a UI for common tasks that generates the right pull requests, kicks off the right workflows, and follows your standards. This keeps them moving fast without bypassing your platform.

Automated scoring: You need automated checks that run against everything in your catalog and tell you what's not meeting your standards. This gives you the visibility to work with teams on improvements.

Easy onboarding: Getting services into the catalog can't require each team to manually register everything. You need automated ingestion that pulls metadata from your existing systems. This is table stakes, but it's an area where Backstage is weak compared to proprietary competitors.

The challenge is that every organization has a slightly different definition of what counts as a "service" or a "deployment" or a "team." You need a solution that's flexible enough to adapt to your organization while being opinionated enough to actually work.

The Path Forward

These three problems, discoverability, self-service, and governance, get worse as your engineering organization grows. If you're at 100 engineers now, imagine what happens at 200 or 500. The chaos compounds.

You're not the first platform team to face these problems. Every company with more than 50 engineers hits them eventually. Software catalogs, self-service automation, and governance scoring aren't experimental anymore.

The decision you need to make is whether to build or buy. Building gives you complete control but requires significant investment. Buying gets you there faster but means accepting someone else's opinions about how things should work.

Whatever path you choose, the problems won't solve themselves. Your platform team is already overwhelmed with questions, your developers are already frustrated with bottlenecks, and your managers can't answer basic questions about what's running in production.

These problems only get worse with time. The earlier you address them, the easier they are to solve.

Next Steps

If you're ready to address these platform engineering challenges, here are some practical next steps to consider:

Evaluate Backstage for your organization: Learn more about what Backstage is and how it works to understand if it's the right foundation for your developer portal needs.

See a developer portal in action: Request a demo of Roadie to see how a managed Backstage solution can solve your discoverability, self-service, and governance problems without the overhead of building and maintaining it yourself.

Calculate your total cost of ownership: Use our guide on how much Backstage really costs to compare the build versus buy decision for your specific situation.

Learn from teams who've solved these problems: Read our case studies to see how companies like Contentful, Celonis, and others tackled similar platform engineering challenges.

Start with a free trial: If you're ready to experiment, try Roadie free to get hands-on experience with a fully managed developer portal built on Backstage.