Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
The Ultimate Guide to Internal Developer Portals in the Age of Platform Engineering
Founder & CEO
September 15, 2023
Platform engineering and Site Reliability Engineering have become cornerstones of modern software development, championed by companies with distinguished engineering cultures. The internal developer portal is a key tool that’s central to implementing these strategies.
In this article, we'll delve into these trends, discuss the problems they solve, and evaluate the significance of these portals. We'll see how they shape developer experience, improve engineering efficiency and productivity, and ultimately enhance the quality and user experience of the products we’re building.
The rising trends of Platform Engineering & Site Reliability Engineering
Another year, another wave in how we build software. This time, it’s the recent breakthroughs in AI that are revolutionizing most industries, and the software industry is no exception - many developers completely changed the way that they are writing code with ChatGPT and Github Copilot.
Yet, even with these advancements, managing engineering operations persist as a daunting challenge. Today's application development is rapidly advancing but it comes with intricate challenges, due to the increased complexity that comes with it. Even seasoned DevOps experts find managing the engineering operations of modern systems increasingly challenging.
Luckily, leading tech companies have begun harnessing two distinct methodologies to address these challenges: Platform Engineering and Site Reliability Engineering (SRE).
Platform Engineering, now a major trend, suggests the creation of a dedicated team within an organization responsible for managing shared services that other development teams rely on. As per Gartner, this approach is about implementing "reusable tools and self-service capabilities with automated infrastructure operations", aiming to refine the developer experience and their productivity, while standardizing the adoption of operational standards throughout the different teams.
"Few organizations showcase a high level of Agile and DevOps maturity. The mandate for software engineering leaders is dual-fold: nurture advanced teams while also guiding or restarting others on their journey."
"Platform engineering has climbed rapidly to the Hype Cycle's peak. Emerging innovations, like internal developer portals, are making notable entries."
A core driver behind this shift is the realization that traditional developer-DevOps interactions based on “ticket ops” aren't sustainable. Or, as stated by Paul Delory, Research VP at Gartner, “Platform engineering emerged in response to the increasing complexity of modern software architectures. Today, non-expert end users are often asked to operate an assembly of complicated arcane services. To help end users, and reduce friction for the valuable work they do, forward-thinking companies have begun to build operating platforms that sit between the end user and the backing services on which they rely”.
Platform Engineering, while significant, isn't the first methodology trying to tackle software development and operation challenges. Its predecessors, such as Agile, DevOps, and Digital Transformation, have also made their marks.Yet, integrating Site Reliability Engineering (SRE) principles directly within platforms designed by Platform Engineering teams proves most effective. This integration ensures foundational SRE practices are instilled from inception.
To contextualize, George Spafford, VP Analyst at Gartner, describes SRE as: "Site reliability engineering (SRE) is a collection of systems and software engineering principles used to design and operate scalable resilient systems. Site reliability engineers work with the customer or product owner to understand operational requirements and define service-level objectives (SLOs). Site reliability engineers work with product or platform teams to design and continuously improve systems that meet defined SLOs."
Internal Developer Portals: The cornerstone of modern platform engineering with integrated SRE principles
An internal developer platform is the primary product of a platform team.
As defined by the internal developer platform org, “an Internal Developer Platform is built by a platform team to build golden paths and enable developer self-service. It consists of many different techs and tools, glued together in a way that lowers cognitive load on developers without abstracting away context and underlying technologies”
The true measure of a platform's value lies in its adoption rate among engineering teams and its tangible impact on their workflow and the products they are building.
Platform engineering recognizes a vital reality: transitioning from monolithic or on-premises systems can be overwhelming, especially since not every developer is equipped to navigate the "you build it, you run it" philosophy. This is where internal developer platforms come in. They empower developers to access DevOps resources and internal developer tools in an intuitive, self-service fashion, making sure they are accessible through a product-like experience. Moreover, these platforms supply developers with the necessary guidelines and guardrails, to ensure that every developer has the autonomy to ship new services, deploy new changes and troubleshoot incidents in production while following best practices to ensure the organisation remains on the golden path.
As described by Manjunath Bhat, Research VP at Gartner, “Internal developer portals serve as the interface through which developers can discover and access internal developer platform capabilities.”.
This is why the internal developer portal is the backbone of platform engineering.
What is an internal developer portal and what problems does it solve?
Historically, the term "developer portal" might evoke thoughts of external platforms, oriented towards third-party developers consuming an organization's APIs. However, there's a recent evolution that's taking center stage: the Internal Developer Portal (IDP).
In today's tech landscape, as distributed architectures gain traction, we face a rising challenge: information fragmentation. Essential data about applications, services, and infrastructure resources are scattered across varied tools, cloud platforms, and teams. This scattering significantly impacts the organization in measures like development speed, system reliability, product quality, customer experience, security, cloud costs, and overall developer experience.
So, what's the real-world fallout of this fragmentation for the different engineering roles?
Product Engineers & Developers
Velocity & developer experience – scattered vital data means slower work and a poorer developer experience. This leads to increased cognitive load and more manual, repetitive tasks, such as during releases or incident responses.
Onboarding delays – fragmentation of documentation, architecture, ownership, and other essential details hinders the swift integration of new engineers.
Limited autonomy – without proper guardrails for standardizing practices and quality across teams, engineering leaders face challenges in granting engineers independence without compromising speed and excellence.
Reliability & Customer Impact – the absence of automations and guardrails, like release quality checks and service scaffolding with embedded best practices, means changes introduced can hurt the product due to errors or gaps in expertise.
Incident management – in the midst of incidents, crucial information like troubleshooting checklists, service owners, and service details can be difficult to locate, increasing stress levels during one of the most stressful times in the lives of engineers.
Focus issues – with KPIs and goals dispersed across numerous tools, maintaining clarity becomes a challenge.
Cognitive Load – fragmented information across different cloud accounts and developer tools with clunky UIs makes navigation tedious and cumbersome. It’s also unclear which environments or applications rely on certain resources.
Limited reach – if they can't automate tasks across all engineering teams (i.e. automating production readiness checks instead of checking manually in each tool if the team is following all the requirements before each release), ops professionals might struggle to instill best practices across diverse teams.
High toil (aka repetitive work) – their tasks, such as infrastructure provisioning or operational audits, tend to be manual and repeated for each team and through different tools.
Incident management – as the first line of defense, ops professionals face challenges due to a lack of integrated tools for efficient incident response. The absence of a centralized source for service quality, dependencies, and relevant telemetry makes it hard to gauge customer impact, determine incident severity, pinpoint responsible services, locate appropriate runbooks, identify owning teams, and find relevant troubleshooting dashboards.
Cloud cost management – analyzing cloud costs at the service level is challenging due to the difficulty in correlating the essential data for such analysis.
Visibility challenges – diverse and unstandardized dashboards across teams complicate the task of gathering holistic company information. This makes it difficult to collect essential metrics like security, stability, architectural drift, cloud costs, and more.
Challenges in promoting reliability and operational maturity standards across teams – overhead in communication, scattered reports, and manual verifications hinder the ability to enforce and track reliability and operational maturity standards across the organization.
Inability to prioritize initiatives effectively – with each team assessing their service's reliability and quality differently, it's challenging to identify underperforming services with significant business implications. This ambiguity masks potential bottlenecks, risks, and valuable investment opportunities.
Reduced team performance and poor user experience – the combined challenges mentioned before slow down the engineering team and lead to a less satisfying experience for customers. The mix of scattered information and lack of standardization makes it hard for the team to deliver consistently good results.
For businesses facing these challenges, the IDP is crucial. It serves as a central hub for microservices, API products, internal and customer facing applications, documentation, and the larger software environment. With this consolidated view, teams can have better observability over their products and services, get timely insights, and correlate disparate information across different tools and teams – from service ownership to dependencies.
As described by Paul Delory, a Research VP with Gartner, “IDPs provide a curated set of tools, capabilities and processes. They are selected by subject matter experts and packaged for easy consumption by development teams. The goal is a frictionless, self-service developer experience that offers the right capabilities to enable developers and others to produce valuable software with as little overhead as possible. The platform should increase developer productivity, along with reducing the cognitive load. The platform should include everything development teams need and present it in whatever manner fits best with the team’s preferred workflow.”
In summary, IDPs address both developer and DevOps challenges, offering a simplification layer so developers can focus on coding. By tackling information fragmentation and providing a unified platform, IDPs are set to be foundational in modern software development.
What are the core components of an Internal Developer Portal (IDP)?
An internal developer portal (IDP) is not just a single tool but a suite of tools, each vital in presenting a holistic view of an organization's software environment.
What an IDP is varies across engineering organizations - depending on the areas where developers needed abstraction, the code base, the demands from DevOps teams, the developer backgrounds (are they accustomed to cloud-based microservices, or were they working on on-premises monoliths), as well as the engineering culture and processes, and the tools in use.
While the specifics of an IDP can differ based on factors like the engineering stack in use, the level of abstraction required from the developers, the reliability and DevOps requirements from the product, organizational culture and a couple of other aspects, the core functions generally include:
Discover and unify insights through a comprehensive catalog.
Establish new services in a standardized and simplified way using centrally accessible blueprints for the entire organization.
Manage service operations via self-service tools.
At its heart, an IDP is about enhancing the developer experience.It diminishes the cognitive load inherent in routine tasks such as on-call rotations, deployments, and new-hire onboarding. Additionally, by consolidating once isolated datasets, leaders achieve unprecedented clarity on their operations. They can define and assess standards like operational maturity, production readiness, observability maturity, compliance and more – all without disturbing their teams.
An effective IDP incorporates role-based access control, ensuring engineers access data aligned with their responsibilities. It uses automation for alerts and workflows, seamlessly integrating with the platform's API. The software catalog within the IDP stands as the unwavering source of truth, representing the organization's software state. Furthermore, the IDP employs scorecards to consistently define, monitor, and evaluate standards, metrics, and KPIs concerning quality, production readiness, and overall efficiency.
These are the core components a good IDP should have to achieve these goals:
At the heart of an IDP lies the universal catalog. Acting as a single source of truth about the organization's software ecosystem, the catalog should be well-acquainted with applications, services, environments, and infrastructure.
The software catalog is there to provide simple answers to complex DevOps-related questions. Questions that usually require engineers to wander around dozens of different tools and require deep tribal knowledge. A few examples:
What’s deployed and where?
Who owns this service and which API routes does it expose?
How is this service performing in production?
Which user journeys and services depend on my services?
What were the latest deployments?
What observability dashboards, runbooks and documentation do we have available to manage and troubleshoot this service?
Who's on-call at the moment for this service?
Is this service meeting the production readiness criteria?
What are the DORA metrics for this service and this team?
...and many more.
When discussing the software catalog's data model, the specifics can vary among organizations. However, a broadly applicable model often comprises:
Service - Think of this as functional units within a software system, whether that's a microservice, a monolithic application, or any other functional component.
Infrastructure Resource - Essentially, where services reside, such as Kubernetes clusters, cloud functions, VM instances, etc
Environments – This defines the settings in which applications or services run. Examples are development, testing, staging, production and even preview environments, each with its own unique configuration and purpose.
Application - This encompasses customer-facing products, internal utilities, 3rd-party software (both SaaS and self-hosted).
User Journeys - The steps a user takes to achieve a particular objective within an application.
An optimal internal developer portal should also be adaptable, allowing organizations the liberty to tweak this foundational data model to cater to their unique requirements.
One of the major challenges organizations with multiple engineering teams face, aside from fragmentation, is the lack of standardization.
This issue is evident as per Gartner's 2023 Agile and DevOps HypeCycle, which notes that "few organizations showcase a high level of Agile and DevOps maturity. The mandate for software engineering leaders is dual-fold: nurture advanced teams while also guiding or restarting others on their journey."
Yet, driving reliability and operational maturity across teams is no small feat, especially when tracking their alignment and adoption. This is where scorecards in IDPs become invaluable.
These scorecards help align all teams—engineering, security, SRE, platform, and product—around one set of operational standards. These standards cover key practices like production readiness, DORA metrics, operational maturity, and observability coverage.
But IDPs don't just set standards. They make the process engaging by using gamification (e.g. live leaderboards), encouraging teams to improve their service maturity. They also provide actionable advice to help teams enhance their services.
For leaders, IDPs automate the assessment of each team and service against these standards by pulling data from various tools. This removes manual checking and highlights key areas to focus on. Leaders can then identify services that are underperforming and have a significant business impact. With this data, they can adjust their technology, processes, and overall approach to boost reliability.
In essence, IDPs with scorecards give organizations a clear path to improve their software practices and reliability.
Self-Service (i.e. Service Scaffolding)
To simplify tasks for DevOps and developer teams, the Internal Developer Platform (IDP) offers a self-service feature, providing easy access to tools, services, and knowledge. This consolidation enables developers to have focus time for activities that humans are uniquely positioned to do: ideation, developing novel solutions, fostering collaboration, and engaging in meaningful communication with peers, partners, and customers - not forgetting coding, at least for the time being.
Gartner underscores the value of developer self-service, in its “Software Engineering Leader’s Guide to Improving Developer Experience” report, noting that “Developer self-service has an inherent benefit to bringing consistency and repeatability to otherwise disparate processes and error-prone manual handoffs. The goal of self-service is to ensure developers have an experience that makes ‘the right thing to do, the intuitive thing to do.’ For example, the ability to self-serve pre-vetted open-source libraries from a trusted component catalog improves governance, as well as developer experience.”
The IDP empowers developers to have visibility into existing services and to seamlessly create new ones within the portal. By keeping the entire service creation process within the IDP from the start, it ensures consistent best practices. This includes tasks like selecting the appropriate infrastructure resources, standardizing telemetry data exposure, and auto-configuring alerts and dashboards for the service. This centralization not only streamlines the developer experience by offering a consistent interface but also reduces the cognitive load, sparing them the complexities often tied to spinning up new services and making changes to existing ones.
However, the IDP is more than just a creation tool – it should be a holistic service management platform. It serves as a comprehensive service management tool. From accessing on-call schedules and initiating incidents to deploying services with quality checks, developers can handle these tasks in one place. The IDP also has safeguards to give developers the freedom to work efficiently, with the assurance that they won't inadvertently disrupt the system.
Service Reliability Management
IDPs should integrate Site Reliability Engineering (SRE) practices to streamline service reliability management, especially in areas like release management, change impact analysis and incident detection & response.
One key feature is the ability to set up Service Level Objectives (SLOs) connected to your product catalog for better alignment with organizational goals. To facilitate SLO adoption, IDPs should provide automated service-specific recommendations based on available telemetry data. By using Service Level Indicator (SLI) templates, a standardized approach to monitoring and alerting on service quality and reliability becomes achievable across teams. Linking these SLOs with GitOps workflows by defining them as code further embeds best practices.
With SLOs set up for the key entities of the software catalog, IDPs should enhance the release process by integrating production readiness automation and SLO-driven quality gates. This approach equips engineering teams to release swiftly and with better quality, placing necessary reliability guardrails and ensuring services adhere to best practices.
Furthermore, IDPs should offer automated configuration for symptom-based alerts (i.e. using the multi-window multi burn-rate framework that companies like Google use to alert their engineers, which you can read about here), improving the on-call experience. These alerts should be actionable, indicating the customer impact, pinpointing responsible services, and providing immediate access to resources like runbooks and dashboards. The goal is to diminish alert fatigue and reduce operational overhead.
Lastly, comprehensive reporting is essential. IDPs should offer a consolidated view of reliability and quality data, helping in identifying areas with strong opportunities for improvement, evaluating team performance, and assisting in making informed decisions. It bridges the gap between services, user journeys, and the teams responsible for them.
With IDPs standardizing reliability indicators through Service Level Objectives (SLOs) for all services and user journeys, they are perfectly positioned to offer enhanced service and user journey status pages. Rather than a simple 'up' or 'down' status, these pages should offer transparent, actionable information, updated based on precise SLO measurements and integrated with the relevant insights from the incident management tools.
The goal is to simplify the view of product health, presenting it in an understandable manner for all stakeholders. Both technical and non technical users should immediately be able to identify service health, delving deeper when necessary, and recognizing areas requiring urgent attention.
Who benefits from Internal Developer Portals (IDPs)?
Internal Developer Portals (IDPs) are not just another trend in the tech world; they are vital tools that cater to diverse needs within an organization, and they are one of the core things that set mature organizations apart.
Although organizational maturity isn’t solely tied to its size, the number of developers in an outfit undeniably influences the necessity for such efficient internal tools. Different team sizes have varying needs and potential gains from IDPs:
Small Teams (1-50 engineers) – Here, the intimate nature of communication shines. Developers, well-versed in operations, can pitch in as needed. And when there's a snag? It’s often settled with a brief chat across desks or by the water cooler.
Mid-sized Teams (up to 200 engineers) – These groups may lean on devtools, GitOps, and some ticket-based processes. While it's not the pinnacle of efficiency, it's functional and often sufficient.
Large Teams (in the hundreds) – With this scale, due to the increased complexity that comes with large scale teams and systems, IDPs start becoming in many cases a must-have for teams that want to remain efficient.
Very Large Teams (Above 1500 engineers) – At this magnitude, IDPs aren’t just nice-to-haves; they're essential. Traditional methods or alternate solutions falter under the weight of such expansive teams.
Especially in industries where compliance plays a pivotal role, an IDP proves invaluable. Take SOC2, for instance, which mandates robust permission management for infrastructure and for release management. Here, IDPs streamline compliance tasks, reducing ticket overloads and ensuring everything remains up-to-par.
As for the beneficiaries of IDPs within organizational roles:
Developers – for them, an IDP is akin to an all-access pass. It streamlines access to documentation, tickets, CI/CD updates, and more. This consolidation fosters quicker issue resolution, dependable deployments, and swift onboarding – all while boosting metrics like reliability.
Ops Professionals – those rooted in operations find IDPs as their panoramic viewfinder. Whether it's resource tracking, cloud cost insights, or compliance checks, everything's discernible in one place, simplifying their complex tasks that moreover require extensive communication over-head.
Leadership – steering an organization is no small feat (especially those with “Large” and “Very Large” teams, such as Enterprises), and for leaders, IDPs are invaluable companions. They render a crystal-clear view of organizational adherence to standards. Such instant insights empower leaders, driving efficiency-focused initiatives without the detours of traditional data collection.
While engineering roles are the primary beneficiaries, the versatility of IDPs means that other roles, such as product managers understanding service impacts on user journeys, or customer support teams gauging ticket relevance, can also extract significant value from these platforms.
How to set up a developer portal for your team?
When considering the deployment of an Internal Developer Portal (IDP), the primary debate centers around building in-house versus purchasing a ready-made solution.
DIY (Do It Yourself)
If resources allow, constructing an IDP from the ground up offers unmatched customizability. However, this route is resource-intensive and may take substantial time to bear fruit.
Build from open source
A few open-source alternatives exist for those not looking to start from scratch. Backstage is one such option that, though comprehensive in its offering, demands you to piece together a set of plugins. This means significant time and effort go into realizing its full potential. The fact that you need to write a lot of React code and Node.js plugins has made many companies abandon the project.
Another contender is Gimlet, known for its preference towards Kubernetes-based environments. Both platforms have their merits, but they might not offer the most efficient route for businesses looking for comprehensive solutions without the associated setup complexities.
Ready-to-Use SaaS Solutions
For companies seeking a streamlined solution, SaaS platforms like Rely.io are often the answer. When evaluating such solutions, ponder on:
Catalog comprehensiveness – does it cater to microservices alone or provides insights into applications, services, environments, resources, and teams?
Integration scope – does it integrate across multiple cloud platforms and your entire engineering stack?
Analytics depth – how robust and varied are the metrics available to build scorecards, status pages and reliability reports around?
Setup simplicity – how intuitive is the setup? Does it rely heavily on manual inputs, or does it provide automated integrations?
How to setup your internal developer portal with Rely
Rely distinguishes itself by incorporating Site Reliability Engineering (SRE) principles, paving the way for seamless adoption of these practices across organizations. What’s more, it allows customization to adapt and align with your company's specific practices and requirements.
The setup process with Rely is notably user-centric, prioritizing quick and minimal-effort deployment. Here's a step-by-step guide:
Integration with existing tools – integrate Rely with your engineering tools like Github, OpenTelemetry, Jira, and PagerDuty, as well as your product analytics platforms, such as Amplitude.
Automated service discovery – Rely then auto-discovers your services and user journeys, populating them with vital information like documentation, ownership, tickets, reliability measures, and more.
Maturity assessment – use Rely's built-in scorecards to gauge company-wide compliance with established benchmarks, including operational and observability maturity, production readiness, and other standards.
SLO recommendations – Rely provides alerts and recommendations based on your current observability metrics.
Auto-generated status pages – each of your services, user journeys, and products have automatically generated status pages for easy reference.
To summarize, Rely serves as a pre-configured developer portal that seamlessly collaborates with various third-party tools. This ensures undisturbed workflows, only amplifying their effectiveness.