Request access

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

What is an SRE Product Manager?

António Araújo
António Araújo
Go To Market Lead
Rely.io
António Araújo
April 28, 2022
4
 min read
What is an SRE Product Manager? | Rely.io

Background

Site Reliability Engineering (SRE) and its interconnected areas such as Observability, Platform Engineering, and DevOps, have typically operated without Product Managers. I believe that’s happened because IT Operations was seen solely as a cost center and not as a source of competitive advantage. 

With the rise of technology giants such as Google, Amazon or Facebook, other companies started adopting similar SRE practices that improve efficiency, security, development speed, and the reliability performance of large-scale systems. Everyone is trying to move at the same speed as big tech and nimble startups. Bets on SRE or DevOps are now seen as investments with positive returns, rather than sunk costs. 

There’s little to no literature coming from Google describing how Product Managers can be part of an SRE team. Although there’s been lots to say about the SRE Team Lifecycles and their different topologies, there hasn’t been much around bringing non-engineers into this function. I think that’s going to change soon.

Why do SRE teams need Product Managers?

There’s an increasing number of product owners and program managers in SRE and Platform teams because they have to:

  • Build products for users (other engineering teams)
  • Prioritize which reliability investments have the highest impact on customers
  • Create the long-term reliability strategy for a company
  • Make Build Vs. Buy decisions
  • Liaise with several functional groups, including teams outside of engineering 
  • Define reliability targets and report on performance from the perspective of customers' expectations
  • Manage relationships with new and existing vendors

An SREs plate is full already so the tasks listed above are arguably stealing time from reliability-ensuring activities. A few weeks ago, not thinking I’d be writing this blog today, I ran a poll on r/sre asking How do you spend most of your time?

The results here were not that surprising. It validated that a large proportion of SREs do actually build and/or manage developer tooling, meaning that they must care for users. Also, those who commented did mention that a portion of their time is spent answering questions, doing admin tasks, and in vendor meetings.

We expect our Technical Product Managers in the platform tribe to have a tight working relationship with the product engineering, infrastructure and security teams as they’re usually the key stakeholders and consumers of the products that our platform teams are building

In Life in the Wise Platform team as a Technical Product Manager, by Laura Woo

What do SRE Product Managers do?

Product Managers supporting SRE and Platform teams are asked to bring traditional product management techniques, such as user research, roadmap prioritization, and stakeholder alignment into the reliability world. According to several job descriptions I’ve analyzed, their responsibilities often include:

  • Partnering with engineering and product leads to build product roadmaps for SRE
  • Creating a long-term strategy for observability and tooling investments, including managing vendor relationships
  • Implementing and maintaining Service-Level Indicators (SLIs) and Service-Level Objectives (SLOs)
  • Creating profiles of users (software engineers) and ensuring SRE’s products addresses their needs
  • Championing reliability ownership across non-SRE teams and enabling them to account for & track reliability of the services they’re responsible for
  • Owning the vision and strategy for: incident management, disaster recovery, performance testing, chaos engineering, etc.

Note: Responsibilities will vary from one organization to another, as well as job titles — SRE Product Lead, Technical Program Manager, SRE Product Owner, etc.

Below is a visual example of how a Product Manager might be part of an SRE team and some of their responsibilities — don’t take the SRE’s work areas as an absolute truth, I know there are many missing and some of these are always shared responsibilities across the team!

Example of an SRE group and some of their responsibilities

Given SRE’s principle of applying software to manage and automate IT, the function has successfully taken on many areas of responsibility. And it has been able to do so with less people than it would normally have been needed to move at the same speed reliably. That means complexity has increased drastically and now there’s a need for a focused strategy, planning and management function within SRE. 

I believe that we will start seeing more and more product managers step into this area or, most likely, more engineers formally take on a technical product management role within reliability. My second hypothesis is that the SLO methodology will become the product manager’s best friend because it will allow them to:

  • Agree with non-engineering functions on the reliability goals needed to meet or exceed customer expectations
  • Communicate about reliability performance with SLIs/SLOs as a standardized language
  • Prioritize roadmap according to SLO historical performance
  • Design better alerting and incident management strategies with burn rate alerting
  • Enable teams to own reliability of their services with out-of-the-box service SLIs
  • Monitor data-driven KPIs/OKRs, allowing for weighted, justified and fast decision making

More on the above with demos of Rely.io on a future blog post coming soon!

António Araújo
António Araújo
Go To Market Lead
Rely.io
António Araújo
On this page
Contributors
Previous post
There is no previous post
Back to all posts
Next post
There is no next post
Back to all posts
Our blog
See related articles
What are Day 1 and Day 2 Operations for Platform Engineers
What are Day 1 and Day 2 Operations for Platform Engineers
For technical leaders, platform and DevOps engineers, mastering both day 1 and day 2 operations is crucial for ensuring smooth operations.
John Demian
John Demian
July 12, 2024
9
 min
How to Implement Developer Self-Service Successfully
How to Implement Developer Self-Service Successfully
Developer self-service empowers developers to build and manage their services and resources independently from DevOps, accelerating development cycles without any compromises on quality or standards.
John Demian
John Demian
June 21, 2024
9
 min
The ultimage guide to Dora metrics
The ultimate guide to DORA metrics
As platform engineers, DevOps engineers, and technical leaders, embracing DORA metrics can propel your teams towards operational excellence and streamlined processes.
John Demian
John Demian
June 14, 2024
7
 min