For the past four years, Catchpoint and various partners have been running a yearly SRE Survey. This year, Blameless is excited to partner with Catchpoint for the fifth annual survey. We want to hear from you if you are in a DevOps or SRE role or even if you work on reliability with some other title or role. There are tremendous, valuable learnings when we listen closely to practitioners. Fill out the survey now at https://www.catchpoint.com/sre-survey! For every completed response we receive, we’ll be donating $5 each to the International Commission of the Red Cross and Girls Who Code.
We believe deeply in the importance of reliability for all digital services. The people who are most effective at delivering that reliability are those who practice resilience engineering principles. When the survey started, these were usually people with the title of SRE (or at some companies, PE) but as resilience engineering and reliability have become more widely adopted, the practices have spread beyond the scope of these specific titles. To a certain extent, these practices have never been exclusively within the domain of specific titles. There have always been people who were passionate about software reliability in many roles long before Google coined the SRE moniker (circa 2003).
For the past 5-7 years, SRE and related roles have consistently been amongst LinkedIn’s “Most Popular Jobs” listings. This popularity grew significantly over the last two years as the pandemic dramatically accelerated the importance and urgency of digital transformation. Companies have been learning just how important reliability is to their online reputation, even in cases where they have not made this a priority (example). This survey is a way to look at the lived experience of these practitioners and understand the state of SRE as a whole. Since resilience engineering is based on understanding the real “work as done”, this survey is a way to keep in touch with the wider profession.
You can take the survey at https://www.catchpoint.com/sre-survey - it should take less than 10 minutes of your time; please reshare the link widely amongst your colleagues, friends and community. The greater the number of participants, the richer the analysis will be. We look forward to sharing our findings from the survey in the SRE Report 2022, to be released later this year.
Here are some highlights from past reports:
2018 - The inaugural SRE Report focused on building a profile of what being an SRE entails and what someone looking to become an SRE expert could expect from the role. The report found that there was no one “typical” background or skill set for resilience engineers. Nonetheless, 64% of survey respondents had previously held a role as a SysAdmin, perhaps a surprising stat given that the majority of SREs report into the engineering department and not operations. The Report also found that while “the majority of SREs felt their job directly contributes to one of their organization’s core business values, they did not feel that the role was well understood and/or respected throughout the organization.”
2019 - Continuing from the previous year, this year’s SRE Report found that Site Reliability Engineering was still emerging as a practice. However, the survey and report concentrated on incident management, asking “What impact do incidents have on organizations and the people responding to them?" While it was clear that organizations are focused on building resilient systems and recovering quickly, the question was raised of whether this focus extended to employee resilience and recovery from post-incident stress. Sadly, the survey found that the impact of incident management on employees is significant, and highly stressful. Moreover, most companies do not effectively support engineers as they deal with these stressors.
2020 - This survey was performed in two parts, January (pre-pandemic lockdown) and then a followup in June (post-lockdown). In contrast to the 2018 findings (when only 20% were, or expected to be, remote), over half the respondents were now forced into a remote arrangement, and did not see a need to return to the office in order to carry out their duties.
Other key findings centered on the importance of designing observable systems to prevent service disruptions instead of purely reacting to outages and the need to continuously work to overcome the entropy (or quicksand) of reactive operational work.
2021 - This year saw the growth of third party reliance and internal platform engineering teams to enhance developer productivity. Allocating reliability engineering cycles to make these platforms reliable in themselves was shown to be an important trend. Also, connecting the resilience work that SRE teams do to direct business-valued capabilities was highlighted as a key way to demonstrate the value of reliability efforts.
2022 - This is your chance to contribute! This year’s questions range from wanting to know how SREs are spending their time to the impact of the great resignation on SRE teams and efforts. Participate here: https://www.catchpoint.com/sre-survey
Not only will you be actively contributing to the industry’s longest-running SRE Report, but you’ll also help in providing actionable solutions for organizations worldwide.
For every survey taken, Catchpoint and Blameless will donate $5 to the International Red Cross and $5 to Girls Who Code.
All contributions will be aggregated and kept confidential.
Participate here: https://www.catchpoint.com/sre-survey