SRE is a field defined by its constant evolution: from Google’s in-house secret recipe, to the hottest new practice for the biggest enterprise orgs, to a diverse and holistic mentality practiced by orgs of all sizes. Earlier this year, we co-sponsored the Catchpoint State of SRE survey, where we took the temperature of SRE where it was. Now, as we did in 2021 and 2020, we’ll turn to the future to speculate on what 2023 will bring for SRE.
For 2022, we predicted that the skillset and responsibilities of people in the SRE position would become more diverse. Indeed, we’ve seen SREs filling more and more roles beyond development and operations, with some SREs focusing entirely on process, strategy, or culture. This expansion of SRE has led to an even greater potential for what the field can accomplish. We’re excited to share what we predict the next steps will be.
1. Economic factors will force companies to look for more efficient ways of managing reliability
As the global economic situation weakens, organizations will have to learn to do more with fewer hires. We predict that organizations in this position will prioritize SRE functions as a way to ensure stability in the face of turmoil.
Consider some of the problems that could occur during the downturn:
- Disappearance of necessary tribal knowledge
- Fewer engineers on-call
- Decelerating development velocity requiring new prioritization
These problems and more are best addressed by SRE processes, like breaking down silos of tribal knowledge, balancing on-call better with deeper investigation of incidents, and aligning development goals on the highest customer impact. Organizations will find investing in SRE skill sets and tools will be the good use of their limited resources.
2. SRE will be valuable insurance for experimentation
Whether it’s AI assistance, VR immersion, or web3 decentralization, 2023 will continue to push orgs to adopt the cutting edge of technology. We can’t guess which of these ideas will flourish and which will flounder, but either way, having a reliable foundation will be necessary. Adopting even the most successful new ideas at scale will bring new challenges and types of incidents. These growing pains of new technologies will require new approaches.
As organizations experience these growing pains, they’ll turn to SRE to keep their customers happy while they adjust. Incident retrospectives can help teams get a handle on new sources of incidents fast, while a reliability mindset can keep customer happiness the #1 priority.
3. A more holistic definition of reliability will emerge
We’ve often said that reliability is the subjective experience of users based on their expectations of the service. While this is a helpful way to align priorities with customer needs, we predict that 2023 will bring an even more holistic definition of reliability. Organizations will start thinking about the reliability of their system not just in terms of their users’ experiences, but as a complete package covering everything starting from development ideation.
This new socio-technical definition of reliability will encompass the system’s health, the users’ expectations and experiences, and the resilience of your team in the face of adversity. As orgs face increasingly complex systems, greater user reliance, and more strained personnel resources, a definition of reliability that faces all of these challenges will become necessary.
4. Reliability will be a growing priority for teams outside of engineering
We discussed previously how SRE was expanding beyond just development roles. With this new socio-technical definition of reliability, we predict teams entirely outside of engineering will start to make it a priority in their process and culture.
Consider a customer-facing team, like sales. When landing new clients, the team needs to ensure there’s continuity even if people are out of the office. They need to have consistency in their messaging and engagement in order to prevent any deals from falling through the cracks. They need to manage unplanned work interfering with planned work, just like engineers dealing with incidents. The reliability mindset, along with processes and tooling, is the best way for these teams to uplevel these skills.
5. Organizations are confronting the build vs. buy dilemma
In our digital-first era, when a consistently available service is a baseline customer expectation, teams of all sizes need a process to handle incidents and downtime efficiently. Before, many teams got away with ad-hoc, improvised processes, scattered across Google Docs or Confluence pages. These homegrown processes may suffice to handle occasional incidents, but we predict that in 2023, organizations will hit their limitations.
With limited budgets and growing expectations, companies will have to diligently evaluate their incident management solution. Investing in a vendor’s solution comes with upfront costs, but maintaining a homegrown solution has continuous time and therefore monetary costs. A purpose-built incident solution allows you to improve after each incident, continually reducing your downtime even as your service grows in complexity. A solution that’s “good enough” is no longer good enough.
We’re excited to join you all in another transformative year for SRE, incident management, and the reliability mentality. What do you see on the horizon? Share your vision with us in our community Slack channel! Want to see how Blameless can guide you through incidents with our cutting edge workflow? Sign up for our free trial!