Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

Here are the Top Predictions for SRE in 2021

|
12.2.2020

Who else is glad that 2020 is almost over? We’ve had one of the most difficult years in recent history. With everything going on, it’s been difficult to think further than a few days out, much less into the new year. But, we’re hopeful that 2021 will be a better year for everyone. And we’re predicting some exciting things in the future for SRE.

Here’s our two cents: SRE adoption will only continue to grow. Yet, the practice and culture shift, rather than the role, will take priority in 2021. More people (not only SREs) will have a reliability mindset, which means reliability will be shifting left through the software lifecycle. SLIs, SLOs, and error budget policies will become common practice. Practices such as observability, runbook automation and blameless retrospectives  continue to be table stakes. 

For a full view into all the trends we expect to see in this space for 2021, read on!

Adopt, adopt, adopt

According to a survey we conducted with almost 300 industry professionals, over 50% of respondents employ an SRE model with dedicated engineers focused on infrastructure and tooling, or an embedded SRE model where full-time SREs are assigned to a service. Why? Companies are realizing that a reliability-first mindset is key. Downtime costs, customer expectations, and competitive pressures have never been higher. 

We believe this means a massive hiring frenzy for SREs is coming. According to LinkedIn, SREs have already seen 34% annual growth in demand for the past five years.  As the infrastructure supporting digital services becomes more mission-critical and complex, companies will staff dedicated teams to address reliability concerns.

We asked our Blameless teammates about SRE trends they expected to see this year. A common theme was that open roles for SREs will triple and dominate 2021 job growth. Furthermore, this growth won’t be unique to the technology industry. As more industries (such as financial services) converge toward technology-enabled experiences and business models tech, they are heading in the direction of SRE as well. Everyone is looking for those nines.

According to Cloud Architect Dan Bergman from Beyond Trust, “In pursuit of the mythical five 9’s availability, enterprises will scale up these teams to match the customer expectation of 24x7 product availability.” To keep the systems running in shipshape, the incident resolution process will need to be faster than ever. This requires well-documented runbooks and strong incident management skills. SREs will be integral to leading and scaling this evolution.

And as Maya Ber Lerner of DevOps.com states,“Increasing levels of automation will require smart ways to handle dynamic infrastructure and applications without losing control, and being able to track changes all the way back to the coder.” This will demand SRE best practices around observability, error budgets, and documented and automated incident management capabilities.

Increasing levels of automation will require smart ways to handle dynamic infrastructure and applications without losing control, and being able to track changes all the way back to the coder.

Additionally, this increase in automation can help lower the amount of toil engineers face. This is often overlooked. In a CIO industry roundtable hosted by Lightspeed Ventures, Blameless Co-Founder Ashar Rizqi stated, “What we don't capture is the impact of cognitive toil. Whether it's on the developers or the operator, it’s not going away. What you're likely seeing is that the cognitive burden of operating software is shifting. Sometimes it'll shift to an ops team, or from the ops team to the dev team. Part of your strategy needs to include what you’re doing to improve and reduce that cognitive toil.”

What we don't capture is the impact of cognitive toil. Whether it's on the developers or the operator, it’s not going away. What you're likely seeing is that the cognitive burden of operating software is shifting. Sometimes it'll shift to an ops team, or from the ops team to the dev team. Part of your strategy needs to include what you’re doing to improve and reduce that cognitive toil.

How to implement SRE in your organization

One of the biggest perceived roadblocks for SRE is how to approach implementation for different kinds of organizations. After all, as Kelsie Pallanck of DevOps.com notes, “What succeeds at Google may not be the recipe for SRE implementation at smaller tech enterprises or startups. The role of SRE and the expectations that go with it remain fluid.”

Adopting SRE means looking past the giants like Google, Netflix, and LinkedIn. Instead, examine how you can apply the principles within your organization. There’s no one-size-fits-all implementation method, and SRE best practices can benefit all companies. 

In 2021, leadership will need to consider what levels of adoption and hiring are the right mix. SRE adoption is more than hiring and relabeling teams, it’s a cultural movement.

President and CIO of StarCio Isaac Sacolick predicts that many organizations will fall into this trap. “I expect adoption numbers to continue to rise, but many CIOs will struggle to get the expected results from these programs. They can't operate in silos, and adding DevOps and SRE skills to agile teams is not the optimal answer for smaller/medium IT organizations that can't find the necessary skill sets.”

This means a cultural shift will be the deciding factor in a successful SRE implementation. It also means that organizations looking to adapt will need to make sure that they aren’t piling SRE duties on already overloaded teams.

As COVID-19 has led to extreme strain on socio-technical systems, burnout has become a top concern. Ashar also addressed this. “Teams are working longer hours with fewer breaks and there's a collaboration toil that isn't accounted for. We're seeing people hitting that burnout point. One of the things that we've done is to make sure that we account for burnout relief.”

Accounting for burnout relief can take many forms. One of the best proactive methods to provide burnout relief is to load balance work across the team. Reliability can’t be the responsibility of a sole engineering team. Everyone in the organization needs to own and feel accountable for it.

What succeeds at Google may not be the recipe for SRE implementation at smaller tech enterprises or startups. The role of SRE and the expectations that go with it remain fluid.

Practice more than personnel

SRE adoption may begin this year with a massive hiring wave, but we don’t think it will end that way. Again, our Blameless teammates had some interesting predictions on this. Since SRE functions will continue to be difficult to hire for, some of the predicted SRE spending will likely go into technology investments and tooling in lieu of senior SRE talent. This will lead to more system and solutions integrators and managed service providers entering the SRE market. Another potential offshoot of this trend is the rise of more entry-level roles and training programs. To meet the skills gap, providers may also begin exploring the potential to deliver SRE professional services focused on best practices.

Today’s SREs manage a vast plethora of tools across monitoring and observability, on-call and alerting, incident response, and much more. The explosion of tooling options for SRE means that instead of over-rotating on data and metrics, teams must also invest in the right processes and culture in order to be maximally effective and truly oriented around the customer experience.

As with DevOps, SRE is about mindset and culture, but tools can aid in facilitating best practices for key responsibilities. With the practice of retrospectives that look forward instead of backward, one of the key underlying principles of SRE will continue to be that of blamelessness, so that work environments are always focused on accountability that looks forwards, instead of backwards.

Can we call this a new age for SRE? Most definitely. Björn Rabenstein of Grafana Labs presented his theory at SREcon Europe/Middle East/Africa in his talk, “SRE in the Third Age.” He believes that as we enter a third age, we won’t need SREs, we will need SRE. There will be a convergence in the mindset, roles, and responsibilities of developers and operations engineers. Success will depend on whether the collective team can embody the SRE culture and prioritize reliability throughout the software lifecycle.

As we enter a third age, we won’t need SREs, we will need SRE. There will be a convergence in the mindset, roles, and responsibilities of developers and operations engineers.

If you want to learn more about how teams are using SLOs and SRE best practices, stay tuned for our upcoming report on SRE trends which will be available in the coming weeks. Follow us on LinkedIn or Twitter to be the first to know when this report goes live.

If you liked this article and want to read more, take a look at these:

Resources
Book a blameless demo
To view the calendar in full page view, click here.