Site Reliability Engineering (SRE) and DevOps share a goal of building a bridge between development and operations. We'll explore and compare both approaches.
Wondering to yourself, which is better for your company, SRE or DevOps? Neither SRE or DevOps is “better,” exactly, since they’re similar yet different in a few key ways:
SRE, or site reliability engineering, is a methodology developed by Google engineer Ben Treynor Sloss in 2003. The goal of SRE is to align engineering goals with customer satisfaction. Teams achieve this by focusing on reliability. SRE is an implementation of DevOps, a similar school of thought. Google is also responsible for bringing these two methods together. In this article, we'll break down more of what this looks like in practice.
Reliability is a subjective quality based on your customers’ experiences. SRE allows you to measure how happy your customers are by using SLIs. SLIs, or service level indicators, are metrics that show how your service is performing at key points on a user journey. SLOs then set a limit for how much unreliability the customer will tolerate for that SLI.
SRE teaches us that 100% uptime is impossible. Some amount of failure is inevitable. Because of that, incident response is a core SRE best practice. Responding to incidents faster reduces customer impact. But, you need the processes in place to enable this. There are many components to incident response, including:
Nobody expects perfection. Some amount of unreliability is acceptable to your customers. As long as your performance meets your SLO, customers will stay happy with your services. The wiggle room you have before your SLO is breached is the error budget.
Your error budget can help you make decisions about prioritization. For instance, services with lots of remaining error budget can accelerate development. When the error budget depletes, teams know it's time to focus on reliability.. Through this decision-making tool, SRE allows operations to influence development in a way that reflects customer needs.
The cultural changes of SRE are as important (if not more) than the process changes. The cultural lessons of SRE include:
DevOps is a set of practices that connects the development of software with its maintenance and operations. Its name reflects these two parts: Development and Operations. DevOps originated from a collection of previous practices. These include the Agile development system, the Toyota Way, and Lean manufacturing. The term DevOps became well-known in the early 2010s.
The primary goal of DevOps is to reduce the time between making a change in code and that change reaching the customers, without impacting reliability. It seeks to align the goals of development with organizational needs to create business value. In this way, the goals of SRE and DevOps are very similar. Both focus on customer impact and efficiency. But, the methods they use to achieve this vary.
DevOps seeks to increase the frequency of new deployments of code. Faster, more incremental changes allow a more attuned response to customer needs. It also reduces the chance of major incidents caused by large, infrequent deployments.
A core tenet of DevOps is to remove silos between development and operations teams. Rather than development “throwing code over the wall” for operations to handle, the teams work together throughout the service’s lifecycle.
Here are some DevOps practices that encourage cooperation between development and operations:
Monitoring data for DevOps is a big deal. DevOps advocates measuring valuable data and using it as your basis for decision making. By default, data should be accessible across the organization.
Simply having a lot of data available isn’t enough to make good decisions. Metrics should be contextualized to provide deeper insights. Make sure that you're setting up monitoring that helps you learn about your system. Having too much data can actually make decision making more difficult.
Like SRE, DevOps advocates for automating wherever possible. Where SRE focuses on automating to increase consistency and reduce toil, DevOps automates to tighten the development cycle. By removing manual steps in testing and deployment, teams can achieve a faster release frequency.
You can implement both DevOps and SRE into your organization. A helpful way to combine the methodologies is to consider SRE as a way to achieve the goals of DevOps. This doesn’t mean SRE is better than DevOps. Focusing on the goals of DevOps instead of the process-focused approach of SRE is also helpful. Drawing from both methodologies as appropriate provides the best way forward.
SRE is a method of implementing the goals of DevOps. Here are some of the common goals of DevOps, and how SRE practices can help achieve them:
DevOps determines what needs to be done, whereas SRE determines how it will be done. DevOps captures a vision of a system that is developed efficiently and reliably. SRE builds processes and values that result in this system. You can establish your goals using DevOps principles, and then implement SRE to achieve them.
SRE and DevOps share many philosophies and principles. Some that they share include:
However, SRE and DevOps also have some differences in philosophy. Often these come down to priority. Some differences include:
When implementing either SRE or DevOps in your organization, you’ll need to consider how these changes will actually take place. Will you:
Depending on the maturity of your organization and your needs, different approaches will be more efficient. You should consider how you want to structure your DevOps and SRE hires.
Both DevOps and SRE teams vary based on how centralized they are. At one end is a centralized team, which creates tools, infrastructure, and processes that the entire organization shares.
The other extreme is a distributed team. DevOps/SRE engineers are assigned to individual teams and projects. They handle maintaining the reliability and velocity goals for each team.
Every engineer can work to implement DevOps and SRE best practices without holding the title of DevOps Engineer or SRE. However, if you do have dedicated staff with this title, here are the main distinctions between the responsibilities:
Blameless can help teams make the most of their DevOps practices with SRE. Achieve your development and reliability goals with our platform. To see how, check out a demo. Or, if you’re interested in more content like this, sign up for our newsletter below.