Resources
Browse through videos, guides, and other educational resources that cover incident management, reliability, team culture, and more.


Blog
Ebook
11.26.2019
Improving Postmortem Practices with Veteran Google SRE, Steve McGhee
For many SREs, Google’s 99.999% availability seems like an untouchable dream. If anything, getting out of pager hell is already worth celebrating with all your coworkers, friends, and family. How can you get to a stage where you have time to proactively prevent incidents, and enter a mental state of calm and control?


Videos
Ebook
5.13.2019
How to Champion SRE Investment to Different Levels of Leadership
Reaching higher levels of the organization is essential to achieving broader adoption of SRE. Without the right buy-in even if parts of SRE are rolled out, behaviors will regress. How then do you get this support? It starts with finding out the incentives needed by key levels of leadership and how to effectively speak to those. It also requires listening to the resistance to SRE adoption, and then effectively address that resistance with reason and metrics. In this talk, I share how to persuade the right level of leadership so that an organization can progress the adoption of SRE through key stages. I provide case studies of how real companies have succeeded or failed with their SRE adoption. This talk aims to equip the audience with the tools to promote SRE adoption through a grassroots/bottoms-up approach.


Blog
Ebook
10.8.2018
Getting to 99.999% Availability with Twilio’s Tyler Wells
A remarkable milestone for any company’s site reliability engineering (SRE) is five 9s availability. That’s less than 30 seconds of service unavailability per month! Exactly what Twilio has accomplished. Tyler Wells, the Director of Engineering at Twilio, shares the key building blocks of getting to five 9s.


Blog
Ebook
Severity vs. Priority | Understanding the Differences
Wondering about severity vs. priority? We explain severity and priority and discuss their differences and their impact on the incident management process.


Ebooks
Ebook
The Comprehensive Guide on SLIs, SLOs, and Error Budgets
What each of these essential metrics are and how to implement them successfully in your organization.



Customer Stories
Ebook
Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial Technology, and Incognia
Incident Impact Calculator
Find out how much you could save
Incidents can do real damage to companies that aren't sufficiently prepared them. Use our calculator to estimate the full cost of incidents for your team.
use the calculator