Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

A Journey through the Blameless Resource Library

Emily Arnott
|
10.3.2023

From the very beginning of Blameless, we had two vital missions. First, to offer a solution to what we saw as a mounting crisis of reliability by offering a comprehensive, easy-to-use, reliability platform. Second, to educate the companies facing this crisis on the fundamentals of incident management, cutting-edge best practices, and the cultural values that sustain learning and growth.

In recent years, we’ve seen the importance of our platform repeatedly underscored, and we’ve responded to this need by publishing a library of educational content. We’ve published almost 400 blog posts, thirteen eBooks, and dozens of educational webinars, talks, and panel discussions, adding up to hundreds of thousands of words and hours of video. We’re proud to have built the greatest library of SRE education, our lighthouse bringing in people to a more reliable and blameless world, all for free.

Just like walking into any huge library, it can be intimidating to know where to start. Most of our readers discover our content while trying to answer a specific question, but many others don’t know what they should be learning. For those starting on their reliability education, we’d like to offer a brief Blameless curriculum highlighting some of our best content.

The Fundamentals

Want to learn the 101s of SRE? Start here!

What is SRE? - What better place to start than here? This blog breaks down the origins, goals, and benefits of SRE.

The Essentials Guide to SRE - This eBook provides a more thorough breakdown of the core principles of SRE and how it differs from other systems, like DevOps or ITIL.

The Importance of Reliability Engineering - Wondering if SRE is worth the investment? This blog breaks down why it’s not just a nice to have, it’s a need to have.

Choosing the Right SRE Tools - Blameless’s integration suite is a major part of how it empowers your system. Learn about what other types of tools should fit in your reliability stack in this blog.

Building vs Buying SRE Tools - Many enterprise-tier companies try to build their own SRE tooling. This blog breaks down the pros and cons of such an approach.

Incident Management

The core realization of SRE is that things will inevitably break. The core lesson is that you can make them break better. Learn all about how in this section.

The Complete Guide to Incident Management and Part 2 - This two-part eBook gives you everything you need to start building an incident management practice.

The Iceberg of Engineering Incident Costs - Incidents aren’t just a big deal, they’re a much bigger deal than you think. Dive into the hidden costs here.

What’s Difficult about Incident Command? - In this video discussion, four incident practitioners break down one of the most challenging parts of incident response: leading the team of responders.

SLOs

One of the most powerful tools in SRE is the SLO, the Service Level Objective. SLOs transform nebulous things like “customer satisfaction” into metrics you can track.

What are Service Level Objectives? - SLOs are nuanced, and the more you can appreciate their nuance, the more value you can get from them. Get up to speed on SLOs with this blog.

How SLIs Help You Understand Users’ Needs - The real power of SLIs and SLOs comes from aligning them with the common behavior of your users. This guide walks you through building that connection.

Beyond the 4 Golden Signals - The fundamentals of reliability metrics are latency, traffic, error rate, and resource saturation. Learn how to use these pieces to create a more complete picture.

Retrospectives

When an incident is resolved, that’s really just the beginning. Retrospectives are documents that help you make systemic improvements and prevent incidents recurring. Learn how to do them right.

The Ultimate Retrospective Template - Get started with retrospectives with this complete guide to all the necessary components.

How to Write Meaningful Retrospectives - Take your incident retrospectives to the next level with this guide, which connects ambitious post-incident goals with best practices for writing.

How to Communicate Retrospectives to Stakeholders - Different types of stakeholders – customers, legal teams, executives, and more – each need different things from retrospectives. This blog will show you how to tailor the document to each group.

Culture

Just as important as adopting the processes of SRE is adopting the culture. A cultural foundation will support you through inevitable gaps in your prescribed practices.

Why Every Company can Benefit from a Blameless Culture - Why do we care about blamelessness so much? So much that we named our company after it? This blog explains.

How to Analyze Contributing Factors Blamelessly - This blog puts your blameless culture into action, guiding you to use these principles to better investigate the causes of your incidents.

The Elephant in the Blameless War Room: Accountability - One challenge that people often have in adopting a blameless culture is reconciliation with situations where accountability is needed. This blog explains how to handle this dynamic.

Five Tenets of SRE Culture - Beyond blamelessness, SRE encourages adopting other cultural tent poles. This blog shows the benefits of these other ideals.

Blameless

Want to learn about how the Blameless solution enables the values and practices we’ve explained in all these other sections? Check out this 4-minute demo of Blameless in action, then reach out to one of our team members to start a free trial!

Resources
Book a blameless demo
To view the calendar in full page view, click here.