Something big happened at Blameless this month — our “Postmortem” feature was updated to its new name, “Retrospective”. To the naysayer, I suppose you’re thinking, This seems trivial. Different teams call it different names anyway, so why bother making the change? First let me say, thank you for reading our blog and I hope you finish this one through to the end. Now, allow me to explain our reasoning and why we’re excited about this update.
A fundamental of SRE is the mindset to treat failures as systemic problems that require systemic solutions. This subsequently fosters a more positive and psychologically safe environment, inspiring the SRE mantra “blameless culture” — you can read more on that in our post about SRE culture. One way you can actionably treat failures as systemic problems that require systemic solutions is by being more intentional with post-incident analysis. For example, you can study a post-incident report to identify gaps in systems, tooling, and processes. We always encourage teams to use the opportunity to brainstorm sustainable solutions. In fact, it’s good to remind yourself that an incident is never an isolated event. There’s virtually always something more to discover that needs addressing.
We want everyone to succeed in their SRE journey, and it’s important for us to evangelize this forward approach to post-incident analysis. Earlier we noted that there are a few different terms that describe the part of incident management that involves a post-incident report and its analysis. This is true. Still, most of us will agree that the most common and longest running name is postmortem. Postmortem is actually a medical term that dates back to the 1820s. In tech, it’s used metaphorically to describe when we review an incident after its “death” and record detailed notes. In a way, sure I guess that makes sense. The incident is over; we killed the beast. But did we? Most of us know this usually isn’t true. And if we’re thinking ahead, we take this as a learning lesson and brace ourselves for what’s ahead.
By changing the product feature name from “Postmortem” to “Retrospective”, Blameless (the company) aims to encourage teams to view incidents as learning opportunities. It can prepare you for future similar events or to identify a larger underlying issue. You’ve heard us say before that incidents are unplanned investments. Teams should cherish the post-incident process by collaborating on how to improve the system moving forward. The term postmortem implies that since the event is “over”, there’s nothing more to discuss. Not so the case. Smart nomenclature is mindful that words are always subject to interpretation. Let’s say I’m a woodshop teacher explaining to my students that it’s important to wear goggles when chopping and to always turn on the safety lock when a saw is not in use (disclaimer: I’m not a woodshop expert). I call these “safety protocols” for everyone’s protection. I don’t call them “restrictions”. I don’t even call them “rules”. Sure, they’re effectively rules and restrictions, but the purpose behind them is to protect. That’s the theme I want everyone to keep in mind. Silly example, but I hope you see where I’m going.
Semantics! Yep, and proud of it. Earlier I mentioned that “blameless culture” is largely emphasized in the site reliability engineering community. Evolving away from using the term postmortem is helpful in engendering blameless culture. We previously wrote abouthow and why the words we use in engineering impact the way we think and work. Language has the power to shift our perspective. Consequently, we might be excited for something or dread it. It can impact the level of importance we attach to a thing. We might expect a situation to be combative or collaborative. In this particular case, retrospective promotes constructive conversation, discourages finger-pointing, and fosters problem-solving. By contrast, postmortem, and it’s association with death, implies finality and bears a strongly negative connotation. Not all incidents are Sev0, but we usually still have something important to learn. There’s no reason to associate an incident with the idea of death.
My final “battle card” is that this feature update is a request we’ve received from many Blameless customers. Several have told us that they refer to the post-incident process internally as the “retrospective” — “retros” for short — and they would love to see that reflected in the Blameless product. They do this to promote collaboration, build long-term sustainable and scalable solutions, and discourage finger-pointing. In fact, our friends at Hashicorp had submitted a ticket to us for this specific feature request. Martin Smith, Senior Site Reliability Engineer at Hashicorp explains, “We believe that retrospectives create continuous reflection and improvement whereas postmortems imply a root cause, which drives the wrong outcomes vs. future improvement. Root cause analysis is disappearing as folks build more and more distributed systems and analyze incidents more like the airlines than a mainframe.”We’re excited that we can finally deliver this update to our customers and continue to partner with you on your reliability journey. To all of our customers, whether or not you requested the update, we understand this will take a bit of getting used to. Thank you for progressing with us as we continue to embrace a blameless mindset. It’s all part of moving the needle forward for more reliable services and resilient teams.
“We believe that retrospectives create continuous reflection and improvement whereas postmortems imply a root cause, which drives the wrong outcomes vs. future improvement.” - Martin Smith, Senior Site Reliability Engineer, Hashicorp
At Blameless, our goal is to be more than just a product for incident response and SLO management. We want to share everything we know about site reliability engineering and make it more accessible. One of the ways we do this is by providing a best-in-class product for engineering and on-call teams. Whether that’s through functionality, service, or even - you guessed it - product nomenclature. Another way we like to share knowledge with the SRE community is through our blog *shameless plug* and other resources like our podcasts and webinars where we record conversations with SRE experts in the community. We welcome you to check those out, and if you’re ever interested in having a chat with our experts, feel free to request a demo. We’d be happy to walk you through the product and share some of our insights. Finally, thank you to our customers who continue to partner with us as pioneers of reliability engineering!
"I have less anxiety being on-call now. It’s great knowing comms, tasks, etc. are pre-configured in Blameless. Just the fact that I know there’s an automated process, roles are clear, I just need to follow the instructions and I’m covered. That’s very helpful."
"I love the Blameless product name. When you have an incident, "Blameless" serves as a great reminder to not blame anything or anyone (not even yourself) and just focus on the incident resolving itself."