Want to up-level your reliability program? Let's start by identifying your opportunities for growth.
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.
Customer Story

F500 Retailer Saves Multiple Hours per Incident with Blameless

High-Level Summary

This F500 Retailer is a renowned global apparel brand, with over 200 million registered users worldwide on its digital community.

In an industry as hyper-competitive as retail, every minute counts. Especially during busy seasons such as the holidays, the Reliability Engineering team struggled with the manual toil around incident processes. With Blameless, the team now shaves off hours of manual work per incident, reducing wasted time and friction so the team can keep focusing on the work that matters.

Goals

  • Codify and scale incident procedures through automation of response tasks and postmortem learning
  • Seamlessly capture high-fidelity information such as contributing factors and key chatops activities
  • Implement a reliability solution including bots, integrated tagging and reporting that would otherwise require a team of full-time employees to build and maintain

The Challenge

Before Blameless, the team's incident process spanned many disparate tools such as JIRA tickets, Google Docs, and PagerDuty, creating cognitive load and difficulty reporting. Blameless automatically consolidates relevant information across tools, providing critical automation and standardization to scale the team’s situational awareness.

Pain Points before Blameless

  • Resistance to logging incidents or completing postmortems due to manual work (the larger the incident, the harder the data collection)
  • Collection of incident information ‘after the fact’ across disparate data sources led to gaps while creating toil
  • Limited resources to create robust and reliable slackbot and internal tooling for automation
When people see the Blameless incident summary come up, everyone knows what to do. It’s improved the quality and standardization of our communication. The fact that all that data is being collected automatically and organized solves a huge pain point for incident response leaders.

The Solution

With Blameless, the team has realized the following business benefits.

Business Impact & ROI

  • 30-60 minutes saved on response for each incident
  • 1-2 hours saved per postmortem (including elimination of 30+ minutes per incident to gather information)
  • Engineering team’s time dedicated to innovation instead of building and maintaining internal tools
Blameless’ reporting language is pretty flexible, so it's been really beneficial for Incident Commanders and leaders, customer happiness, as well as our operations team who are tracking key metrics.

Reliability Toolchain

  • Blameless
  • Slack
  • JIRA
  • PagerDuty

Positive Business Outcomes

  • Blameless collects all incident reports and postmortems, including relevant chats, into a searchable directory for holistic context
  • Flexible reporting language provides metrics useful to multiple audiences: incident commanders, customer happiness, and operations
  • Focused communication through alignment and standardization on incidents, facilitated by the Blameless bot
  • Ease of use has led to more logged incidents, allowing for more meaningful metrics and postmortem report completion
Blameless' integrated tagging and reporting capabilities would have otherwise required us to staff a whole team to build something comparable.