Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.
Customer Story

Agero’s Incident Management Is “Invincible” with the Help of Blameless Automation

Before

  • Manual reporting was “tedious and labor intensive”
  • Information gathering done in multiple areas by individual teams

After

  • Seamless IM tracking and reporting
  • Business leaders value the Blameless solution

Agero is leading the digitalization of driver assistance services on a massive scale, safeguarding consumers on the road through a unique combination of platform intelligence and human-powered solutions. Their white-label roadside assistance, accident management, consumer affairs, and digital dispatch solutions are informed by deep industry expertise and insight from more than 12 million annual events.

Many leading vehicle manufacturers and insurance providers use Agero’s solutions to strengthen their businesses and create stronger, lasting connections with their customers.

The Challenge

Agero needed consistency across their incident response. Understandably, this is a challenge because their engineering function actually consists of two groups — a core engineering team and a frontline support team. Both groups are responsible for service reliability, and Ethan Cohen, Sr. Manager of SRE at Agero, wanted to find a methodical approach to the incident management process.

SRE Investment Reaped Instant Payoff

With the help of Blameless, Agero reimagined their approach to Site Reliability Engineering. Right off the bat, Blameless alleviated the instant burden of an alert, because it walks you through every single step of incident response. You can quickly find on-call schedules, assign roles, and begin resolving immediately. Even beyond the on-call team, all stakeholders have access to incident information, and everyone is communicating in the same channels. “It saved us a lot of administrative effort and that’s where we were struggling. During any incident response, there are a lot of moving pieces, including determining the right people to involve and when. And we need to move fast. Blameless has helped immensely,” shares Ethan.

"With Blameless running the automation for incident management, our response to critical issues has helped us significantly reduce time to resolution."

Using Blameless helped Agero eliminate manual toil during incident response. Ethan says the rollout was fast and seamless. The Blameless user interface is intuitive, so it’s easy to make use of its capabilities. From the get-go, the SRE team noticed how easily Blameless simplified and sped up our processes, including notifying the right people as quickly as possible and using emojis to capture events in the incident timeline. Before using Blameless, the previous paradigm could be tedious and time consuming for the team, leading to wasted resources and inconsistent adoption of IM practices. Their next goal is to designate specific roles for managing different parts of the IM process such as the newer Blameless CommsFlow feature, designed to automate communications throughout incident response. They’re working to build custom Flows that align PagerDuty with Status page and streamline notifications.

Reliability Gets Woven into Process and Culture

Today, Agero maintains a much more robust and scalable IM process, going so far as to describe it as “invincible”. This is all due to renewed focus, led by Ethan, on codifying best-in-class methodologies. For example, the Agero team has been able to simplify the way they classify incidents. Where before they distinguished 5 incident severity levels, now they only need to leverage 4. The engineering team has also made improvements in their documentation workflow, which is useful for retrospective analysis and SLO management. By leaning on the Blameless platform, Ethan has successfully trained everyone across the engineering function on their new universal IM process. 

"Blameless has helped us drive efficiency around administrative efforts, enabling us to automate a lot of typically manual tasks. Blameless made our incident resolution efforts relatively seamless in that respect."

Much of the credit for this huge success goes to Ethan, other leaders, and the entire engineering team at Agero. They made thoughtful decisions to adopt specific IM methodologies, while Blameless helped put those into practice. It takes a lot of effort to align across two business units and multiple engineering groups. Kudos to the Agero team!

Blameless has been happy to partner with Agero, who are firm believers in having a “blameless” mindset during incident response all the way through retrospective analysis. Agero has always been a proponent of fostering a culture that’s positive and encouraging. Their post-mortem process is truly “blameless”. In fact, they follow the motto “BPM”, which stands for “blameless postmortem”.

Agero Business Leaders Value Blameless and Reliability

The story doesn’t end here. Agero continues to grow and expand their reliability engineering efforts. They host regular “lunch and learns” to further educate engineers and other teams about SRE and how to continuously improve. In addition, whenever they have a question about using Blameless to further their reliability efforts, they call on their Customer Success Manager at Blameless, Matthew Dodge, who is always happy to assist. 

"It’s amazing what Blameless's customer support brings to the table. Matthew always gives 110%. He goes above and beyond for us in every way possible. We are very comfortable knowing that Matthew is on top of our questions and concerns every day."

Beyond the engineering team, Agero’s business leaders have recognized the benefit of using Blameless. Looking at Reliability Insights data in Blameless confirms that Agero has reduced MTTR. Ethan shared, “Engineers, frontline workers, and business leaders have all said they see the value and impact of using Blameless and improving our incident response.”