Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

An end-to-end incident in Blameless and PagerDuty

|
11.4.2020

PagerDuty is a leading on-call management platform that aggregates monitoring and alerting data, notifies on-call teams, and accelerates incident resolution. The platform is used by thousands of teams responsible for software experiences. It integrates incident triage with rapid responder mobilization, so teams can resolve incidents in real time.  

The bi-directional Blameless integration with PagerDuty helps teams add additional automation to their PagerDuty workflows, minimizing the costs of incident coordination. 

Blameless also offers a similar integration with Atlassian Opsgenie.

Key Workflows

Here are some of the key workflows that are supported with the Blameless and PagerDuty integration. 

  • Determine the on-call for a given service: When a Blameless incident is triggered, team members can quickly identify who is on-call. Typing the command /blameless oncall or /blameless show oncall will display the current on-call team member based on the associated PagerDuty on-call schedule.
  • Determine an escalation policy for a given service: Within Slack, users can type the command /blameless ep, which will show the associated PagerDuty escalation policy. 
  • Trigger a PagerDuty alert for a given service: A Blameless incident can generate a PagerDuty incident either manually via a command, or auto-triggered upon the creation of a Blameless incident.
  • Trigger a Blameless incident when a PagerDuty incident is created: PagerDuty incidents can automatically generate Blameless incidents. This will spin up a dedicated Slack channel for the incident to drive automation through the Blameless bot, as well as sync incident tracking in the Blameless web app.

Managing Incidents in Blameless and PagerDuty

PagerDuty is a robust tool for alerting on-call teams and helping them take control of their incidents. Blameless helps teams maximize the ROI of their PagerDuty implementations by bringing more guardrails and automation to PagerDuty workflows. 

In a nutshell, Blameless empowers organizations to scale incident response by codifying the rules of engagement throughout the incident management and learning process

Rafael Fonseca, Engineering Manager at Vital Healthcare, puts it this way: “We pick up habits around incident response, but I’d never found a solution that tells you where to start. You follow the prompts, and Blameless will run through your incident process. That’s when the light bulb went off for me.”

The most common use case of the Blameless-PagerDuty integration is to trigger a Blameless incident when a PagerDuty incident is created. This ensures that the correct responders are engaged via PagerDuty, while minimizing toil and ensuring process repeatability and consistency via Blameless’ responder checklists, automation, and much more. This frees up crucial bandwidth for team members to focus on decision-making.

As shown below, Blameless can automate key workflows such as spinning up Slack channels and video conferencing bridges. Blameless also automatically surfaces critical context such as incident status and severity, team, service, and recommended action items

Benefits of integrating PagerDuty and Blameless

Here are just a few of the reasons as to why thousands of users jointly use Blameless with PagerDuty. 

  • Easy setup and configuration: Blameless augments PagerDuty workflows by providing out-of-the-box checklists for each incident response role (Creator, Commander, Communications Lead, Responder, Engineering Lead, etc.). These are all easily customizable to meet your team’s specific needs.

   

  • Scale incident management: Blameless also helps build guardrails throughout the incident process by adding responder-based roles, checklists, and communications workflows to your existing operations toolchain.
  • Reduce postmortem toil: One of the biggest pain points of building post-incident reports is aggregating and building context around the timeline. Oftentimes, relevant information such as graphs and key discussion points are siloed across monitoring tools and different communication channels. The Blameless bot helps teams capture important information during the incident itself, so they don’t have to do it after the fact when things are no longer as fresh in their mind. This saves teams upwards of 1-2 hours for each postmortem report.
  • Standardize context and communication: According to an SRE leader from a Fortune 500 retailer, “When people see the Blameless incident summary come up, everyone knows what to do. The fact that all that data is being collected and organized automatically solves a huge pain point for incident response leaders.” Blameless standardizes important context by auto-populating information such as: when responders checked into the incident, start of customer impact to end of customer impact, incident state as well as postmortem state 
  • Automate key tasks and workflows: Team members can also automate key tasks within the incident response workflow. For example, you can configure a setting within zoom to auto-record zoom meetings, spin up additional chatops channels (swimlanes) to organize triage for complex incidents, and more.

Check out our customer case studies for more. 

Best practices of using PagerDuty with Blameless

Here are some best practices to be aware of when integrating PagerDuty with Blameless.

  • Configure deduplication rules within PagerDuty to minimize noise. To prevent alert fatigue, you’ll want to configure your PagerDuty event and notification rules to ensure that team members are only notified on actionable incidents. 
  • Create an incident commander rotation. Blameless can automatically assign the listed on-call individual as the Incident Commander on activation of an incident. As such, we recommend that you create an Incident Commander on-call rotation in PagerDuty. This will ensure that the correct person is assigned as the Incident Commander within Blameless.

Note that all the same workflows described above can also be configured with the Opsgenie integration (see integration guide). However, please note that you should not activate both the PagerDuty and Opsgenie integrations simultaneously. 

Why not give Blameless incidents a spin? See how you can reduce hours spent on every incident with our free sandbox environment. And for more details on how to integrate PagerDuty with Blameless, check out our integration guide.

If you want to see more of Blameless in action, check out the following:

Resources
Book a blameless demo
To view the calendar in full page view, click here.