Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

Blameless vs. PagerDuty Incident Response

What’s the difference?
Are you looking to upgrade your incident management and reliability process with Blameless or PagerDuty? Both these tools can help you reduce toil and increase consistency, depending on your needs. This page will break down how these tools can help you and guide you through questions to determine your needs.

Who is Blameless?

Founded in 2018, Blameless seeks to build a workflow that can service as the backbone for incident response. Our small team has achieved a robust feature set, taking users from declaring a new incident, through an effective and easy response process, to insightful learning with incident retrospectives and reliability trends.

Who is PagerDuty?

Since 2009, PagerDuty has sought to be the gold standard of alerting software for on-call engineers. From here, they’ve expanded their offerings to integrate with hundreds of other tools, allowing monitoring data from many sources to trigger alerts.

Questions to Consider

Here are some of the questions you should consider before reading this guide:

Thinking about these questions should help guide you to the parts of your process that need the most attention.
How many engineers do I have working on-call or otherwise responding to incidents?
What parts of my incident process are manual, tedious, time-consuming, or difficult?
When something goes wrong, is it clear who should be involved and can they be reliably contacted?
When are situations escalated to involve more people?
Do people know what they should do when they start to diagnose and solve an incident?
What learning is retained after an incident? What review process happens, and how are changes made after the incident?
How is information about the incident relayed to other stakeholders?
Talk through the key differences with an expert
Learn how to optimize your tech stack with Blameless, and discuss the key differences between the platforms
To view the calendar in full page view, click here.
Privacy Policy

Blameless or PagerDuty?

Which to Use Depending On Your Needs
Blameless and PagerDuty mostly live in different stages of your incident management process. PagerDuty focuses on the logistics of alerting – defining the rules in which someone is alerted, and handling the alert itself. Blameless focuses on what people do once they’re alerted – what steps should they take, what infrastructure needs to be established, and what learning happens from the incident afterwards.

In essence, PagerDuty gets people to an incident, and Blameless guides them once they’re there.

Let’s look at different stages of the incident management process, review what sort of pain points you could be experiencing at each one, and look at how PagerDuty or Blameless could help.
Explorer graph - Blameless Images
  1. Incident Detection using Blameless or PagerDuty
Incident detection is the first stage in incident management, where an abnormality in your system is detected and flagged. Are you finding that you’re sometimes the last to know that your service is down, with customers complaining being the first sign that something is wrong? Conversely, do you find yourself bogged down with red flags that don’t reflect actual problems? Refining your monitoring and observability tools is the key.

Having a meaningful understanding of your system’s health is a challenge with modern microservices architecture and 3rd party tool integration. Black box monitoring, where you simulate use of your service like a user would, can help get clarity. After all, if an issue isn’t experienced by users, it may not be worth dealing with.

Both Blameless and PagerDuty integrate with a number of monitoring tools. Blameless uses them to gather contextual data to better assess incidents. For example, if you’re dealing with a server outage due to overuse, Blameless can pull in historical monitoring data that shows you when different thresholds of use were reached to help you diagnose your server’s limits. PagerDuty integrates with monitoring tools to automatically trigger alerts. For example, if server use reaches a certain threshold, it will automatically alert specific people. Both abilities will help increase the speed with which you respond to incidents. Just make sure your monitoring tools can integrate with your other chosen tools.
  1. Incident Alerting using PagerDuty
Once an incident is detected, the next step is alerting the on-call engineers best equipped to respond to the problem. This is where PagerDuty really shines. Reaching out to the right people is more complicated than it may seem. A good alerting tool will make sure that people aren’t over-alerted, leading to pager fatigue. If someone is pinged too often for things that don’t pertain to them, they’ll start to ignore the pings and could miss something important.

PagerDuty’s customization allows for sophisticated triaging and classification of incidents. Discrete user groups and dynamic on-call schedules are also supported. If you’re having issues with engineers being bogged down with too many alerts, or missing out on alerts they need, PagerDuty can help. Having this step be fast, automated, and consistent is key, as you want engineers working on the problem as quickly as possible.
Pagerduty screenshot - Blameless Images
Slack GIF - Blameless Images
  1. Incident Response Process using Blameless

The response process is the most substantial part of incident management. It’s when the assembled engineers work together to diagnose the problem, brainstorm solutions, implement them, and iterate their ideas until the incident is resolved. There can be many problems that occur here:

  • Engineers don’t know what to try or where to start, due to a lack of resources like runbooks, checklists, or past incidents to study
  • Poor communication leads to poor work distribution, leading to redundant work
  • Engineers keep breaking focus with other obligations and tasks, like sending updates to management or tracking down documentation, slowing their progress
  • Engineers have a poor understanding of their individual responsibilities, leaving some tasks unfinished

All of these problems are compounded by the stress and time constraints of the incident. Solving the incident will never be trivial, but the goal is to make it as easy as possible by removing toil and making things smooth. That way, the engineers can focus on applying their expertise in the most efficient way.

Blameless does exactly that. It uses a role-based checklist system to make sure all the tasks of the response are handled without redundancy. It makes building helpful resources and infrastructure easy to get people up to speed fast. Previously distracting tasks, like updating stakeholders, are handled automatically so engineers stay focused. This stage of the process can be stressful and demoralizing, so making an investment in a tool like Blameless is key to keep your engineers happy and productive.

  1. Incident Escalation using Blameless and PagerDuty
Sometimes an incident seems relatively benign at the start, but reveals additional issues as you investigate it. Other times, as the demands for the affected service increase, the incident starts to demand a more immediate response. These and other scenarios require you to escalate the severity of the incident and involve more people. This process needs to be predefined, executing consistently and quickly, as you don’t want to rely on people’s judgment in the heat of the moment.

Both Blameless and PagerDuty can help in this process. Blameless lets you update the severity and status of an incident from where you’re already working on the incident. This loops in additional people to your existing communication channels without any toil. Once you escalate, PagerDuty can handle sending out effective alerts to exactly the right people. Blameless’s dynamic checklists can get these newly added people working effectively right away.
Slack incident report - Blameless Images
Communication flows - Blameless Images
  1. Incident Communication using Blameless
A common problem organizations face in incident management is handling communication during incidents. A big incident affects the entire organization, and unsurprisingly many people want to be in the loop. Managers, executives, customer success teams, PR teams, and more may want the latest news from the incident for many different reasons.

There are many benefits for them to stay informed and relay their knowledge to other stakeholders. At the same time, during these critical incidents, the time taken by engineers to respond to these update requests, including the break in their focus, can be hugely detrimental.

The key is to automate this process. Blameless’s CommsFlow feature allows you to set up communication templates that automatically send to preselected groups when certain triggers happen. You can have specific templates for the PR team and managers when an incident escalates, for example. If you want to keep your engineers working effectively while also keeping the whole organization informed, investing in Blameless CommsFlow is a must.
  1. Integrations and Tool Support for Blameless and PagerDuty
An important part of platforms like Blameless or PagerDuty is the ability for them to integrate with other tools you depend on. The manual toil for your engineers in switching between responding to the incident and working in your other tools breaks focus and slows the incident response. Having your platform automatically feed data into these other tools and receive data from them is key to a smooth incident response process.

PagerDuty and Blameless both feature a large suite of integrations for all of their features. PagerDuty integrations help reach the on-call engineers in whatever apps they’re already checking and use monitoring data to automatically trigger alerts. Blameless integrations help gather contextual data from monitoring tools, let you work within your favorite communication tools, create tickets for incidents to orchestrate followup tasks, and more.

Take a look at where toil is being generated in your current incident management process. Your engineers are probably bouncing between a number of tools, manually passing data back and forth. Platforms like Blameless and PagerDuty can expedite this process, making everything smoother and centralized.
App list screen shot - Blameless Images
Retrospective GIF  - Blameless Images
  1. Incident Learning and Retrospectives (Postmortems) with Blameless
In incident management, you aren’t done just when the incident is resolved. Just as important as fixing the immediate problem is taking steps to ensure it doesn’t happen again. Every incident should be an opportunity to understand your system better and improve it. If you’re having repeat incidents, you need to invest in incident learning.

Blameless facilitates easy incident learning with automatic retrospectives or postmortems and integration with ticketing services. As you resolve the incident, Blameless is automatically logging communication in incident channels and gathering relevant contextual information. Once the incident is resolved, responders will fill out a questionnaire to gather additional info, customizable for different types of incident. This turns into a retrospective or postmortem document linked to each incident that you can further customize and review. Building up a library of these documents gives your engineers a head start in resolving similar incidents.

Your retrospective or postmortem also serves a hub for followup tasks. Blameless can automatically generate tickets for followup tasks in platforms like Jira. Use this to improve the resilience of your system and prevent recurring incidents. Once you determine the contributing factors or root causes of an incident, including things like codebase bugs, insufficient resources, or lack of processes, create followup tasks for each of them.
  1. Service Level Objectives (SLOs) and Reliability Patterns with Blameless
Understanding your system’s health and reliability is important to keep customers happy and make long-term strategic decisions. If you feel like you’re always reacting to problems and not proactively preparing for them, or if your customers are becoming frustrated with your unreliability, these are areas you need to invest in.

Service level objectives, or SLOs, are a metric that ensure your customers are satisfied with your level of reliability. No service can be perfect, incidents are inevitable. You need to understand what parts of your service are most critical and what level of reliability will satisfy your customers. Although your services will go down sometime, as long as you stay above this objective, you’ll know that most users will remain happy and not look to competitors.

Blameless makes making and tracking SLOs easy. The impact of incidents are automatically accounted for in the SLO status, making sure you know right away if you’re approaching a breach. Policies for approaching the SLOs can be baked into the SLO itself, triggering deployment freezes or other plans to keep things afloat.

This data is also collected in Blameless’s reliability insights platform. This feature shows you patterns for your incidents over time. Which features of your service are breaking most often? Are there particular days of the week or times of day that services break? Which of your on-call engineers are spending the most time on incidents? Answering these questions can help you get ahead of repeat incidents, burnout, and the most critical failures.
Login service GIF  - Blameless Images

Summary

Blameless and PagerDuty each substantially improve your incident management process, reducing toil, helping engineers succeed, and improving reliability.

If you’re looking to get your engineers to incidents faster, PagerDuty is the best in the field for alerting them effectively and correctly.

If you’re having trouble with engineers performing to the best of their abilities once summoned to the incident, Blameless can help you at many stages of the process.
Try a Demo

Frequently Asked Questions

Can Blameless and PagerDuty work together?
Yes! Blameless can access the escalation policies you’ve defined in PagerDuty and trigger them when necessary. Using both tools in concert creates an even more robust and flexible solution.
Should I use Blameless or PagerDuty to be more reliable?
Both tools can improve your software’s reliability and consistency. Depending on where your reliability problems are concentrated, either tool may make more of an impact. If on-call engineers aren’t alerted quickly to new incidents, try PagerDuty first; if they don’t know what to do after being alerted, try Blameless.
What integrations do I need to use Blameless or PagerDuty?
Blameless and PagerDuty both host a suite of integrations to make them more robust and flexible. Find integrations that can capture the health of your system to provide context to these tools.