Want to up-level your reliability program? Let's start by identifying your opportunities for growth.
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

Are you an MS Teams shop? We've got you Covered with Blameless Incident Resolution

We have an exciting announcement. Blameless is providing early access to our Microsoft Teams integration. SRE and engineering teams can now resolve incidents faster without leaving the comfort of their favorite messaging tool.  

With the Blameless incident resolution product, Microsoft Teams users can now reduce toil in routine incident response processes through automation, codify processes with checklists, and craft retrospectives with the ‘add to timeline’ command. From the moment the incident commander starts an incident, they can immediately rally all the appropriate subject matter experts and work to resolve the incident faster.

Why incident response matters

Incident response is a must for modern teams. As downtime becomes more and more expensive, teams must rise to the challenge of responding quickly and efficiently. Customer expectations continue to grow in a world that’s on 24/7. Additionally, teams must ensure that they’re culturally responsive to the needs of the people who carry the pager for their systems. On-call loads must be balanced, and it becomes crucial to eliminate cognitive toil from processes.

SRE provides teams with a set of best practices for responding to incidents that allows teams to learn from failure and become more resilient. Some of these best practices include:

  • Establishing channels and standards of communication: During an incident, it’s important that your entire team has a communication central. This helps keep everyone on the same page, keeps people from asking the same questions multiple times, and allows teams to understand the needs and progress of each member.
  • Creating set roles and responsibilities: In critical moments, you don’t want to be stepping on another person’s toes or doing repeat work. You also want to know who is responsible for taking the lead and communicating outside the team. By assigning roles and delegating responsibilities, teams can work together seamlessly to solve problems.
  • Assigning tasks and creating repeatable processes: No incident is the same, but how you approach resolving it should be. Steps and processes for certain incident types and severities help guide responders through the incident.
  • Codifying information with runbooks: It’s impossible for a single person to know everything about a system. They’re called complex for a reason. So, resolving different classes of incidents for different services can be difficult. If the incidents are rare, or the team member is new, on-call can be overwhelming. Documenting processes makes sure that anyone who carries the pager can work through an incident with confidence.
  • Using automation to lower cognitive toil: Getting everyone who needs to be involved in an incident all together (even virtually) can be a chore. Organization is a top pain point for many teams we speak to. To overcome this, you can automate spinning up communication channels, data aggregation, and other key tasks to lower the cognitive toil for your team.

These best practices help teams handle incidents better. They can be implemented in engineering teams with only two or three engineers, all the way to organizations with thousands. As you scale, however, you may realize that you need a tool to help codify these best practices. This is where Blameless comes in.

How Blameless can help you resolve incidents better

With Blameless’ Incident Resolution product, teams can integrate with the communication platform of their choice (Slack or MS Teams) as well as video conferencing options (Zoom, GoTo Meeting, and Google Meet). When an alert triggers an incident, Blameless automatically spins up communication channels and drops relevant team members into the incident. You can add additional users to the incident as needed. And, you won’t have to break your workflow to share what’s happening. It’s all in the incident.

Blameless is more than just a communication hub. It’s the way teams are orchestrating an incident all the way to it’s resolution. After kicking off an incident, you can assign response roles, add a severity, enrich the incidents with tags, and view the on-call engineers for any specific service through our integration with PagerDuty.

During the incident, you can also use automated checklists of tasks to help guide your team through. These checklists can be tailored by severity and role, making sure that they fit the incident as well as the unique responsibilities of each team member.

You can also attach runbooks to each incident through our Runbook Documentation feature. This feature allows users to create sets of documentable tasks and actions. These drag-and-drop tasks and actions can be in basic text, rich text, or code snippets. This allows for a more robust runbook with better context, as teams can include images, scripts to run, and more.

After the incident concludes, Blameless also helps teams extract the most knowledge possible. The data automatically aggregated during the incident is displayed on your timeline, with logs, key messages, and more. Additionally, this timeline is editable. You can also assign retrospective creation to teammates through the incident channel, making sure that all information on the incident (even after it’s over) is stored centrally. If your team forgets to work on the retrospective, Blameless will send a gentle reminder.

Once the retrospective is complete, you can export and send a report on it to any stakeholders who are interested. And, once even that step is done, you can use Blameless to archive the incident channel.

These capabilities help teams codify and streamline incident response processes. You’ll be able to resolve incidents faster and with less toil. As Tenzin Wangdhen, Staff SRE at Iterable, said, “The improved coordination, follow-up tracking, and visibility help us actually address what caused the incident in the first place, and prevent it from happening again. Through that iterative process of having incidents, learning from them, applying the fixes and rinsing and repeating, we’ve been able to improve the stability of our platform.”

Ready to see Microsoft Teams in action?

We’re currently seeking design partners to help make your MS Teams experience even better. If you’re interested in trying out our Microsoft Teams integration for yourself, fill out this form and we’ll reach out to you.

Book a blameless demo
To view the calendar in full page view, click here.