Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

A Practical Guide to Incident Communication

Emily Arnott
|
8.28.2023

Even the best software fails sometimes. How quickly those failures get addressed, and how your teammates and customers feel about you after the fact, comes down to how well you communicate with them. Users, customer success managers, Ops team members, IT, security,  engineering leadership, even the executive team. Each has a vested interest in resolving engineering incidents quickly. All need to be updated with the right information at the right time. 

When a system goes down, a poor framework for communication can create chaos across an organization, and be a distraction for teams working to resolve the problem. 

Throughout this guide, we’ll delve deeper into the importance of incident communication, how it works, steps to take, best practices, and more, to help your team master incident communication.

The Importance of Incident Communication

When working to efficiently resolve incidents, communication must be purposeful, reliable, and consistent. An incident response communication plan ensures everyone involved in incident response:

  • Is aware of the issue
  • Knows the current status of the issue
  • Understands the steps to take for a resolution
  • Stays updated on actions taken and planned

A quick response time limits damage and potential costs resulting in loss of data, income, productivity, and more.

Preparing for Incident Communication

Before an incident begins, there are things your team must prepare if they hope to respond effectively. Trying to establish an effective process for incident communication, while responding to a high severity incident, is like trying to repair a plane while in flight. Implementing incident communication best practices requires a little research into your team’s needs before something breaks.

Define What Constitutes an Incident for Your Team

Incident management looks different depending on the system or product in use. An incident is any situation disrupting your system or software. You can define ‘what is an incident’ based on the risk to your team, users, investors, and system.

Determine How You Categorize Incidents

The impact of an incident is measured by its severity level. A low number means a high impact. It’s useful to make a chart of each level and the type of incident it clarifies. For example:

Incident classification and severity levels

Determine Your Team’s Roles and Responsibilities for Incident Response

Each department or team member should have a unique role in incident response. Whether it’s monitoring, reporting, or resolving. The best way to ensure your team is prepared and organized is to:

Preparation is key to successfully navigating any incident type.

Incident Communication Steps

Once an incident is categorized and the appropriate team or team members understand their roles, communication steps are implemented. These should be outlined in your incident response communication plan. Some steps to include in your strategy include:

1. Categorize the Incident

There are two main ways to categorize the incident:

  • Categorize by incident type
  • Categorize by incident severity

Refer to your impact incident severity level chart (or add to it if needed) to categorize the current issue.

2. Select Appropriate Incident Communication Channels

Knowing the type and severity of the incident helps you identify the proper channel for communication. You’ll know whether it goes to the DevOps team, IT, operations, upper management, or even customers. Make the most of this process by:

  • Selecting appropriate communication tools (Slack, etc. for different scenarios)
  • Utilizing multiple channels for redundancy and wider reach

3. Identify Predefined Communication Templates

An incident response communication template reduces the time better spent resolving the incident. These templates are customized by:

  • Incident type
  • Team
  • Severity level
  • Appropriate response

Blameless offers several types of templates for DevOps or SRE teams, including incident response and communication.

4. Alert the Incident Response Team 

An incident response team is a group pre-determined to react to incidents at certain levels of severity. Your organization may have response teams that manage all incidents or one team for minor and another for moderate to severe incidents.

Alerting the team includes:

  • Implementing pre-planned alerting mechanisms
  • Ensuring timely response to incident notifications

Communication is a key factor in the organization and implementation of the response teams.

Incident Communication Best Practices 

Incident communication involves many steps and protocols. To best support your organization and response teams, it helps to have a list of incident communications best practices. Some common best practices to include are:

  • Clearly defining communication channels and escalation paths
  • Using clear and concise language
  • Being empathetic during communication
  • Tailoring language to suit the audience (no jargon or department-specific acronyms)
  • Maintaining transparency so all team members know what is expected of them

Incident communication is not only internal. It often involves alerting users to potential downtime or disruptions. Clear communication ensures trust and credibility are maintained, despite the issue.

What to Look for in Incident Communication Platforms

Incident communication platforms are a type of software or website designed to guide and automate incident communication. When seeking the right platform for your organization or system, there are some key factors to consider.

Here are some must-have features and functionalities for efficient communication:

Real-time alerting

The faster your response team is alerted to an incident, the faster the incident is resolved. Quick turnaround times build trust in your brand and enhance authority and credibility in your industry.

Centralized communication hub

A communication system with a centralized hub helps consolidate all incident-related communication. Response teams, management, and anyone else involved in the response process can see updates, add comments, and communicate in real-time.

Customizable templates

Templates go a long way to improve consistency and speed up the communication and response process. Templates include basic information necessary to inform on a variety of incidents and are further customizable for specificity.

Communication channel capability

Choose communication channels based on speed, privacy level, and system compatibility. Some examples of communication channels include email, SMS, voice, and Slack.

Collaboration tools

Collaboration tools facilitate swift decision-making and resolution. There are many collaboration tools to choose from, including New Relic, Salute Safety, Swimlane, and Freshservice.

Integration Analytics and Reporting

Communication platforms with enhanced analytics and reporting seamlessly fuse data solutions into applications and workflow. This brings data sources together for inclusive and far-reaching metric reports.

Blameless Platform’s Incident Communication Capabilities

Throughout this guide, we explored incident communication. Incident communication is important to share issue awareness, incident status, understanding resolution steps, and planned actions. It keeps all relevant departments and individuals informed of the incident.

Some of the topics covered in this guide include:

  • Categorizing incidents by severity and type
  • Incident communication templates for streamlined communication
  • What to look for in an incident communication platform
  • Incident communication best practices

At Blameless, we offer reliability management solutions for engineering, DevOps and ITOperations teams, such as incident management, incident response, CommsFlow, incident retrospectives, reliability insights, and SLO manager. One of the areas Blameless specializes in is the control of incident communication.

With Blameless you can inform stakeholders, create visibility, create and locate incident communication templates, and leverage swimlanes for faster, more efficient incident communication.

To learn more about the Blameless platform’s incident communication capabilities, start a free trial today.

Resources
Book a blameless demo
To view the calendar in full page view, click here.