Curious about the incident priority matrix? We discuss how to determine the impact and urgency of an incident, and how to create a matrix that helps prioritize incidents.
What is an Incident Priority Matrix?
An incident priority matrix is a method of prioritizing incidents based on their impact and urgency. “Impact” is a measure of the extent of an incident and the potential damage it can cause. “Urgency” is a measure of how quickly a resolution is required.
What is the Impact of an Incident?
The impact of an incident is defined as the measure of the effect of an incident, change, or problem on the day-to-day business of an organization. A loss of revenue or customers following an incident are negative effects. These impacts on business are due to the incident causing downtime, loss of data, or other issues causing customer pain.
The impact of an incident is proportional to the number of users that were impacted, how badly they were impacted, and how important they are as users. Therefore, a high-impact incident is one where many users are impacted, they completely lose the ability to use the service, and their service use contributes significantly to business needs. Impact can also be understood as severity.
Factors to Determine the Impact of an Incident
Number of users or customers impacted
Loss of revenue or cost incurred in incident resolution
Number of IT systems or services involved
Categories of Impact in an Incident Priority Matrix
What is the Urgency of an Incident?
The urgency of an incident is about time. It’s a function of time that depends on the speed at which a customer or business expects something — restoring a service, providing updates, etc. Urgency is often associated with service level targets where urgency increases if the contracts such as service level agreements (SLA) guarantee to users that services will be restored in some timeframe.
Generally, the urgency of an incident depends on the criticality of the affected service. Resolving incidents that affect areas that are critical to the business have high urgency. You also have to consider how much time and how many resources will be needed to resolve the incident. An incident with low criticality that can be resolved easily can still be considered urgent and prioritized highly. . You can also use tools to determine criticality and resources needed based on your incident response processes.
Categories of Urgency in an Incident Priority Matrix
Incident priority is defined as the intersection of impact and urgency of an incident. When you consider the impact and urgency of a situation, you can easily assign priority and assign adequate resources. You start by calculating impact and the urgency, and assign the incident a priority value.
It’s important to remember that priority is relative. It defines the actions you will take in a particular situation. However, the actions are not set in stone and will change with the situation and context. It isn’t about an objective priority level, but what’s the highest priority among your options.
Incident Priority Matrix
The best way to map out incident priority is in an incident management matrix. In the matrix, we map out various incidents according to their impact and urgency, and a priority class is automatically assigned. If both urgency and impact are low, then the incident is assigned a low priority (P4 or P5). if both values are high, then it’s a high-priority incident (P1 or P2), and if the values lie somewhere in the middle, then it’s a medium priority (P3) incident.
Incident Priority Levels
How Many Priorities Should You Have in an Incident Priority Matrix?
In an incident priority matrix, you can have as many priority levels as you want. However, the recommended priority levels are no more than five. Many organizations have only three priority levels (high, medium, and low) to eliminate confusion. Having more than five priorities can make it difficult to define each priority.
Incident Priority Matrix Example
The best way to explain various priorities to your employees is by providing them with examples of the incident priority matrix. That will help them understand the matrix and use it in a more intelligent manner.
In the following example, we will give an example of an incident priority matrix with five levels of priority:
In the incident priority matrix above, there are:
Three levels of impact: high, medium, and low.
Three levels of urgency: high, medium, and low.
Five levels of priority: P1, P2, P3, P4, and P5.
How to Design a Priority Matrix?
A lot of thought and experience goes into designing an incident priority matrix, but four points should always be considered:
How an incident is affecting the productivity of the organization and its users.
How many users and what types of users are affected by the incident.
How many systems and services are affected and how critical are those to users.
The level of IT security and safety risks to the organization and its users.
What resources are necessary to resolve the problem.
Incident Priority Matrix in the Incident Management Process
The most common use of an incident priority matrix is in the incident management process where incidents are classified according to their severity and the area of service affected. This classification will allow you to determine who is required to respond and what resources are necessary.
In many organizations, incident management is the responsibility of SREs (Site Reliability Engineers). During the incident management process, incidents are prioritized as low-priority, medium-priority, and high-priority based on the aforementioned factors. However, prioritizing incidents can be challenging. Whenever in doubt about incident priority, always go with high-priority, because erring on the side of caution is better than mistaking a severe incident for a minor one.
During incident response, establishing timelines is critical. The team needs to ensure that the error budget (amount of acceptable unreliability of service) is not exceeded and more importantly, the SLA (service level agreement) is not breached. Considering the error budget and SLA, your team can categorize and respond to high-priority incidents. If the issue remains unresolved, it’s critical to escalate the incident and notify the stakeholders. The priority of the incident and how it’s responded to can change during the response process depending on how successful the response is.
How Can Blameless Help?
Having reliable service is extremely important nowadays. You need to monitor the system metrics and resolve incidents immediately to make your service reliable. The incident priority matrix helps you decide which incidents require an immediate response and which can wait. However, in an ever-changing business paradigm, it’s not easy to determine an incident priority. Blameless offers tools such as automated incident response, incident insights, and runbook documentation to help your team manage incidents more efficiently. Schedule a demo or sign up for our newsletter below to learn more.
Noor is a software engineer who contributes educational articles on SRE and DevOps fundamentals to our blog.