Wondering about severity vs. priority? We explain severity and priority and discuss their differences and their impact on the incident management process.
What is severity vs. priority?
Severity is generally defined as the degree of impact an incident has on a project, and the priority is defined as the level of business importance assigned to an incident.
That’s the broad description, but let’s look at each one in more detail.
Breaking down severity and severity types
Severity is used to assess how much an incident has impacted a system. It’s a key part of incident management and one of the first assessments, but its actual definition depends on the business context. Most teams have a categorization system for severity that ranges from 1 to 5. That’s not a hard and fast rule, but that’s the general practice.
When defining severity types, it’s important to create a system that accurately categorizes the impact of the incident in a way that everyone understands. Too few types mean that there is a risk impact that may be missed. Too many, and it can get confusing for teams to identify the risk level accurately.
There are different ways to label the severity types and how they are ordered. Factors like impact on the customer, usability, reliability, and internal resource availability need to be considered when defining severity types. Severity types need to be done in tandem with all other teams that deal with incidents, but roles focusing on service health, such as QA engineers, SREs, and operations will initiate the definitions.
The categories for severity types can be defined as the following, although they will be modified based on business context:
- Critical/Severity 1: A catastrophic failure that has rendered the entire system unusable and inaccessible.
- Major/Severity 2: A large part of the system has failed or is inaccessible. Some functionality is still available, but most elements are missing or not working. There might be some alternative ways to access standard features, but it’s extremely difficult.
- Medium/Severity 3: A segment of customers face errors or some issues cropping up that make access very difficult but not impossible.
- Minor/Severity 4: Issues that don’t impact the system itself but might be causing issues for customers. It doesn’t require an all-hands-on-deck approach but should be fixed.
- Low/Severity 5: Usability isn’t impacted, but there are small issues that may go unnoticed by customers but should still be fixed when there’s time.
Breaking down priority
Now that there is a way to categorize and label how severe an event is, the next step is to consider priority for severity types. How is it decided which event needs to be addressed first? Does it need an immediate fix, or is it something that can be pushed down the to-do list? Since priority is related to business resources rather than the system itself, the main drivers for the discussion need to be product teams and business teams together with SREs and QA engineers to create a realistic system.
The same factors mentioned for severity, such as the impact on the customer, usability, reliability, and availability of internal resources need to be considered for priority as well. Priority is essentially a way to communicate to teams what needs their focus and what can wait.
Priority is generally broken down into the following categories:
- Low: The issue needs to be addressed, but it can wait until other more pressing problems are solved.
- Medium: The issue needs to be addressed relatively quickly, but it can be done in normal day-to-day work the next day or during the next cycle of development work.
- High: Immediate resolution of the issue is needed as the system is unusable otherwise.
Priority vs. severity: What’s the difference?
Different elements need to be considered when labeling incidents when looking at severity vs. priority. To go deeper into the differences between priority vs. severity, it’s important to distinguish why categorization is needed. A high severity, high priority event requires a significant, fast response. But if it’s a medium severity event that impacts an important customer, its priority gets pushed to high rather than being medium or low priority.
The factors mentioned and the severity types are all interrelated, and context is key. Priority helps engineers understand where their attention needs to be based on business impact. Severity types help teams understand how serious the effect is, who will need to be involved in resolving, and how much time is needed to resolve the issue. Severity is used to indicate how the incident has impacted functionality, and priority is a way to understand when it needs to be fixed.
Teams need to work together to define the severity types and severity vs. priority for their specific business context. Having an agreement on where the focus needs to be based on the kinds of incidents that have occurred ensures that the categories are realistically crafted. They also need to take into account the team’s resources and expertise and how those are distributed for different types of incidents.
Priority vs. Severity: Examples
If we think about examples of priority vs. severity, an e-commerce business's severity types will range from a critical incident, such as a customer's inability to purchase products due to payment gateway failure. For an app, a critical incident could be the inability of customers to use a key feature due to a bug. Although there will be some parallels across the different types of systems, such as customer experience, the incident itself will differ when being categorized. That’s why it’s crucial to use past experiences, worst-case scenarios, and other specific business resources to define severity vs. priority rather than generic definitions.
To drill down deeper into examples, a high severity and high priority event would be an incident such as users being unable to access an app as it crashes every time users open it on their phone. Or if users cannot purchase items on an e-commerce website because the cart icon is no longer appearing.
The incident blocks all functionalities and renders users unable to use essential parts of the system. A low severity but high priority event might be a major misspelling on the website's home page. It won’t bring down the system, but it does need to be fixed before too many customers notice. Plus, it’s an easy fix so it can be a higher priority to get it out of the way and knock an easy one off the to-do list.
Another example of severity vs priority would be an incident that’s high severity but low priority, taking the same scenario of an e-commerce site. If the commenting function of product pages suddenly crashes, that would be a high severity incident, but not necessarily a high priority. The comments section is a helpful part of your service, but it’s not a service or function that’s crucial to the user experience and isn’t frequently used. Fixing the issue may not be the highest priority at the moment since it would be a huge investment of time and resource, so it could be labeled as a low priority until other issues are resolved.
How Blameless helps
To ensure that incidents have a standardized response that considers severity and priority, it’s crucial to use the right tools. Automated testing to flag issues is part of the solution, alongside incident response and management tools to create a holistic response. Using features such as automated runbooks, teams can solve common issues faster using sophisticated automation rather than tedious manual work.
In addition, event data can be captured in real-time, giving teams a solid foundation to work together and streamline the incident response process post-event. Additional features such as incident retrospectives allow teams to go through data from incidents as learnings that can improve workflows moving forward.
With Blameless, teams can accelerate development velocity while ensuring that reliability remains a top priority. To learn more about how Blameless creates a better incident response process, request a demo today!