The blameless blog

Using AI to Auto-Detect and Remediate Incidents

Blameless
Blog home
Incident Response
|

Today, the number of possible failure modes in cloud and microservices applications are exploding, making it increasingly difficult to gain true observability and take the right action across IT environments. According to Lightstep’s Global Microservices Trends report, 91% of teams are using or have plans to use microservices, but 73% report it is harder to troubleshoot application performance problems due to greater complexity.

Service architectures are changing at a speed beyond human capacity, while teams are increasingly distributed. Yet, consumer expectations of application experiences will only continue to increase. Latency is no longer an option, as bounce rates increase exponentially with page load time. The inherent conflict between these phenomena means that resilient systems necessitate automation and intelligence in order to meet always-on demands.

This is reflected in Gartner’s identification of AI and Continuous Intelligence as major technology trends reflecting the importance of data-driven, real-time decision making. In this context, teams should invest in AI driven tools for monitoring, response playbooks, and blameless postmortems. This empowers teams to prevent incidents before they become customer-impacting, saves hours of manual work per incident, and celebrates learning so teams can continue leveling up their operations.

We’ve partnered with Zebrium to make this happen. To see what it looks like in action, be sure to register for our live webinar, Using AI to Auto-Detect and Remediate Incidents, on March 19 at 10AM PST.