Is it a coincidence that “May” and “yay” rhyme? Probably not. This month has been pretty exciting for us here at Blameless, and we’d love to share why. We also have some of our favorite Tweets, content, and events happening in the SRE and resilience engineering community this month.
Incident Response for Resilient Socio-Technical Systems: In this whitepaper, we will describe why incident resolution is harder than ever, and share how SRE can help teams respond better under pressure.
Failover Conf follow-up: Your team and culture questions answered! The Gremlin team answers your questions from Failover Conf about blamelessness, building SRE teams, and more.
SRE Leaders Panel: Business Agility and SRE: SRE leaders Garima Bajpai and Jason Fraser discuss the value of crisis during incident response, the best and worst tech transformations, and more.
What Chaos Engineering Is (and Isn’t): Casey Rosenthal writes about this history of chaos engineering, what it is and the practices it entails, and what it is not.
Improve your Reliability with Blameless SLOs: Blameless is excited to announce that our SLO Manager is now generally available! This product helps SRE and engineering teams proactively make data-driven decisions about reliability efforts
Continuous Learning as a Tool for Adaptation: Nora Jones writes about emphasizing learning over action items, increasing and disseminating insights, asking questions after incidents, and more.
Blameless is excited to announce that our SLO Manager is now generally available! SLO Manager is a new service added to the Blameless platform. This service helps SRE and engineering teams proactively make data-driven decisions about reliability efforts.
According to a survey Blameless conducted, over 80% of organizations use SLOs or will in the next 1-2 years. While there are a variety of solutions available to create SLOs using application performance monitoring (APM) tools, it remains difficult to prioritize, interpret, and leverage these to drive customer satisfaction. After building SLOs, many teams are still left asking, “So what’s next?”
With Blameless’ SLO Manager, teams can create distinct user journeys that correspond to their services. Teams can monitor these services’ corresponding SLOs and gain actionable insights via error budgeting. Blameless error budgets help teams understand how much unreliability their services have experienced over a time period, and predict when their error budget will deplete. This helps teams sort services by risk levels and take proactive measures to address any degradation of reliability before it starts affecting customer satisfaction.
Learn more about our SLOs by registering for our bi-weekly live demo here.
WTF is SRE May 20: A virtual conference about site reliability engineering, DevSecOps, observability, multi cloud, and working with complex distributed systems at scale.
Blameless Bi-Weekly Demo May 25 at 8 AM PDT: Check out a live demo of Blameless. Plus, get a sneak peek of something new we’ve been working on.
The Anatomy of Three Incidents May 25 at 9 AM PDT: Randy Shoup will share why the best response to a system outage is not "What did you do?", but "What did we learn?"
NS1 Insights 2021 June 24: This is a one day virtual event that brings together industry experts to discuss how applications are changing the way we work, live, and solve some of the world’s biggest challenges.