The Blameless Blog
Establishing equitable on-call rotations, putting the right guardrails and automation in place, and regular incident practice are key to minimizing the stress of on-call. In this blog, we’ll share key tools and practices to ensure your on-call engineers are set up for success.
Jacob Scott is an experienced engineer and enthusiastic participant in the resilience engineering community, having spent time caring for the technology systems powering high-growth startups as well as unicorns like Lyft and Stripe. See our interview with him here.
In our fourth episode, Amy chats with Craig Sebenik, SRE at Aurora and co-author of “What is SRE?” and “Salt Essentials.” He has a degree from Le Cordon Bleu (Sydney, Australia), a Master's in Italian Cuisine (Apcius in Florene, Italy), and a Master's in Gastronomy (University of Rheims, France). His greatest passion is teaching what he has learned from adventures in SRE and cooking.
It’s important to minimize alert or pager fatigue as much as possible, for the health and well being of your team members. After all, the health of your systems is dependent on the health of your people. Here are 5 tips on how to cut down on alert fatigue and improve your signal-to-noise ratio.
With the difficulties we’re facing during this time, it can be difficult to keep up with the increasingly vast demand for our services. You need to make use of all the tools in your toolbelt in order to conserve your team’s cognitive resources. Two ways you can do this are through automating toil from your processes and prioritizing with SLOs.
Between COVID-19 and the typical summer slow down, offices are emptier than they’re ever been. With team members taking some much-needed time off, it’s important to know how your team will be affected. Here are some tips to help your teams function during this time of flux.