With the difficulties we’re facing during this time, it can be difficult to keep up with the increasingly vast demand for our services. You need to make use of all the tools in your toolbelt in order to conserve your team’s cognitive resources. Two ways you can do this are through automating toil from your processes and prioritizing with SLOs.
Flexibility is crucial right now, but it’s difficult to create during a crisis if it didn’t exist prior to it. Organizations with less toil built into their processes are set up to succeed better than those with toil-intensive processes. This means it’s more important than ever to build some “margin” into our processes in order to remain flexible.
Brain space is at a premium during a crisis. With stress levels mounting, cognitive capacity is diminished. While teams may be too busy putting out fires to focus on automation, it’s actually more important than ever to decrease the cognitive load teams are facing. Additionally, automation can help build a buffer between the loss of productivity teams face during this crisis and the need to perform at an increased capacity. This can also increase the likelihood of the 50/50 engineering and toil split, giving you more room for innovation despite the constraints on resources.
Your team will also function better with decreased strain and toil. Richard Cook from Adaptive Capacity Labs notes that during this crisis, “Social spaces will become more tightly coupled. The effects of events and strains at work will transfer to home and vice versa. The influence of work on home (and home on work!) is usually moderated via social conventions. As stress saps energy it becomes more difficult to maintain boundaries.”
When toil becomes overwhelming, teams will lose energy and productivity. Automation helps build margin for your teams to recharge, take time with their families, and deal with this difficult time in a healthy way.
One way to bake in automation is with runbooks, easing incident response. Here are some key steps to consider when creating automated runbooks:
In addition to automated runbooks, you can also use SLOs to help create margin through compassionate prioritization.
Margin can be built into your processes in other ways, too. One useful method is through error budgets and SLOs. SLOs are powerful tools to help align teams on how to prioritize engineering work against new features vs. reliability needs. This shared agreement is even more important now than ever. Richard Cook from Adaptive Capacity Labs predicts that during this crisis, “Tribalism will increase. Past success in producing a “no blame” and “learning” environment will come under severe pressure as the strain accumulates. Groups that previously worked in harmony may be at odds. Willingness to share productivity across groups will be sapped by the loss of resources and decreased performance.”
As teams experience unprecedented strain and are hit simultaneously with increases in unplanned work as well as reduced capacity, a game of tug of war could erupt. This means that even policies and metrics of success must change during this time. As such, SLOs and error budgets should be established with the team’s context in mind. As Alex said, “The best way to use the concept of an error budget isn’t that you have to actually have measurements, but rather that the concepts behind it give you a different way of thinking about things. And to have good discussions with people with that data and to help you make decisions based upon that.”
He also stressed the importance of revisiting a target whenever necessary: whether that’s due to an incident, change in code base, or a massive black swan event. Relaxing your error budget and compassionately setting flexible SLOs can help facilitate your team’s adaptive capacity, while improving shared prioritization of the work that matters most.
If you liked this blog post, check out these as well: