At Blameless, our mission is to provide teams with the tools they need to operationalize SRE and embrace a culture of resilience. We help teams automate toil and adopt best practices across integrated incident management, comprehensive retrospectives, service level objectives, reliability insights, and more. We are very excited to announce that Blameless Runbook Documentation is now generally available for all customers.
Runbooks are an industry best practice, empowering teams to codify the incident response process and drive process repeatability and consistency. These sets of instructions allow teams to resolve incidents faster with greater confidence and less toil.
Below is a description of Runbook Documentation capabilities, as well as how you can get started with it today.
Documenting tasks and actions
Blameless Runbook Documentation allows users to create sets of documentable tasks and actions. These drag-and-drop tasks and actions can be in basic text, rich text, or code snippets. This allows for a more robust runbook with better context, as teams can include images, scripts to run, and more.
The ability to build step-by-step processes helps keep teams on the same page, and respond to incidents in a codified way. When an alert fires, runbooks can decrease the cognitive toil of remembering how to react to individual situations. One of our engineers, Jamie Atyeo experienced the benefits of Blameless Runbook Documentation.
“I was on-call and responded to a PagerDuty alert, and initially I had no idea what to do. However, I saw that we had a runbook set up for this particular alert, so I followed the runbook and was able to resolve the incident and fix the outage before the customer was even aware of it,” Jamie said.
I was on-call and responded to a PagerDuty alert, and initially I had no idea what to do. However, I saw that we had a runbook set up for this particular alert, so I followed the runbook and was able to resolve the incident and fix the outage before the customer was even aware of it.
Knowledge share and communication
Beyond helping teams resolve incidents faster with improved context and less cognitive overhead, Blameless Runbook Documentation also helps by centralizing critical documentation around operations. Information is now codified and available to all users within an organization. This limits tribal knowledge and helps engineers get onboarded much faster.
Blameless Runbook Documentation also eliminates the need to store runbooks across various tools (wikis, Confluence, PDFs, etc.), which often increases cognitive overhead both during and after incidents. Instead, runbooks are all editable within Blameless and easily attachable to incidents.
Blameless engineer Harry Hull says attaching a runbook to an incident is especially helpful. "Now that we require all of our incidents to have linked Runbook Documentaion, not only has it drastically reduced the need to escalate before incident resolution, but it has forced our alerts to be more actionable."
Now that we require all of our incidents to have linked Runbook Documentaion, not only has it drastically reduced the need to escalate before incident resolution, but it has forced our alerts to be more actionable.
Runbook Documentation helps teams see the gaps in their processes, as well. Teams can analyze runbook completion. Steps often skipped, ineffective, or that cause confusion can be revised. This gives teams the information necessary to refine runbooks.
Blameless Runbook Documentation provides a foundation for additional functionality that will allow teams to automate end-to-end incident workflows. In the future, customers will have access to out-of-the-box workflows across common tools to standardize incident communications, synchronization, and more.
What makes us excited about Runbook Documentation
At Blameless, we feel that it’s important to always be customer zero for our own product. We’ve been using Runbook Documentation for several months now. Two of our engineers, Lucas Bartroli and Alicia Li, helped build and currently use this feature. These are the top 3 things they are most excited about:
Runbook Documentation allows users to document the optimal way to respond to events. This helps teams be consistent in their incident response processes. Users are guided through a series of predefined steps to accomplish a specific outcome via manual tasks. In Blameless, you can also create independent steps that allow you to craft custom flows, and get metadata from each step to use on another step.
What was run at the time of the incident is preserved as-is, even if the runbook changes in the future. This is much better than an ad-hoc comment linking to a Google doc or Confluence that may have already been edited, as it gives a clearer view of what responders were working with.
You can write code snippets using Monaco Editor (the code editor that powers VSCode). This means you have no limits when writing a code snippet, as it supports more than 50 languages with syntax highlighting.
Blameless Runbook Documentation is available to all customers. If you’re a current user and want to learn more, visit our documentation or reach out to your Customer Support representative.