The Blameless Blog

Failure Is Not An Option Inevitable

Featured Post

SREview Issue #9 January 2021

New year, new SRE! We’ve said goodbye to 2020 and hello to 2021. Here’s some of the most exciting Tweets, content, and events happening in the SRE and resilience engineering community so far this year.
Oct 28, 2020
Yury Niño Roa Shares her Insights on Chaos Engineering and SRE

In this interview, we’ll delve into what draws Yury to SRE and chaos engineering, how she defines resilience, as well as her predictions on emerging trends in the SRE landscape.

Oct 27, 2020
Here are 4 Ways SRE Helps New Employees Onboard

The SRE mentality can provide insights into many areas, including onboarding itself. In this blog post, we’ll cover how SRE takes onboarding to the next level.

Oct 26, 2020
This is How Blameless Integrates with JIRA

Atlassian JIRA, one of the most popular ticketing systems, allows teams to catalogue incidents, follow-up actions, bugs, stories, and more. As a common tool in any DevOps/SRE operation’s toolchain, JIRA is a key integration at Blameless. Here's how it works.

Oct 19, 2020
3 Ways SRE Can Boost your Business Value

In this blog post, we’ll look at the business value of SRE through customer focus, observability, and efficiency.

Oct 16, 2020
SREview Issue #6 October 2020

BOO! Did we scare you? We couldn’t help it, we’re just so happy it’s spooky season. Here’s the October issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Oct 13, 2020
Can Security Teams Benefit from SRE? You bet!

In this blog post, we’ll break down how to use SRE to enhance your security procedures.

Oct 8, 2020
How to Construct a Reliability Model for your Organization

In this post, we’ll construct a basic reliability model and show you how to create one for your own organization.

Oct 1, 2020
This is your Guide for Implementing SRE in NOCs

In this blog post, we’ll look at how SRE can improve NOC functions such as system monitoring, triage and escalation, incident response procedure, and ticketing.

Sep 30, 2020
The Ultimate, Free Incident Retrospective Template

To make the most of each incident, teams need a solid post-incident template that can help minimize cognitive load during the analysis process. Here is an example of what a comprehensive, narrative incident retrospective could look like.

Sep 24, 2020
Here's your Complete Definition of Software Reliability

In this blog post, we’ll break down what software reliability means. We’ll look at how the reliability of your software is perceived, how teams operate to improve reliability, and how to contextualize reliability with customer happiness and cultural lessons.

Sep 17, 2020
Availability, Maintainability, Reliability: What's the Difference?

In this blog post, we’ll break down reliability in terms of other metrics within reliability engineering: availability and maintainability.

Sep 15, 2020
SREview Issue #5 September 2020

Here’s the September issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Sep 11, 2020
SRE Leaders Panel: Testing in Production

Our panelists discussed testing in production, how feature flagging and testing can help us do that, and how to get managers to be on board with testing in production.

Sep 8, 2020
How to Improve the Reliability of a System

In this blog post, we’ll work through some helpful steps to take when improving a system’s reliability. We’ll use a development project as an example, but the essence of this advice can be applied anywhere SRE is being implemented.

Sep 3, 2020
Industry Experts Explain how to Thrive in a Post-COVID World

In a CIO panel hosted by Lightspeed Venture Partners, industry experts came together to discuss how to thrive in a post-COVID world. Here are key insights from their coversation.

Sep 2, 2020
Determining Error Budgets and Policies that Work for Your Team

In this blog, we’ll look at the basics of error budgeting, how to set corresponding policies, and how to operationalize SLOs for the long term.

Sep 1, 2020
How to Build Your SRE Team

In this blog post, we’ll look at some of the many roles an SRE can play, and how to find people with those skill sets.

Aug 26, 2020
Here are the Important Differences Between SLI, SLO, and SLA

In this blog post, we’ll cover what SLI, SLO, and SLA mean and how they contribute to your reliability goals.

Aug 25, 2020
How SLOs Enable Fast, Reliable Application Delivery

In this blog, we’ll discuss how SLOs are the key to modern application delivery, how to manage and measure them, the importance of observability for your SLO solution, and how to begin the journey to reliable application delivery today.

Aug 21, 2020
SREview Issue #4 August 2020

Here’s the August issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Get the latest from Blameless

Receive news, announcements, and special offers.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.