The Blameless Blog

Failure Is Not An Option Inevitable

Featured Post

3 Ways SRE Can Boost your Business Value

In this blog post, we’ll look at the business value of SRE through customer focus, observability, and efficiency.
Aug 13, 2020
Choosing the Right SRE Tools

Implementing SRE practices and culture can be challenging. In this blog, we’ll talk about what to look for in an SRE tool, and how they’ll help you on your journey to reliability excellence.

Aug 12, 2020
Look Upstream to Solve your Team's Reliability Issues

We can’t impede innovation, but we can Dan Heath’s wisdom from upstream thinking to move away from reactive modes of work and make our teams and our systems more reliable.

Aug 6, 2020
The Importance of Reliability Engineering

What makes reliability engineering so important? In this blog, we’ll look at three big benefits of investing in reliability and explain how you can get started on your journey to reliability excellence.

Aug 5, 2020
Improving Postmortems from Chores to Masterclass with Paul Osman

In our 2019 Blameless Summit, Paul Osman spoke about how to take postmortems or incident retrospectives to a new level.‍The following transcript has been lightly edited for clarity.

Aug 4, 2020
How to Bring Operational Experience to your Development with Github's Lauren Rubin

At the 2019 Blameless Summit, Lauren Rubin spoke about how to bring operational expertise to development teams.

Jul 30, 2020
How to Improve On-Call with Better Practices and Tools

Establishing equitable on-call rotations, putting the right guardrails and automation in place, and regular incident practice are key to minimizing the stress of on-call. In this blog, we’ll share key tools and practices to ensure your on-call engineers are set up for success.

Jul 29, 2020
Enabling the Stripe and Lyft Platforms Through Modern Safety Science

Jacob Scott is an experienced engineer and enthusiastic participant in the resilience engineering community, having spent time caring for the technology systems powering high-growth startups as well as unicorns like Lyft and Stripe. See our interview with him here.

Jul 24, 2020
Resilience in Action E4: The Good Ol' Days and Education with Craig Sebenik

In our fourth episode, Amy chats with Craig Sebenik, SRE at Aurora and co-author of “What is SRE?” and “Salt Essentials.” He has a degree from Le Cordon Bleu (Sydney, Australia), a Master's in Italian Cuisine (Apcius in Florene, Italy), and a Master's in Gastronomy (University of Rheims, France). His greatest passion is teaching what he has learned from adventures in SRE and cooking.

Jul 23, 2020
How to Choose Monitoring Tools for DevOps and SRE

Deciding what and how to monitor is an important decision. We’ll walk you through the basics in this blog post. We’ll also suggest a few popular monitoring tools for your consideration.

Jul 22, 2020
Leaders, Here's how to Encourage Full Service Ownership

Service ownership is becoming common practice and its benefits are well-known. Leadership will need to encourage and empower teams to adopt the “you build it, you run it” mentality. Here are some ways to get teams on board.

Jul 21, 2020
SREview Issue #3 July 2020

Here’s the July issue of SREview! This monthly zine features epic Tweets, content, and events happening in the SRE and resilience engineering community.

Jul 21, 2020
How SLOs Help Your Team with Service Ownership

Learn how SLOs can help with service ownership by using metrics to learn about system health, unifying incentives, and balancing reliability with innovation.

Jul 17, 2020
The Essential List of Top SRE Resources

Are you looking to get up to speed on SRE fundamentals with the best SRE books and best DevOps books? Or are you hoping to expand your SRE knowledge into new domains? Either way, we’ve got you covered in our list of essential SRE resources!

Jul 16, 2020
5 Tips for Getting Alert Fatigue Under Control

It’s important to minimize alert or pager fatigue as much as possible, for the health and well being of your team members. After all, the health of your systems is dependent on the health of your people. Here are 5 tips on how to cut down on alert fatigue and improve your signal-to-noise ratio.

Jul 15, 2020
Leadership and Innovation with Instacart's VP of Infrastructure

Blameless CEO Ashar Rizqi recently had the pleasure of interviewing Dustin Pearce in a virtual executive fireside chat and AMA. Below is the transcript of their conversation.

Jul 14, 2020
Are you Promoting Continuous Learning within Your Teams?

Our work-as-done may not match what we did at the beginning of 2020. However, by prioritizing continuous improvement and learning, we can work through these issues and build more resilient socio-technical systems.

Jul 13, 2020
Fostering Teamwork and Culture in the Era of Remote Work

Remote work isn't going anywhere. Make sure your teams are working with it, not against it by fostering teamwork and culture.

Jul 10, 2020
How to Create Margin in your Systems with SRE Best Practices

With the difficulties we’re facing during this time, it can be difficult to keep up with the increasingly vast demand for our services. You need to make use of all the tools in your toolbelt in order to conserve your team’s cognitive resources. Two ways you can do this are through automating toil from your processes and prioritizing with SLOs.

Jul 9, 2020
This is What you Should do to Minimize SPOFS

Between COVID-19 and the typical summer slow down, offices are emptier than they’re ever been. With team members taking some much-needed time off, it’s important to know how your team will be affected. Here are some tips to help your teams function during this time of flux.

Jul 8, 2020
How to Classify Incidents

In this blog, we’ll look at some benefits of classifying incidents, how classification is distinguished from incident triage, how to set up your own classification system, and how ITIL handles incident classification as an example.

Get the latest from Blameless

Receive news, announcements, and special offers.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.