In SRE, a core message is that failure is inevitable. No matter how much you prepare, there will always be incidents you can't foresee. This doesn't mean preparation is useless, though. This talk will focus on one extremely valuable type of preparedness: having backups and restoration processes for the worst disasters. When your system experiences a total outage, an effective option is often to switch to a backup system before trying to solve the issue itself. This will restore service as fast as possible. However, just making backup systems isn't enough. This talk reveals complacency and blind spots when it comes to backup systems. Many organizations feel comforted by having created backups, but aren't actually prepared to use them. There will be practical advice given on how to improve backup systems for organizations of all sizes. The talk will cover looking at backup systems from the perspectives making them more reliable, more robust, and more resilient - based on the definitions given by Dr. David D. Woods. In order to make the advice inclusive, there won't be much technical detail. Instead, the focus will be on mindsets and strategies. Black swan events are highly impactful incidents that are so unlikely or unimaginable that effort isn’t made to prepare for them. You'll learn how to conduct thought experiments of "meteor strikes" and other worst-case scenarios, such as ransomware, to feel ready for other problems you can't yet imagine. You'll also see how backup systems can still be useful for such disasters. This is how a resilient backup system is created - one that can still handle what falls outside your expectations.

Description

In SRE, a core message is that failure is inevitable. No matter how much you prepare, there will always be incidents you can't foresee. This doesn't mean preparation is useless, though. This talk will focus on one extremely valuable type of preparedness: having backups and restoration processes for the worst disasters. When your system experiences a total outage, an effective option is often to switch to a backup system before trying to solve the issue itself. This will restore service as fast as possible. However, just making backup systems isn't enough. This talk reveals complacency and blind spots when it comes to backup systems. Many organizations feel comforted by having created backups, but aren't actually prepared to use them. There will be practical advice given on how to improve backup systems for organizations of all sizes. The talk will cover looking at backup systems from the perspectives making them more reliable, more robust, and more resilient - based on the definitions given by Dr. David D. Woods. In order to make the advice inclusive, there won't be much technical detail. Instead, the focus will be on mindsets and strategies. Black swan events are highly impactful incidents that are so unlikely or unimaginable that effort isn’t made to prepare for them. You'll learn how to conduct thought experiments of "meteor strikes" and other worst-case scenarios, such as ransomware, to feel ready for other problems you can't yet imagine. You'll also see how backup systems can still be useful for such disasters. This is how a resilient backup system is created - one that can still handle what falls outside your expectations.
In SRE, a core message is that failure is inevitable. No matter how much you prepare, there will always be incidents you can't foresee. This doesn't mean preparation is useless, though. This talk will focus on one extremely valuable type of preparedness: having backups and restoration processes for the worst disasters. When your system experiences a total outage, an effective option is often to switch to a backup system before trying to solve the issue itself. This will restore service as fast as possible. However, just making backup systems isn't enough. This talk reveals complacency and blind spots when it comes to backup systems. Many organizations feel comforted by having created backups, but aren't actually prepared to use them. There will be practical advice given on how to improve backup systems for organizations of all sizes. The talk will cover looking at backup systems from the perspectives making them more reliable, more robust, and more resilient - based on the definitions given by Dr. David D. Woods. In order to make the advice inclusive, there won't be much technical detail. Instead, the focus will be on mindsets and strategies. Black swan events are highly impactful incidents that are so unlikely or unimaginable that effort isn’t made to prepare for them. You'll learn how to conduct thought experiments of "meteor strikes" and other worst-case scenarios, such as ransomware, to feel ready for other problems you can't yet imagine. You'll also see how backup systems can still be useful for such disasters. This is how a resilient backup system is created - one that can still handle what falls outside your expectations.

Speakers

Emily Arnott

Community Relations Manager, Blameless
Read Bio

Emily Arnott

Community Relations Manager, Blameless
Emily is the Community Relations Manager at Blameless, where she fosters a place for discussing the latest in SRE. She has also presented talks at SREcon, Conf42, and Chaos Carnival.

Jake Englund

Sr. Site Reliability Engineer, Blameless
Read Bio

Jake Englund

Sr. Site Reliability Engineer, Blameless
Jake has an insatiable curiosity for learning about how complex systems work. Ever since his serendipitous introduction to SRE, Jake has been fascinated by the unique challenges and innovative solutions which come with scaling web services by orders of magnitude. In his spare time, he enjoys video and tabletop games, dancing, and cooking.

Video Description

In SRE, a core message is that failure is inevitable. No matter how much you prepare, there will always be incidents you can't foresee.

Video Transcript