Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

Survival Guide: Black Swan Events

Survival Guide: Black Swan Events

Description

In SRE, a core message is that failure is inevitable. No matter how much you prepare, there will always be incidents you can't foresee. This doesn't mean preparation is useless, though. This talk will focus on one extremely valuable type of preparedness: having backups and restoration processes for the worst disasters. When your system experiences a total outage, an effective option is often to switch to a backup system before trying to solve the issue itself. This will restore service as fast as possible. However, just making backup systems isn't enough. This talk reveals complacency and blind spots when it comes to backup systems. Many organizations feel comforted by having created backups, but aren't actually prepared to use them. There will be practical advice given on how to improve backup systems for organizations of all sizes. The talk will cover looking at backup systems from the perspectives making them more reliable, more robust, and more resilient - based on the definitions given by Dr. David D. Woods. In order to make the advice inclusive, there won't be much technical detail. Instead, the focus will be on mindsets and strategies. Black swan events are highly impactful incidents that are so unlikely or unimaginable that effort isn’t made to prepare for them. You'll learn how to conduct thought experiments of "meteor strikes" and other worst-case scenarios, such as ransomware, to feel ready for other problems you can't yet imagine. You'll also see how backup systems can still be useful for such disasters. This is how a resilient backup system is created - one that can still handle what falls outside your expectations.

Speakers

Emily Arnott

Community Relations Manager, Blameless

Emily Arnott

Community Relations Manager, Blameless
Emily is the Community Relations Manager at Blameless, where she fosters a place for discussing the latest in SRE. She has also presented talks at SREcon, Conf42, and Chaos Carnival.
Blue cross X  - Blameless Images

Jake Englund

Sr. Site Reliability Engineer, Blameless

Jake Englund

Sr. Site Reliability Engineer, Blameless
Jake has an insatiable curiosity for learning about how complex systems work. Ever since his serendipitous introduction to SRE, Jake has been fascinated by the unique challenges and innovative solutions which come with scaling web services by orders of magnitude. In his spare time, he enjoys video and tabletop games, dancing, and cooking.
Blue cross X  - Blameless Images