The blameless blog

9 Reliability Talks at AWS re:Invent 2019 that SREs Should Attend

Blog home

Planning your schedule for AWS re:Invent 2019 but don’t know how to choose between the 3,400 sessions? If you are passionate about all things reliability, we’re here to help you sift out the signal from the noise.

The Blameless engineers have curated a list of nine exciting SRE talks for anyone interested in improving reliability and blameless culture at their company. We've divided these talks into three categories: cultivating culture, how successful companies built it, and reliability best practices.

How to use this guide

  1. Talks are linked to the Session Catalog. Login, star your favorites, and a schedule will be generated to help you plan your day.
  2. Talk codes ending in “R” have repeat sessions throughout the week whereas other talks are only available once.
  3. Speakers' Twitter handles are linked below, so follow and interact with your favorites from AWS!

Cultivating culture

Building a culture of observability (DOP308-S):

Cory Watson (Technical Director, SignalFx) and Julia Wong (Developer, Atlassian)Implementing observability requires an organizational culture shift. Cory Watson and Julia Wong share their highs and lows of building teams, tools, and processes rooted in observability.

Driving change and building a high-performance DevOps culture (DOP207-R1):  

Mark Schwartz (author and Enterprise Strategist, AWS)How do you drive cultural change from all levels of the organization? Mark Schwartz will share research and insights drawn from his experiences bringing Agile and DevOps practices to US Citizenship and Immigration Services.

How successful companies built it

Reliability of the cloud: How AWS achieves high availability (ARC306-R):

Adrian Hornsby (Principal Evangelist, AWS) and Rodney Lester (Reliability Tech Lead, AWS Well-Architected, Amazon Web Services)Adrian Hornsby leads a technical chalk talk on how AWS achieves high availability and reliability with its Well-Architected Framework.

Beyond five 9s: Lessons from our highest available data planes (ARC349-R):

Colm MacCarthaigh (Senior Principal Engineer, AWS)How do you build a Tier 0 service that can withstand massive loads and outages? Colm MacCarthaigh explains how AWS’ approach to resiliency is rooted in their team and technology.

Release with confidence: Observability for microservices (DEM14-S):

Kevin Crawley (Developer Evangelist, Instana)With great complexity comes great need for observability. Kevin Crawley reveals how Instana leverages the three pillars of observability in Kubernetes to better understand deployment performance for SREs and developers.

Reliability best practices

Amazon’s approach to failing successfully (DOP208-R):

Becky Weiss (Senior Principal Engineer, AWS)Becky Weiss shares Amazon’s top strategies for understanding metrics, maximizing learning from failures, and conducting meaningful postmortems.

Failing successfully: The AWS approach to resilient design (ARC303-R):

David Yanacek (Principal Engineer, AWS Lambda)Designing for resilience and reliability is the key to creating sustainable systems. David Yanacek (principal engineer, AWS Lambda) discusses the AWS tools and best practices that can be used to achieve this vision.  

How Coinbase handles incident management by leveraging AWS (STP207):

Amy Li (SRE, Coinbase) and Lalita Maraj (Infrastructure Engineer, Coinbase)SREs Amy Li and Frances Chong explain how Coinbase built Misato, an incident bot, using AWS Lambda and SES to improve their company’s incident resolution process.

Serverless architectural patterns and best practices (ARC307):

Heitor Lessa (Principal Serverless Lead, AWS)How do you get the most out of serverless patterns without incurring an exorbitant cost?  Heitor Lessa shares operational, security, and reliability best practices for serverless architectures.

Book a 1-1 with Blameless at AWS!

With AWS rapidly approaching, it’s vital that you have a plan to make the most of your experience. If you’re passionate about reliability and building a blameless culture, we’d love to talk to you and walk you through how our team can help. Join the conversation on #AWSreInvent and #reInventSRE, or find us at Booth #3130.

If you need more information on how to navigate the conference and build the best schedule, check out these two fantastic guides written by AWS veterans.

"I have less anxiety being on-call now. It’s great knowing comms, tasks, etc. are pre-configured in Blameless. Just the fact that I know there’s an automated process, roles are clear, I just need to follow the instructions and I’m covered. That’s very helpful."
Jean Clermont, Sr. Program Manager, Flatiron
"I love the Blameless product name. When you have an incident, "Blameless" serves as a great reminder to not blame anything or anyone (not even yourself) and just focus on the incident resolving itself."
Lili Cosic, Sr. Software Engineer, Hashicorp
Read their stories