LISA21 - Groove with Ambiguity: The Robust, the Reliable, and the Resilient
The networked software systems we build are increasing in complexity every moment. Today the most successful builders and operators are embracing complexity through CI/CD, Chaos Engineering, and innovation in Incident Response. They realize that the adaptive world around us is advancing at such a breakneck speed, it is leaving our capacity to understand it in the dust. That humans and technology must race a gauntlet of automation surprises and collaboration challenges as a team, learning and improving along the way. This session showcases methods of deploying, running, and navigating complexity. It offers a practical view of how software systems can scale and remain robust to failure (like fallbacks or high availability), achieve highly reliable socio-technical operations (via runbooks and game days), and adapt to surprise through techniques of resilience engineering (graceful extensibility and building for adaptation).
Matt is a Sr. Infrastructure Engineer at Blameless. His expertise brings to bear a variegated background including data-center operations, storage hardware and distributed databases, IT security, site reliability, support services, observability systems, and techops leadership. He has a passion for exploring the relationships between the artistic mind and operating distributed software architectures.