Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.
Case Study

LeoLabs

Chretien Mayes is a seasoned aerospace systems engineer. He spent his early career with NASA before joining the private sector. He is an expert in software development, orbital mechanics & flight dynamics and today leads the Technical Support team for LeoLabs

Launching upwards with LeoLabs

The vast emptiness of space ain’t what it used to be. Since the day NASA launched astronaut Alan Shepard into orbit on May 5, 1961, the space surrounding our little green rock has been literally littered with junk, cast off from our forays into space.  Now, 80 years after Alan Shepard’s first flight, the risk of collisions with defunct satellites and other debris makes putting anything or anyone into orbit a scary proposition. As many essential services rely on satellites – from GPS to internet relays to weather monitoring – these collisions could lead to devastating consequences back on Earth. Sounds like the end of space launches right? Don’t worry, LeoLabs is on it.

Founded in 2016, LeoLabs is at the forefront of monitoring near-earth objects. They specialize in helping clients make it into space and avoid any nasty collisions once they get there.

{{quote1}}

LeoLabs uses a worldwide array of radar stations to track objects in space. They help customers identify the position of their satellites, to manage the risk of their possible collisions with debris and other objects, while also preventing new launches from threatening existing assets already in orbit. Space flight isn’t what anyone would describe as a forgiving undertaking. Precision, accuracy, and reliability are essential to avoiding catastrophic disasters. Navigating sky trash is a space-age problem, and LeoLabs is a cutting-edge solution.

LeoLabs’ start on their reliability journey

Space is cool. Really cool. That’s why, when Chretien Mayes was first thinking about “real adult jobs'', this industry stood out to him. But space isn’t just rockets flying around and probes to distant planets. Behind every awesome launch is a mountain of data that needs to be available and accurate.

{{quote2}}

Reaching that goal is a challenge that gets tackled in two distinct ways. First, when something’s about to break, detect it as early as possible and prevent it. Second, when something does break, fix it as fast as possible.

To execute that plan, LeoLabs needed to get proactive. Their team needed a way to harness the incident response process to design better preventative measures for the future. They needed a system for learning and adapting to new challenges and implementing new solutions on the fly. Whether that meant new monitoring processes or mitigation techniques, the bottom line was if something broke, it couldn’t be allowed to happen again.

That is a hefty challenge, but when the future of space travel is on the line, you better bat .1000%. The LeoLabs team realized if they were going to succeed, they needed to rethink the way they handled incidents - from response, to communication, to collaborative learning and remediation. For the LeoLabs team, that all starts with bringing incident response into one place.

{{quote4}}

In this sort of decentralized incident response process, incidents were being resolved more slowly and haphazardly than Chretien and the team were happy with. After all, this is space flight, and customer confidence is dependent on impeccable reliability.

When Chretien joined LeoLabs, he started looking for a way to help the tech support team achieve these goals. He discovered that another team was already employing a tool that could prove instrumental for his team: Blameless.

Starting off with Blameless

Chretien first saw the value of Blameless in improving incident tracking. He wanted to eliminate incidents that just lingered without conclusion or those that were started without sufficient information. The standardization, tracking, and tagging features of Blameless immediately appealed to him.

{{quote5}}

This kicked off a sea change in how people saw incidents across LeoLabs. Chretien was able to track issues over time and present data to stakeholders. This gave the perspective needed to prioritize fixes to managers all the way up the flagpole. Rather than incidents just being annoyances, they became guidelines for future strategic decisions.

To get the full benefits of Blameless, Chretien needed to train his team to be comfortable with using it. He needed to replace their status quo with a new muscle memory. This isn’t a trivial overnight process, but Chretien realized it was also an opportunity to get employees up to speed with the benefits of proper incident management too.

{{quote6}}

Using Blameless makes onboarding and upskilling your team for incident response easier, faster, and more consistent. Rather than expecting engineers to adopt unintuitive ad-hoc processes, Blameless gives them a clear pathway to contribution.

LeoLabs soaring higher with Blameless

As they integrated Blameless into their workflow, LeoLabs saw great results. They were getting closer and closer to near-perfect reliability.

{{quote7}}

This feedback loop of detecting problems, investigating root causes, and implementing automatic prevention measures is already leading to fewer incidents being resolved faster, and with less toil. By addressing incidents through Blameless, they’ve built up a knowledge bank of incident info that helps their team navigate further incidents.

Incidents used to be one person toiling solo, but now many teammates are contributing information and suggestions. LeoLabs is getting bigger contributions from more people.

{{quote8}}

1

Standardization And Automation Of Incident Management Process

This is some text inside of a div block.

Chretien Mayes is a seasoned aerospace systems engineer. He spent his early career with NASA before joining the private sector. He is an expert in software development, orbital mechanics & flight dynamics and today leads the Technical Support team for LeoLabs

“We’re doing this to help build better situational awareness for customers and ensure a safe operating orbital regime for future generations. Space debris is a very important problem to solve. It’s becoming ever more important each day with more satellite constellations going up.”

Chretien Mayes

Technical Support Lead at LeoLabs

“Our biggest goal is to ensure that there is continuous real-time data that’s reliable. So from the perspective of technical support, we ensure that disruptions or interruptions to that goal are as minuscule as possible.”

Chretien Mayes

Technical Support Lead at LeoLabs

“We had processes in place to declare incidents, but they were being handled in a decentralized way. This was making it harder for the broader team to contribute to resolutions and it created some inconsistency in terms of completing follow-up work. Essentially information about any given incident existed in Slack, but it rarely made it any further. You couldn’t locate essential information in the future, because it might have been in a DM, or a random channel, or something like that.”

Chretien Mayes

Technical Support Lead at LeoLabs

“The appeal was the ability to create incidents individually, then track different tags and information under it. It was easy to not only track them, but visualize them, plot them, and report them out. It allowed me to report whether it was a customer experiencing an incident, or a product having an incident.”

Chretien Mayes

Technical Support Lead at LeoLabs

“I gave some training for escalation response at the start of the year. We focused on Blameless as the core tool. I showed our internal people how to start an incident in Blameless, how to update incidents, and how to make sure you’re setting tasks and completing those tasks and their follow-ups. Helping folks master Blameless and having them use it to initiate escalation procedures is what helped in growing and maturing a lot of people.”

Chretien Mayes

Technical Support Lead at LeoLabs

“When something broke, we would handle that through an incident using Blameless. We’d do our retrospective follow-up, then usually institute some new type of monitoring and integration alarm. Something that sets off some other code or process to help in alleviating whatever is going out of bounds.”

Chretien Mayes

Technical Support Lead at LeoLabs

“Incidents are more approachable and centralized, we’ve indexed a lot of incident information, and that’s translated to a more resilient system overall”

Chretien Mayes

Technical Support Lead at LeoLabs