Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.
Customer Story

Vital Safeguards Patient Experience with Blameless

Vital Safeguards Patient Experience with Blameless 

Vital’s software for hospitals’ emergency departments uses artificial intelligence to provide transparency around patient experience such as wait times, tests, and length-of-stay.

This brings peace of mind to patients during stressful times in the ER. It also benefits healthcare providers as patient experience is the main driver of insurance payouts. This innovative team has also recently risen to the challenges of COVID-19 by providing a free coronavirus checker tool.

According to Engineering Manager Rafael Fonseca, as every second counts in a critical industry such as healthcare, the Vital engineering team focuses on efficient incident response. 


It's impossible to not have incidents, but we can mitigate their impact by regular training and checking that our knowledge on how the process works is crystal clear.

The goal of Vital's incident response program is to stress test and optimize three things: 1) monitoring, 2) detection and response, and 3) communications. 

By using Blameless, the team was able to scale up and codify incident response processes.

Hardening Incident Response: Where to Start?

Vital believes in the value of preparedness, which is why its engineering team has always, since its inception, run regular game days to practice for the inevitable incident. As the team expanded, it became clear that the priority was not how often it was introducing failure, but rather how quickly its team members could bring the system back to a healthy state. Before breaking the system, the team needed to build the foundation of the process.

Additionally, keeping track of documentation around services, architectural diagrams, and other critical assets for triage, was toilsome.

Pain Points before Blameless

  • No guardrails around incident response protocols
  • Low team confidence in the ability to recover from incidents
  • Toilsome documentation process

Goals

  • Drive incident response excellence while conserving engineering time
  • Automate incident tasks and democratize knowledge (eliminate ‘hand holding’)
  • Create a culture of preparedness and learning from failure


We know systems break down, it’s a natural part of the process. Instead of aimlessly hoping to avoid breakdowns, we make sure that we can recover faster from them. That's where Blameless came in: to help drive the recovery process and codify it.

The Solution: Transforming Culture, Ownership, and Confidence through Guardrails

With Blameless, incident coordination now takes seconds, and the team aligns on tasks from the very beginning.

Following the checklist guides Vital in the right direction.


We pick up habits around incident response, but I'd never found a solution that tells you where to start. You follow the prompts and it will run through your incident response process. That's when the light bulb went off for me.

With incident response kick off automatically, Vital saves time and engineering hours.

With less cognitive overhead, the team is able to free up bandwidth to focus on critical decision-making.


Blameless saves us a full time engineer dedicated to just incident response. If Blameless wasn't available, I'd have to hire another person. It would be really difficult to perform at the same level of incident response excellence without it.

Vital's customers typically also have SLAs that the company works with.

Blameless helps the Vital team enforce operational rigor that allows them to stay compliant with SLAs. 


It's interesting using Blameless for this type of compliance, even though we don't have this word written anywhere. I love that I know I'm being compliant with an SLA without using the word compliance at all in that process.

Reliability Toolchain

  • Blameless
  • Slack
  • JIRA
  • Datadog
  • PagerDuty

Blameless seamlessly integrates with Vital’s toolchain and works out-of-the-box. Additionally, Vital loves the Blameless team’s responsiveness to ongoing product feedback and requests. 


The support team has been amazing. Feature requests we share in our dedicated slack channel are always responded to. Not only is the product useful, but the team behind it does an amazing job at filling in gaps and taking in feedback.

The Business Impact

With Blameless, the team has noticed a significant cultural shift as engineers have higher confidence in shipping code knowing that when an incident occurs, the team will be able to handle it. Here are some other benefits:

  • Process and single source of truth to scale incident response
  • Adapt to remote work through chatops and integrated collaboration
  • More confidence in the team’s ability to respond to and recover from incidents


I don't want to even think about life without Blameless. It would mean a lot of work for me to try and document all the processes, all the hand-holding, that Blameless sets up for us behind the scenes.

What’s Next

As part of its goal to streamline communication, Vital has been leveraging Blameless’ internal comms feature to simulate posting status updates and continue honing the flow of task assignments. 

Vital is also looking to converge on a service registry in Blameless, to enrich services with component information such as wikis, monitoring, incidents, and more. While some of this is currently done through tags, the team looks forward to improving visibility and context by creating a tighter linkage between dynamic service components. 


It's like my AWS bill. I can pay it happily, knowing that it's the thing that I use to run the system; it's the baseline of the product. Blameless is the same thing: it’s a part of our culture. We wouldn’t be able to do as good a job without it. It makes my job easier.

The main focus of the company always remains: ‘If everything goes wrong, how do we ensure that patients are safe and receiving the care they need?’

Partnering with Blameless, Vital is able to scale its incident response process to meet growing needs. This leads to better platform reliability and happier, safer doctors, nurses, and patients.


We are now much more confident as a team and as a company that when there is an incident, we'll be able to respond in a much more organized way, reducing the impact on our customers.