The blameless blog

For incident management, should you build or buy?

Blameless
Blog home
Incident Response
Aaron Lober
Aaron Lober
|

Is your incident response held together by a thread? Are you manually recording incident updates in a shared doc? Do you struggle to juggle the incident management workload with your other responsibilities? Does everyone on-call report data the same way? These are all common problems faced by DevOps teams still relying on homegrown incident management tooling. 

At face value, it may seem like designing a process and building a tool internally is the way to go, but it’s setting your team up to fall victim to the perfect storm when something fails. Imagine this:

Your home-grown solution isn't working quite as expected. The company has grown and the original solution wasn’t built to support the new size and complexity of your microservices architecture or your engineering team. Unfortunately, the team that built it years ago are no longer with the company. Nothing is properly documented so it's super time-consuming to figure out and know where to start. Plus, you have new team-members who just came on board and are totally in the dark. Wham! a Sev0 incident occurs and it's intense! 

Your on-call team is scrambling to find a resolution to the problem as fast as possible. Messages are flying across half channels in your chat-app. The process guardrails you had in place are crumbling  and your executive stakeholders are wondering where it all went wrong. Worst of all, the slow disorganized response begins to impact customers. The minutes and hours tick by and money is flying out the door.

If that description sent a chill down your spine, if you’ve lived through the chaos of uncoordinated incident response, you know how NOT FUN this scenario is for everyone involved. If this sounds even remotely plausible for your organization, it’s time to revisit your incident management tool stack and playbook. If you’re working with a homegrown solution, it’s time to reconsider the build vs. buy decision.

Nightmare scenario aside, the idea of building a solution in house can look attractive. Especially to an enterprising engineering manager with a product background. You can build a solution to  perfectly fit your organization’s specifications. If something goes wrong, you’ll be able to troubleshoot quickly because, after all, you built it. You won’t be beholden to a third party and you won’t be on the hook for annual fees. Plus, building things is what an engineering organization does.

You don’t have to look far to find examples of engineering organizations making this choice. Take RazorPay for example. They’ve built their own home-grown system and are vocal proponents of the need for a dedicated solution. The truth is, effective incident management NEEDS tooling. It’s non-negotiable at this point. Embracing dedicated incident management tooling, home-grown or store bought is better than trying to work a manual process, and in a world where solutions are needed now and your engineering organization is confident they CAN build it, the choice to build from within is understandable. 

The problem with that thinking is it misses all the hidden costs of development and maintenance while also ignoring the benefit of allowing someone else to put focused effort on innovation. The average engineer makes up to $250,000 a year. It takes a team of engineers close to 20% of their time across a quarter to get a solution developed, plus another 20% of their time in perpetuity to maintain it and stay on top of its own reliability and performance. The bigger your company and the more expansive your microservices architecture, the longer this takes and the more resources required. Bottom line — it will be difficult, if not downright impossible, to reliably predict cost. 

Of course, this ignores the additional cost of any innovation on the solution itself. By working with a third party software provider, you benefit from the development, maintenance and innovation all rolled into your upfront purchase cost. Remember, this is the software provider’s business. In this case, every single one of their engineering teams is focused on developing the best incident management solution out there. Their livelihoods depend on it. Literally. Expecting your internal team to match that level of investment is unrealistic.

“Despite our best efforts, we could never really give adequate resources to the thing we built internally… It was just more work than it was worth,”
— Staff SRE, Major Software Manufacturer

The biggest problem with creating an in-house solution is that it takes your team’s focus away from doing the thing that keeps the lights on, BUILDING GREAT PRODUCTS FOR YOUR CUSTOMERS.

Incident management software is meant to unlock the potential of your engineering team. Reduced operational load, less toil, faster resolutions, and better reliability are all possible. But realizing any or all of these benefits requires a combination of tooling and process that reduces friction and allows engineers to focus on the highest value activities in front of them. Working with a software vendor allows your engineering team to focus on innovation within your own product, while still delivering a great user experience for your customers.

There are a few particular conditions that make it especially important for you to consider purchasing incident management software rather than building from within.

When Should I Buy Incident Management Tooling?

Graphic highlighting factors that may make it better to buy incident management software - scaling, unpredictable costs, and engineering teams losing focus

Buying software rather than building it in-house isn’t without its own risks. Any tool you buy runs the risk of not mapping perfectly to your established process. There’s also the risk of the software provider not delivering on their commitments. Maybe they struggle to develop the roadmap they promised. It’s equally possible that the vendor experiences their own reliability struggles. A major outage has the potential to impact your team as much as the failure of your in-house solution. All of these make it critically important for you to select the right software provider. When you work with the right software vendor however, not only can you trust them to deliver a reliable product experience, you can play a role in guiding how they innovate, and be the beneficiary of their market reach. 

Bottom line, if you are waiting until an internal tool fails to select a dedicated incident management software provider, it’s too late. Your team is absorbing costs that may not be obvious that show up in many parts of your business. Not the least of which is slower development of your core product. 

If you’re interested in learning more about the common inputs to the build vs. buy decision, download our infographic. Then, visit www.blameless.com/trial for a free trial of the industry’s leading incident management solution.