Want to up-level your reliability program? Let's start by identifying your opportunities for growth.
How much time are engineering teams spending on incidents?
Are you trying to set your engineering team free to do their best work? Read our new case study to learn how Blameless can help you do that.

SRE: From Theory to Practice | What’s difficult about tech debt?

Emily Arnott
|
8.4.2022

In episode 3 of From Theory to Practice, Blameless’s Matt Davis and Kurt Andersen were joined by Liz Fong-Jones of Honeycomb.io and Jean Clermont of Flatiron to discuss two words dreaded by every engineer: technical debt. So what is technical debt? Even if you haven’t heard the term, I’m sure you’ve experienced it: parts of your system that are left unfixed or not quite up to par, but no one seems to have the time to work on.

Pretend your software system is a house.Tech debt is the leak in your sink that you’ve haven’t gotten around to fixing yet. Tech debt is the messy office you haven’t organized in a while. It’s also the new shelf you bought but haven’t installed.  To-do’s quickly build up over time. Even if certain tasks are quick, there are just so many of them that it’s tough to know where to start.

As software systems become more complex, with more integrations, microservices, and features, it becomes more likely that technical debt will accumulate. How do you get ahead of it before it piles up? And how do you deal with tech debt you already have, without sacrificing velocity on new projects?

It’s a question every engineer faces. We were excited to have two experts joining our panel to dive into the issue. Liz Fong-Jones is an SRE with 16 years of experience in improving the reliability of sociotechnical systems. She’s seen a wide scale of tech debt, from the challenges of high-velocity startups avoiding accumulating tech debt, to paying down tech debt built up over years at major enterprise orgs. Jean Clermont is a program manager at Flatiron, a medical tech company. Handling incident management and building resilience when dealing with something as critical as the human body requires a proactive mind for long-term issues, including tech debt.

Watch the recording to hear their insights. And as I did for episode one and episode two, I’ll be summarizing three key insights from their discussion!

Our four panelists discussing tech debt

Paying off tech debt means really knowing where it is

It can be difficult to track where tech debt is accumulating. Even when you have the opportunity to proactively reduce tech debt, how do you know what to tackle first? This is a question that has to be answered both proactively and reactively.

Proactively, you should try to log technical debt as it’s created. Sometimes technical debt is inevitable. You might need to implement a fix or feature that has a toil-intensive process to maintain it, or is unintuitive when expanding on it. Matt suggests logging whenever making a change that creates these issues. That way, when you’re looking to reduce tech debt, you have a list ready of issues to address. You can also use this list when estimating how long future tasks and updates will take – you can compensate for the delays that the tech debt will cause when working in specific areas of the codebase. Judging these delays can motivate you to deal with the most damaging tech debt.

Liz pointed out that focusing entirely on this “known” tech debt can “lull you into a false sense of security”. A lot of tech debt is created without people noticing, in small decisions that have unexpected consequences. Jean described tech debt as like an “iceberg”, where the vast majority of it could be hidden below the surface. Running up against this unknown tech debt is likely to be even more damaging than expected tech debt, as you won’t be able to proactively compensate for it.

So how do you find this tech debt “beneath the surface”? You need to look for the symptoms of it. Liz highlights the two major ways tech debt manifests: making it harder to develop new software, and increasing the toil required to maintain the system. Look for common processes that are very toilsome, or review projects that hit a lot of unexpected hurdles. Underlying tech debt could be the cause.

Paying down tech debt incrementally is best

It can be tempting to make proclamations like “we’ll spend until the end of the quarter dealing with all our tech debt, and then we’ll be fine going forward”. It’s the same behavior people have towards financial debt, or other tasks they’ve been avoiding. Rather than having to deal with them as they crop up, one imagines that the future will bring a totally different attitude or opportunity that allows easy cleanup of the neglected tasks.

Unfortunately, this plan doesn’t usually work out. If you aren’t in the habit of dealing with tech debt continuously, switching gears into focusing on it will be jarring and limit productivity. You likely won’t have a good understanding of where to start paying down tech debt if you usually ignore it. Moreover, you’ll immediately start accumulating tech debt again, without having the habits in place to track it and deal with it.

Instead, deal with tech debt incrementally, solving small parts as you become aware of them. Liz suggested on-call engineers finding chances to invest in battling tech debt. Working on documentation and runbooks can help counter the toil of tech debt. Even if you don’t have the bandwidth to overhaul code itself, having documentation and processes will reduce the problems tech debt causes. It also highlights where effort should be spent to make fundamental changes when possible. Every step you take to reduce tech debt helps more than waiting for the perfect time to try to wipe it all out.

Incentivize dealing with tech debt

Paying down tech debt isn’t always the most glamorous or visible work. Unfortunately, compared to developing new features that get celebrated releases, cleaning up or documenting old code may not get the same recognition. Rather than try to ignore this disparity between the types of work and hope that people will rise to the challenge of tech debt regardless, try to find ways to make tech debt work more celebrated.

One option, suggested by Matt Davis, is to set up programs like “bug bounties” for troublesome pieces of tech debt. Whoever finds the time and energy to rectify the tech debt would be able to claim the bounty. This bounty could be an actual financial reward, or some recognition in a team-wide or organization-wide meeting.

Another option is to add clearing tech debt to requirements to sprints and projects alongside new feature work. For example, a project plan could include three new features and one major piece of tech debt dealt with. This plan, suggested by Kurt, puts tech debt on the same level of feature development, equally instrumental in finishing the project. This helps the work get recognized as its tracked alongside the rest of the new feature work.

How does your organization tackle tech debt? Let us know by joining the conversation in our community Slack channel! Blameless helps you manage tech debt by highlighting incidents that have the most customer impact with our SLOs and reliability insights, showing you where damaging tech debt could be lurking. See more by signing up for a demo!

Resources
Book a blameless demo
To view the calendar in full page view, click here.