Recently I watched the AppleTV series, Silo. The show focuses on the lives of 10,000 people living in an underground silo. They know that their ancestors have lived in the silo for hundreds of years, but they don’t know why, and they don’t know why it’s unsafe to leave the silo.
The silo is powered by a generator and maintained by a team called Mechanical. In the third episode of the show, it’s made clear that the generator that powers the silo hasn’t worked correctly in 30 years, and it’s rapidly moving toward failure. Of course, this made me immediately think of software tech debt! In this post, I’ll examine the eight steps a team should take to address tech debt, with examples from Silo and from a project I worked on several years ago.

Step One: Be aware of the problem
In the silo, Mechanical was well aware of the problem with the generator, but no one else was aware, not even the mayor. The first step in fixing the problem was letting the mayor know that there were serious issues with the generator.
In the real world, I worked at a company that had an important payment module. The module was being built in an already overloaded server. Only a few developers were aware of the problem until they shared the issue with management.
It’s easy to stick one’s head in the sand when there is a tech debt problem, but ignoring the problem is not going to make it go away. Viewing the problem clearly is the first step in determining how it can be fixed.
Step Two: Make a case for why the tech debt should be fixed
The generator in the silo was the only thing keeping the power running and the lights on. If the generator failed, there would be no way to fix it and the residents of the silo would be plunged into permanent darkness. Without power, there would be no way to grow food and the entire population would collapse.
At my company, if the load on the server became too great, all software on the server would begin to fail, and there wouldn’t be any good way to update it. Response times for regular user activities would slow to a crawl.
In both scenarios, it was important for people with knowledge of the problem to communicate to management the consequences of not fixing the issue. These decision-makers can then give their permission to fix the problem and allocate the resources needed for the fix.
Step Three: Understand the risks involved with fixing the tech debt
Tech debt is not just a risk in itself; fixing the debt also can pose a significant risk. In the case of the silo, the generator would need to be turned off in order to be repaired. There was a chance that the generator might not turn on again after it was stopped. Additionally, when the generator was stopped for too long, the reactor would begin heating up and would eventually explode.
In the case of the payment module, the module would need to be moved to its own server to address the tech debt. If this move wasn’t successful, millions of people wouldn’t be paid.
It’s important to communicate to management exactly what the risks are with fixing the debt so they can make an informed decision. In some rare cases, the risk of fixing the debt can be greater than the risk of not fixing it.
Step Four: Formulate a plan with all the stakeholders
In Silo, the Mechanical team needed to discuss the plan to fix the generator with the mayor and the deputy sheriff, because the entire populace would be without power during the fix. The mayor and the deputy needed to put together a plan that would keep the people from panicking during the process.
At the company where I worked, we needed to discuss the movement of the payment module to another server with both the team that maintained the payment software and the team that oversaw the weekly payments.
All stakeholders should be aware of how the problem will be fixed and what the potential impact will be while the tech debt is fixed.
Step Five: Limit the scope of the project
The silo’s Mechanical team only had thirty minutes to fix the generator before the reactor exploded, so they knew that they had to focus on the most important fix. The large rotor that was out of alignment was what needed to focus on, and while it was being fixed they just did some light hammering on other dented areas.
When I was testing the payment module, I discovered two bugs. We were told we couldn’t fix them because the fixes would have required additional testing that would have complicated the project. Both bugs had workarounds, so we left them alone.
When you’re dealing with tech debt, it’s important to focus on the most important problem, rather than getting sidetracked into other issues.
Step Six: Have a rollback plan
When fixing tech debt is risky, be sure to have a rollback plan. In the silo, the only rollback plan was to turn the generator back on and leave the rotor unfixed. This wasn’t a great solution, but it would at least mean that they could keep the lights on for a few more months.
With the payment module, we had a way to return the code back to the original server if the migration to the new server didn’t work.
Step Seven: Communicate to everyone affected
In Silo, the mayor knew that it was extremely important to alert the population that the lights would be going out. Anyone unaware would panic. So she broadcast a message to the entire silo, and sent out deputies to make sure everyone was either in their living quarters or in a group shelter while the lights were out.
For the payment module, the stakeholders were already aware of the project and they were alerted to what time the move to the new server would take place.
When it’s likely that service could be disrupted or that users will see a big change, it’s very important to communicate that information.
Step Eight: Don’t go it alone
While Juliette, the main character in Silo, was a loner who often insisted on doing maintenance herself, she understood that fixing the generator would require the effort of the entire mechanical team. Some team members worked on removing the outer plates of the generator, some worked on fixing the rotor, and some monitored the pressure of the reactor.
Similarly, when we moved the payment module from one server to another, we had a whole team of people watching for error messages and checking that the payments went through.
Fixing tech debt is best done when the whole team collaborates, ensuring that changes made will not negatively impact end users.
Conclusion
Most likely your tech debt is not so dangerous that it will kill off an entire community, as it was in Silo! But the principles outlined here can make fixing tech debt a rewarding and low-stress exercise that will benefit your customers.