What To Do When There’s a Bug in Production

There is nothing quite as bone-chilling to a software tester than the realization that a bug has been found in Production!  In this post, I’ll walk through a series of steps testers can take to handle Production bugs and prevent them in the future.

Step One: Remain Calm

Because we are the ones who are testing the product and signing off on the release, it’s easy to panic when a bug is found in Production.  We ask ourselves “How could this have happened?” and we can be tempted to thrash around looking for answers.  But this is not productive.  Our top priority should be to make sure that the bug is fixed, and if we don’t stay calm, we may not investigate the issue properly or test the fix properly.

Step Two: Reproduce the Issue

If the issue came from a customer, or from another person in your company, the first thing to do is to see if you can reproduce the issue.  It’s possible that the issue is simply user error or a configuration problem.  But don’t jump to those conclusions too quickly!  Make sure to follow any steps described by the user as carefully as you can, and wherever possible, make sure you are using the same software and hardware as the user: for example, use the same type of mobile device and the same build; or the same type of operating system and the same browser.  If you are unable to reproduce the issue, ask the user for more information and keep trying.  See How to Reproduce A Bug for more tips.

Step Three: Gather More Information

Now that you have reproduced the issue, gather more information about it.  Is the issue happening in your test environment as well?  Is the issue present in the previous build?  If so, can you find a build where the issue is not present?  Does it only happen with one specific operating system or browser?  Are there certain configuration settings that need to be in place to see the issue?  The more information you can gather, the faster your developer will be able to fix the problem.

Step Four: Understand the Root Cause

At this point, your developer will be working to figure out what is causing the bug.  When he or she figures it out, make sure they tell you what the problem was, and make sure that you understand it.  This will help you figure out how to test the fix, and also to determine what regression tests should be done.

Step Five: Decide When to Fix the Issue

When there’s a bug in Production, you will want to fix it immediately, but that is not always the best course of action.  I’m sure you’ve encountered situations where fixing a bug created new ones.  You will want to take some time to test any areas that might have been impacted by the fix.

When trying to decide when to fix a bug, think about these two things:

  • how many users are affected by the issue?
  • how severe is the issue?

You may have an issue where less than one percent of your users are affected.  But if the bug is so severe that the users can’t use the application, you may want to fix the bug right away.

Or, you may have an issue that affects all of your users, but the issue is so minor that their experience won’t be impacted.  For example, a misaligned button might not look nice, but it’s not stopping your users from using the application.  In this case, you might want to wait until your next scheduled release to fix the issue.

Step Six: Test the Fix

When you test the bug fix, don’t check it just once.  Be sure to check the fix on all supported browsers and devices.  Then run regression tests in any areas affected by the code change.  If you followed Step Four, you’ll know which areas to test.  Finally, do a quick smoke test to make sure no important functionality is broken.

Step Seven: Analyze What Went Wrong

It’s tempting to breathe a big sigh of relief and then move on to other things when a Production bug is fixed.  But it’s very important to take the time to figure out exactly how the bug got into Production in the first place.  This is not about finger-pointing and blame; everybody makes mistakes, whether they are developers, testers, product owners, managers, or release engineers.  This is about finding out what happened so that you can make changes to avoid the problem next time.

Perhaps your developers made a code change and forgot to tell you about it, so it wasn’t tested.  Perhaps you tested a feature, but forgot to test it in all browsers, and one browser had an issue.  Maybe your product owner forgot to tell you about an important use case, so you left it out of your test plan.

Whatever the issue, be sure to communicate it clearly to your team.  Don’t be afraid to take responsibility for whatever your part was in the issue.  See my post on Extreme Ownership for more details.

Step Eight: Brainstorm Ways to Prevent Similar Issues in the Future

Now that you know what went wrong, how can you prevent it from happening again?  Discuss the issue with your team and see if you can come up with some strategies.

You may need to change a process: for example, having your product owner sign off on any new features in order to make sure that nothing is missing.  Or you could make sure that your developers let you know about any code refactoring so you can run a regression test, even if they are sure they haven’t changed anything.

You may need to change your strategy: you could have two testers on your team look at each feature so it’s less likely that something will be overlooked.  Or you could create test plans which automatically include a step to test in every browser.

You may need to change both your process and your strategy!  Whatever the situation, you can now view the bug that was found in Production as a blessing, because it has resulted in you being a wiser tester, and your team being stronger.