A few years ago, I wrote a blog post detailing why I thought toggles were a bad idea. It made a clever analogy between toggles and the tribbles on Star Trek’s U.S.S. Enterprise. I think it’s a fun read, so you may want to check it out; but since the time I wrote it, my opinion has changed a bit. In this post I will explain why I think toggles may be helpful, and I’ll propose some rules for their use.
About a year ago, my team was working on a new notification service that would send out emails and messages more efficiently than the current service. When the new service was ready, we migrated one notification type to the new service to see how it would work. We tested the notification extensively and we were sure that we had accounted for all scenarios, so we took the new service to Production.
A couple of weeks later, we discovered that there was an odd case that we hadn’t tested. If two users in the same company had the same id, the wrong user was getting the notification. We had no idea that it was possible for two users in the same company to have the same id, so we hadn’t thought to test this.
Fortunately, our new service was behind a toggle. Since we certainly didn’t want the wrong people to get notifications, we quickly toggled off the new service. There was no impact to any other customers, because they were still getting their notifications; they were just being notified through the old service. We were able to quickly fix the bug, get the fix into Production, and toggle the service back on.
If we hadn’t had the toggle, the users with the same id would have continued to get the wrong notifications until we were able to fix the bug. We would have had to rush to get a code patch into Production, and it’s possible that we would have made mistakes along the way. Because we had the toggle, we could take the time to make sure that the fix was good, and we could do all the regression testing we wanted.
So, I’ve changed my mind about toggles. I think they can be useful in situations where there’s a significant risk that accompanies a change. But if you are going to use toggles, please observe the following rules:
1. Toggles are NOT a substitute for high-quality testing. Being able to toggle something off at the first sign of trouble does not mean that you can skip testing your new feature thoroughly. Ideally you should have tested so well that you never need to turn your toggle off.
2. Make sure to test your feature with the toggle on AND with the toggle off. You don’t want to discover in the middle of dealing with a problem in Production that the toggle doesn’t actually work!
3. When the feature has gone to Production and a certain amount of time has passed, remove the toggle so that the feature is on permanently. Otherwise you could get into a situation where months from now someone inadvertently toggles the feature off. And the fewer toggles you have in your application, the fewer combinations of toggles you need to test.
As with many things in software development, the best strategies are those that ensure the best possible outcome for our end users. When they are used wisely, toggles can help mitigate any unexpected issues found in Production.
Your second point, "2. Make sure to test your feature with the toggle on AND with the toggle off." is spot on! We work with feature toggles occasionally, usually when the feature is so big or complex that it spans multiple releases before we want to officially launch the product. Since we know the feature will be "off" for an amount of time, it is so important that we fully regression test the site in an "off" state to ensure we're not introducing anything, and also in the "on" state so we know the toggle and the feature are working as we expect.
Yes! This is especially important for those big toggles. Thanks for your comment, Amanda!