Has this ever happened to you? You arrive at work in the morning to find that many of your nightly automated tests have failed. Upon investigation, you discover that your test user has been edited or deleted. Your automation didn’t find a bug, and your test isn’t flaky; it simply didn’t work because the data you were expecting wasn’t there. In this week’s post, I’ll take a look at five different strategies for managing test data, and when you might use each.
Strategy One: Using data that is already present in the system
This is the easiest strategy- there’s nothing to do for setup- but it is also the most risky. Even if you label your user with “DO NOT REMOVE” there’s always a chance that some absent-minded person will delete it.
However, this strategy can work well if you are just making simple requests. For example, if you are testing getting a list of contacts, you can assert that contacts were returned. For the purposes of your test, it doesn’t matter what contacts were returned; you just need to know that some contacts were returned.
Strategy Two: Updating or creating data as a setup step
Most automated test platforms offer the ability to create a setup step that either runs before each test or before a suite of tests. This strategy works well if it’s easy to create or update the record you want to use. I have a suite of automated API tests that test adding and updating a user’s contact information. Before the tests begin, I run requests that delete the user’s email addresses and phone numbers.
The downside to this strategy is that sometimes my requests to delete the user’s contact information fail. When this happens, my tests fail. Also, updating data as a setup step adds more time to the test suite, which is something to consider when you need fast results.
Strategy Three: Using test steps to create and delete data
This is a good strategy when you are testing CRUD (Create, Read, Update, Delete) operations, because you can use the actual tests to create and delete your test data. If I was testing an API for a contact list, for example, I would have my first test create the contact and assert that the contact was created. Then I would update the contact and assert that the contact was updated. Finally, I would delete the contact and assert that the contact was deleted. There is no impact to the database, because I am both creating and destroying the data.
However, if one of the tests fails, it’s likely the others will as well. If for some reason the application was unable to create the contact, the second test would fail, because there would be nothing to update. And the third test would fail because the record would not exist to be deleted. So even though there was only one bug, you’d have three test failures.
Strategy Four: Taking a snapshot of the database and restoring it after the tests
This strategy is helpful when your tests are doing a lot of data manipulation. You take a snapshot of the database as a setup step for the test suite. Then you can manipulate all the data you want, and as a cleanup step, you restore the database to its original state. The advantage to this method is that you don’t need to write a lot of steps to undo all the changes to your data.
But this method relies on having the right data there to begin with. For instance, if you are planning to do a lot of processing on John Smith’s records, and someone happened to delete John Smith before you ran your tests, taking a snapshot of the database won’t help; John Smith simply won’t be there to test on. It’s also possible that taking a snapshot will be time-consuming, depending on the size of your database.
Strategy Five: Creating a mini-database with the data you need for your tests
In this strategy, you spin up your own database with only the data you need for testing, and when your tests have finished, you destroy the database. If you are using Microsoft technologies, you could do this with their DACPAC functionality; or you are using Docker, you could create your own database as part of your Docker instance. With this strategy, there is no possibility of your data ever being incorrect, because it is always brand-new and exactly how you configured it. Also, because your database will be smaller than your real QA environment database, your tests will likely execute more quickly.
The downside to this strategy is that it requires a lot of preparation. You may have to do a lot of research on how your data tables relate to each other in order to determine what data you need. And you’ll need to do a fair amount of coding or configuration to set up the creation and destruction steps. But in a situation where you want to be sure that your data is right for testing, such as when a developer has just committed new code, this solution is ideal.
All of these strategies can be useful, depending on your testing needs. When you evaluate how accurate you need your data to be, how likely it is that it will be altered by someone else, how quickly you need the tests to run, and how much you can tolerate the occasional failure, it will be clear which strategy to choose.