Many of my clients have used large scale data refresh processes to pull production data down into staging and development environments. This process is generally accompanied by a complicated process of depersonalising the data and masking anything which could be deemed private or confidential. In larger enterprises, the process can take several days for a single environment, making it unavailable for deployments and testing. The process often breaks integrations where relational integrity between very separate systems is lost.
So, if it’s such a large, difficult task, why does it happen?
Where it works
Let’s start with looking at where this practice is useful (the list isn’t very long).
User acceptance testing and load testing are best performed with production-like datasets. This is because the results could be affected by the shape, size, and detail of the data in the system. These types of tests are generally carried out toward the end of an iteration, whether that iteration is the delivery of a sprint or the delivery of a feature – they test the combined results of all the small changes which have been made. It makes sense to run these against a dataset which has been generated from the production data, as that is guaranteed to contain all your production scenarios (including data corruptions).
Because these types of tests are not being run constantly and they can be run on the same datasets, they can be run in the same environment. When they aren’t being run, the data refresh processes can be running to update that single environment with up to date records from production. This needs to be made efficient, otherwise the more platform is built, the more data is in prod, and the longer the refresh will take.
I’m pretty sure I’m going to cop for some flack, saying that UAT tests and load tests aren’t being run continuously, but I beg to disagree. UAT tests carried out at the story level are not real UAT tests unless the story encompasses an entire feature. A story can be integration tested, UI tested, auto tested, manually tested, unit tested, but usually not UAT’d. A user acceptance test is from the point of view of a user, and that generally happens with feature releases (especially when a later story may change the functionality of an earlier story, making the earlier UAT irrelevant).
There might be an amount of load testing carried out in other environments, but on a much smaller scale and with narrower scopes. The tests we’re talking about here are end to end.
Because only a single environment is being affected, temporary outages due to the complicated nature of refreshing data and masking personal data tend not to impact ongoing work.
Where it doesn’t work
As a rule, don’t let developers near your production datasets. Not even obfuscated copies. This isn’t a security problem, it’s an architecture problem. If developers and architects don’t have to worry about the composition of a record, if they don’t have to think of how many different systems need data injecting into them in order for a single screen to function, then things start to sprawl in horrible ways. I’ve seen first hand the ridiculous scenario where there is simply no known way to reliably inject a user in such a way that a system will work fully. What’s worse, is that I’ve seen this more than once.
I’ve been in the situation where there is no single developer who knows exactly where a user record comes from in full. The idea of building a ‘User Service’ which could create a user seemed mindbogglingly complicated.
Why is this a bad thing? If your development teams don’t understand where the data is coming from, they don’t understand the behaviour of the system they’re building, and they can’t write tests which cover all scenarios. You start to rely on the (incorrect) idea that the production data is a ‘golden recordset’ which contains so much data it must cover all scenarios. Then the developers start to realise they can’t write reliable tests against data which is refreshed every few weeks and randomly masked in different ways. It becomes a manual QA effort to find records to use in tests. Problems aren’t found until much later and cost much more to solve, or worse: problems aren’t noticed.
If it isn’t possible for developers to understand and write coded tests for all behaviours and inject data to drive each behaviour, then you are slowly grinding to a halt.
Avoid pushing production-like data to development, or staging, or any other environment where it isn’t needed. Behaviours should be sufficiently defined, and architecture should be properly conceived, so injecting test data as part of automated testing is simple. There are no swings and roundabouts here – there’s just a good and bad approach. Please pick the good one.