How I learned to love GDPR and so can you
If you are working with software development as a developer, manager or tester, then you will be impacted by the General Data Protection Regulation (GDPR) – the new EU laws regarding data privacy. In many ways, the regulation is likely to have as big of an impact as the Y2K problem. But this time it’s because of a good cause! And you cannot ignore it, as the fines for doing so can be crippling. But for most people who find themselves face-to-face with GDPR it’s quite intimidating.
I hope this article can do something about that.
First, let’s say a few words about how to approach the regulation itself and then let’s look at some actions you should consider. I find the text of the regulation to be surprisingly well-written, but I wish I had a reading guide when I started out.
The actual text is officially published on http://ec.europa.eu. The text starts with 173 “recitalsâ€. These are background considerations for the regulation. Even though they are well thought out and well written, they are not very much to the point. They are also often quite heavy to read, even if it’s for good reason (a perfect example is Recital 38 which talks about the data protection concerns of children). After the recitals are the actual 99 articles of the law, which are much easier to read. These are divided into chapters and many of the later sections are about the structure for enforcement, not for the actual regulation as it impacts most organizations.
Instead of reading the PDF, I recommend looking at https://gdpr-info.eu, which has organized the regulation in an easy-to-use structure. If you are responsible for an IT system, you need to understand at a minimum articles 1 through 50 and especially articles 5 through 35. Start by reading these.
Let’s look at some of the ways you may be surprised.
Surprise 1: Consent and test data
In order to collect and use personal data, you need to have permission to do so (article 6). For most people, this either means that you are required by law to use the information (for example medical information in a public health context) or that you have obtained consent from the data subject. This has actually been the case for a while, but some things will be more explicitly required: First, you must make it clear what your user is consenting to and second, you must make it optional (and non-default!) to give consent in cases where it’s not needed to provide a given service.
One very common scenario is for organizations to use production data from their customers for testing purposes. You can forget about that in the future. You could ask your customers for consent to use their data for testing purposes, but you must make this consent optional and non-default. So that means that at best, you need to find good routines to extract only data for consenting customers. Good luck! You could anonymize the data, but you are liable if there is a risk of reidentification.
Instead, I recommend that you invest in other testing strategies. In particular, most testing organizations can improve a lot by creating synthetic data. By investing in synthetic data, you can also improve your ability to stress the system with large and unusual values. Another testing method is partial production, where you gradually let more users onto a new version of the system. Perfecting this will also improve your delivery cycle a lot.
Now you have an excuse to invest in better testing.
Surprise 2: Functions required to support the rights of the data subject - data portability
When you store personal data, you now have to plan for functionality where the subject of this data can exercise their rights regarding the data. Much of this has already been the case, but the existing rights have been strengthened. This is described in articles 12 through 23. Just add them to your product backlog as system functionality or manual processes that must implemented. Basically, your customers have the right to see what data you are storing about them, including who has accessed the data. They should be able to correct errors in the data and to ask for data to be deleted.
A new and very interesting right is the right to data portability (article 20). Your customers have the right to get their data from you and take it to your competitors in a portable format (“structured, commonly used and machine readable”). As I understand it – if your customers can get an exported JSON, XML or CSV-format with everything concerning them, you’re pretty much set. If you wonder: PDF is not good enough.
Data portability is one of the most exciting parts of the GDPR and seems to be motivated by a desire to promote innovation. Just imagine what you can do with it! Imagine an app that can import your purchase history from all major store chains and help you analyze your own buying habits. It’s long been a dream, but the data was locked up. Until now!
Or you can use it to keep your competitors honest! Tell your customers that if they give you consent and upload their data from your competitors, you give the discounts and prizes. You have to be prepared for the case that your customers withdraw their consent again, but you are still allowed to keep aggregated data. And if you’re competitors are laid-back about GDPR – well, you can really put their feet to the fire!
Surprise 3: Keep data safe during transfer and storage - everywhere!
The final important consideration I want to talk about that comes out of GDPR is keeping data safe and under control (article 32 and also article 25, which mandates Privacy by Design). You are required by law to protect personal data in transit and rest at all locations. You are also required to report to the local authorities any breach of data that you detect.
What you need to do now is to map out every place you transfer personal data. What about temporary storage areas? What about backups? What about third parties that receive data? And what about logs?
Most organizations have less access protection of application logs than any other data. And often you log without considering the contents of the logging. I used to practice an approach of logging the full payload of all incoming and outgoing communication messages. That may no longer be a good idea.
The easiest way to protect data is to make sure you never collect it, and failing that, making sure that you don’t store it unnecessarily. You need to trace every piece of personal data through your systems and find out who can access it. If you can collect less, that will make your job easier.
Get started
There are more aspects of the GDPR that I have not discussed. In particular, the role of the Data Protection Officer is critical and has some surprising nuances.
But the goal of this article was to give you a place to start where there’s a good chance that you have something you need to do and perhaps something that you can benefit from as well. Here are three suggestions: 1. Verify that you’re not using production data for testing and develop new testing techniques if you do, 2. Add development tasks to support the rights of the data subject - a good place to start is Data portability, 3. Analyze the flow of personal data through the system, especially looking for less secured storage locations like logs and temporary files.
It’s not long until the law comes into effect and you need to get ready! The three places I’ve pointed out where you need to start touch most of the IT system related aspects of GDPR and can be used as a basis for understanding, implementing and benefiting from our improved rights to our own data as citizens and individuals!
Comments:
[Emily Bache] - Sep 15, 2017
Thanks for this, Johannes, I’m also wrestling with the implications of this new law, in particular with respect to test data. You may not be aware of a ruling in Sweden by our Datainspektion in a case where SJ (Swedish Railways) was prosecuted for breaching ‘personuppgiftslagen’ (a Swedish law basically equivalent to GDPR). They were using a copy of production data in their test environment, and a former customer discovered they had run a credit check on him, despite him not having bought anything from them at the time. The credit check had been inadvertently triggered from the test environment. He argued that he hadn’t consented to them using his personal data to perform a credit check in this case, and won. You can read about the judgement here (http://www.datainspektionen.se/press/nyheter/2014/fel-av-sj-anvanda-riktiga-personuppgifter-da-it-system-testades/) in swedish.
What’s interesting about the judgement is that they say that if you are a customer of a service, you are implicitly giving your consent to have your data used to test that service. So it wasn’t wrong of SJ to use a production copy for testing, the problem was they used it in an unsafe way.
So basically, there are situations where you are allowed to use a copy of production data in testing. You do have to be pretty careful with it though, and they do encourage you to use fictitious data instead.
Johannes Brodwall - Sep 15, 2017
Thank you for sharing this story, Emily.
I feel it’s important to mention a few caveats: GDPR is generally moderately stronger than older laws like personuppgiftslagen. Consent is one area where GDPR is more restrictive. So it may be applicable or it may not be. There is also a requirement with GDPR that it should be applied uniformly to the whole EU/EEC area, so if other countries don’t share the Swedish interpretation, that will be a factor.
Finally, as the article mentions and as you indicate what SJ’s error, you definitely must have as strong security in a test system that uses production data as you have in the actual production system.
I noticed your tweet today about your personnummer library. I think this is a much better approach!
PS: The link in your comment got misformatted with an extra “)’. Feel free to edit your comment to fix it.
[Emily Bache] - Oct 10, 2017
You make good points, Johannes. I think it’s a bit of a minefield because until the law has been tested in the courts, to some extent everyone is guessing how it will be interpreted.
I also wanted to point out my new blog post on GDPR and Swedish Personal numbers. It’s an issue I’m currently grappling with, hence the python library I published recently.
https://www.praqma.com/stories/testing-personal-numbers/