GDPR, Database Backups, and the Right to be Forgotten

Home / DevOps / GDPR, Database Backups, and the Right to be Forgotten

I’ve said it before, but it bears repeating, there is no cause for any kind of panic when it comes to the GDPR. None. There are however, a number of concerns. One of those concerns is, well, concerning. How does the right to be forgotten within the GDPR impact database backups? Let’s discuss what we know.

The Right To Erasure

Each of the articles within the GDPR lays out a topic. Article 17 is pretty darned clear about the topic:

Right to erasure (‘right to be forgotten’)

Basically, the individuals, also known as the data subject, also known as natural persons, in short, people, can request that you remove their data from your system. The first sentence lays out the gist of the idea quite well:

The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay…

Sure, there are exceptions, and it’s worth reading Article 17 to understand those, but that’s not the point of this discussion. The question is, what about backups. It’s easy to run a DELETE statement. Heck, it’s easy to put in referential integrity such that you can do a cascading delete if you so desire (I don’t, but different discussion again). What doesn’t happen when you run a DELETE statement today is, it doesn’t remove any data from that backup that you took last night.

Now what?

Nothing within Article 17 talks about backups, offsite storage, readable secondaries, log shipping, or any of that stuff. In fact, there’s nothing technical there at all. No help to tell you what to do about this question.

Now, each article has expansions that further detail the information within the article called recitals. In the case of the right to be forgotten, there are two, Recital 55 and Recital 66. Recital 55 has nothing for us, at all. Recital 66 does talk about the fact that, because we’re dealing in an online world, the best available technical means should be used to deal with the fact that a person’s data may be in more than one location and we’ll need to clean that up.

And that’s it.

In fact, you can search the GDPR and not find the word, backup. You can search the GDPR and find the word restore exactly once, in Article 32 which talks about:

…the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident;…

Basically saying that you better be able to restore your system after an outage (and that’s another discussion for another day).

Now what? Well, let’s look at some of the foundational law for the GDPR, the Information Commissioner’s Office in the UK.

Information Commissioner’s Office

The ICO is largely dealing with laws of the UK. However, some of those laws provided a lot of the basis and thought for the GDPR. In preparation for dealing with the GDPR, the ICO has a lot of information published. For example, a guide to the right to be forgotten. No, don’t look. It doesn’t mention backups either.

The ICO in support of the Data Privacy Act (DPA) talks about a bunch of scenarios, including the need to protect database backups with encryption. Also, the inability to restore data is considered a breach of the DPA, and probably the GDPR (see part 5). We do however find some guidance around backups here:

…When data is deleted is it rarely removed entirely from the underlying storage media unless some additional steps are taken. In addition, a cloud provider is likely to have multiple copies of data stored in multiple locations to provide a more reliable service. This may include back-up tapes or other media not directly connected to the cloud. Copies of personal data stored in a cloud service may also be stored in other forms such as index structures.

74. The cloud customer must ensure that the cloud provider can delete all copies of personal data within a timescale that is in line with their own deletion schedule….

OK, not exactly detailed, but you get the core of the idea. You have to delete the data, in all its locations, but you have a set time to take care of this. That certainly sounds like I need to clean up my backups.

And that’s all I can find there.

In fact, do some internet searches on your own. No one is quite sure what to do about the information stored on backups. There are a lot more questions than answers, so now what?

Dealing With Backups

So, upon receipt of a request to be erased, right after you delete the data from your production database and all the secondaries and the warehouse… <sigh>, you can restore all the backups, delete the data from those backups, retake the backups…<double-sigh>

Raise your hand if you want to do this? Neither do I. So where does that leave us?

Let’s go back to the GDPR. It very similar phrasing at multiple places in the Articles and the Recitals:

take reasonable steps, taking into account available technology and the means available to the controller, including technical measures

This is our defensible position. It’s not reasonable, see that word, to expect us to go into offline data storage with the existing technical means, again, using their words, to delete data from that offline location.

Instead, we’re going to build a process where by we use the existing technology (assume you’re talking to a lawyer and you MUST use the same phrases the same way, over and over) to ensure that the offline information doesn’t become available online.

Yeah, cool. Sounds neat. Now, can you repeat that for me using T-SQL?

Sure. Let’s build a method that basically has us store the request to be forgotten by storing the date of the request and the key values (all artificial, of course, no identifying information). Then, when we run a restore, prior to putting the database back online, we delete all the requests between when the backup was taken and the current time.

Assuming we document this process and detail why we’re using it, we should be building what I’ve come to know as a defensible position. A defensible position, as it relates to the GDPR, is a full set of documents and processes that does everything reasonably feasible to meet the requirements of the GDPR. Writing it down, as a process, and following it, and publishing it, is the key to establishing this defensible position. Without all that work, you are facing serious trouble.

Conclusion

Most of the GDPR, including the right to be forgotten, is not a reason for panic. However, parts of the GDPR, specifically the right to be forgotten, present us with challenges. The key is to build a defensible position for your organization. Document your processes. Show you deal with the right to be forgotten. You’ll probably find that, in most cases, the defensible position is just a common sense approach to data management that you should have been using anyway. It’s just going to be a lot of work to implement all this.

21 Comments

  • Cal Lewis

    Reading your latest post brought to mind a problem I wrestled with for the last 10 years of my career as a Data Manager for a Police Department in a city on the low end of the large cities scale. That is Court ordered expungment of personal data for a person involved in a Police report. Handling the search and expungment process was a well rehearsed and documented job flow created by our records manager, (backups included). The problem we had and still have is data sharing with other Law Enforcement Agencies, the biggest problem being with the State level Fusion Centers created after 9-11. Those centers started receiving complete data sets from all agencies in their associated state. Thankfully, the FBI only collects summary data from each agency in the country, imagine the problem for a local court to get the FBI’s attention much less action. The fusion centers have been slowly throttled back to what their original Homeland Security mandate was so that problem is going away. Bottom line is Law Enforcement Agencies have been handling “Right to erasure (‘right to be forgotten’)” for decades, still with too many failures but doing better as paper copies of incident reports go away. Expunging data in Local, County, State, and National government agencies is still a rats nest of rules and procedures.

      • Cal Lewis

        Here is a link to a small PDF with sample paperwork and petition. A brief list of things the agency(ies) are required to do start on page 8.

        https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjQyNiYk7PaAhXIXrwKHX0WDU8QFgg5MAA&url=https%3A%2F%2Fwww.texasbar.com%2FAM%2FTemplate.cfm%3FSection%3DOur_Legal_System1%26Template%3D%2FCM%2FContentDisplay.cfm%26ContentID%3D23459&usg=AOvVaw3hH8c6PAnk1EEb0f-sxOLZ

        I will contact my old records manager to (hopefully) get documentation that details what the receiving agency is required to do when receiving an expunction order.

          • Cal Lewis

            I was Data Manager for the Fort Worth Police Department for 10 years, prior to that a Developer. Here is what I heard from the Records Manager: There is only an informal description of what needs to be accomplished in fulfillment of an expungement court order. (A) upon receipt of a notice of a court order in process the Incident Report in question is sealed, i.e. no longer publicly visible. (B) upon receipt of confirmation from the court that the order is binding, expungement takes place. (C)(1) if the only suspect in the Incident Report is the subject of the order, then the entire Incident Report is deleted, but not from backups. (C)(2) if there are other suspects in the Incident Report, all personal identifying information for the subject of the order is deleted where possible; i.e. the person record, obfuscated otherwise. (D) a cursory search for any printed copies of the Incident Report in the department are located and destroyed. These hard copies are generally in the case package sent to the District Attorney by the Detective assigned to the Incident Report; potential for a multitude of problems here. (E) backups: a list of Incident Reports involved with expunction orders is maintained manually, generally an Excel spreadsheet. If the Records Management System is restored from backup, or if just a single table is restored then the Data Management office is sent sent a current copy of the spreadsheet, the duty DBA scans the restored tables for any matches on the Incident Reports, if found the related expunction is applied again. During my time the RMS was never recovered from backup wholly or partially. There is plenty of room for errors, and they did happen, usually found by the lawyer representing the person having their personal information removed.

            Here is a link to the Texas: CODE OF CRIMINAL PROCEDURE

            TITLE 1. CODE OF CRIMINAL PROCEDURE
            CHAPTER 55. EXPUNCTION OF CRIMINAL RECORDS
            See: Article 55.02, Sec 5

            http://www.statutes.legis.state.tx.us/Docs/CR/htm/CR.55.htm#55.01

            Cal

          • Marvelous. Thanks for the information. That’s very helpful.

            Looks like the process is actually similar to what I imagine will be most peoples approach to the right to be forgotten.

  • Philip Leduc

    Perhaps a dumb question but in case of the right to be forgotten, would you suggest deleting records for customers or overwriting personal identifiable information with generic data like “Jane Doe” becomes “customer XXX” ? In the former case and depending on how the reports were built, we would get in trouble with historical information like how many customers did we have on a date in the past…

    • No such thing as a dumb question, only dumb answers.

      So, here’s one.

      I’d base the solution on the requirements of your data. Probably, most of us, most of the time, are simply going to mask the information in some fashion. Which means, yeah, “Jane Doe” becomes “Former Customer52” or whatever. But, we also have to deal with the related data and ensure we also get rid of stuff that could be still personally identifying. For example, sex, do we change the ‘F’ to ‘M’? Probably not. Utter waste of time… unless we only have two or three values for ‘F’ and suddenly that value, possibly in combination with others, once again makes this personally identifying… possibly.

      The easier answer is just hard delete everything. However, the business answer is going to be, remove identifying information, and that’s where our challenges come from.

  • Jon Dobson

    Hi Grant, interesting article. One thing I was confused about is that the defensible position seems to be based around the language in Article 17 (2) and Recital 66. Both these areas seem to only apply “Where the controller has made the personal data public”. As backups are not made public, how can this be used this as a defensible position?

    Many Thanks

    Jon

    • It’s back to the right to be forgotten. If you keep peoples information in your backups and don’t have any mechanisms for removal, when you restore a backup, you restore their information. So, we have to build a defensible position that we have processes in place to ensure that we don’t retain information even in offline storage.

  • Lim Torpy

    So it seems to me that to have a defensible position we must have an process that can be audited. This implies we must retain some PII that identifies the fact of data deletion for specific individual people. Now perhaps this should be stored off-line, or even on paper! It is not enough to say “Record ID 30X21CZ was deleted”, that doesn’t prove that Jane Doe’s record was deleted. Another point, I might consider keeping backups on a live server (not web connected) and running DELETE against them for the “right to be forgotten” requests – maybe batching it every day/week/month. This would depend on the expected volume of “forget me” requests, the larger the system and number of requests the more affordable and useful. Remember, backups do not have to be static files stored on some remote HD. There are plenty of options and we are capable of building a variety of solutions.

    • Cal

      Will a procedure need to be constructed, tested, and be in place to re-clean a DB after a restore from backup. Partial restore, log restore, … How would you address this problem, particularly with large multi-nationals?

      • One bite at a time. This is something large corporations should have been working on. A little bit over a month away, it’s going to be very hard to get something comprehensive done now.

    • There is absolutely more than one way to get this done. No question. My point is though, we have to do something. We can’t possibly remain in our current state, do nothing, and achieve anything like compliance. Reasonable steps must be taken.

  • Sam

    With that (and not having read the regs.): How do I prove I deleted the records? And is industry expected to keep records of deleted records — which means (technically) ‘a type of record’ still exists.

    • Since nothing has gone to court, we still don’t know for certain how this will all work out. However, we face a few choices. Do nothing, hope we’re not the first ones who get nailed, see how things shake out, comply. Try to anticipate every possible need, which just won’t work. Set up a position where by we meet the definition of “take reasonable steps”, document that, and, if we’re first, hope it’s good enough. It really is about what is technically reasonable and whether or not we define it. A completely random object like GUID, and having that retained, couldn’t possibly be construed as PII.

OK, fine, but what do you think?