You are on page 1of 3

GDPR and Tokenizing Data (Part 3 in a Series)

tdwi.org/articles/2018/06/06/biz-all-gdpr-and-tokenizing-data-3.aspx

You need to protect any personal data your enterprise collects. Tokenizing data is one way to
stay in compliance with GDPR.

By Rod Welch
June 6, 2018

In the first two parts of this series we examined the six principles of the GDPR. In this final
article, we'll look at how enterprises are protecting personal data to meet GDPR's
requirements, including with tokenization technology.

When the GDPR is mentioned, the first thing that comes to mind may be "personal data"
because that's what the GDPR is all about, protecting personal data. It's important to
understand what, exactly, is meant by personal data.

We have to be very clear here because there is "personal data" and then there is "personally
identifiable information" (popularly referred to as PII). Let's take them in reverse order.

Personally identifiable information is data that can be used to directly or indirectly


identify a particular person. This consists of such data items as a person's name,
address, email address, or phone number. For example:
John Doe, 27 First St., NY 12345

Personal data is data about the person and contains PII data. For example, this record
is personal data:
John Doe, 27 First St., NY 12345, occupation bus driver, salary $41,000

If we remove the PII data from this record, it no longer contains personal data. Instead,
it becomes anonymous:

Occupation bus driver, salary $41,000

This anonymous (or de-personalized) data could be used to analyze occupations and salaries
and be GDPR-compliant.

In addition, you must consider sensitive personal information (SPI), often simply known as
sensitive data. As the name implies, SPI data are facts about a person considered private.
The GDPR lists specific items that are sensitive:

Racial or ethnic origin


Political opinions
Religious or philosophical beliefs
Trade union membership
Health/medical information
Sexual orientation
1/3
Genetic data
Biometric data that uniquely identifies an individual

Protecting Data: Two Approaches

What can you do if you want to collect and use personal data, and a group within your
organization has a legitimate purpose to work with all data while another group only
requires access to de-personalized information? There are two approaches. You can split the
personal data and de-personalized data into two data marts, thus "anonymizing" the
sensitive data for those who don't need it. Imagine splitting the bus driver data from our
earlier example into a data mart with PII data and a different data mart with (now de-
personalized) occupation and salary data.

The other option for complying with the GDPR is to de-personalize individual data elements
so an enterprise can keep all its data in a single data mart. Because such a data mart
contains personal data, it falls under GDPR's rules.

How to De-Personalize Data

Obfuscation (or obscuring) is a term used quite often in relation to the GDPR and "hiding"
personal data. Obfuscation is a technique that makes data unrecognizable. "John Doe"
becomes "hejn fed," for example. This can be destructive such that obfuscated data is
unrecoverable (that is, the transformation is permanent) or non-destructive (the data can be
recovered using a key or conversion table).

Tokenization is a form of non-destructive obfuscation. With tokenization, data is obscured


but is recoverable via a special secure key. An example of this is credit card processing,
where the credit card number is replaced in your data mart with a set of (seemingly)
meaningless numbers and characters. The "real" credit card number is only available when
the card processing company talks to the bank, at which point it is de-tokenized.

To tokenize or not to tokenize, that is the question. Excuse my corruption of Will


Shakespeare, but I once worked on a BI project in Stratford-upon-Avon (Shakespeare's
birthplace) and a muse of fire hast upon me descended.

"Tokenizing" PII data items renders them into meaningless groups of seemingly random
characters that cannot be linked to an individual. The tokenized values can be converted back
to the original values for those people with a legitimate interest in the PII data while keeping
the PII data useless to everyone else.

There are a variety of tools that will tokenize personal data while maintaining referential
integrity. That is, "John Doe" will tokenize to a value common throughout your database
systems, maintaining any primary key/foreign key links.

Tokenization Pros and Cons

Tokenization of personal and sensitive data items would seem a logical way to satisfy the
GDPR. After all:

2/3
It provides a clean method to hide personal and sensitive data (both PII and SPI)
It provides another layer of data security in case of a data breach (personal data is
hidden)
It maintains referential integrity
Good test data can be extracted from production without fear of exposing personal
data
Once the personal data is tokenized at the source, it will flow to all downstream
systems seamlessly

Unfortunately there are drawbacks:

There is a price to pay for the tokenization tool, supporting hardware, and
configuration costs.
Performance takes a hit. Tokenizing and de-tokenizing will have an impact on your
application's performance. Whether this is significant will depend on the tool, hardware
performance, etc.

Conclusion

The GDPR is not the end but the beginning, and the regulations will surely continue to
change as BI evolves. The GDPR requires ongoing monitoring of your use of individuals'
personal data along with understanding where that data is stored and who has access to it.
With the growing use of artificial intelligence and robots for analysis and decision making,
this heralds a new era for BI.

About the Author

Rod Welch is a BI consultant with the breadth and depth of experience gained from over 15
years in the BI environment from agile requirements gathering and dimensional modeling to
ETL programming. In addition, he has a keen interest in agile and automated data
warehouse development and the move to cloud storage. He is currently contracted to a U.K.
insurance company to assess the impact of -- and define the detailed requirements for --
implementing GDPR. You can contact the author via email or via .

3/3

You might also like