GDPR and Tokenizing Data Part 3 in A Series

GDPR and Tokenizing Data (Part 3 in a Series)
tdwi.org/articles/2018/06/06/biz-all-gdpr-and-tokenizing-data-3.aspx
You need to protect any personal data your enterprise collects. Tokenizing data is one way to
stay in compliance with GDPR.
By Rod Welch
June 6, 2018
In the first two parts of this series we examined the six principles of the GDPR. In this final
article, we'll look at how enterprises are protecting personal data to meet GDPR's
requirements, including with tokenization technology.
When the GDPR is mentioned, the first thing that comes to mind may be "personal data"
because that's what the GDPR is all about, protecting personal data. It's important to
understand what, exactly, is meant by personal data.
We have to be very clear here because there is "personal data" and then there is "personally
identifiable information" (popularly referred to as PII). Let's take them in reverse order.
Personally identifiable information is data that can be used to directly or indirectly

identify a particular person. This consists of such data items as a person's name,
address, email address, or phone number. For example:
John Doe, 27 First St., NY 12345
Personal data is data about the person and contains PII data. For example, this record
is personal data:
John Doe, 27 First St., NY 12345, occupation bus driver, salary $41,000
If we remove the PII data from this record, it no longer contains personal data. Instead,
it becomes anonymous:
Occupation bus driver, salary $41,000
This anonymous (or de-personalized) data could be used to analyze occupations and salaries
and be GDPR-compliant.
In addition, you must consider sensitive personal information (SPI), often simply known as
sensitive data. As the name implies, SPI data are facts about a person considered private.
The GDPR lists specific items that are sensitive:
Racial or ethnic origin

Political opinions
Religious or philosophical beliefs
Trade union membership
Health/medical information
Sexual orientation
1/3
Genetic data
Biometric data that uniquely identifies an individual
Protecting Data: Two Approaches
What can you do if you want to collect and use personal data, and a group within your
organization has a legitimate purpose to work with all data while another group only
requires access to de-personalized information? There are two approaches. You can split the
personal data and de-personalized data into two data marts, thus "anonymizing" the
sensitive data for those who don't need it. Imagine splitting the bus driver data from our
earlier example into a data mart with PII data and a different data mart with (now de-
personalized) occupation and salary data.
The other option for complying with the GDPR is to de-personalize individual data elements
so an enterprise can keep all its data in a single data mart. Because such a data mart
contains personal data, it falls under GDPR's rules.
How to De-Personalize Data
Obfuscation (or obscuring) is a term used quite often in relation to the GDPR and "hiding"
personal data. Obfuscation is a technique that makes data unrecognizable. "John Doe"
becomes "hejn fed," for example. This can be destructive such that obfuscated data is
unrecoverable (that is, the transformation is permanent) or non-destructive (the data can be
recovered using a key or conversion table).
Tokenization is a form of non-destructive obfuscation. With tokenization, data is obscured

but is recoverable via a special secure key. An example of this is credit card processing,
where the credit card number is replaced in your data mart with a set of (seemingly)
meaningless numbers and characters. The "real" credit card number is only available when
the card processing company talks to the bank, at which point it is de-tokenized.
To tokenize or not to tokenize, that is the question. Excuse my corruption of Will

Shakespeare, but I once worked on a BI project in Stratford-upon-Avon (Shakespeare's
birthplace) and a muse of fire hast upon me descended.
"Tokenizing" PII data items renders them into meaningless groups of seemingly random
characters that cannot be linked to an individual. The tokenized values can be converted back
to the original values for those people with a legitimate interest in the PII data while keeping
the PII data useless to everyone else.
There are a variety of tools that will tokenize personal data while maintaining referential
integrity. That is, "John Doe" will tokenize to a value common throughout your database
systems, maintaining any primary key/foreign key links.
Tokenization Pros and Cons
Tokenization of personal and sensitive data items would seem a logical way to satisfy the
GDPR. After all:
2/3
It provides a clean method to hide personal and sensitive data (both PII and SPI)
It provides another layer of data security in case of a data breach (personal data is
hidden)
It maintains referential integrity
Good test data can be extracted from production without fear of exposing personal
data
Once the personal data is tokenized at the source, it will flow to all downstream
systems seamlessly
Unfortunately there are drawbacks:
There is a price to pay for the tokenization tool, supporting hardware, and
configuration costs.
Performance takes a hit. Tokenizing and de-tokenizing will have an impact on your
application's performance. Whether this is significant will depend on the tool, hardware
performance, etc.
Conclusion
The GDPR is not the end but the beginning, and the regulations will surely continue to
change as BI evolves. The GDPR requires ongoing monitoring of your use of individuals'
personal data along with understanding where that data is stored and who has access to it.
With the growing use of artificial intelligence and robots for analysis and decision making,
this heralds a new era for BI.
About the Author
Rod Welch is a BI consultant with the breadth and depth of experience gained from over 15
years in the BI environment from agile requirements gathering and dimensional modeling to
ETL programming. In addition, he has a keen interest in agile and automated data
warehouse development and the move to cloud storage. He is currently contracted to a U.K.
insurance company to assess the impact of -- and define the detailed requirements for --
implementing GDPR. You can contact the author via email or via .
3/3

GDPR and Tokenizing Data Part 3 in A Series

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GDPR and Tokenizing Data Part 3 in A Series

Uploaded by

Copyright:

Available Formats

GDPR and Tokenizing Data (Part 3 in a Series)

Personally identifiable information is data that can be used to directly or indirectly

Occupation bus driver, salary $41,000

Racial or ethnic origin

Protecting Data: Two Approaches

How to De-Personalize Data

Tokenization is a form of non-destructive obfuscation. With tokenization, data is obscured

To tokenize or not to tokenize, that is the question. Excuse my corruption of Will

Tokenization Pros and Cons

Unfortunately there are drawbacks:

About the Author

You might also like