Data Anonymization Patent Landscape

Data Anonymization Patent Landscape
Mirjana Peji Bach

Jasmina Pivar
Ksenija Dumii
Abstract. The omnipresence of the digital data that is unstoppably increasing

has raised the understanding of the importance of the data privacy. Since data is
stored in various data storages, different approaches are used in order to protect
the privacy of the subjects related to the data. Goal of this paper is to develop a
data anonymization patent landscape, by providing the answers to the following
issues: (i) what is the trend in data anonymization patenting, (ii) what technical
content for the data anonymization is protected, (iii) which organizations and
countries were most active in patenting data anonymization know-how; and (iv)
what themse emerge the most often in the patent titles. Patents from the PatSeer
database related to the data anonymizatin patented from 2001 to 2015 (15th
August 2015) were analyzed. Longitudinal approach in combination with text
mining techniques was utilized in order to develop a data anonymization patent
landscape.
Keywords: data anonymization, patent landscape, PatSeer, data mining, associ-

ation rules, text mining
1 Introduction
Part of the difficulty of working with data that can come from sensitive sources, such
as health or financial data, is protecting the privacy of individuals or organizations
related to the data. Such types of data need to be anonymized with some of the data
anonymization techniques and methods, which is a prerequisite of the data utilization
while in the same time retaining the privacy of the data. Various means are used for
the data anonymization, that include algorithms and physical equipment. Cormode
and Srivastava [5] state that the result of data anonymization is anonymized data
which are, essentially, a set of possible worlds, one of which corresponds to the
original data.
Data anonymization has been the subject of research and patenting activities in re-
cent years. Development related to data anonymization is being reinforced across
many industrial applications. That makes it important for both researchers and busi-
ness practitioners to be aware of patents in the field of data anonymization across
different companies, industries, and countries, which is the purpose of this paper.
According to World International Property Organization, patent is an exclusive
right granted for an invention - a product or a process which provides a new way of
doing something, or a new technical solution to a problem [14]. It offers the exclusive
right to stop or prevent others from commercially making, using, distributing,
importing or selling the patented invention without the patent owner's permission.
Those rights are only valid in the country or region where a patent has been filed or
granted [11]. Also, the protection is granted for a limited period, generally 20 years
from the filing date of the application [14].
Many countries use national patent systems based on "world patent application
that are made under the Patent Cooperation Treaty [14]. The World Intellectual Prop-
erty Organization maintains a database of published international patent applications,
using International Patent Classification system (IPC), that was established in 1971 by
the Strasbourg Agreement, and is nowadays used in more than 100 countries world-
wide [14, 9]. Additionally, there are two important classification systems used by the
largest patent offices, both based on the IPC: 1) The Cooperative Patent Classification
(CPC) system - developed by the European Patent Office (EPO) and the United States
of America, and 2) File Index (FI) Japanese patent classification system [11].
The patent documents are highly structured, providing rish source of information
[9]. They contain fields such as patent title, description, simple family ID,
publication/issue year, filing/application year, assignee country, assignee
original/inventors, IPC codes, CPC codes, and FI codes. Patent landscape, also known
as a Competitive Technical Intelligence Report, White Space Analysis or Technical
Gap Analysis, is a study which uses a large set of patents data to extract useful
information for understanding a particular field [13]. It aims to give an overview of a
particular field and provide insights to decision makers. The insights can be for
example what is the publication trend (time) of patents or filing trend (technology) of
patents; who are the top assignees or which companies are filing how many patents;
and how are patents spread across countries [13]. Other approaches are also often
used [7]. For example, Noh, Jo and Lee [10] focused on keywords strategies for pa-
tent analysis and offered guidelines on the selection and processing keywords for
patent analysis, and Brgmann et al. [2] presented an operational prototype of a
workbench for patent document analysis and summarization. Text mining and visuali-
zation based approaches had been also used for analyzing the patent content in the
vast body of literature [1].
Numerous researchers have developed the patent landscape for different technolo-
gy fields. Some examples will be provided with the brief presentation of the method-
ology. Han and Sohn [8] identified technological convergence in standards related to
information and communication technology, and have applied social network analysis
and association rules analysis. Choi and Hwang [4] analyzed the patents related to
Light Emitting Diode and wireless broadband fields by using trend analysis, and
method that combines the network-based and the keyword-based research. Patent
analysis was used to explore virtualization technology development in USA [12],
analyzing technology life cycle, assignee organization and country, patent classifica-
tion and patents citations. In [3], authors investigated technological pervasiveness and
variety of innovators in Green ICT, using network analysis.
The goal of this paper is to develop a patent landscape of the data anonymization
related patents during the period from 2001 to 2015 (15th August 2015) by providing
an answer to the following research questions:
RQ1: What is the trend in data anonymization patenting in terms of change in

time?
RQ2: What technical content related to data anonymization is protected by the
patenting process classified using IPC system (at the sub-class and group level)?
RQ3: Which organizations from which countries patented their innovations related
to data anonymization?
RQ4: What themes emerge most often as the subject of patenting process related to
data anonymization?
To the best of our knowledge, there has not been analysis of the patents related to
the data anonymization. As the attempt to develop a patent landscape of the data
anonymization approaches, this study is expected to help in understanding this area,
and shed some more light to the means of protecting data privacy.
2 Methodology
The development of the patent landscape consists on the four stages related to: (i) the
patent selection and trend analysis, (ii) the areas of technology analysis, (iii) assignee
country and organization analysis, and (iv) text mining analysis.
2.1 Stage 1: Patent selection and trend analysis
As a source for the patent search and selection, we have used the Patseer database,
which is an online global patent database covering the patent activity in 121 countries
stored in the forms of simple patents and patent families. Patent family consists of a
set of patent applications assigned in different countries, in order to protect the inno-
vation in wider geographical area.
In order to detect the patents related to the data anonymization, we have searched
the patents that have in their title the word data and one of the following words:
anonymizing, anonymization, anonymized, anonymizy and anonymize.
Therefore, Patseer database was searched at the 15th August 2015, using search string
(TA:(data AND anonym*)), with an option for searching simple patent families. Fol-
lowing keywords associated with data anonymization were used: anonymizing,
data, anonymization, anonymized, anonymizy and anonymize. English
spelling of the words was also used, e.g. anonymisation. Possible statuses of the pa-
tent are: active, inactive-rejected, refused, suspended, inactive with-
drawn/surrendered. In our analysis we have focused only to the active simple patent
families.
2.2 Stage 2: Patent analysis according to the areas of technology

According to our goal to determine technical content related to data anonymization
protected by the patenting process we conducted analysis by using International
Patent Classification (IPC) system [14]. The IPC separates the whole body of
technical knowledge using the hierarchical levels in descending order of
hierarchy[14]. Figure 1 represents the hierarchical levels of the IPC: section, class,
subclass and group. The contents of lower hierarchical levels are subdivisions of the
contents of the higher hierarchical levels, and the lower levels are subordinated to the
higher hierarchical levels [14].
Fig. 1. IPC hierarchical levels (Source: According to World Intellectual Property Organisation,
2015, p. 6)
The section ist the highest level of the hierarchy of the IPC. It is considered as a
very broad indication of the technologial contents [14]. IPC contains eight sections,
divided into classes, and each class refers to one or more subclasses. Finally, each
subclass is broken down into groups [14]. Patents related to the data anonymization
are most often patented under the sections G-Physics and H-Electricity. Some
examples of the sections with the data anonymization patents are: G06F-Electric
digital data processing, and G06Q-Data processing systems or methods. Some
examples of groups are: H04L9/00-Arrangements for secret or secure communication
and G06F21/62-Security arrangements for protecting computers, components thereof,
programs or data against unauthorized activity - Protecting access to data.
In this research, we analyze the active simple patent families related to data
anonymization according to the sections, subclasses and groups. In addition, we use
association rules analysis at IPCs' Group level in order to determine what is the
heterogeneity of the technical content protected by the patenting process.
2.3 Stage 3: Patent analysis according to the assignee country and
organization
According to the [13], the assignee is the entity that has the property right to the
patent. The assignee is not necessary the inventor of the new knowledge, since it is
more likely that the organization will assign a patent, in which the inventor is
employed. In this research we use the extensive analysis of organizations and
countries, focusing to the longitudinal trend when possible. The aim was to determine
which are top countries and organizations that assigned patents related to data
anonymization.
2.4 Stage 4: Text mining patent analysis
In order to detect the main themes that emerge in patents related to the data anony-
mization, text mining approach was utilized. Text mining of simple patent families
titles has been used in order to determine what themes emerge most often as the sub-
ject of patenting process related to data anonymization. In order to reduce the size of
variability of the words, different approachs like filtering, lemmatization or stemming
could be used [9]. We have used the Staticstica Text Mining software in order to
utilize the stemming method. Examples of stemming techniques are, e.g. remove the
ing from words, and s from plural of nouns. By using stemming algorithm, we
have build the stems, which are natural group of words with similar or even equal
meaning. For example, the stemming algorithm develops a stem analy which
represents words analysis and analytics.
3 Results
3.1 Patent search and trend analysis

Patseer database was searched at the 15th August 2015, using search string (TA:(data
AND anonymiz*)), with an option for searching simple patent families. There were
313 of records for simple family IDs in total. Among these records, there were 296
simple patent families that were active at the time of the search. Therefore, analysis of
the 296 simple patent families related to the data anonymization was conducted in
order to attain the goal of the research.
Figure 2 represents the patent dynamics for the period between 2001 and August
15th, 2015. The increasing trend is present, since in the period from 2001 to 20010
less than 10 simple patent families were registered per year. After than period, the
number of simple patent families is increasing, and to the 83 simple patent families
registered in 2014. Our data is missing patents that may have been assigned later than
the 15th August 2015.
83 83
35
26
18
9 6 8 10
3 2 1 5 3 4
Fig. 2. Number of data anonymization simple patent families (2001- August 15th, 2015)
Source: Authors; Patseer [15th August 2015]
3.2 Patent classification according to the areas of technology

Our search revealed that 296 simple patent families were registered under following
five IPC sections: A Human necessities, B Performing operations; Transporting, C
Chemistry, Metallurgy, G Physics and H Electricity. The majority of patents were
assigned to the section G Physics with the following sub-classes with the largest
number of patents: G06F - Electric digital data processing (283 simple patent fami-
lies) and G06Q - Data processing systems or methods (124 simple patent families).
Following is the section H that covers a significant number of patents assigned to sub-
classes H04L - Transmission of digital information (101 simple patent families) and
H04W - Wireless communication networks (21 simple patent families). One patent
can have more ICR codes, why the total of IRC codes (616 codes) is larger than the
number of patents examined (296 simple patent families). Appendix 1 provides de-
tailed information.
The IPC group analysis revealed the following results. The majority of data anon-
ymization simple patent families were assigned within the group G06F17/30-Digital
computing or data processing equipment or methods adapted for specific functions -
Information retrieval; Database structures (69 simple patent families). The second
most often IPC group is G06F21/60- Security arrangements for protecting computers,
components, programs or data against unauthorized activity - Protecting data (53
simple patent families). The third most often is group H04L29/06- Arrangements,
apparatus, circuits or systems, not covered by a single one of groups H04L 1/00-H04L
27/00 - Characterized by a protocol (39 simple patent families). More detailed infor-
mation on the number of patents related to data anonymization at the IPC group level
is presented in Appendix 2.
In order to detect the level of the heterogeneity of the technical content related to
data anonymization protected by the patenting process, we have used the association
rules analysis [9]. Most of the patents were assigned to more than one IPC group, and
616 groups were identified for the 296 simple patent families. This indicates that in
average one Simple Patent Family is registered to approximately 2 IPC groups. There-
fore, only 12 rules are generated, under the minimal support and confidence at 1%
level, which indicates that, is difficult to find the dependencies between different IPC
group level codes. The results are presented Figure 3.
Fig. 3 Association rules network at IPC Group level patents (Source: Authors; PatSeer [15th
August 2015]; Statistica Text Miner)
According to the set limitations and generated rules we conclude that heterogeneity
does not characterize the protected technical content related to data anonymization
content. Results reveal that following IPC groups were most often registered together:
G06F21/60- Security arrangements for protecting computers, components, programs
or data against unauthorized activity - Protecting data, G06F21/62- Security arrange-
ments for protecting computers, components thereof, programs or data against unau-
thorized activity - Protecting access to data via a platform, G06F17/30- Digital com-
puting or data processing equipment or methods adapted for specific functions - In-
formation retrieval; Database structures, and G06F17/00- Digital computing or data
processing equipment or methods, specially adapted for specific functions.
3.3 Patent assignee organization and country analysis
Our search revealed that the patenting activites is spread across different countries,
but the majority of patents related to data anonymization have been assigned by The
USA and Japan. In some of the cases, more than one organization from two or more
countries were the assignees. Figure 4 outlines the patent dynamics according to
countries for the period between 2001 and August 15th, 2015. The USA is the leading
country, since its organizations began publishing patents on data anonymization in
2001. Other countries followed later. European countries that have assigned more
than five patents in given period are Germany (29 simple patent families), Switzer-
land (18 simple patent families), France (13 simple patent families) and Ireland (8
simple patent families). Our data is missing patents that may have been assigned later
than the 15th August 2015. Appendix 3 lists the number of patents related to data
anonymization of the assignee countries for the period between 2001 and August
15th, 2015.
40
35
30
25
20
15
10
5
0
USA Ireland Japan France

South Korea Germany Switzerland
Fig. 4. Number of data anonymization simple patent families per country (2001- August 15th,
2015); countries with more than 5 simple patent families; Source: Authors; Patseer [15th
August 2015]
Table 1 represents the number of simple patent families related to data anonymiza-
tion according to assignee organization and country Assignee of the patent is organi-
zations that refer to a company, an academic institution and individual persons in
some of the cases. The organization with the largest number of simple patent families
related to the data anonymization in the observed period is NEC, registered in Japan,
(22 simple patent families or 7,61%), followed by IBM registered in the United States
of America (21 simple patent families or 7,27%). Other organizations that registered
larger number of simple patent families are as well multinational organizations, such
as Microsoft, Alcatel, Google, Siemens, Mastercard, and Amazon.
Table 1. Number of simple patent families related to data anonymization according to assignee
organization and country (Source: Authors; PatSeer [15th August, 2015])
Assignee organization Country code Count %

NEC Japan 22 7,61%
IBM USA 21 7,27%
ALCATEL LUCENT SA France 7 2,42%
MICROSOFT USA 7 2,42%
ACCENTURE GLOBAL SERVICES Ireland 6 2,08%
GOOGLE USA 6 2,08%
NIFTY Japan, Switzerland 6 2,08%
SIEMENS AG Germany 6 2,08%
MASTERCARD INTERNATIONAL USA 5 1,73%
FUJITSU Japan 4 1,38%
HITACHI Japan 4 1,38%
SAP Germany 4 1,38%
AMAZON TECH USA 3 1,04%
ELWHA USA 3 1,04%
KDDI Japan 3 1,04%
NIPPON TELEGRAPH TELEPHONE Japan 3 1,04%
Other - 186 61,94%
Total - 296 100%
3.4 Text mining utilization for themes identification

We used text mining analysis in order to extract most common words that occurred in
simple families patent titles, using stemming approach available as a feature of the
Staticstica Text Miner software. Stem or phrases are generated as the output of the
stemming algorithm. Table 2 shows most often used stems or phrases in titles of the
patents related to data anonymization. Anonym* is the most often phrase as ex-
pected. Also, method, data and system appeared in more than 100 cases. Other
phrases those are often present are inform, devic, apparatus, and process.
In order to provide a more intuitive insight into the themes that occur in the titles of
the simple patent families related to the data anonymization, tag cloud analysis is
conducted [6]. Tag cloud has become a common way of visualizing most occurring
themes, since it visualize of the most often words in the analyzed text, relating the
size of the word to its relative frequency. Therefore, the words that occur more often
are larger.
Software Wordle was used in order to generate a tag cloud of the stems that have
occurred the most often in the titles of the simple patent families related to the data
anonymization. In order to increase the transparency of the cloud, we have applied the
tag cloud algorithm to the stems that have occurred more than 5 times in the titles of
the simple patent families. We have excluded the stems data and anonym, since
they have occurred in every title due to the fact that these words were the criteria for
the selection of the simple patent family in the analysis. Also, two stems that also
occurred often are omitted from the analysis: system and method.
Table 2. Most often used words in titles of the patent related to data anonymization; = > 10
patents (Source: Authors; PatSeer [15th August 2015]; Statistica Text Miner)
Number of occur- Number of simple

Stem / Phrase Examples
rences in the title patent families
anonym 182 153 anonymous

method 180 170
data 144 132
system 124 120
inform 71 51 information
devic 45 44 device
apparatus 39 39
process 39 28
program 30 29
network 26 23
comput 20 18 computer
ident 19 18 identification
manag 19 12 managing
analy 18 18 analysis,analytics
person 16 14
privaci 16 15 privacy
medic 15 12 medical
protect 13 11 protected
provid 13 12 provider
commun 12 12 community
behavior 11 9
product 11 10
record 11 11
servic 11 10 service
distribut 10 7 distribute
secur 10 10 secure
Figure 4 indicates that stems method and system have occurred the most often
within the titles of simple patent families related to the data anonymization. Following
groups of topics were also identified: (i) themes related to physical equipment such as
devic, comput or apparatus; (ii) themes related to software such as program,
process, and analy or manag; (iii) themes related to the goal of the patent, such
as protect, ident, encrypt or privac; and (iv) some specific themes related to
the areas of the implementation, such as commun, medic, or service. Example
of the patent related to above mentioned groups are: (i) US20040199789A1: Anony-
mizer data collection device; (ii) US20080287118A1: Method, apparatus and
computer program for anonymization of identification data; (iii)
DE102007033667A1: Method and apparatus for an anonymous encrypted mobile data
- and voice communication, and (iv) US20100070306A1: Patient community system
with anonymized electronic medical data.
Fig. 4. Tag cloud of the most often used words in patent titles related to data anonymization; >
5 simple patent families (Source: Authors; PatSeer [15th August, 2015]; Wordle.org)
4 Conclusion
The paper presents the examination of data anonymization related simple patent fami-
lies, based on the data gathered from the Patseer. We have analyzed 296 active simple
patent families related to the data anonymization assigned from 2001 to 15th August
2015. The analysis is conducted in four stages: (i) detecting the trend in data
anonymization patenting, (ii) patent classification according to the areas of
technology, (iii) assignee organization and country analysis, and (iv) text mining
utilization for themes identification. The analysis revealed the answers to the research
question, that provide inights into the data anonymization patent landscape.
The first research question (RQ1) aimed at detecting the trend in data
anonymization patenting. The number of Single Patent Families is growing with the
high increase after 2010, and espetially after 2014, thus indicating a positive trend in
the area of patenting data anonymization solutions. Such increase is the result of the
incrase of the awareness of the necessity of the data privacy protection, and also the
new challenges (e.g. big data) that are ahead to this issue.
The second research question (RQ2) aimed at detecting protected technical content
related to data anonymization classified using IPC system (at the sub-class and group
level). The majority of simple patent families related to data anonymisation were
assigned to the section G Physics of IPC system. G section sub-classes with most
patents are G06F - Electric digital data processing and G06Q - Data processing
systems or methods. Within this sub-class, the majority were assigned to the group
G06F17/30- Digital computing or data processing equipment or methods adapted for
information retrieval and database structures. Therefore, the protection of data privacy
in databases and for information retrieval has brought the biggest attention of the
inventors, which is the result of the omnipresent digitization of the information.
Association rules analysis revealed that the patents with more than one IPC group
were homogenous, since all of the co-occurring IPC groups were from the class
G06F- Electric digital data processing.
The third research question (RQ3) aimed at which organizations from which
countries patented their innovations related to data anonymization. According to the
patent analysisz, the data anonymization technology is spread across different
countries, but the majority of simple patent families related to data anonymization
have been assigned by the USA and Japan organizations. The NEC, registered in
Japan, assigned the greatest number of patents, followed by IBM registered in the
USA in the observed period. Numerous multinational corporations, such as Google,
Microsoft, Amazon and MasterCard have also registered substantial number of pa-
tents related to the data anonymization.
The fourth research question (RQ4) aimed at detecting what themes emerge most
often as the subject of patenting process related to data anonymization. The most
often used word in titles of the patents related to data anonymization was anonym*,
followed by method, data and system. Several additional groups that indicated
the most often themes related to data anonymization were detected: physical equip-
ment, software, protection, identification, encryption or privacy, and specific themes
such as community, medical, or service.
Limitations of this work result from the fact that we have oriented only to the
simple patent families that have the word data and one of the following words:
anonymizing, anonymization, anonymized, anonymizy and anonymize.
Hence, the patents that have these words in the abstract, but not in the title are omitted
from the analysis. Furthermore, the analysis was conducted only for the part of the
year 2015, which prevented us in providing conclusion for the most recent period.
Further research recommendations emerge from these limitations, urging the need to
include also the abstract and full text into the analysis. Since this would lead to the
much larger number of results, text mining approach should be fully utilized for such
a research in order to automatize the process of analysis.
Appendices
Appendix 1. Number of patents related to data anonymization according to the IPC system -
Sub-class level (Source: Authors; Patseer [15th August 2015])
Simple
Code Code description patent
families
A Human necessities
A61B Diagnosis; Surgery; Identification 3
B Performing operations; transporting
B60Q Arrangement of signaling or lighting devices 1
B65G Transport or storage devices 1
C Chemistry; Metallurgy
C12N Micro-organisms or enzymes; compositions 1
G Physics
Measuring distances, levels or bearings; surveying; navigation;
G01C 1
gyroscopic instruments; photogrammetry; videogrammetry
Measuring not specially adapted for a specific variable and variables
G01D 2
not covered by a single another subclass
G01R Measuring electric and magnetic variables 3
G05B Control or regulating systems 2
G06F Electric digital data processing 283
G06K Recognition and presentation of data; record carriers 9
G06N Computer systems based on specific computational models 6
G06Q Data processing systems or methods 124
G06T Image data processing or generation 4
Ticket-issuing apparatus; taximeters; apparatus for collecting fares,
G07B 1
tolls or entrance fees; franking apparatus
Time or attendance registers; registering or indicating the working of
G07C machines; generating random numbers; voting or lottery 5
apparatus; arrangements, systems or apparatus for checking
G07G Registering the receipt of cash, valuables, or tokens 6
G08B Signalling or calling systems; order telegraphs; alarm systems 2
G08C Transmission systems for measured values and control signals 2
G08G Traffic control systems 2
G09B Educational or demonstration appliances 1
G09C Ciphering or deciphering apparatus 8
Speech analysis or synthesis, recognition, processing, coding or
G10L 3
decoding
Information storage based on relative movement between record
G11B 1
carrier and transducer
H Electricity
H04H Broadcast communication 5
H04L Transmission of digital information 101
H04M Telephonic communication 7
H04N Pictorial communication 6
H04Q Selecting 3
H04W Wireless communication networks 21
Missing data 2
Appendix 2. Number of patents related to data anonymization according to the IPC system -
Group level (Source: Authors; PatSeer [15th August 2015])
Code Code Description Count

G06F Electric digital data processing
Accessing, addressing or allocating within memory systems or architectures-
G06F12/14 7
Protection against unauthorized use of memory
Digital computers in general -Combinations of two or more digital computers
G06F15/16 10
each having at least an arithmetic unit, a program unit and a register
Digital computing or data processing equipment or methods, specially adapted
G06F17/00 12
for specific functions
Digital computing or data processing equipment or methods adapted for spe-
G06F17/30 69
cific functions - Information retrieval; Database structures
Digital computing or data processing equipment or methods, specially adapted
G06F19/00 13
for specific applications
Security arrangements for protecting computers, components thereof,
G06F21/00 18
programs or data against unauthorized activity
Security arrangements for protecting computers, components, programs or
G06F21/60 53
data against unauthorized activity - Protecting data
Security arrangements for protecting computers, components thereof,
G06F21/62 37
programs or data against unauthorized activity - Protecting access to data
Methods or arrangements for processing data by operating upon the order or
G06F7/00 10
content of the data handled
G06Q Data processing systems or methods
G06Q10/00 Administration; Management & Resources, workflows, human or project
19
& 06 management; Enterprise planning; Organizational models
G06Q10/10 Administration; Management-Office automation; Time management 8
G06Q30/00 Commerce 13
G06Q30/02 Commerce - Marketing 16
G06Q40/00 Finance; Insurance; Tax strategies; Processing of corporate or income taxes 8
G06Q50/00 Systems or methods adapted for a specific business sector 10
G06Q50/10 Systems or methods adapted for a specific business sector - Services 6
G06Q50/22 Systems or methods adapted for a specific business sector - Healthcare 14
G06Q50/24 Systems or methods adapted - Patient record management 9
G09C Ciphering or deciphering apparatus for cryptographic or other purposes
Apparatus or methods whereby a given sequence of signs is transformed into
G09C1/00 an unintelligible sequence of signs by transposing the signs or groups of signs 8
or by replacing them by others according to a predetermined system
H04H Broadcast communication
Arrangements, apparatus, circuits or systems, not covered by a single one of
H04L29/06 39
groups H04L 1/00-H04L 27/00 - Characterized by a protocol
Arrangements, apparatus, circuits or systems, not covered by a single one of
H04L29/08 8
groups H04L 1/00-H04L 27/00 - Transmission control procedure
H04L9/00 Arrangements for secret or secure communication 7
Arrangements for secret or secure communication - Including means for
H04L9/32 12
verifying the identity or authority of a user of the system
H04L12/26 Data switching networks- Monitoring arrangements; Testing arrangements 5
H04W Wireless communication networks
H04W12/02 Security arrangements; Authentication; Protecting privacy or anonymity 7
Appendix 3. Number of simple patent families related to data anonymization according to
assignee country (Source: Authors; PatSeer [15th August, 2015])
Assignee Number of
Assignee country / region %
country code patents
The United States of America US 131 44,2%
Japan JP 59 19,93%
Germany DE 29 9,80%
Switzerland CH 18 6,08%
France FR 13 4,39%
South Korea KR 9 3,04%
Ireland IE 8 2,70%
United Kingdom GB 4 1,35%
Sweden SE 4 1,35%
Australia AU 3 1,01%
Finland FI 3 1,01%
India IN 2 0,68%
Spain ES 2 0,68%
United States of America-United Kingdom US-GB 2 0,68%
The United States of America-Japan US-JP 2 0,68%
Austria AT 1 0,34%
Denmark-United States of America DE-US 1 0,34%
Ireland-United States of America IE-US 1 0,34%
Israel IZRAEL 1 0,34%
Norway NO 1 0,34%
Russia RU 1 0,34%
The United States of America-Australia US-AU 1 0,34%
Total 296 100,00%
References
1. Abbas, A., Zhang, L., Khan, S.U.: A literature review on the state-of-the-art in patent
analysis. World Patent Information. 37, 3-13(2014)
2. Brgmann, S. et al.:Towards content-oriented patent document processing: Intelligent
patent analysis and summarization. World Patent Information. 40, 3042 (2015)
3. Cecere, G., Corrocher, N., Gossart, C., Ozman, M.: Technological pervasiveness and
variety of innovators in Green ICT: A patent-based analysis. Research Policy, 43(10),
1827-1839 (2014)
4. Choi, J., Hwang, Y.-S.:Patent keyword network analysis for improving technology
development efficiency. Technological Forecasting and Social Change. 83, 170182
(2014)
5. Cormode, G., Srivastava, D.: Anonymized Data: Generation, models, usage. In: 26th IEEE
International Conference on Data Engineering, pp. 12111212. IEEE, Long Beach (2010)
6. De Spindler, A., Leone, S., Nebeling, M., Geel, Matthias, Norrie, M.C: Using
Synchronised Tag Clouds for Browsing Data Collections. In: Mouratidis, H., Rolland, C.
(eds.) Advanced Information Systems Engineering. LNCS, vol. 6741, pp. 214-228.
Springer, Heidelberg (2011).
7. Grant, E., Van den Hof, M., Gold, E. R.: Patent landscape analysis: A methodology in
need of harmonized standards of disclosure. World Patent Information. 39, 3-10 (2014)
8. Han, E.J., Sohn, S.Y.: Technological convergence in standards for information and
communication technologies. Technological Forecasting and Social Change. 106, 110
(2016)
9. Kim, J., Lee, S.: Patent databases for innovation studies: A comparative analysis of
USPTO, EPO, JPO and KIPO. Technological Forecasting and Social Change. 92, 332345
(2015)
10. Noh, H., Jo, Y. and Lee, S.: Keyword selection and processing strategy for applying text
mining to patent analysis. Expert Systems with Applications. 42(9), 43484360 (2015)
11. Patent Lens, http://www.bios.net/daisy/patentlens/ip/around-the-world.html
12. Sheau-Pyng, J., Ming-Fong, L., Chin-Yuan, F.: Using Patent Analysis to Analyze the
Technological Developments of Virtualization. Procedia-Social and Behavioral Sciences.
57, 146-154 (2012)
13. Sinha, M., Pandurangi, A.: Guide to Practical Patent Searching and how to use PatSeer for
Patent Search and Analysis. Gridlogics Technologies, Pune (2015)
14. World Intellectual Property Organization. Guide to the IPC. WIPO (2015) Available at:
http://www.wipo.int/export/sites/www/classifications/ipc/en/guide/guide_ipc.pdf
[Accessed April 21, 2016].

Data Anonymization Patent Landscape

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Anonymization Patent Landscape

Uploaded by

Copyright:

Available Formats

Data Anonymization Patent Landscape

Mirjana Peji Bach

Abstract. The omnipresence of the digital data that is unstoppably increasing

Keywords: data anonymization, patent landscape, PatSeer, data mining, associ-

RQ1: What is the trend in data anonymization patenting in terms of change in

2.1 Stage 1: Patent selection and trend analysis

2.2 Stage 2: Patent analysis according to the areas of technology

2.4 Stage 4: Text mining patent analysis

3.1 Patent search and trend analysis

3.2 Patent classification according to the areas of technology

USA Ireland Japan France

Assignee organization Country code Count %

3.4 Text mining utilization for themes identification

Number of occur- Number of simple

anonym 182 153 anonymous

Code Code Description Count

You might also like