You are on page 1of 9

ARTICLE IN PRESS

computer law & security review (2016)

Available online at www.sciencedirect.com

ScienceDirect

w w w. c o m p s e c o n l i n e . c o m / p u b l i c a t i o n s / p r o d c l a w. h t m

Data mining and copyright: A bittersweet


technology gift for copyright owners and the
Malaysian public?

Pek San Tay *, Cheng Peng Sik


Faculty of Law, University of Malaya, Kuala Lumpur, Malaysia

A B S T R A C T

Keywords: The Age of Big Data is marked by an explosion of digital information that is captured by
Big data new digital technologies. The volume, velocity and variability of the data that is captured
Databases surpass the processing capacity of the human mind. Understanding text contents and cor-
Data mining relations can no longer be performed without the intervention of technology. Data mining
Data analytics refers to the process of using software techniques to analyze large amounts of data sets in
Copyright infringement order to discover useful information. The information is usually in the form of patterns or
Fair dealing relationships that might otherwise not be observed if scanned by the human eye and is there-
Transient copy fore useful for many industries in predicting future trends. However, data mining involves
Balance of interests in copyright the reproduction of databases and, accordingly, raises copyright issues. The objective of this
paper is two-fold. First, it identifies the copyright issues arising from data mining and analy-
ses its legality under Malaysian copyright law. Secondly, it discusses how copyright law can
play a role to regulate data mining so as to protect database owners without hindering the
public interest to enjoy the benefits of data mining. It concludes by recommending how the
Malaysian Copyright Act 1987 may be amended in order to strike a balance between the
competing interests of database owners and that of the public so that the benefits of data
mining may be optimised.
2016 TAY Pek San & SIK Cheng Peng. Published by Elsevier Ltd. All rights reserved.

In relation to such data sets, the standard database software


1. Introduction is not capable of capturing, storing, structuring and analyz-
ing the data in order to yield insightful trends or statistics that
In todays world of big data, an ever-increasing volume of data are beneficial to most organisations. Technological advances
is captured, aggregated and processed by new digital tech- have enabled analytics tools to be developed which can quickly
nologies such as the internet, smartphones, tablet computers, sift through vast amounts of data in a database in order to
sensors and social media. There is a massive increase in the extract useful information, analyse them and develop previ-
size of data sets that are generated, the speed of generation ously unknown patterns or statistics which are potentially
of the data is rapid and the types of data are extremely diverse.1 useful for an organisation to make informed decisions. Data

* Corresponding author. Faculty of Law, University of Malaya, 50603 Kuala Lumpur, Malaysia.
E-mail address: tayps@um.edu.my (P.S. Tay).
1
Fishleigh, J., A Non-Technical Journey into the World of Big Data: An Introduction (2014) 14 Legal Information Management, Issue 2 at 149
151, available at http://www.paynehicksbeach.co.uk/docs/publications/2014legalinformationmanagementvol14no2v01szb.pdf (last accessed
16/2/2016). In this paper, the word data is intended to include texts, images, sounds, videos, photos and any other types of content that
can be analysed electronically.
http://dx.doi.org/10.1016/j.clsr.2016.07.008
0267-3649/ 2016 TAY Pek San & SIK Cheng Peng. Published by Elsevier Ltd. All rights reserved.

Please cite this article in press as: Pek San Tay, Cheng Peng Sik, Data mining and copyright: A bittersweet technology gift for copyright owners and the Malaysian public?,
Computer Law & Security Review: The International Journal of Technology Law and Practice (2016), doi: 10.1016/j.clsr.2016.07.008
ARTICLE IN PRESS
2 computer law & security review (2016)

mining is the application of specific algorithms to search, index hand, it can also be a bane. For instance, competitors who gain
and analyse electronic databases so as to yield insightful pat- access to the database of a business owner can mine the data
terns that would not have been possible if conducted manually. and potentially obtain a competitive advantage over that busi-
The ability of data mining technology to extract new knowl- ness owner. Accordingly, another objective of this paper is to
edge from databases is widely acknowledged. For instance, in consider how copyright law can play a role to regulate data
the meteorological industry, the analysis of multi-source mining so as to protect the private interest of copyright owners
weather data can generate trends that can be used to predict in their databases without hindering the public interest to enjoy
future weather.2 In the healthcare industry, patient data gen- the benefits of data mining.
erated by hospital charts, sensors, monitors and doctors records The paper is organised as follows. The second section of this
can be analysed to garner insights on improving treatment and paper provides an overview of the data mining process in order
hospital care.3 In the higher education sector, data mining of to identify the steps within the process that raise copyright
databases containing scholarly literature offers great re- issues. The third section discusses the protection which Ma-
search trends for academicians.4 In the commercial world, data laysian copyright law confers on databases and analyses the
about customer habits, behavior, preferences and spending pat- nature and scope of the exclusive right conferred on data-
terns can be analysed to garner previously unknown trends base owners. The fourth section examines whether data mining
in order to provide better products and services.5 interferes with the reproduction right of the database owner.
Big data analytics is very much at the infancy stage in Ma- The fifth section considers whether the exceptions to copy-
laysia but it is a growing area which has recently been given right provided by the Copyright Act 1987, namely, the fair dealing
emphasis by the Government. In January 2015, the Govern- and transient reproduction exceptions, provide a defence to
ment launched the National Big Data Analytics Innovation data mining. In light of the benefits and real threats to copy-
Network to accelerate the adoption of big data analytics in the right owners posed by data mining, the final section concludes
country. Pursuant to that, a network of Big Data Innovation with some thoughts on how the Malaysian Copyright Act 1987
Centres of Excellence was established to work on dengue pre- could regulate data mining while, at the same time, further the
diction and prevention, fighting organised crimes including drug primary purpose of copyright law which is to maintain a deli-
trafficking and pinpointing tax fraud through predictive cate balance between the private needs of database owners and
analysis.6 These projects inevitably utilise data from a variety the public interest to benefit from data mining activities. As
of sources including databases from the private sector, open the thrust of this paper is to examine the intersection between
data7 and purchased data. data mining and copyright law in Malaysia, the ensuing dis-
In light of the growing interest in the Malaysian public and cussion is confined to databases which enjoy copyright
private sectors about the benefits of big data analytics, the ob- protection. Databases which are not protected by copyright,
jective of this paper is to identify the copyright issues that arise either because of the lack of originality or the inability to meet
from data mining in Malaysia and analyse whether data mining the intellectual creation requirement, are outside the purview
is legal or otherwise under Malaysian copyright law. In a sense, of this paper.
data mining is a double-edged sword. On the one hand, data
mining is a boon to many industries. For instance, as noted
earlier, researchers and industry players can obtain new in- 2. Data mining operations
sights in order to forge forward into new frontiers. On the other
There are different techniques of data mining which technol-
ogy offers. In his study on the legal framework of data mining,
2
Olaiya, F. and Adeyemo, A.B., Application of Data Mining Tech- Jean-Paul Triaille8 provides a general description of the differ-
niques in Weather Prediction and Climate Change Studies (2012) 4
ent steps which are involved in most data mining operations.
International Journal of Information Engineering and Electronic Busi-
ness, No 1 at 5159, available at http://www.mecs-press.org/
The steps are essentially as follows although it is recognised
ijieeb/ijieeb-v4-n1/v4n1-7.html (last accessed 16/2/2016). that the methodology of data mining together with the details
3
Jensen, P.B., Jensen L.J. and Brunak, S., Mining Electronic Health and specifics will differ on a case to case basis depending on
Records: Towards Better Research Applications and Clinical Care (2012), factors such as the software that is used, the expertise of the
13 Nature Reviews Genetics at 395405, Odgers, D.J. and Dumontier, analyst undertaking the task, the complexity of the data under
M., Mining Electronic Health Records Using Linked Data (2015), Pro- consideration and the purpose of conducting the data mining
ceedings AMIA Joint Summits on Transnational Science, available
exercise. Once the objective of the data mining is estab-
at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4525267/ (last ac-
cessed 16/2/2016). lished, the databases to be mined are identified and the initial
4
Guadamuz, A. and Cabell, D., Data Mining in UK Higher Educa- data corpus is determined. This is a pre-processing step which
tion Institutions: Law and Policy (2014) 4 Queen Mary Journal of involves data cleaning to remove inconsistent data. Data for
Intellectual Property 3. the mining process is usually obtained from more than one
5
See generally, Linoff, G.S. and Berry, M.J.A., Data Marketing Tech- database. In such a case, the process of data integration takes
niques: For Marketing, Sales, and Customer Relationship Management,
3rd ed (Wiley Publishing, 2011).
6 8
See MDeC Launches National Big Data Analytics (BDA) Innovation See Triaille, Jean-Paul, dArgenteuil, Jrme de Mees and de
Network, available at http://www.mimos.my/paper/mdec Francquen, Amlie, Study on the Legal Framework on Text and Data
-launches-national-big-data-analytics-bda-innovation-network/ (last Mining (TDM) (European Union, 2014), available at http://ec.europa.eu/
accessed 16/2/2016). internal_market/copyright/docs/studies/1403_study2_en.pdf (last ac-
7
Open data are data that any person can freely use, re-use and cessed 16/2/2016), Han, J., Kamber, M. and Pei, J., Data Mining: Concepts
redistribute subject to requirements of attribution and share alike. and Techniques, 3rd ed (Elsevier, 2012) at 67.

Please cite this article in press as: Pek San Tay, Cheng Peng Sik, Data mining and copyright: A bittersweet technology gift for copyright owners and the Malaysian public?,
Computer Law & Security Review: The International Journal of Technology Law and Practice (2016), doi: 10.1016/j.clsr.2016.07.008
ARTICLE IN PRESS
computer law & security review (2016) 3

place where multiple data sources are combined. The result- protection of databases.12 Indeed, the term database does not
ing data is then stored in a data warehouse. Analytics tools appear at all in the Copyright Act 1987. However, the word data
such as spiders, crawlers or bots are then applied to search appears in section 8 which regards compilations of mere data
through the databases and select data relevant to the analy- as derivative works. Database is defined in the Oxford English
sis. The relevant data are then extracted from the database, Dictionary as a structured set of data held in computer storage
which essentially is a reproduction of the relevant data. This and accessed or manipulated by means of specialized soft-
step may involve the whole or a substantial part of the data- ware. The requirement that there be some structure to the set
base. The purpose of the extraction is usually to convert the of data excludes, as a matter of course, those collections of data
data into a single format that is suitable for the software op- that are disorderly or disorganised even though they may
erational needs.9 The extracted data is stored in a repository contain valuable bits of information that are of interest to data
in a computer. The process of analysis then takes place in order miners. Such collections of data are not databases as they do
to extract data patterns. At this juncture, some key data may not meet the definition of the term. Accordingly, they do not
be separated from the rest of the content as the software sys- enjoy copyright protection in Malaysia and the mining of data
tematically verifies the existence of new patterns or structure. from such collections does not raise copyright issues.
The data may be indexed, linked, compared and contrasted, Of greater interest to the discussion in this paper are those
aggregated, merged or disassembled. The output of the analy- databases which are created with some structure in mind. The
sis is a report which yields new knowledge such as previously extent of investment, whether in the form of skill, labour,
unknown patterns, statistics, number of occurrences and finance, time or other entrepreneurial effort, which is ex-
relationships. pended in creating such databases varies from insignificant to
There may be exceptional situations where the analytics substantial. This aspect is important because only databases
tool only crawls through the database and does not extract which meet the originality test under copyright law enjoy copy-
the data. Instead, the analytics tool directly accesses and pro- right protection while those that do not meet the test are
cesses part by part of the database individually without making outside the purview of copyright. The originality test as spelt
any copies to download. The analytics tool merely analyses a out in section 7(3) of the Copyright Act 1987 requires suffi-
small section of the database content one at a time before cient effort to be expended to make the database original in
moving on to the next section. Such operations do not pose character.
copyright issues because no reproduction of the database or Under the Copyright Act 1987, two provisions confer copy-
any part of it is involved. This usually applies to simple data right protection on databases. These are, first, compilations of
mining operations where a one-time access is sufficient.10 data under the broad category of literary work spelt out in
The above account is a general description of the steps that section 7(1)(a). Literary work is defined in section 3 as en-
are involved in a data mining operation in order to provide the compassing a wide range of works, one of which is
contextual background for the ensuing discussion as to what compilations, whether expressed in words, figures or symbols
exclusive rights of copyright owners are infringed in the course and whether in a visible form. The second provision is found
of a data mining operation and when the rights are infringed. in section 8(1)(b) which protects databases as derivative works.13
It is not intended to be a detailed description of the technical Pursuant to section 8(1)(b), copyright is conferred upon col-
processes that take place during data mining operations. lections of works eligible for copyright or compilations of mere
data whether in machine readable or other form, which con-
stitute intellectual creation by reason of the selection and
arrangement of their contents.
3. Copyright protection of databases in The presence of two separate provisions within the Act that
Malaysia can protect databases raises an anomaly because of the two
different threshold levels for copyright protection spelt out in
As noted in the previous section, one of the steps in the process the sections. For compilations under section 7(1)(a), the test
of a data mining operation involves data selection where data of originality is embodied in section 7(3)(a). The sub-section
which are relevant to the analysis are extracted from the da- provides that a literary work shall not be eligible for copy-
tabases and converted into a format that is suitable for mining. right unless sufficient effort has been expended to make the
Essentially, a part of or the entire database is reproduced in work original in character. The amount of effort is a question
the process. In light of such reproduction of the database, this of degree in each case but simply making a list or putting several
section and the next examines how copyright law protects da-
tabases in Malaysia especially with regard to the issue of the 12
Compare, for instance, the position in the UK where data-
reproduction of data in the course of data mining. bases are protected under two separate legal regimes, namely,
Database is protected in Malaysia through the Copyright Act copyright and database right. Database right is granted to all da-
1987. Unlike the position in Europe where sui generis protec- tabases but only databases that satisfy the criteria for copyright
tion exists for databases under the European Union Directive will enjoy copyright protection. In the US, databases are pro-
on Database 96/9/EC,11 Malaysia does not have a regime for the tected by copyright law as compilations under the Copyright Act
1976. There is no specific database right in the US.
13
A derivative work is a work based upon or derived from one
9
Triaille et al., ibid at 28. or more pre-existing works. Examples include the translation of
10
Ibid at 47. a work, the adaptation of an existing work such as the making of
11
Directive 96/9/EC of the European Parliament and of the Council a movie based on a novel or vice versa, and the arrangement of
of 11 March 1996 on the legal protection of databases. an existing piece of music.

Please cite this article in press as: Pek San Tay, Cheng Peng Sik, Data mining and copyright: A bittersweet technology gift for copyright owners and the Malaysian public?,
Computer Law & Security Review: The International Journal of Technology Law and Practice (2016), doi: 10.1016/j.clsr.2016.07.008
ARTICLE IN PRESS
4 computer law & security review (2016)

lists together does not amount to sufficient effort.14 In the land- public is granted to the maker of the database.20 The invest-
mark English case of University of London Press Ltd v University ment may be in the form of financial, human or technical
Tutorial Press Ltd,15 Peterson J made it clear that originality in resources and its substantiality is judged qualitatively and
copyright law was not concerned with the originality of ideas. quantitatively.21 It is possible for a database to enjoy copy-
Neither should the notion of originality be equated to novelty right protection if it is original22 and, at the same time, to be
or innovativeness because copyright law is not concerned with conferred database right if substantial investment had been
the quality of a work.16 According to Peterson J, originality meant expended in creating the database.23 The database right pro-
that the work must have originated from the author and that tects the raw data contained in the database and not the
sufficient skill and labour must have been expended into arrangement of the database.
making the work. However, for databases under section 8(1)(b), In Malaysia, regardless of whether the work is a compila-
which are described as collections of works or collections of tion or a database, what is ultimately protected by copyright
mere data, it is a requirement that the selection and arrange- is not the individual item of information but the skill and effort
ment of the constituent items constitute intellectual creation. that has been expended in creating the compilation or in the
As in the case of compilations under section 7(1)(a), it is irrel- selection and arrangement of the information.24 It is the skill
evant whether the constituent items of a database under section and effort which has been expended that confers originality
8(1)(b) enjoy copyright protection or otherwise. The phrase in- on the work to merit copyright protection. This is illustrated
tellectual creation has not been interpreted by Malaysian courts in the Malaysian High Court case of Penerbit Fajar Bakti Sdn Bhd
as yet. Notwithstanding this, it is likely that Malaysian judges v Cahaya Surya Buku dan Alat Tulis25 where the plaintiff alleged
would take the cue from its European counterpart where the that the defendants had infringed their copyright in two books.
standard for originality under the European Union Copyright The books were anthologies of short stories and poems. The
Directive 2001/29/EC is intellectual creation. In other words, plaintiff claimed that they had copyright in both the books as
the author has exerted some intellectual judgment in the se- a collection of works. In defence, the defendants argued that
lection of the material or the method of their arrangement since the plaintiff was not the author of the individual stories
instead of the mere exercise of skill and labour, which is a less and poems, they could not have copyright in the collection of
stringent requirement.17 In light of the higher standard that works. However, the court held that a distinction should be
is required by section 8(1)(b) when compared with section 7(1)(a), made between, on the one hand, the copyright in the indi-
it is questionable whether the former section furthers any useful vidual stories or poems and, on the other hand, the copyright
purpose since it would be less exacting to meet the require- in the collection. According to the court, the person who com-
ments of section 7(1)(a). This is especially so since there is no piled the books had expended skill and labour in the
distinction in the scope of copyright protection conferred on compilation and, accordingly, was the copyright owner of the
the owner of a compilation or a database. compilation although not the copyright owner of each of the
The above differs markedly from the position in the United individual stories or poems.
Kingdom where compilations are treated differently from da-
tabases because of the existence of the sui generis database right
which was put in place by the European Union Database 4. Infringement of copyright in compilations
Directive.18 The Database Directive was enacted in the United and databases
Kingdom by the Copyright and Rights in Databases Regula-
tions 1997 (SI 1997/3032) which established a two-tier system
The nature of the exclusive right given to the copyright owner
of protection for databases.19 For databases where the selec-
in a compilation and a database is the same and this is pro-
tion or arrangement of the data amounts to the authors own
vided in section 13(1) of the Copyright Act 1987. The exclusive
intellectual creation, copyright protection is conferred on the
right which is granted to the copyright owner is in essence a
database under section 3A(2) of the Copyright, Designs and
bundle of exclusive rights. The most relevant right in the context
Patents Act 1988. For databases where the selection or arrange-
of data mining is the right to control the reproduction of the
ment of the raw materials does not amount to the higher
work in any material form. Reproduction is defined in section
standard of intellectual creation but nevertheless its compi-
3 of the Act as the making of one or more copies of a work in
lation calls for a substantial investment in the obtaining,
verification or presentation of its contents, a 15-year period of
20
protection from the date the database is made available to the Copyright and Rights in Databases Regulations 1997, regula-
tions 17(2) and 14.
21
British Horseracing Board Ltd (BHB) v William Hill Organisation Ltd
(C-203/02) [2005] RPC 260.
22
In Football Dataco and Others v Yahoo and Others [2012] EUECJ C-604/
14
Hardial Singh Hari Singh v Daim Zainuddin & Ors [1991] 2 CLJ (Rep) 10, the CJEU reaffirmed, at para 38, that originality for copyright
701. purposes required the author to express his creative ability in an
15
[1916] 2 Ch 601. original manner by making free and creative choices through the
16
Copyright Act 1987, section 7(2). selection and arrangement of the data. Skill and judgement alone,
17
Infopaq International A/S v Danske Dagblades Forening [2012] EUECJ albeit significant, was not sufficient.
23
C-302/10, Public Relations Consultant v The Newspaper Licensing Agency Copyright and Rights in Databases Regulations 1997, regula-
Limited and Others [2013] UKSC 18. tion 13(1).
18 24
Above n. 11. Khaw, L.T., Copyright Law in Malaysia, 3rd ed (LexisNexis, 2008)
19
The Copyright and Rights in Databases Regulations 1997 came at 79.
25
into force on 1 January 1998. [1989] 1 MLJ 386.

Please cite this article in press as: Pek San Tay, Cheng Peng Sik, Data mining and copyright: A bittersweet technology gift for copyright owners and the Malaysian public?,
Computer Law & Security Review: The International Journal of Technology Law and Practice (2016), doi: 10.1016/j.clsr.2016.07.008
ARTICLE IN PRESS
computer law & security review (2016) 5

any form or version. In turn, copy is also defined in the section section. The effort at this stage is negligible in so far as the
as a reproduction of a work in a written form, in the form of maker is concerned because the task is handled by the soft-
a recording or film or any other material form. Material form ware. Apart from the skill and effort in searching and selecting
is defined in the same section as including any visible or non- the data for input, a significant amount of skill and effort takes
visible form of storage from which the work may be reproduced. place during the early stages of identifying the purpose of the
The reproduction right is not limited to the making of dupli- database, designing it and creating software that will carry out
cate or facsimile copies but extends also to the making of copies the task of storing the information in an organised fashion.
in a medium different from that of the original. For instance, Notwithstanding that the database is only created at the point
making a copy of an electronic database that is available on in time when the data is input into the system, the House of
a network and downloading it into the random access memory Lords decision in Ladbroke (Football) Ltd v William Hill (Football)
of a computer or a USB stick is an act of reproduction. Ltd,29 which has been followed in Malaysia,30 confirms that pre-
The reproduction right enables the copyright owner to liminary or preparatory works constitute labour and skill in
control the reproduction of the whole work or a substantial assessing the originality of the compilation or database. Ac-
part of it. As noted earlier, in order to undertake data mining, cording to the court, it is not possible to separate the different
copying of the whole or a substantial part of the compilation stages and focus only on the step of presenting the work in
or database is usually required.26 Where the entire work is material form as the sole factor in determining sufficiency of
copied by the defendant, it is fairly straightforward to estab- skill and labour.
lish that infringement of the compilation or database has In data mining, once the data miner gains access to the da-
occurred. tabase to be mined, analytics tools are applied to search through
More problematic is the case where more than a small part the database. The whole of the database or a large portion of
of the compilation or database is reproduced by the analytics it is copied and converted into a format to meet the opera-
tool in the course of data mining. The difficulty lies in deter- tional needs of the software so that the data can be analysed.
mining whether the part that is taken amounts to a substantial As discussed earlier, much skill and effort is expended in de-
part for the purpose of proving infringement. Cases have firmly veloping the structure and design of the database as well as
established that substantiality is a question of fact which is searching and selecting the data for input into the database.
determined primarily by reference to the quality of that part By reproducing a substantial part of the structure and design
which is copied in relation to the work as a whole although of the database as well as the data itself, the data miner has
the quantity copied is also a relevant factor.27 Bearing in mind appropriated for himself the skill and effort which the data-
that copyright protection of databases and compilations es- base maker has put into creating the database. Even if only
sentially aims to protect the skill and effort in creating the the data is taken but not the structure or design of the data-
databases, the question arises as to what exactly is the nature base, there would still be an appropriation of the database
of the skill and effort where electronic databases are con- makers skill and effort in searching and selecting the data for
cerned. A general description of database creation is as follows, input into the database, It is this aspect that is objectionable
although obviously it is only representative because the details and not the reproduction of the data itself because there is no
will differ depending on the subject, type of database and tech- copyright in data per se since copyright law only protects the
nology involved.28 The first step is for the database maker to form of expression of an idea and not the idea itself, unless
identify the purpose of the database. Once that is done, the the data in the database is a work that is protected by copy-
maker needs to carefully design the database. At this stage, right. Every data in the database is important and the copying
he has to decide the structure of the database or its format, of a large quantity of data in relation to the whole database
which includes matters such as the major subject content, the amounts to taking a substantial part of the skill and effort of
types of information that would be stored in each subject the owner.31
content, how the information is to be organised once it is cap- It is conceded that one of the factors commonly used in as-
tured, how it may be retrieved and the sort of report or overall sessing substantiality, which is whether the part of the database
information he wants to generate from the database. Next, he taken may prejudice the sale or compete with the original
has to acquire software that is designed to carry out that work,32 may weigh in favour of the data miner. This is because
purpose or engage a computer programmer to create the soft- the data that is extracted by the analytics tool is used for analy-
ware programme. With that in place, he will then input any sis purposes only in order to yield an output that produces
existing data that he has and searches for more data continu- useful information. The output is an independent creation
ously from sources he has identified. Unlike the manual creation which is an analysis of the original data and is typically in the
of a database, once the data is input into the system, the com- form of a report, graph, table or chart and hence does not
puter software will organise the information to the assigned contain any of the original data that were mined. Notwith-
standing this, since the question of substantiality is a matter

26
See section 2 above.
27 29
Longman Malaysia Sdn Bhd v Pustaka Delta Pelajaran Sdn Bhd [1987] [1964] 1 WLR 273.
30
2 MLJ 359, Megnaway Enterprise Sdn Bhd v Soon Lian Hock (sole pro- Radion Trading Sdn Bhd v Sin Besteam Equipment Sdn Bhd & Ors
prietor of the firm Performance Audio & Car Accessories Enterprise) [2009] [2010] 9 MLJ 648.
31
3 MLJ 525. Copyright Act 1987, section 7(2A), Onestop Software Solutions (M)
28
See generally, Steps in designing a database, available at Sdn Bhd v Masteritec Sdn Bhd & Ors [2009] 8 MLJ 528.
32
http://www.technologyforall.com/TechForAll/datadesigning.htm (last Longman Malaysia Sdn Bhd v Pustaka Delta Pelajaran Sdn Bhd [1987]
accessed 16/2/2016). 2 MLJ 359.

Please cite this article in press as: Pek San Tay, Cheng Peng Sik, Data mining and copyright: A bittersweet technology gift for copyright owners and the Malaysian public?,
Computer Law & Security Review: The International Journal of Technology Law and Practice (2016), doi: 10.1016/j.clsr.2016.07.008
ARTICLE IN PRESS
6 computer law & security review (2016)

of degree and involves many mixed factors, it is submitted that ing fair use under the US Copyright Act of 197634 and are as
analytics tools that extract a large portion of a database in terms follows:
of the quantity of data and the databases structure are actu-
ally reproducing a substantial part of the database. Accordingly, (a) The purpose and character of the dealing, including
it appears that data mining operations where the whole or a whether such dealing is of a commercial nature or is for
substantial part of the database is extracted by the analytics non-profit educational purposes
tool amounts to copyright infringement. (b) The nature of the copyright work
(c) The amount and substantiality of the portion used in re-
lation to the copyright work as a whole, and
(d) The effect of the dealing upon the potential market for
5. Are there any defences available to data
or value of the copyright work
miners?
The four factors, which were inserted by the 2012 Amend-
Once copyright infringement is made out, it is for the defen- ment Act, have arguably changed the character of the fair
dant who wishes to escape liability to argue that his activities dealing exception under Malaysian copyright law and align it
fall within the ambit of the statutory exceptions provided by towards the US fair use concept. The fair dealing exception
the Copyright Act 1987. Two exceptions are relevant for con- hinges on the established three-step test, which first emerged
sideration in the context of data mining. These are, first, the at the 1967 Stockholm Conference of the Revision of the Berne
fair dealing exception in section 13(2)(a) and, secondly, the Convention for the Protection of Literary and Artistic Works.
making of a transient and incidental electronic copy of a work The three-step test provides that exceptions and limitations
in section 13(2)(q). to the reproduction right may be made to national copyright
Pursuant to section 13(2)(a), it is a defence to an action for laws in certain special cases provided that the reproduction
copyright infringement if the act is done by way of fair dealing does not conflict with a normal exploitation of the work and
including for purposes of research, private study, criticism, review does not unreasonably prejudice the legitimate interests of the
or the reporting of news or current events. The provision further copyright owner. In contrast, the current fair dealing defence
imposes the requirement that the act is accompanied by an in section 13(2)(a) is an open-ended, flexible provision where
acknowledgement of the title of the work and its authorship fair balance is assessed based on a balancing test of a number
unless it is in connection with the reporting of news or current of factors.
events by means of a sound recording, film or broadcast. Prior Data mining is essentially research oriented because it is
to the Copyright (Amendment) Act 2012, this exception was the computational analysis of data contained in databases. The
only available for acts carried out in respect of any of the fol- research that is carried out can either be for commercial or non-
lowing five purposes: non-profit research, private study, commercial purposes. While it is clear that prior to the 2012
criticism, review and the reporting of current events. Acts done amendment, the research had to be solely for non-profit
for any other purposes cannot benefit from the fair dealing purpose to be considered for the fair dealing exception, the
exception.33 However, after the amendment in 2012, the pre- current provision has removed the non-profit qualifier so that
viously narrow scope of the provision has been broadened. The only the word research is used. This seems to suggest that
insertion of the word including before the prescribed pur- commercial research can also benefit from the fair dealing ex-
poses has the effect of widening the scope of the exception ception. Support for this may be found in the Canadian Supreme
such that it is no longer confined to those five purposes. Court decision in Canada Law Book Inc v Law Society of Upper
Nevertheless, the exact boundary of this exception remains Canada35 which held that the word research in its fair dealing
to be seen because the section appears now to be amenable provision should be given a liberal interpretation and not be
to two possible interpretations. The first, which is a restric- limited to non-commercial or private contexts so as to ensure
tive interpretation, takes the view that although the exception that users rights are not unduly constrained. In that case, it
is no longer tied to the purposes of the act, the ejusdem generis was held that lawyers research for the purpose of advising
rule should apply and, therefore, the purpose of the act should clients, giving opinions, arguing cases and preparing briefs quali-
be of the same general nature or kind as those specified in the fied as research within the fair dealing exception. On the
provision. In other words, the act should have a purpose similar assumption that Malaysian courts follow this line of reason-
in nature to non-profit research, private study, criticism, review ing, all data mining activities, commercially motivated or
or the reporting of news or current events. The second inter- otherwise, can potentially benefit from this exception pro-
pretation, which is more liberal, takes the view that any act vided that they are fair having regard to the factors stated in
can benefit from the fair dealing provision without the need section 13(2A).
to be of a similar nature to the specified purposes. Regard- With regard to the first factor, it is an important consider-
less of whether the restrictive or liberal approach is adopted ation whether the defendants act is for a commercial purpose
by the court, there is still the requirement that the dealing must or non-profit educational purpose. This suggests that data
be fair. In this regard, section 13(2A) spells out four factors which mining that is carried out for non-profit educational purpose
a court should consider when determining whether a dealing is more likely to qualify for the fair dealing defence than that
is fair. These factors are similar to the factors for determin-

33 34
MediaCorp News Pte Ltd & Ors v MediaBanc (JB) Sdn Bhd & Ors [2010] 17 US Code 107.
35
6 MLJ 657. [2004] SCC 13.

Please cite this article in press as: Pek San Tay, Cheng Peng Sik, Data mining and copyright: A bittersweet technology gift for copyright owners and the Malaysian public?,
Computer Law & Security Review: The International Journal of Technology Law and Practice (2016), doi: 10.1016/j.clsr.2016.07.008
ARTICLE IN PRESS
computer law & security review (2016) 7

which is done to achieve commercial ends. Be this as it may, to make a transient and incidental electronic copy of a work
the US Supreme Court in Campbell v Acuff-Rose Music, Inc36 stated that is made available on a network if the making of the copy
that the focus of this factor is on whether the new work merely is required for the viewing, listening or utilisation of the work.
supersedes the objects of the original creation, or whether and This transient copying defence is a new exception which was
to what extent it is transformative, altering the original with inserted in the Copyright Act 1987 through the Copyright
new expression, meaning, or message. The court stressed that (Amendment) Act 2012. According to the Explanatory State-
[t]he more transformative the new work, the less will be the ment in the Copyright (Amendment) Bill 2010,39 the purpose
significance of other factors, like commercialism, that may of this exception is to accommodate fair usage on the inter-
weigh against a fair use.37 In data mining, the output is un- net. This provision therefore legalises activities such as browsing
deniably highly transformative in nature since the results yield or caching. Essentially, it allows cached copies to be made in
statistics, patterns, relations, trends and the like which do not the random access memory of a computer when an internet
at all resemble the original data or content in the databases. user browses a website. The work contemplated by this pro-
On the assumption that Malaysian courts recognise that trans- vision is one that must be made available on a network. This
formative uses may qualify as fair use, the first factor is likely would include online databases which are freely accessible as
to favour data mining especially if it is carried out for non- well as online databases which are only accessible through a
profit educational research. Even if the data mining activity has subscription agreement restricting access through means such
a commercial purpose, the highly transformative nature of the as a password. The exception does not apply to off-line data-
output may overshadow its commercial aim. This would, there- bases in analog or electronic form.
fore, mean that the first factor may weigh in favour of data The Act does not define the terms transient and inciden-
mining activities irrespective of whether they are for com- tal. In the absence of a definition of those terms in the Act
mercial or non-commercial purpose. and in any local case law, guidance may be obtained from ju-
The second factor deals with the nature of the copyright dicial interpretations in foreign jurisdictions. As to what a
work in question. In general, copyright law tolerates a use made transient act is, the Court of Justice of the European Union in
of factual works more than that made of creative works.38 The the case of Infopaq International A/S v Danske Dagblades Forening40
data in databases are often factual in nature and since the scope (hereafter Infopaq 1) held that an act is transient only if its du-
of fair use is wider with respect to factual than non-factual ration is limited to what is necessary for the proper completion
works, it is likely that this factor will also favour data mining of the technological process of which it forms an integral and
activities. essential part, being understood that that process must be au-
The third factor is the amount and substantiality of the tomated so that it deletes that act automatically, without human
portion used. This is likely to weigh against data miners because intervention, once its function of enabling the completion of
the analytics tools will usually copy large parts of the data- such a process has come to an end.41 Thus, to qualify as a tran-
base for analysis purpose so as to ensure accuracy of the output sient copy in the European Union, the duration in which the
results. copy resides in the system is assessed by reference to what
The final factor, which is the effect of the dealing upon the is necessary to complete the technological process.42 Clearly,
potential market for or value of the copyright work, is likely whether a copy which exists for a specified duration is tran-
to favour data mining because it is difficult to conceive that sient or otherwise would depend on each technological process
the trends, patterns, statistics or relations revealed in the output involved.
would in any way compete with the market for or value of the In data mining, it is typically the case that the extracted
database. The output is unlikely to be a substitute for the copy- data that are deposited into repositories for mining opera-
right work and therefore will not undermine the market for tions are not temporary in nature or automatically deleted after
the work. a certain time. This is in contrast to copies in the computers
To surmise, the application of the four factors to data mining, random access memory which are automatically deleted and
on the whole, appears to suggest that data mining activities, replaced by other content when the internet user moves away
whether for commercial purposes or otherwise, are likely to from the website viewed or when the computer is turned off.
qualify as fair dealing and, accordingly, do not amount to copy- On the assumption that the CJEUs decision in Infopaq 1 is fol-
right infringement. There is the additional requirement under lowed by Malaysian courts, a copy qualifies as transient only
section 13(2)(a) that the use of the copyright work must be ac- if it is one where its removal takes place by an automated
companied by an acknowledgement of the title of the work and process without human intervention. Since copies stored in
its authorship. This may be an arduous task especially if there repositories after extraction are not removed automatically but
are multiple databases which are analysed in the mining their removal is dependent on human intervention,43 it does
process. not appear that this provision is capable of providing any ex-
Another exception that appears, at first blush, to be rel- ception to data mining activities.
evant in the context of data mining is the transient and
incidental electronic copy provision under section 13(2)(q). Pur-
suant to the provision, it is not an infringement of copyright
39
Explanatory Statement, Copyright (Amendment) Bill 2010, para
11.
36 40
510 US 569 (1994) at 579. CJEU 16 July 2009, Case C-5/08.
37 41
Ibid. Ibid at para 64.
38 42
Basic Books, Inc v Kinkos Graphics Corp 758 F Supp 1522 (SDNY, Triaille et al., above n. 8 at 43.
43
1991) at 1533. Ibid at 46.

Please cite this article in press as: Pek San Tay, Cheng Peng Sik, Data mining and copyright: A bittersweet technology gift for copyright owners and the Malaysian public?,
Computer Law & Security Review: The International Journal of Technology Law and Practice (2016), doi: 10.1016/j.clsr.2016.07.008
ARTICLE IN PRESS
8 computer law & security review (2016)

With regard to the word incidental which is used in con- then fair dealing provision was tied to the particular pur-
junction with transient in section 13(2)(q), the UK Supreme poses spelt out in section 13(2)(a). In other words, the research
Court in Public Relations Consultants Association Limited v The News- had to be for non-profit purposes in order to benefit from the
paper Licensing Agency Limited and Others44 opined that the fair dealing defence. However, the 2012 amendment has
presence of the word incidental is superfluous because copies changed this with the consequence that the right initially con-
made in the internet cache are necessarily incidental to the ferred on the database owner to control data mining activities
technological process involved. According to the court, tran- which are conducted for commercial purposes has been se-
sient copies retained on the screen or the internet cache are verely constrained by the current fair dealing provision in favour
merely the incidental consequence of the internet users use of data miners regardless of the commercial or non-commercial
of a computer for viewing or browsing. nature of the data mining. Although this may not have been
Section 13(2)(q) also stipulates that the copy must be re- the deliberate intention of Parliament, the implications brought
quired to be made for the viewing, listening or utilisation of about by the changes to the fair dealing provision on data-
the work. While activities such as viewing, browsing, reading, base owners are inadvertently far-reaching.
consulting a webpage, accessing a webpage or listening to a If copyright law is to strike a balance between public access
work have never been within the exclusive right of the copy- to creative work and the protection of this work as the prop-
right owner, the same cannot always be said of the utilisation erty of the copyright owner,47 then the Act must, on the one
of a work. A work may be used in a lawful way or an unlaw- hand, provide for the optimisation of the great potential offered
ful way. For instance, if a work is used for the purposes of by data mining to the public and, on the other hand, economi-
teaching, judicial proceedings, setting of examination ques- cally reward database owners for their effort in creating the
tions or non-profit research, the utilisation is lawful. However, databases so as to encourage rather than hinder the making
if a work on the internet is downloaded by a third party nu- of more databases by others. It is recommended that one way
merous times for the purpose of commercial transactions, the of achieving this balance is to classify data mining activities
use of the work can hardly be said to be lawful.45 In the latter into two categories, corresponding to the primary purpose for
case, the transient and incidental copy exception clearly cannot which they are carried out. The first category comprises data
be invoked to protect the third party. Accordingly, section 13(2)(q) mining which is conducted with the primary purpose of serving
will only apply if the utilisation of the work is lawful. It is there- the public interest in the sense that its main aim is to benefit
fore submitted that section 13(2)(q) will not be available to data the public without any commercial motive or agenda on the
miners to relieve them from copyright infringement. part of the data miner. In the vast majority of cases, such data
mining is carried out for non-commercial research, academic
or education goals which benefit the public and further the
progress of society generally. The second category deals with
6. Concluding thoughts: balancing the private data mining that is driven by the commercial motives of private
and public interests in data mining enterprises to achieve their own financial gains. Data mining
under the first category meets one of the avowed purposes of
As discussed above, the fair dealing provision in section 13(2)(a) copyright law to benefit the public and, accordingly, should be
is likely to provide a defence to data mining activities regard- permitted by the law. Indeed, as discussed above, the fair dealing
less of whether it is for commercial or non-commercial purpose. provision is likely to absolve the data miner from any legal li-
However, as was pointed out by the judge in MediaCorp News ability as the defence would apply to non-commercial research.
Pte Ltd & Ors v MediaBanc (JB) Sdn Bhd & Ors,46 inherent in copy- In contrast, the orientation of the second category of data
right law is the need for a balance between the copyright mining is towards securing financial gains for the data miner
owners right to protect his economic benefit, and the need for himself or his organisation. If Malaysian judges were to follow
society to progress in the development of its creativity and ideas the reasoning in Acuff-Rose48 on transformative uses, this would
without undue restriction. The database owner who has ex- likely mean that data mining carried out for commercial re-
pended a significant amount of time, labour and finance in search will also satisfy the first factor of the fairness test.
designing the database, selecting and arranging the relevant However, to exempt such an activity from copyright infringe-
data for his database would clearly insist on his exclusive right ment is to subordinate the database owners exclusive
to control the reproduction of his database. This is especially reproduction right to that of the data miners personal gain.
so in cases where the data mining activity is carried out by a As such, it is submitted that data mining carried out for com-
competitor who conducts mining operations to further his own mercial purposes should not be exempted from copyright
commercial aims, such as for marketing purposes in order to infringement. Instead, the data miner has to first seek a licence
predict consumers demands or make informed decisions on from the data owner to carry out mining activities.
new products and thereby placing himself in a more advan- The above suggestion essentially restores to the database
tageous position. Prior to the 2012 amendment to the Copyright owner the exclusive right which he had enjoyed prior to the
Act 1987, the database owner had the exclusive right to control 2012 amendment. There is no discernible reason to expropri-
data mining conducted for commercial purposes because the ate that right from him to pave the way for data mining

44 47
Above n. 17 at para 33. Halbert, D., Mass Culture and the Culture of the Masses: A Mani-
45
MediaCorp News Pte Ltd & Ors v MediaBanc (Johor Bharu) Sdn Bhd festo for User-Generated Rights (2009) 11 Vand J Ent & Tech L 921 at
& Ors, above n. 33. 953.
46 48
Ibid at 716 para 195. Above n. 36.

Please cite this article in press as: Pek San Tay, Cheng Peng Sik, Data mining and copyright: A bittersweet technology gift for copyright owners and the Malaysian public?,
Computer Law & Security Review: The International Journal of Technology Law and Practice (2016), doi: 10.1016/j.clsr.2016.07.008
ARTICLE IN PRESS
computer law & security review (2016) 9

activities that are commercially motivated. Indeed, this sug- production of all or a considerable part of the database.52
gestion is in line with the developments in some major However, any person may reproduce the whole or a consider-
countries in the world. For instance, in 2014, the UK Copy- able part of the database in two specified situations, unless the
right, Designs and Patents Act 1988 was amended to, inter alia, reproduction conflicts with a normal exploitation of the
introduce a new exception to copyright infringement for text database.53 The first situation is the use of the database for edu-
and data mining for non-commercial research provided that cation, scholarship or research purposes which are not aimed
access to the database is lawful.49 Pursuant to the new section at making a profit. The second situation is the use of the da-
29A of the Act, the making of a copy of a work by a person who tabase for reporting current news.
has lawful access to the work is not an infringement of copy- It is interesting to note that in the US, courts have held that
right provided two conditions are met. First, the copy is made data mining qualifies under the US fair use doctrine and ac-
in order that a computational analysis of anything recorded cordingly does not amount to copyright infringement. This is
in the work may be carried out for the sole purpose of re- due to the highly transformative nature of digital analytics pro-
search for a non-commercial purpose. Secondly, the copy is cesses where the output in the form of new knowledge does
accompanied by a sufficient acknowledgement unless this not supplant or substitute the market for the original data-
would be impossible for reasons of practicality or otherwise. base. For instance, in the decision of the US Court of Appeals
In Japan, the legislature has adopted a more liberal stance by for the Second Circuit in The Author Guild v Google, Inc,54 it was
permitting data mining generally unless the database was held that the Google Books Library Project where Google
created with the aim of data analysis. The Japanese copy- scanned millions of books into a searchable database that
right statute provides that it is not an infringement of copyright enable researchers to carry out data and text mining did not
to reproduce a database for the purpose of information analy- amount to copyright infringement because it qualified as trans-
sis conducted on a computer unless the database is made formative fair use. 55 This demonstrates that different
specifically for the purpose of using it for data analysis jurisdictions have adopted different responses to data mining,
purpose.50 In Korea, the copyright provision that has an impact largely because of the merits and demerits of data mining. In
on data mining is more restrictive.51 The copyright legislation Malaysia, in light of the emphasis given to balancing the two
confers on the database producer the right to control the re- competing interests, which are the private rights of copy-
right owners and the broader public interest to have access to
the copyright work, it is submitted that the proposed recom-
mendation would best achieve this purpose.
49
Copyright, Designs and Patents Act 1988, section 29A, as
amended by Regulation 3(2) of The Copyright and Rights in Per-
52
formances (Research, Education, Libraries and Archives) Regulations Ibid, Article 93-1.
53
2014. The Regulations came into force on 1 June 2014. Ibid, Article 94(2).
50 54
Copyright Act (Act No. 48 of May 6, 1970, as last amended by Docket No. 13-4829. The decision was delivered on 16 October
Act No. 63 of December 3, 2010), Article 47septies. 2015.
51 55
Copyright Act 1957, Chapter 4 on Protection of Database Pro- An appeal by The Authors Guild was turned down by the US
ducers. Supreme Court on 18 April 2016.

Please cite this article in press as: Pek San Tay, Cheng Peng Sik, Data mining and copyright: A bittersweet technology gift for copyright owners and the Malaysian public?,
Computer Law & Security Review: The International Journal of Technology Law and Practice (2016), doi: 10.1016/j.clsr.2016.07.008

You might also like