Professional Documents
Culture Documents
Executive summary
Test Data Management (TDM) is about the provisioning of data for non-production environments, especially for test purposes but also
for development, training, quality assurance,
demonstrations or other activities. Test data
has always been required to support application development and other environments
but, until the relatively recent advent of TDM,
this has been achieved in an ad hoc manner
rather than in any formalised or managed way.
The predominant technique has been copying
some production data or cloning entire production databases.
Copying data or cloning production databases
has a number of drawbacks. To begin with:
how do organisations manage costs? Copying
production data or cloning production databases is largely a manual process. Moreover
it requires a whole new set of hardware and
software, along with the licenses to match,
and must be duplicated across each testing
environment. As a result you can easily end
up with multiple copies of the same database
across multiple projects in development.
Alternatively, you can have multiple development teams sharing the same test database
but sharing means that there is often contention for resources, resulting in extended
delivery schedules. Indeed, getting access to
the right data at the right time can be a major
issue regardless of whether you share test
data or have multiple test databases. It takes
time to generate (copy or clone) new datasets
because databases administrators, who typically complete this task, have other priorities.
Test teams often have to wait several days, or
even longer, to get test data. In agile environments, in particular, manual copying or cloning can slow development.
Unfortunately, there are additional challenges
with copying production data or cloning production databases. If the application processes any
sensitive data, such as personally identifiable
information (PII), personal health information
(PHI), credit card information, national identifiers or any company confidential information,
then it will be necessary to protect this data
so that developers or testers cannot see it. In
most cases, simply copying production data or
cloning the production database is not legally
viable since sensitive data is likely to be exposed. It is best to apply an appropriate data
protection technique such as data masking
which we will discuss later in this paper.
break whereas normal results do not. Therefore it is important that outliers be properly
tested. If you are using sub-setting then you
need to ensure that outliers are captured and
represented within the sub-setting process.
This means using a solution that can capture
the full range of production data rather than
simply randomly picking some sub-set of the
data.
An issue that occurs with both full and partial database copies or sub-setting directly
from production is that sensitive data may
be exposed. This sensitive data needs to be
protected as mandated by data privacy laws or
compliance requirements. De-identifying test
data is not a simple process and requires an
understanding of what data may be hidden.
You need to first discover where sensitive
data resides, classify and define datatypes,
and determine metrics and policies to ensure
protection over time. Data can be distributed over multiple applications, databases and
platforms with little documentation, and it is
often the case that relevant information is built
into application logic so that both implicit and
explicit relationships exist. This is a danger
because organisations may rely too heavily on system and application experts who are
familiar with the latter but not the former. In
practice, finding sensitive data and discovering
data relationships requires careful analysis.
Data sources and relationships should be
clearly understood and documented so no
sensitive data is left vulnerable. Only after
understanding the complete landscape can
you define proper enterprise data security and
privacy policies and, while this can be done
manually, it is an onerous process that is better automated.
So how do you protect the data? Typically, TDM
solutions provide masking capabilities. How
the masking is accomplished is somewhat
dependent on why you are doing the masking.
For example, you could simply hide a credit
card number by replacing each digit with an x
(xxxx-xxxx-xxxx-xxxx). This approach is fine if
you are only concerned with data protection.
However, if you want to test a payment application then you will need to work with contextually accurate numbers. Similarly, simple
shuffling techniques (for example, replacing
zip code 12345 with 54321) will not work if your
application requires a valid zip code. For test
data management you will need to mask in
such a way that the data remains valid.
Figure 2: Test flow showing integration between Rational and InfoSphere Optim components
Conclusion
According to NIST, Planning Report 2002-2003: the Economic Impacts of
Inadequate Infrastructure for Software Testing the average testing team
spends between 30 and 50% of their time setting up test environments
rather than on actual testing and the estimated number of projects with
significant delays or quality issues is 74%. Further, according to Capers
Jones, Software quality 2011 A Survey of the state of the art, (http://www.
sqgne.org/presentations/2010-11/Jones-Nov-2010.pdf) poor software
quality costs $150+ billion per year in the US and over $500 billion
worldwide.
These figures very strongly suggest the need for better testing and software quality. Automated test data management represents a major step
in that direction. TDM improves the quality and accuracy of testing and
supports agile development where errors are caught early, fixed more
quickly and results in fewer defects for the project as a whole. Building
test data management into your development process will lower your
risk of late delivery penalties and improve customer satisfaction. By implementing intelligent sub-setting and data masking, organisations can
reduce storage and software costs. In our view, a formalised test data
management discipline provides a key strategic advantage compared to
traditional ad hoc methods.
Further Information
Further information about this subject is available from
http://www.BloorResearch.com/update/2138
2nd Floor,
145157 St John Street
LONDON,
EC1V 4PY, United Kingdom
Tel: +44 (0)207 043 9750
Fax: +44 (0)207 043 9748
Web: www.BloorResearch.com
email: info@BloorResearch.com