You are on page 1of 36

Essentials for Test Data Management

2009 IBM Corporation

Agenda
Drivers for Effective Test Data Management (TDM) Effective Test Data Management Test Environment Creation Data Masking Considerations Editing Test Data Compare Refreshing Test Environments IBM Optim Q&A

2009 IBM Corporation

The Challenge
Production Production 500GB 500GB 500GB 500GB 500GB 500GB 500GB 500GB 500GB 500GB 500GB 500GB

2 3 1 6 4 5
3

Training Training Unit UnitTest Test System SystemTest Test UAT UAT Integration Integration

Total Total

3 3TB TB

2009 IBM Corporation

Test Data Management (TDM): What & Why?


What?
TDM refers to the need to manage data used in testing and other nonproduction environments Extract related subsets of production data that are targeted to functionality under test Edit data to create error and boundary conditions De-identify (mask) test data to protect privacy Compare before and after images of test data

Why?
Your business can deploy new/improved enterprise applications faster without sacrificing quality increase revenue generation Your business can benefit from using IT resources more effectively reduce costs Your company can implement a reliable database upgrade ensure positive customer experience

2009 IBM Corporation

Data Privacy Considerations


Organizations need the ability to de-identify, mask and transform sensitive data Companies can apply a range of transformation techniques to substitute customer data with contextually-accurate but fictionalized data to produce accurate test results By masking personally-identifying information, you protect the privacy and security of confidential customer data, and support compliance with local, state, national, international and industry-based privacy regulations

2009 IBM Corporation

What If I Dont Do Anything?


Infrastructure Costs higher storage costs
Cloning databases requires more storage hardware Larger databases could mean more license costs

Higher staff costs


Greater data volumes take longer to clone Greater data volumes equates to longer test cycles

Defects can be expensive


Costs to resolve defects in production can be 10 100 times greater than those caught in the development environment

Privacy breaches

2009 IBM Corporation

The Symptoms of Poor Testing Strategies


Management notices that new application functionality is delayed three months The business is unable to compete for customers because their software lacks state-of-the-art functionality The CFO is complaining over how high the IT budget has become to fix application defects Developers are sitting around waiting for their copy of the database to work with

6/19/2009

2009 IBM Corporation

TDM: Benefits to Key Stakeholders

CIO
Speed-time-to-market without sacrificing quality. Ensure consistent testing methodologies and reduce costs. Minimize threat of data breach.

VP, Line of Business


Ensure a reliable, positive customer experience. Sustain or react to competitive situations quickly. Provide customers with sense of security.

Director of IT
Populate realistic test data to improve testing and quality. Streamline testing processes for optimal environment. Consistent methodology for privatization of data.

2009 IBM Corporation

Effective Test Data Management

2009 IBM Corporation

Test Data Management Building Blocks


Create/Modify Create/Modify Application Application CreateTest Test Create Environment Environment Privatizationof of Privatization PersonalInformation Information Personal Inspectand andEdit EditData Data Inspect toTest TestError ErrorRoutines Routines to Correct Errors Errors in in Correct Production Data Data Production

1 2 3 5
RefreshTest TestData Data Refresh

ArchiveOld OldData Data Archive

TEST Go Production !

4
10

CompareBefore/After Before/After Compare Data Data


2009 IBM Corporation

Environment Creation: Some Current Practices


#2 Write SQL

#1 - Clone Production

Clone Production Request for Copy Wait


After Production Database Copy Production Database Copy

Write SQL

Extract Extract

Complex Subject to Change

Changes
Extract RI Accuracy? Right Data? After Expensive, Dedicated Staff, Ongoing Responsibility

Changes

Manual examination: Right data? What Changed? Correct results? Unintended Result? Someone else modify?
11

Share test database with everyone else

2009 IBM Corporation

1What is Subsetting?
Production or Production Clone Development Environment

Create targeted, right-sized test environments instead of cloning entire production environments Development environments are then more manageable, speeding the testing process!

Test Environment QA Environment Training Environment

12

2009 IBM Corporation

1Testing Best Practices Oracle


Tip #27Test with a Representative Subset of Production Data
When performing the development upgrade, it is important to leverage a representative subset of production data instead of an exact copy; this is because the development environment usually has less capacity in both memory and hard drive space than the test and production environments. Limiting the size of the conversion files during the development upgrade will better ensure that the processes will complete in a timely manner.

13

2009 IBM Corporation

1 Test Environment Creation Using Subsetting


Production Environment Baseline Subset

Extract/ Archive File

Test
(DB2 LUW/ AIX)

Dev
(Oracle/ Solaris)

Dynamically load relational intact data sets & objects based on selection criteria
14

QA
(Sybase/ Linux)

2009 IBM Corporation

Creating the Baseline Subset

Production Environment

Baseline Subset

2 Common Approaches: Clone production and truncate transactions Extract and seed common set up data

15

2009 IBM Corporation

Extracting a Subset Using Templates


Criteria can be based on one or more modules All Date Values Create Date Transaction Date Effective Date Organizations Status Order number(s) And/Or combinations More.

16

2009 IBM Corporation

Ensure Referential Integrity in Subset


Complete Business Object

17

2009 IBM Corporation

2 Data Masking
Also known as: data de-identification, depersonalization, desensitization, obfuscation, data scrubbing Technology that helps conceal real data Scrambles data to create new, legible data Retains the data's properties, such as its width, type and format Common data masking algorithms include random, substring, concatenation, date aging Used in non-production environments as a Best Practice to protect sensitive data
18
2009 IBM Corporation

Data Privacy General Principles


Do What is Needed But Not more Balance Costs vs. Data Breach Risks. Identify Company Best Practices Designate an internal champion Meet Regulatory/Legal Needs Government Regulations/Internal Privacy Policies Understand Application and Business Requirements Developers should be debugging the test application not the test data. Data should be masked appropriately and consistently in the application Volume of Data Independent Test Environments Use smaller test beds of data for frequent refreshes
19
2009 IBM Corporation

2 Data Masking Techniques


String literal values Character substrings Random/sequential numbers
Example Example 1 1 Client ClientInformation Information
123456 SSN 333-22-4444 Client ClientNo. No. 112233 SSN 123-45-6789 Erica Schafer Winters Name Name Amanda 12 Bayberry Murray Court Drive Address Address 40 City State TX Austin City Elgin State IL Zip 78704 Zip 60123

Arithmetic expressions Concatenated expressions Date aging


Example Example 2 2 Personal Info Table
PersNbr 10000 08054 10001 19101 10002 27645 FirstName Jeanne Alice Claude Carl Pablo Elliot

Lookup values Intelligence

LastName Renoir Bennett Monet Davis Picasso Flynn

Data is masked with contextually correct data to preserve integrity of test data

Referential integrity is maintained with key propagation

Event Table
PersNbr 10002 27645 10002 27645 FstNEvtOwn LstNEvtOwn Pablo Picasso Elliot Flynn Pablo Picasso Elliot Flynn
2009 IBM Corporation

20

Browse and Edit Test Data


You must be sure that all logic paths are tested BUT Your production data may not contain all the needed test cases Errors Boundary conditions Unusual combinations of data

21

2009 IBM Corporation

3 Editing Test Data


Browse or edit referentially intact sets of data, from multiple related tables, simultaneously on one screen Create data values to test program logic Inspect and correct data that is causing problems Verify execution results Dynamically join related tables and views, synchronously scroll related data, and edit the data displayed. Boundary conditions Error conditions Rare combinations of data.
22
2009 IBM Corporation

Comparing Data
Compare the "before" and "after" data from an application test Compare results after running modified application during regression testing Identify differences between separate databases Audit changes to a database Compare analyzes complete sets data finding changes in rows in tables Single-table or multi-table compare Creates compare file of results Displays results on screen

23

2009 IBM Corporation

Analyzing Test Data Results


Version 1 INVOICES
27645 86-4538 Widget#1 27645 86-4538 Widget#PG13 Invoice Total $80.00 $20.00 $100.00

Both Invoices total $100 Composition is different Could we have missed an error?

Version 2 INVOICES
27645 86-4538 Widget#1 27645 86-4538 Widget#PG13 Invoice Total $50.00 $50.00 $100.00

24

2009 IBM Corporation

Browsing the Compare File

Generated for each pair of tables Identifies tables containing unmatched rows Identifies tables containing duplicate match keys

25

2009 IBM Corporation

View Details of Discrepancies

26

2009 IBM Corporation

Easily Refresh Test Environments

Production Environment Baseline Subset

Extract/ Archive File

Test
(DB2 LUW/ AIX)

Dev
(Oracle/ Solaris)

Dynamically load relational intact data sets & objects based on selection criteria
27

QA
(Sybase/ Linux)

2009 IBM Corporation

Additional TDM Features to Consider


Compare Pre and Post Mask Extract File Browsing Schedule Jobs Command Line Interface Federated Data Access MetaData Extracts Re-startability

28

2009 IBM Corporation

Benefits of Test Data Management


Efficient creation and management of test environments Environment size reduction Replication time reduction Fewer users per environment through a segmented test process. Increase accuracy of testing through fresher Data Reduced time to conduct the tests More parallel testing possible Reusable tool and methodology Reduced risk to test data Reduced volume of exposed data. Reduced value of exposed data via masking Increased regulatory compliance Reduced risk of legal exposure
29
2009 IBM Corporation

Why Do Something? TDM Saves Money

Leading North American Financial Institution

Eliminated downtime associated with rebuilding test environments savings of up to $250,000 per year. Achieved more than $100,000 annual savings collectively for 10 to 15 projects.

Large International Financial Services Group


Reduced the time needed to create a test environment by up to 90% (from 20 days to just 2 days). Improved time-to-deployment of new application functionality, contributing to critical business/financial initiatives.

Leading Banking & Payment Technology Solutions


Reduced operational cost and improved efficiencies by reducing the size of test database from 1.2TB to 24GB
30
2009 IBM Corporation

Questions?

31

2009 IBM Corporation

Thank You

32

2009 IBM Corporation

About IBM Optim


Proven leader in Integrated Data Management (IDM):

Manage and Control Data Growth Data Retention, Compliance & Discovery Speed Application Delivery & Quality with Test Data Management Speed Application Upgrades & Migrations Application Retirement Improve Storage Management ILM Improve Application Performance and SLAs
Solving complex data management issues since 1989 Global company: 2500 clients; 50% of Fortune 500 Recognized by Gartner, IDC, META as EDM industry leader with 46% market share.
33
2009 IBM Corporation

Optim Solves the IDM Challenge


Archiving

Improve performance Control data growth, save storage Support retention compliance Streamline upgrades

Test Data Management

Create targeted, right sized test environments Improve application quality Speed iterative testing processes
Data Privacy

Mask confidential data Comply with privacy policies


Application Migration & Retirement
34
2009 IBM Corporation Maintain referential integrity

IBM Optim: Enterprise Architecture


IBM Integrated Data Management
Database Design, Development & Administration, Data Growth, Data Privacy, Test Data Management, Application Upgrades & Retirements, Data Retention & E-Discovery

35

2009 IBM Corporation

Trademarks and disclaimers


Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries./ Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind. The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of nonIBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography. Photographs shown may be engineering prototypes. Changes may be incorporated in production models. IBM Corporation 1994-2008. All rights reserved. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.

36

2009 IBM Corporation

You might also like