You are on page 1of 32

1

Data Subsetting and Masking: Advanced


Techniques for Test Data Management
2

Agenda

Costs of production data in test environments


Application data model and data discovery
Data subsetting
Data masking
PeopleSoft data masking: a case study by Cornell U.

Agenda

Costs of production data in test environments


Application data model and data discovery
Data subsetting
Data masking
PeopleSoft data masking: a case study by Cornell U.

2/3rd of Sensitive and Regulated Data Resides in


Databases
and growing exponentially

BUT IS IT SECURE?

1,800 Exabytes
Source: IDC, 2008

2011
5

Challenges in Setting up Test Systems


Cannot use
sensitive data in
test without
obfuscation

Full production
copies for test
systems not cost
effective

Producing
relationally intact
subset is hard but
necessary

Error-prone,
manual process

Test System
Setup

Complexity of Enterprise Applications

Financials

Supply Chain
Management

Human Capital
Management

Customer
Relationship
Management

What is the relationship between tables across the various applications?


What are the data extraction rules that produces referentially intact subsets?
Will the subsetted data fit into the available storage for test systems?
What are the various types of sensitive data used across the applications?
Where are these sensitive data stored and how are they related to each other?
7

Test Data Management


Solutions
Sensitive
Data
Identification

Data
Relationship
Modeling

Data
Subsetting

Test
System
Setup

Data
Masking

Agenda

Costs of production data in test environments


Application data models and data discovery
Data subsetting
Data masking
PeopleSoft data masking: a case study by Cornell U.

Data Discovery and Modeling


Application Data Models

Sensitive
Data
Identification

Data
Relationship
Modeling

Data
Subsetting

Data Masking

Test
System
Setup

Scans application schemas to model relationships


between tables and columns
Extract data relationships from Oracle Applications
meta-data
Oracle eBusiness Suite
Oracle Fusion Applications

Store referential relationships stored in repository


Enables test data operations such as data
subsetting, masking

10

Data Discovery and Modeling


Sensitive Data Identification

Sensitive
Data
Identification

Data
Relationship
Modeling

Data
Subsetting

Test
System
Setup

Data
Masking

Sensitive data discovery


Pattern-based database scanning
Import from pre-built mask templates

Data Masking Templates for Oracle Applications


Fusion Applications

11

Sensitive Column Discovery


Discover the
sensitive column
using patterns
Create Sensitive
Column Types using
Regular Expression
to define the pattern
for column name,
comment, or data
Run a discovery job
to scan the database
and discover the
sensitive columns

Import from
masking templates

Set columns as
sensitive Manually

Import the masking


templates for Oracle
Applications such as
Fusion Applications
Sensitive columns
will be automatically
tagged

Add columns directly


in the sensitive
column list

12

Agenda

Costs of production data in test environments


Application data models and data discovery
Data subsetting
Data masking
PeopleSoft data masking: a case study by Cornell U.

13

Sensitive
Data
Identification

Data Subsetting
What?

Production

Data
Relationship
Modeling

Data
Subsetting

Test
System
Setup

Data Masking

A relationally intact and yet fractional


representation of production data for test
and development purposes

Why?
Reduce the storage overhead created by
production data copies in various
z
application
environments
Allow developers to perform real world
application development by using
production-class data

Application data

Application
metadata

Subset criteria:
REGION = NORTH AMERICA
AND FISCAL_YEAR = 2009

Test

Application data

Application
metadata
14

Select Applications for Subset

Financials

Supply Chain
Management

Human Capital
Management

Sensitive
Data
Identification

Data
Relationship
Modeling

Data
Subsetting

Test
System
Setup

Data Masking

Customer
Relationship
Management

Identifies the tables associated with the application(s)


Transfers relationships between tables from ADM into Subset Definition
Transfer additional metadata from the ADM into Subset Definitions
Constraints, Indexes

15

Define Subset Parameters


Time
(FY:2011)

Dimension
(Region:Asia)

Sensitive
Data
Identification

Data
Relationship
Modeling

Data
Subsetting

Test
System
Setup

Data Masking

Analyzes dependencies between


tables to select the subset driver
tables by traversing down and across
the ADM
Defines extraction (WHERE) clause
to select the rows for each of the
tables that will included in the subset
Analyzes table statistics to estimate
size of the database generated from
the specified subset parameter

Space
(Size:10%)
16

Data Subsetting - High Performance Execution

Production

Export =
Writing subset data
via DataPump

Clone
Production

Import
Test

Datapump
Export file
In-Place subset =
Deleting data in
the same database
Database size

Subset size

Test
Time*

Data Pump method

1 Terabyte

200G (20%) 1 hour 8 minutes

Clone and delete

1 Terabyte

200G (20%) 5 hours 49 minutes

*2-nodes Intel Xeon 6-core X5675 Processor w/ 216G memory running OEL 5.5

17

Data Subsetting: End to End Process


Production

HR.EMPLOYEES
NAME

HR.EMPLOYEES
NAME

JOB_ID

SALARY

AGUILAR

SA_MAN

40000

BENSON

SA_REP

60000

JOB_ID

JOB

Min_SAL

SA_MAN

Sales Mgr

10000

SA_REP

Sales Repres

20000

Create
Application
Data Model

HR.EMPLOYEES
NAME
AGUILAR

JOB_ID

SALARY

SA_MAN

40000

HR.JOBS

HR.JOBS

JOB_ID

JOB

Min_SAL

SA_MAN

JOB

Min_SAL

Sales Mgr

10000

Table rule: Min_Sal < 20,000

Create Data
Subset
Definition

Schemas
Tables
Relationships collected

Test/Staging

SALARY

Table rule: Salary< 60,000


JOB_ID

HR.JOBS

JOB_ID

Create Test
Database

EM

Extract Data
Subset:
2 methods

Extract and import

Clone and delete

Schemas
Tables
Relationships retrieved

18

Agenda

Costs of production data in test environments


Application data models and data discovery
Data subsetting
Data masking
PeopleSoft data masking: a case study by Cornell U.

19

Sensitive Data
Identification

Secure Test System Deployment

Data
Relationship
Modeling

Data Masking
Production
LAST_NAME

Test

Data
Subsetting

Data Masking

Test
System
Setup

SSN

SALARY

LAST_NAME

SSN

AGUILAR

203-33-3234

60,000

SMITH

111-23-1111

60,000

BENSON

323-22-2943

40,000

MILLER

222-34-1345

40,000

SALARY

Deploy secure test system by masking sensitive data


Extensible template library and policies for automation
Sophisticated masking: Condition-based, compound, deterministic
Integrated masking and cloning
NEW in EM 11g: Heterogeneous Data Masking
NEW in EM 11g: Pre- and Post-mask commands and command line (EMCLI) support
NEW in EM 12c: Data Masking integration with Real Application Testing
NEW in EM 12c: Key-based reversible masking
20

Oracle Data Masking


Comprehensive and Extensible Mask Library

Mask formats for common sensitive data

Accelerates solution deployment of masking

Extensible mask routines

Enables customization of business rules

Define once, apply everywhere

Ensures consistent enforcement of policies


21

Oracle Data Masking


Sophisticated Masking Techniques
Conditionbased
Masking

Compound
Masking

Compound Mask
Sets of related columns masked together e.g. Address, City, State, Zip, Phone

Condition-based Masking
Specify separate mask format for each condition, e.g. drivers license format for each state
SQL-expression based masking

Use SQL functions, e.g. UPPER, SUBSTR, TO_CHAR, to generate mask values, e.g.
SUBSTR(%ORIG_VALUE%,1,3)||111-1111

22

Heterogeneous Data Masking


Oracle Databases
Production
(Oracle)

Data
Relationship
Modeling

Non-Oracle Databases
Production

monitor

(non-Oracle)

Database
Gateway
manage

Staging
(Oracle)

Staging
(Oracle)

manage

Database
Gateway
manage

Test
(Oracle)

Test
(non-Oracle)

Available for IBM DB2, Microsoft SQLServer, Sybase

monitor

Data
Subsetting

Data
Masking

Test
System
Setup

Enterprise Manager Cloud Control


with Data Masking

Enterprise Manager Cloujd Control


with Data Masking

manage

Sensitive
Data
Identification

23

Integrating ODI using EMCLI & pre- post- mask


XML masking
with Oracle
Data
Integrator

Flat file
masking with
Oracle Data
Integrator

Incremental
masking using
Oracle Data
Integrator

Pre-Mask

Read XML and DTD


into tables

Reverse engineer
flat file into tables

Listen for journaled


records and write to
tables

Mask

Mask using Oracle


Data Masking

Mask using Oracle


Data Masking

Mask Using Oracle


Data Masking

Post-Mask

Write masked data


back to XML from
tables using DTD

Write masked data


back to flat file from
tables using reverseengineered structure

Write masked
incremental
transactions to Test
DB

24

Key-based Reversible Masking


New in EM 12c
Mask
Data format is preserved
Any pattern and any length
Numeric, alphanumeric or
mixed
Deterministic
Unique
Key-based

Unmask
Reverse the masked data back
to its original value with the
same key

25

Secure Database Testing


New in EM 12c

End-to-end testing with real workloads


Create Test
System

Capture
Workload

Replay
Workload

SQL Performance Analyzer

Deploy Replay
Clients

Database Replay

SQL unit testing for response time


Identify and tune regressed SQL
Integrated into SQL Tuning Advisor and
SQL Plan Baseline

Load, performance testing for throughput


Remediate application concurrency problems
Integrated with Oracle Application Testing
Suite for superior comprehensive testing
solution

Seamless integration with Data Masking to preserve data privacy compliance

26

Masking Real Application Testing Workloads


Real Application Testing Integration with Data Masking

Copying production data to test systems puts sensitive


information at risk
Perform secure, production-scale testing
Sensitive data found in workload capture files and STS are
masked along with application data
STS bind data (used with SPA)
Workload Capture files (used with DB Replay)
AWR sensitive bind data is purged

Consistent masking across all data sources and workloads

27

Masking Internals of RAT workloads


Real Application Testing routine called from data
masking script to Initialize
Sensitive columns are introspected from masking
definition
All identified sensitive binds in STS are extracted and
placed into masking staging tables
All identified sensitive binds in Database Replay
capture files are extracted and placed into masking
staging tables
Build Mapping tables containing original sensitive and
masked values using masking routines

28

Masking Definition Compatibility and


Restrictions
SQL Expression and Conditional format masking in masking
definition not compatible
Database Replay capture files and SQL Tuning Sets may not
contain entire data set
Conditions may be based on values outside the workload
Masking of literals in Database Replay capture files and SQL Tuning
Sets is not supported
It is not a best practice to store sensitive data as literals.

29

Question & Answer

30

31

32

You might also like