You are on page 1of 25

Seeking Data Quality

Using Agile Methods to Test a Data Warehouse

Copyright Ideaca 2008

Agenda Seeking Data Quality


Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions

Copyright Ideaca 2008

Agenda Seeking Data Quality


Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions

Copyright Ideaca 2008

What is a Data Warehouse?


A non-transactional data repository Integrates data from multiple sources Organized around relevant subjects Queryable by business users Used for reporting Used for analysis

Copyright Ideaca 2008

The Structure of a Data Warehouse


Kimballs Star Schema

Copyright Ideaca 2008

The Flow of Data


Typical data flow

Copyright Ideaca 2008

Agenda Seeking Data Quality


Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions

Copyright Ideaca 2008

The Value of a Data Warehouse


To provide information that will help people make better choices This information is a solution to the problem of making choices in a complex environment The benefit of the information is that it reduces risk by providing an accurate representation of the state of the world This comes at the cost of building and maintaining the data warehouse now and into the future

Copyright Ideaca 2008

Data Value Drivers


Our research led us to these value drivers:
The more accurate the data is, the more useful it is, and therefore the more valuable it is The value of data increases when combined with other data The value of data increases with its use; in fact is only has value when people use it

Focus on high risk problems using limited resources Emphasis on Data Quality
Relevance Completeness Correctness Consistency

Copyright Ideaca 2008

Agenda Seeking Data Quality


Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions

Copyright Ideaca 2008

10

Agile Principles as Guides


Testing is a process of investigation and evaluation Customer involved in deciding test relevance Customer involved in deciding test priority Communication of test goals and approach Simple and lightweight test scripts Avoid effort on low value tasks

Copyright Ideaca 2008

11

Agenda Seeking Data Quality


Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions

Copyright Ideaca 2008

12

Test Strategy Outline


Data Warehouse Test Targets
Stars are the business view of a data warehouse Stars are comprised of a Fact and its Dimensions Fact and Dimension tables are loaded through ETLs

Each target had a similar test approach The test backlog was a prioritized list of these tests Detailed test scripts are expensive to produce Our scripts outlined a guided exploration Progress could be measured through a burndown chart Regulatory requirements needed to be met

Copyright Ideaca 2008

13

Business View of a Data Warehouse


Testing progress reported on the basis of stars

Copyright Ideaca 2008

14

Agenda Seeking Data Quality


Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions

Copyright Ideaca 2008

15

Tests
We tested for completeness
No missing records No missing fields

We tested for correctness



Correct keys Correct calculations Correct aggregations Correct data type/size


Consistent aggregations Consistent calculations Consistent data type/size Consistent granularity Consistent business rules Consistent use of nulls and defaults Consistent formatting

We tested for consistency

Copyright Ideaca 2008

16

Test Points
Test every ETL, Fact, and Dimension

Copyright Ideaca 2008

17

Agenda Seeking Data Quality


Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions

Copyright Ideaca 2008

18

Test Results
Greater than 99.99995% data accuracy Testing less than 20% of development effort Common scripts, common understanding

Copyright Ideaca 2008

19

Root Cause Analysis


Defects Classified by root cause Cause Defect %

Development Standards Issues 23% Implementation Errors ETL Errors 22% 21%

Database Issues
Design Issues

13%
9%

Other Issues

12%

Copyright Ideaca 2008

20

Defect Roots Causes


Cause Development standards issues Cause Breakdown Naming conventions Design standards Documentation standards Metadata Implementation errors Primary/foreign key problems Inconsistent field lengths Field types

Bad data
Missing data ETL errors Counts off Totals off

Failed calculations
Failed conversions Unpopulated fields
Copyright Ideaca 2008 21

Defect Roots Causes - continued


Cause Database errors Cause Breakdown Performance Indexes Partitions Tablespace Design issues Missing fields Extra fields Missing dimensions Mapping problems All other issues Miscellaneous

Copyright Ideaca 2008

22

Agenda Seeking Data Quality


Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions

Copyright Ideaca 2008

23

Conclusions
Value based approach focused our test efforts to find more serious problems sooner Applying agile principles allowed us to minimize wasted time and effort Testing identified development process changes that had the greatest impact on data quality New regulatory requirements mean that the ability to test is now a design issue

Copyright Ideaca 2008

24

Summary Contrasting Test Styles


Old Approach New Approach

Focus on tool database, data warehouse Focus on process tables, views, stored procedures Test plans
Test cases Detailed scripts for instructions

Focus on value data usage in business context Focus on outcome stars/dimensions/facts Test backlogs
Test targets Light scripts as guides for exploration Team communication is vital

No special emphasis on team communication

Copyright Ideaca 2008

25

You might also like