You are on page 1of 22

ETL Testing Perspective

Agenda

Introduction to ETL What is ETL? ETL Tools ETL Testing


Testing Phases Testing Type Benefits of ETL Testing Challenges in ETL Testing

Things to verify during ETL Testing Live Demo Questions

Introduction to ETL

We all know importance of data in todays world, on the other words data are backbone of any organization. Most of the organization runs their businesses on the basis of collection of data for strategic decision- making.

As the volume of data are growing noticeably and increase in number of data sources made it desirable for a scope of convenient way to integrate data across the different application using various data sources. In order have seamless integration between different data sources and application using them process called ETL is developed and is one of the convenient way to integrate, so that data flow are more accurate and efficient. ETL have made it easy and prove to be the one the reliable process of data transformation based on the business rules.

Introduction to ETL Contd

ETL stands for extract, transform, and load. It can consolidate the scattered data for any organization while working with different departments. It can very well handle the data coming from different departments.
ETL can transform not only data from different departments but also data from different sources altogether. For example, any organization is running its business on different environments like SAP and Oracle Apps for their businesses. ETL can take these two source system data and make it integrated in to single format and load it into the tables. The need to use ETL arises from the fact that in modern computing business

data resides in multiple locations and in many incompatible formats.

What is ETL

The physical process of extracting data from a source system, transforming the data to the desired state, and loading it into a database
EXTRACT:- The first step in the ETL process is extracting the data from various sources. Each of the source systems may store its data in completely different format from the rest. The sources are usually flat files or RDBMS, but almost any data storage can be used as a source for an ETL process.

What is ETL

TRANFORM:-Once the data has been extracted and converted in the expected format, its time for the next step in the ETL process, which is transforming the data according to set of business rules. The data transformation may include various operations including but not limited to filtering, sorting, aggregating, joining data, cleaning data, generating calculated data based on existing values, validating data, etc.
LOAD:- The final ETL step involves loading the transformed

data into the destination target, which might be a database or data warehouse.

Sample ETL Process Flow

ETL Tools

Many of the biggest software players produce ETL tools, including IBM, Oracle and of course Microsoft. Below are list of ETL tool:

Informatica PowerCenter Microsoft SQL Server Integration Service AB Abinito Hyperion Business Object Data Integrator (BODI) Oracle Warehouse Builder IBM WebSphere DataStage In some of the cases we sometime invoke ETL process through some command line scripts directly working in UNIX box that is without any tool.

ETL Testing

Testing is an important phase in project life cycle, a structured well defined testing Scope ensures smooth transition of the project. An organization gains the real confidence once the ETL process are verified and validated by independent group of experts. During ETL application testing we test for the following:

Data Transformation: Ensures that all data is transformed correctly according to business rules and design specification. Data Completeness : Ensure that all expected data is loaded with any data loss and without any truncation. Data Quality: Ensures that ETL application correctly rejects, substitutes default values, corrects or ignores and reports invalid data. Performance and Scalability: Ensures that data loads and queries performs within expected time frames and that the technical architecture is scalable.

ETL Testing Phases

ETL testing is no different than a other testing and comprises of following phases:

Business and Requirement understating Test Estimation Test Planning Test Case Creation Test bed and environment set-up Receiving data file from developer Test case execution Comparing the expected result with actual results by testing the business rules. Test closure.

ETL Testing Process Flow

Testing Types

Talking broadly ETL testing is done by writing SQL queries against source & target databases. These SQL are included in test steps of each test cases. Below are the some of the methodology that can be considered to perform effective ETL testing:

Application Testing Data-Centric Testing Technical Testing Business Testing Reconciliation

Application Testing

Application testing is base and is important irrespective of what testing we are going to perform.

Data-Centric Testing

Data-centric testing revolves around testing quality of the data. The objective of the data-centric testing is to ensure valid and correct data is in the system. Following are the couple of reasons that cause the requirement of performing data-centric testing:

ETL Processes/Data Movement: When you apply ETL processes on source database, and transform and load data in the target database
System Migration/Upgrade: When you migrate your database from one database to another or you upgrade an existing system where the database is currently running.

Technical Testing

Technical testing ensures that the data is moved, copied, or loaded from the source system to target system correctly and completely. Technical testing is performed by comparing the target data against the source data. Following is a list of functions that can be performed under technical testing:

Checksum Comparison: The data-centric testing makes use of checksum approach to discover errors. Checksum can be performed on source and target databases in n number of ways such as counting the number of rows, and adding the data of a column. Later the result of checksum calculated on source database is compared against the checksum calculated on the target database.
Domain Comparison: The domain list in the target database is compared against the corresponding domain list in the source database. Multi-Value Comparison: Similar to List comparison, multi-value comparison compares the whole record or the critical columns in a record against the corresponding values in the source system.

Business Testing

Business testing is done to ensure that the data fulfills the requirements of the business. Data may have been moved, copied, or loaded completely and accurately, and technical testing does not report any issue still there are chances that the system still contains the invalid data. To ensure high quality data, the data is evaluated against the business rules.

Reconciliation
Reconciliation ensures that the data in the target system is in agreement with the overall system requirements. Following are the couple of examples of how the reconciliation helps in achieving high quality data:

Internal reconciliation: In this type of reconciliation, the data is compared within the system against the corresponding data set. For example shipping would also always be less than or equal to the orders. If the shipping ever exceeds the orders then it means data is invalid. External reconciliation: In this type of reconciliation, data in system is compared against its counterpart in other systems. For example, in a module or in an application, number of employees can never be more than the number of employees in HR Employee database. Because HR Employee database is the master database that keeps record of all the employees. If such a situation occurs where the number of employees anywhere in the system is more than the HR Employee database, then the data is invalid.

Key Benefits of ETL Testing

Assure your business information is accurate, consistent, and reliable Independently validate information upstream, downstream, and within ETL Minimize the risk of the data loss Data security Data Accuracy Reporting Efficiency

Challenges in ETL Testing

Following are the challenges observed in ETL testing:


Inconsistent and redundant data Loss of data during ETL process Non-availability of comprehensive test bed Volume and complexity of data

Things to verify during ETL Testing

All transformation logics work as designed from source till target All source data that is expected to get loaded into target actually get loaded. All fields are loaded with full contents i.e. no data field is truncated while transforming Data completeness match source to target counts Granularity of data is as per specifications Error logs and audit tables are generated and populated properly Notifications to IT and/or business are generated in proper format NULL values have been populated where expected Rejects have occurred where expected and log for rejects is created with sufficient details Error recovery methods No duplicates are loaded Auditing is done properly Data integrity constraints are properly taken care of

Live Demo

To provide a basic understanding on how we do ETL testing in real time, lets see a live demo of testing a ETL process that integrated two system. This demo will help us to understand on how we run a ETL job, verify in different data sources and cross application validation.

Questions?

?
?

?
?

? ?

?
?

?
?

? ?
? ?

? ?

You might also like