You are on page 1of 12

ETLQA/Tester Datawarehouse QA/Tester should have

About Data Warehouse IntroductionThis document details the testing process involved in data warehouse testing and test coverage areas. It explains the importance of data warehouse application testing and the various steps of the testing process.About Data WarehouseData warehouse is the main repository of the organization's historical data. It contains the data for management's decision support system. The important factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis (data mining) on the information within data warehouse without slowing down the operational systems.Data Warehouse definition Subject-oriented : Subject Oriented -Data warehouses are designed to help you analyse data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. The data is organized so that all the data elements relating to the same real-world event or object are linked together. Integrated : Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. The database contains data from most or all of an organization's operational applications and is made consistent. Time-variant : The changes to the data in the database are tracked and recorded to produce reports on data changed over time. In order to discover trends in business, analysts need large amounts of data. A data warehouse's focus on change over time is what is meant by the term time variant. Non-volatile : Data in the database is never over-written or deleted, once committed, the data is static, read-only, but retained for future reporting. Once entered into the warehouse, data should not change. This is logical because the purpose of a data warehouse is to enable you to analyse what has occurred. Validating the Report data Once the ETLs are tested for count and data verification, the data being showed onto the reports hold utmost importance. QA team should verify the data reported with the source data for consistency and accuracy.1. Verify Report data with source- Although the data present in a data warehouse will be stored at an aggregate level compare to source systems. Here the QA team should verify the granular data stored in data warehouse against the source data available.2. Field level data verification- QA team must understand the linkages for the fields displayed in the report and should trace back and compare that with the source systems.3. Creating SQLs- Create SQL queries to fetch and verify the data from Source and Target. Sometimes its not possible to do the complex transformations done in ETL. In such a case the data can be transferred to some file and calculations can be performed. Scenarios to be covered in Integration Testing Integration Testing would cover End-to-End Testing for DWH. The coverage of the tests would include the below:1. Count Validation- Record Count Verification DWH backend/Reporting queries against source and target as a initial check.2. Source Isolation- Validation after isolating the driving sources.3. Dimensional Analysis- Data integrity between the various source tables and relationships.4. Statistical Analysis- Validation for various calculations.5. Data Quality Validation- Check for missing data, negatives and consistency. Field-by-Field data verification can be done to check the consistency of source and target data.6. Granularity- Validate at the lowest granular level possible (Lowest in the hierarchy E.g. Country-City-Street start with test cases on street).7. Other validations.- Graphs, Slice/dice, meaningfulness, accuracy

Tuesday, February 19, 2008 ETL Testing with informatica and oracle ETL should not be confused with a data creation process. It never creates new data. If a list of hundred employees is being loaded, one more employee cannot be added to the list and make it hundred and one. Or if last name of customer is absent an arbitrary last name cannot be substituted. Data warehouses are not OLTP systems. Duplication of calculations in Source system & the data warehouse should not be attempted, as the numbers will be very difficult to match during QA. Also in future the process in the source system can change that will result in asynchronous data. ETL cannot change the meaning of data. For example for sex M and F in source system sex flag to 1 and 2 is used in the Data Warehouse respectively. This is OK because this does not change the business meaning of the data. It only has changed the representation of the data. What is ETL?Extract, transform and load (ETL) software, which includes reading data from its source, cleaning it up and formatting it uniformly, and then writing it to the target repository to be exploited.what is Purpose of ETL ? After extraction, the data is transformed, or modified, depending on the specific business logic involved so that it can be sent to the target repository. There are a variety of ways to perform the transformation, and the work involved varies. The data may require reformatting only, but most ETL operations also involve cleansing the data to remove duplicates and enforce consistency. Part of what the software does is examine individual data fields and apply rules to consistently convert the contents to the form required by the target repository or application. There are several levels of testing that can be performed during data warehouse testing. Some examples,1.Constraint testing : During constraint testing, the objective is to validate unique constraints,primary keys, foreign keys, indexes, and relationships. The test script should include these validation points.Some ETL processes can be developed to validate constraints during the loading of the warehouse. If the decision is made to add constraint validation to the ETL process, the ETL code must validate all business rules and relational data requirements.Depending solely on the automation of constraint testing is risky. When the setup is not done correctly or maintained throughout the everchanging requirements process, the validation could become incorrect and will nullify the tests.COUNTS :The objective of the count test scripts is to determine if the record counts in the source match the record counts in the target. Some ETL processes are capable of capturing record count information such as records read, records written, records in error, etc. If the ETL processbeing used can capture that level of detail and create a list of the counts, allow it to do so. This will save time during the validation process.2.Source to target countsSource to Target :to perform source to target field-tofield validation. This piece of the testing cycle is the most labor intensive and requires the most thorough analysis of the data.3.Source to target data validationField to Field verification : Normally ,once count is matched source and target, you can test with field level verification. Here Exactly we are going to verify the source column data and target column data.4.Error processingUnderstanding a script might fail during data validation, may confirmthe ETL process is working through process validation. During processvalidation the testing team should identify additional datacleansing needs, as well as identify consistent error patterns that couldpossibly be diverted by modifying the ETL code. ETL Tester should have responsibilities Executing back-end data-driven test efforts with a focus on data transformations between various

systems and a data warehouse.Responsible for testing all new and existing ETL data warehouse components.Test ETL software using Informatica.Experienced at testing ETLs and flat file data transfers without relying on a GUI layer.Provide technical suggestions and guidance to QA management for improving QA software testing methodology for data warehouse testing.Identify issues on complex testing projects and recommend solutions based on data gathered.Identify, troubleshoot, and propose solutions on potential issues.Analyze, interpret, and approve requirements and design specifications.Responsible for analyzing technical specifications, creating test plans, designing test harnesses and test cases, and executing the test cases.Team player who will work very closely with other data warehouse testers and with development team. ETLQA/Tester Datawarehouse QA/Tester should have Executing back-end data-driven test efforts with a focus on data transformations between various systems and a data warehouse.Responsible for testing all new and existing ETL data warehouse components.Test ETL software using Informatica.Experienced at testing ETLs and flat file data transfers without relying on a GUI layer.Provide technical suggestions and guidance to QA management for improving QA software testing methodology for data warehouse testing.Identify issues on complex testing projects and recommend solutions based on data gathered.Identify, troubleshoot, and propose solutions on potential issues.Analyze, interpret, and approve requirements and design specifications.Responsible for analyzing technical specifications, creating test plans, designing test harnesses and test cases, and executing the test cases.Team player who will work very closely with other data warehouse testers and with development team.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

ETL Testing / Data Warehouse Testing Tips, Techniques, Process and Challenges
Posted In | Database Testing This is a guest post by Vishal Chhaperia. If you want to publish your article please read our article guidelines. Today let me take a moment and explain my testing fraternity about one of the much in demand and upcoming skills for my tester friends i.e. ETL testing (Extract, Transform, and Load). This article will present you with a complete idea about ETL testing and what we do to test ETL process. It has been observed that Independent Verification and Validation is gaining huge market potential and many companies are now seeing this as prospective business gain. Customers have been offered different range of products in terms of service offerings, distributed in many areas based on technology, process and solutions. ETL or data warehouse is one of the offerings which are developing rapidly and successfully.

Why do organizations need Data Warehouse? Organizations with organized IT practices are looking forward to create a next level of technology transformation. They are now trying to make themselves much more operational with easy-to-interoperate data. Having said that data is most important part of any organization, it may be everyday data or historical data. Data is backbone of any report and reports are the baseline on which all the vital management decisions are taken. Most of the companies are taking a step forward for constructing their data warehouse to store and monitor real time data as well as historical data. Crafting an efficient data warehouse is not an easy job. Many organizations have distributed departments with different applications running on distributed technology. ETL tool is employed in order to make a flawless integration between different data sources from different departments. ETL tool will work as an integrator, extracting data from different sources; transforming it in preferred format based on the business transformation rules and loading it in cohesive DB known are Data Warehouse. Well planned, well defined and effective testing scope guarantees smooth conversion of the project to the production. A business gains the real buoyancy once the ETL processes are verified and validated by independent group of experts to make sure that data warehouse is concrete and robust. ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used:

New Data Warehouse Testing New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and new data warehouse is build and verified with the help of ETL tools. Migration Testing In this type of project customer will have an existing DW and ETL performing the job but they are looking to bag new tool in order to improve efficiency. Change Request In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule.

Report Testing Report are the end result of any Data Warehouse and the basic propose for which DW is build. Report must be tested by validating layout, data in the report and calculation.

ETL Testing Techniques:


1) Verify that data is transformed correctly according to various business requirements and rules. 2) Make sure that all projected data is loaded into the data warehouse without any data loss and truncation. 3) Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data. 4) Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability. Apart from these 4 main ETL testing methods other testing methods like integration testing and user acceptance testing is also carried out to make sure everything is smooth and reliable.

ETL Testing Process:


Similar to any other testing that lies under Independent Verification and Validation, ETL also go through the same phase.

Business and requirement understanding Validating Test Estimation Test planning based on the inputs from test estimation and business requirement Designing test cases and test scenarios from all the available inputs Once all the test cases are ready and are approved, testing team proceed to perform preexecution check and test data preparation for testing Lastly execution is performed till exit criteria are met Upon successful completion summary report is prepared and closure process is done.

It is necessary to define test strategy which should be mutually accepted by stakeholders before starting actual testing. A well defined test strategy will make sure that correct approach has been followed meeting the testing aspiration. ETL testing might require writing SQL statements extensively by testing team or may be tailoring the SQL provided by development team. In any case testing team must be aware of the results they are trying to get using those SQL statements. Difference between Database and Data Warehouse Testing There is a popular misunderstanding that database testing and data warehouse is similar while the fact is that both hold different direction in testing.

Database testing is done using smaller scale of data normally with OLTP (Online transaction processing) type of databases while data warehouse testing is done with large volume with data involving OLAP (online analytical processing) databases.

In database testing normally data is consistently injected from uniform sources while in data warehouse testing most of the data comes from different kind of data sources which are sequentially inconsistent. We generally perform only CRUD (Create, read, update and delete) operation in database testing while in data warehouse testing we use read-only (Select) operation. Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing.

There are number of universal verifications that have to be carried out for any kind of data warehouse testing. Below is the list of objects that are treated as essential for validation in ETL testing: - Verify that data transformation from source to destination works as expected - Verify that expected data is added in target system - Verify that all DB fields and field data is loaded without any truncation - Verify data checksum for record count match - Verify that for rejected data proper error logs are generated with all details - Verify NULL value fields - Verify that duplicate data is not loaded - Verify data integrity

ETL Testing Challenges:


ETL testing is quite different from conventional testing. There are many challenges we faced while performing data warehouse testing. Here is the list of few ETL testing challenges I experienced on my project: - Incompatible and duplicate data. - Loss of data during ETL process. - Unavailability of inclusive test bed. - Testers have no privileges to execute ETL jobs by their own. - Volume and complexity of data is very huge. - Fault in business process and procedures. - Trouble acquiring and building test data. - Missing business flow information. Data is important for businesses to make the critical business decisions. ETL testing plays a significant role validating and ensuring that the business information is exact, consistent and reliable. Also, it minimizes hazard of data loss in production. Hope these tips will help ensure your ETL process is accurate and the data warehouse build by this is a competitive advantage for your business. This is a guest post by Vishal Chhaperia who is working in a MNC on a test management role. He is having extensive experience in managing multi technology QA projects, Processes and teams.

Have you worked on ETL testing? Please share your ETL/DW testing tips and challenges below.
14 inShare

Related Posts:
Database Testing Properties of a Good Test Data and Test Data Preparation Techniques Tips to design test data before executing your test cases Database Testing Practical Tips and Insight on How to Test Database Automated Regression Testing Challenges in Agile Environment How to Test Application Security Web and Desktop Application Security Testing Techniques

ETL Testing / Data Warehouse Testing Tips, Techniques, Process and Challenges

Posted In | Database Testing

This is a guest post by Vishal Chhaperia. If you want to publish your article please read our article guidelines.

Today let me take a moment and explain my testing fraternity about one of the much in demand and upcoming skills for my tester friends i.e. ETL testing (Extract, Transform, and Load). This article will present you with a complete idea about ETL testing and what we do to test ETL process.

It has been observed that Independent Verification and Validation is gaining huge market potential and many companies are now seeing this as prospective business gain. Customers have been offered different range of products in terms of service offerings, distributed in many areas based on technology, process and solutions. ETL or data warehouse is one of the offerings which are developing rapidly and successfully.

ETL testing

Why do organizations need Data Warehouse? Organizations with organized IT practices are looking forward to create a next level of technology transformation. They are now trying to make themselves much more operational with easy-tointeroperate data. Having said that data is most important part of any organization, it may be everyday data or historical data. Data is backbone of any report and reports are the baseline on which all the vital management decisions are taken.

Most of the companies are taking a step forward for constructing their data warehouse to store and monitor real time data as well as historical data. Crafting an efficient data warehouse is not an easy job. Many organizations have distributed departments with different applications running on distributed technology. ETL tool is employed in order to make a flawless integration between different data sources from different departments. ETL tool will work as an integrator, extracting data from different sources; transforming it in preferred format based on the business transformation rules and loading it in cohesive DB known are Data Warehouse.

Well planned, well defined and effective testing scope guarantees smooth conversion of the project to the production. A business gains the real buoyancy once the ETL processes are verified and validated by independent group of experts to make sure that data warehouse is concrete and robust.

ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used:

New Data Warehouse Testing New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and new data warehouse is build and verified with the help of ETL tools. Migration Testing In this type of project customer will have an existing DW and ETL performing the job but they are looking to bag new tool in order to improve efficiency. Change Request In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule. Report Testing Report are the end result of any Data Warehouse and the basic propose for which DW is build. Report must be tested by validating layout, data in the report and calculation.

ETL Testing Techniques:

1) Verify that data is transformed correctly according to various business requirements and rules. 2) Make sure that all projected data is loaded into the data warehouse without any data loss and truncation. 3) Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data. 4) Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability.

Apart from these 4 main ETL testing methods other testing methods like integration testing and user acceptance testing is also carried out to make sure everything is smooth and reliable. ETL Testing Process:

Similar to any other testing that lies under Independent Verification and Validation, ETL also go through the same phase.

Business and requirement understanding Validating Test Estimation Test planning based on the inputs from test estimation and business requirement Designing test cases and test scenarios from all the available inputs Once all the test cases are ready and are approved, testing team proceed to perform pre-execution check and test data preparation for testing Lastly execution is performed till exit criteria are met Upon successful completion summary report is prepared and closure process is done.

It is necessary to define test strategy which should be mutually accepted by stakeholders before starting actual testing. A well defined test strategy will make sure that correct approach has been followed meeting the testing aspiration. ETL testing might require writing SQL statements extensively by testing team or may be tailoring the SQL provided by development team. In any case testing team must be aware of the results they are trying to get using those SQL statements.

Difference between Database and Data Warehouse Testing There is a popular misunderstanding that database testing and data warehouse is similar while the fact is that both hold different direction in testing.

Database testing is done using smaller scale of data normally with OLTP (Online transaction processing) type of databases while data warehouse testing is done with large volume with data involving OLAP (online analytical processing) databases. In database testing normally data is consistently injected from uniform sources while in data warehouse testing most of the data comes from different kind of data sources which are sequentially inconsistent. We generally perform only CRUD (Create, read, update and delete) operation in database testing while in data warehouse testing we use read-only (Select) operation. Normalized databases are used in DB testing while demoralized DB is used in data warehouse testing.

There are number of universal verifications that have to be carried out for any kind of data warehouse testing. Below is the list of objects that are treated as essential for validation in ETL testing: - Verify that data transformation from source to destination works as expected - Verify that expected data is added in target system - Verify that all DB fields and field data is loaded without any truncation - Verify data checksum for record count match - Verify that for rejected data proper error logs are generated with all details - Verify NULL value fields - Verify that duplicate data is not loaded - Verify data integrity

ETL Testing Challenges:

ETL testing is quite different from conventional testing. There are many challenges we faced while performing data warehouse testing. Here is the list of few ETL testing challenges I experienced on my project: - Incompatible and duplicate data. - Loss of data during ETL process. - Unavailability of inclusive test bed. - Testers have no privileges to execute ETL jobs by their own. - Volume and complexity of data is very huge. - Fault in business process and procedures. - Trouble acquiring and building test data. - Missing business flow information.

Data is important for businesses to make the critical business decisions. ETL testing plays a significant role validating and ensuring that the business information is exact, consistent and reliable. Also, it minimizes hazard of data loss in production.

Hope these tips will help ensure your ETL process is accurate and the data warehouse build by this is a competitive advantage for your business.

This is a guest post by Vishal Chhaperia who is working in a MNC on a test management role. He is having extensive experience in managing multi technology QA projects, Processes and teams.

Have you worked on ETL testing? Please share your ETL/DW testing tips and challenges below. 14 inShare

Related Posts: Database Testing Properties of a Good Test Data and Test Data Preparation Techniques Tips to design test data before executing your test cases Database Testing Practical Tips and Insight on How to Test Database Automated Regression Testing Challenges in Agile Environment How to Test Application Security Web and Desktop Application Security Testing Techniques

You might also like