You are on page 1of 93

Business value Assurance /

Advanced DWH (Testing)


1. Challenges faced by the testing team in realtime scenario
2. Challenges faced by the team in differents phases of STLC
3. What tools are available & used for testing DWH at different
stages
4. Any automation tool available for DWH
5. Any tool available and used to ensure data quality
6. How it is ensured that the data sample selected ensures
completeness
7. How is data reconciliation done
8. How to test bulk data
9. Some information on performance tool and how the result is
analyzed

Table Of Contents
1. Challenges faced by the testing team in real-time scenario.
2. Challenges faced by the team in different phases of STLC.
3. What tools are available & used for testing DWH at different
stages.
4. Any automation tool available for DWH.
5. Any tool available and used to ensure data quality.
6. How it is ensured that the data sample selected ensures
completeness.
7. How is data reconciliation done.
8. How to test bulk data.
9. Some information on performance tool and how the result is
analyzed.

Challenges faced by the testing
team in real-time scenario.
Challenges Faced:
Lack of Skilled testers
Results:
Resulted into incomplete, insufficient and
inadequacy of testing that led to spending of
lot of effort in finding and reporting the bugs.
Challenges Faced:
Lack of availability of standard test data
/ datasets during testing
Results:
Lead to insufficient test coverage.
Challenges Faced:
The team members had insufficient
knowledge of the domain standards
Results:
Resulted in inadequate testing.
Challenges Faced:
Poor understanding of requirements and
Miscommunication or no communication with the end-
users during testing/development cycles
Results:
No specifics of what an application should or shouldn't
do (the application's requirements) and lead to poor
quality of testing.
Challenges Faced:
Not recording non-reproducible defects
Results:
Many times tester came across bugs during random /
exploratory testing which appeared on specific
configurations and are non-reproducible. This made
testing task extremely tedious and time consuming, as
many times there would be random hangs in product.
Challenges Faced:
Tedious manual verification and testing the complete
application
Results:
Even though this led developers on displaying specific
interpretation of results, this has to be done on wide
range of datasets and is a repetitive work. Also to test
each and every combination was challenging.
Challenges Faced:
Interdependencies of components in the Software
Results:
Since the software was complex with different
components, the changes in one part of software often
caused breaks in other parts of the software. Pressure
to handle the current functionality changes, previous
working functionality checks and bug tracking.
Challenges Faced:
Testing always under time constraints
Results:
Often there was a slippage in other phases of the project and thus
reduced time for testing as there was a committed end date to
customer. It was also observed that the tester could simply focus
on task completion and not on the test coverage and quality of
work. This testing activity was taken up as last activity in project
life cycle and there was always a pressure to squeeze testing in a
short time.
Challenges Faced:
Test Systems inadequacy & lack of dedicated resources for test team.
Under estimating testing efforts in project efforts
Results:
Testing time was affected because of lack of dedicated test systems given to
test team, the testers got assigned to test multiple modules and the developers
were finally moved on the testing job.
Test engineers were forced to work at odd hours/weekends as the limited
resources were in control of the development team and test engineers were
given a lower priority during allocation of resources.
Testing team was not involved during scoping phase and the testing teams
efforts were typically underestimated. This led to lower quality of testing as
sufficient efforts could not be put in for the same.
Challenges Faced:
The involvement of test team in entire life cycle is lacking
Results:
Test engineers were involved late in the life cycle. This limited
their contribution only to black box testing. The project team didnt
use the services of the test team for the unit as well as integration
testing phases. Due to the involvement testers in the testing
phase, the test engineers took time to understand all the
requirements of the product, and were overloaded and finally
were forced to work many late hours.
Challenges Faced:
Problems faced to cope with attrition
Results:
Few Key employees left the company at very short
career intervals. Management faced hard problems to
cope with attrition rate. New testers taken into project
required project training from the beginning and as this
is a complex project it became difficult to understand
thus causing delay in release date.
Challenges Faced:
Hard or subtle bug remained unnoticed
Results:
Since there was a lack of skilled testers and
domain expertise, some testers concentrated
more on finding easy bugs that did not require
deep understanding.
Challenges Faced:
Lack of relationship with the developers & no
documentation accompanying releases provided to
test team
Results:
It is a big challenge. There is no proper documentation
accompanying releases provided to the test team. The
test engineer is not aware of the known issues, main
Features to be tested, etc. Hence a lot of effort is
wasted.
Challenges Faced:
Problems faced to cope up with scope creep and
changes to the functionality.
Results:
Delays in implementation date because of lot of
rework. Since there were dependencies among parts
of the project and the frequent changes to be
incorporated, resulted many bugs in the software.
Though automated testing has a lot of benefit, but it
also has some associated challenges.
i. Selection of Test Tool
ii. Customization of Tool
iii. Selection of Automation Level
iv. Development and Verification of Script
v. Implementation of Test Management System
Challenges faced by the team
in different phases of STLC.
Testing the complete application:
Is it possible? I think impossible. There are millions of
test combinations. Its not possible to test each and
every combination both in manual as well as in
automation testing. If you try all these combinations
you will never release the product.
Misunderstanding of company processes:
Some times you just dont pay proper attention what
the company-defined processes are and these are
for what purposes. There are some myths in testers
that they should only go with company processes
even these processes are not applicable for their
current testing scenario. This results in incomplete
and inappropriate application testing.
Relationship with developers:
Big challenge. Requires very skilled tester to handle
this relation positively and even by completing the
work in testers way. There are simply hundreds of
excuses developers or testers can make when they
are not agree with some points. For this tester also
requires good communication, troubleshooting and
analyzing skill.
Regression testing:
When project goes on expanding the regression
testing work simply becomes uncontrolled. Pressure
to handle the current functionality changes, previous
working functionality checks and bug tracking.
Testing always under time constraint:
Hey tester, we want to ship this product by this
weekend, are you ready for completion? When this
order comes from boss, tester simply focuses on task
completion and not on the test coverage and quality
of work. There is huge list of tasks that you need to
complete within specified time. This includes writing,
executing, automating and reviewing the test cases.
Which tests to execute first?
Then how will you take decision which test cases
should be executed and with what priority? Which
tests are important over others? This requires good
experience to work under pressure.
Understanding the requirements:
Some times testers are responsible for
communicating with customers for understanding the
requirements. What if tester fails to understand the
requirements? Will tester be able to test the
application properly? Definitely No! Testers require
good listening and understanding capabilities.
Decision to stop the testing:
When to stop testing? Very difficult decision.
Requires core judgment of testing processes and
importance of each process. Also requires on the fly
decision ability.
One test team under multiple projects:
Challenging to keep track of each task.
Communication challenges. Many times results in
failure of one or both the projects.
Reuse of Test scripts:
Application development methods are changing
rapidly, making it difficult to manage the test tools
and test scripts. Test script migration or reuse is very
essential but difficult task.
Testers focusing on finding easy bugs:
If organization is rewarding testers based on number
of bugs (very bad approach to judge testers
performance) then some testers only concentrate on
finding easy bugs those dont require deep
understanding and testing. A hard or subtle bug
remains unnoticed in such testing approach.
To cope with attrition:
Increasing salaries and benefits making many
employees leave the company at very short career
intervals. Managements are facing hard problems to
cope with attrition rate. Challenges New testers
require project training from the beginning, complex
projects are difficult to understand, delay in shipping
date!
Different types of testing are required
throughout the life cycle of a DWH
implementation.
So we have different challenges to face
during the different phases of STLC.
ETL (Business Functionality Data Quality Performance)
During the ETL phase of DWH implementation, Data
quality testing is of utmost importance. Any defect
slippage in this phase will be very costly to rectify later.
Functional testing need to be carried out to validate the
Transformation Logic.
Data Load (Parameters Settings Validation)
During the setup of Data Load functionality, specific
testing on the load module is carried out. The
Parameters and Settings for data load are tested here.
Initial Data Load (Perfomance Data Quality)
Initial Data Load is when the underlying databases are
loaded for the first time. Performance testing is of
significance here. Data Quality, once tested and signed
off during the ETL testing phase is re-tested here.
E2E Business Testing (UI & Interface Testing)
Once the initial data load is done, the Data warehouse
is ready for an end-to-end functional validation. UI
testing and Interface testing are carried out during this
phase.
Maintenance / Data Feeds (Regression)
Data from the operational Database should be input into
the Data warehouse periodically. During such periodic
updates, regressing testing should be executed. This
ensures the new data updates heve not broken any
existing functionality. Periodic updates are required to
ensure temporal consistency.
What tools are available and
used for testing DWH at different
stages?
ETL software can help you in automating
such process of data loading from
Operational environment to Data
Warehouse environment.
What tools are available and used for
testing DWH at different stages?
What tools are available and
used for testing DWH at different
stages?
Create pairs of SQL queries (QueryPairs)
and reusable queries (Query Snippets) to
embed in queries.
What tools are available and
used for testing DWH at different
stages?
Execute Scenarios that compare Source
databases and / or files to Target data
warehouses.
What tools are available and
used for testing DWH at different
stages?
Agents execute your queries and return the results to the
QuerySurge server for reporting and analysis.

Analyze and drill down into your results and identify bad data and
data defects with our robust reporting.


Issue: Missing Data
Description: Data that does not make it into the target database
Possible Causes: By invalid or incorrect lookup table in the
transformation logic
Bad data from the source database (Needs cleansing) Invalid
joins
Example(s): Lookup table should contain a field value of High
which maps to Critical. However, Source data field contains
Hig - missing the h and fails the lookup, resulting in the target
data field containing null. If this occurs on a key field, a possible
join would be missed and the entire row could fall out.
Issue: Truncation of Data
Description: Data being lost by truncation of the data field
Possible Causes: Invalid field lengths on target database
Transformation logic not taking into account field lengths from
source
Example(s):
Source field value New Mexico City is being truncated to New
Mexico C since the source data field did not have the correct
length to capture the entire field.
Issue: Data Type Mismatch
Description: Data types not setup correct on target database
Possible Causes: Source data field not configured correctly
Example(s): Source data field was required to be a date,
however, when initially configured, was setup as a VarChar.
Issue:
Null Translation
Description:
Null source values not being transformed to correct target values
Possible Causes:
Development team did not include the null translation in the
transformation logic
Example(s):
A Source data field for null was supposed to be transformed to
None in the target data field. However, the logic was not
implemented, resulting in the target data field containing null
values.

Issue:
Wrong Translation
Description:
Opposite of the Null Translation error. Field should be null but is
populated with a non-null value or field should be populated but
with wrong value
Possible Causes:
Development team incorrectly translated the source field for
certain values
Example(s):
Ex. 1) Target field should only be populated when the source
field contains certain values, otherwise should be set to null
Ex. 2) Target field should be Odd if the source value is an odd
number but target field is Even (This is a very basic example)

Issue:
Misplaced Data
Description:
Source data fields not being transformed to the correct target
data field
Possible Causes:
Development team inadvertently mapped the source data field to
the wrong target data field
Example(s):
A source data field was supposed to be transformed to target
data field Last_Name. However, the development team
inadvertently mapped the source data field to First_Name

Issue:
Extra Records
Description:
Records which should not be in the ETL are included in the ETL
Possible Causes:
Development team did not include filter in their code
Example(s):
If a case has the deleted field populated, the case and any data
related to the case should not be in any ETL
Issue:
Not Enough Records
Description:
Records which should be in the ETL are not included in the ETL
Possible Causes:
Development team had a filter in their code which should not
have been there
Example(s):
If a case was in a certain state, it should be ETLd over to the
data warehouse but not the data mart

Issue:
Transformation Logic Errors/Holes
Description:
Testing sometimes can lead to finding holes in the transformation logic or
realizing the logic is unclear
Possible Causes:
Development team did not take into account special cases. For example
international cities that contain special language specific characters might
need to be dealt with in the ETL code
Example(s):
Ex. 1) Most cases may fall into a certain branch of logic for a
transformation but a small subset of cases (sometimes with unusual data)
may not fall into any branches. How the testers code and the developers
code handle these cases could be different (and possibly both end up
being wrong) and the logic is changed to accommodate the cases.
Ex. 2) Tester and developer have different interpretation of transformation
logic, which results in having different values. This will lead to the logic
being re-written to become more clear
Issue:
Simple/Small Errors
Description:
Capitalization, spacing and other small errors
Possible Causes:
Development team did not add an additional space after a
comma for populating the target field.
Example(s):
Product names on a case should be separated by a comma and
then a space but target field only has it separated by a comma

Issue:
Sequence Generator
Description:
Ensuring that the sequence number of reports are in the correct
order is very important when processing follow up reports or
answering to an audit
Possible Causes:
Development team did not configure the sequence generator
correctly resulting in records with a duplicate sequence number
Example(s):
Duplicate records in the sales report was doubling up several
sales transactions which skewed the report significantly

Issue:
Undocumented Requirements
Description:
Find requirements that are understood but are not actually
documented anywhere
Possible Causes:
Several of the members of the development team did not
understand the understood undocumented requirements.
Example(s):
There was a restriction in the where clause that limited how
certain reports were brought over. Used in mappings that were
understood to be necessary, but were not actually in the
requirements.
Occasionally it turns out that the understood requirements are not
what the business wanted.
Issue:
Duplicate Records
Description:
Duplicate records are two or more records that contain the same
data
Possible Causes:
Development team did not add the appropriate code to filter out
duplicate records
Example(s):
Duplicate records in the sales report was doubling up several
sales transactions which skewed the report significantly

Issue:
Numeric Field Precision
Description:
Numbers that are not formatted to the correct decimal point or not
rounded per specifications
Possible Causes:
Development team rounded the numbers to the wrong decimal
point
Example(s):
The sales data did not contain the correct precision and all sales
were being rounded to the whole dollar

Issue:
Rejected Rows
Description:
Data rows that get rejected due to data issues
Possible Causes:
Development team did not take into account data conditions that
could break the ETL for a particular row
Example(s):
Missing data rows on the sales table caused major issues with
the end of year sales report
Any tool available and used to
ensure data quality.
WizSoft- WizRule
Vality- Integrity
Prism Solutions, Inc.- Prism Quality Manager
Objective:
Is your data complete and valid?
Tool:
WizSoft- WizRule, Vality- Integrity
Features:
Data examination- determines quality of data, patterns
within it, and number of different fields used.
Objective:
Does your data comply to your business rules? (Do you
have missing values, illegal values, inconsistent values,
invalid relationships?)
Tool:
Prism Solutions, Inc.- Prism Quality Manager
WizSoft - WizRule
Vality- Integrity
Features:
Compare to business rules and assess data for
consistency and completeness against rules.
Objective:
Are you using sources that comply to your
business rules?
Tool:
WizSoft- WizRule, Vality- Integrity
Features:
Data reengineering- examining the data to determine
what the business rules are?
Trillium Software- Parser i.d. Centric- DataRight
Trillium Software- GeoCoder
i.d. Centric- ACE, Clear I.D. Library
Group 1- NADIS
Trillium Software- Matcher
Innovative Systems- Match
i.d. Centric-Match/Consolidation
Group 1- Merge/Purge Plus
Innovative Systems- Corp-Match

Objective: Does your data need to be broken up
between source and data warehouse?

Tool: Trillium Software- Parser
i.d. Centric- DataRight

Features: Data parsing (elementizing)- context and
destination of each component of each field.
Objective: Does your data have abbreviations that
should be changed to insure consistency?

Tool: Trillium Software- Parser
i.d. Centric- DataRight

Features: Data standardizing- converting data
elements to forms that are standard throughout the
DW.
Objective: Is your data correct?

Tool: Trillium Software- Parser
Trillium Software- GeoCoder
i.d. Centric- ACE, Clear I.D.
Library
Group 1- NADIS

Features: Data correction and verification- matches
data against known lists (addresses, product lists,
customer lists)
Objective: Is there redundancy in your data?

Tool: Trillium Software- Matcher
Innovative Systems- Match
i.d. Centric-Match/Consolidation
Group 1- Merge/Purge Plus.

Features: Record matching- determines whether two
records represent data on the same object.
Objective: Are there multiple versions of company
names in your database?

Tool: Innovative Systems- Corp-Match

Features: Record matching- based on user specified
fields such as tax ID
Objective: Is your data consistent prior to entering data
warehouse?

Tool: Vality- Integrity
i.d. Centric-Match/Consolidation

Features: Transform data- 1 for male, 2 for female
becomes M & F- ensures consistent mapping
between source systems and data
warehouse
Objective: Do you have information in free form fields
that differs between databases?

Tool: Vality- Integrity

Features: Data reengineering- examining the data to
determine what the business rules are?
Objective: Do you multiple individuals in the same
household that need to be grouped together?

Tool: i.d. Centric-Match/Consolidation
Trillium Software- Matcher

Features: Householding- combining individual records
that have same address.
Objective: Does your data contain atypical words-
such as industry specific words, ethnic or hyphenated
names?

Tool: i.d. Centric- ACE, Clear I.D.

Features:
Data parsing combined with data verification-
comparison to industry specific lists.
Enterprise / Integrator by Carleton.
Semio - SemioMap


Objective: Do you have multiple formats to be
accessed- relational dbs, flat files, etc.?

Tool: Enterprise/Integrator by Carleton.

Features: Access the data then map it to the dw
schema.
Objective: Do you have free form text that needs to be
indexed, classified, other?

Tool: Semio- SemioMap

Features: Text mining- extracts meaning and
relevance from large amounts of information
Objective: Have the rules established during the data
cleansing steps been reflected in the metadata?

Tool: Vality- Integrity

Features: Documenting- documenting the results of
the data cleansing steps in the metadata.
Objective: Is data Y2K compliant?
Tool: Enterprise/Integrator by Carleton.
Features: Data verifiacation within a migration tool.

How it is ensured that the data
sample selected ensures
completeness.
By data verification with the help of migration tool.
How is data reconciliation
done?
If the DDL that the data architect has produced somehow
does not match the DDL that has already been defined to
the dbms, then there MUST BE a reconciliation before
any other design and development ensues.
Many of the data warehouses are built on n-tier
architecture with multiple data extraction and data
insertion jobs between two consecutive tiers. As it
happens, the nature of the data changes as it passes
from one tier to the next tier. Data reconciliation is
the method of reconciling or tie-up the data between
any two consecutive tiers (layers).
Master Data Reconciliation
Master data reconciliation is the method of reconciling
only the master data between source and target.

Common examples of master data reconciliation
Total count of rows, example:
Total Customer in source and target
Total number of Products in source and target etc.

Total count of rows based on a condition, example:
Total number of active customers
Total number of inactive customers etc.

Transactional Data Reconciliation
Sales quantity, revenue, tax amount, service usage etc. are
examples of transactional data. Transactional data make the
very base of BI reports so any mismatch in transactional data
can cause direct impact on the reliability of the report and the
whole BI system in general. That is why reconciliation
mechanism must be in-place in order to detect such a
discrepancy before hand (meaning, before the data reach to
the final business users)

Some examples measures used for transactional data
reconciliation
Sum of total revenue calculated from source and target
Sum of total product sold calculated from source and target etc.

Automated Data Reconciliation
For large warehouse systems, it is often convenient to
automate the data reconciliation process by
making this an integral part of data loading. This
can be done by maintaining separate loading
metadata tables and populating those tables with
reconciliation queries. The existing reporting
architecture of the warehouse can be then used to
generate and publish reconciliation reports at the
end of the loading. Such automated reconciliation
will keep all the stake holders informed about the
trustworthiness of the reports.

How to test bulk data?
Using Automation tools.
Open source load testing tool: It is a Java platform
application. It is mainly considered as a performance
testing tool and it can also be integrated with the test
plan. In addition to the load test plan, you can also
create a functional test plan. This tool has the capacity
to be loaded into a server or network so as to check on
its performance and analyze its working under different
conditions. It is of great use in testing the functional
performance of the resources such as Servlets, Perl
Scripts and JAVA objects.
Some information on performance tool and how
the result is analyzed.
Load and performance testing software: This is a
tool used for measuring and analyzing the
performance of the website. The performance and the
end result can be evaluated by using this tool and any
further steps can be taken. This helps you in
improving and optimizing the performance of your
web application. This tool analysis the performance of
the web application by increasing the traffic to the
website and the performance under heavy load can
be determined. It is available in two different
languages; English and French.
One of the key attractive features of this testing tool is
that, it can create and handle thousands of users at
the same time. This tool enables you to gather all the
required information with respect to the performance
and also based on the infrastructure. The
LoadRunner comprises of different tools; namely,
Virtual User Generator, Controller, Load Generator
and Analysis.
Open Source Stress Testing Tool: This tool works
effectively when it is integrated with the functional testing
tool soapUI. This allows you to create, configure and
update your tests while the application is being tested. It
also gives a visual Aid for the user with a drag and drop
experience. This is not a static performance tool. The
advanced analysis and report generating features allows
you to examine the actual performance by pumping in
new data even while the application is being tested. You
need not bother to restart the LoadUI each and every
time you modify or change the application. It
automatically gets updated in the interface.
Load testing and stress testing tool for web
application: To find out the bottlenecks of the website,
it is necessary to examine the pros and cons. There are
many performance testing tools available for measuring
the performance of the certain web application.
WebLoad is one such tool used for load testing and
stress testing. This tool can be used for Load testing
any internet applications such as Ajax, Adobe Flex,
Oracle Forms and much more. This tool is widely used
in the environment where there is a high demand for
maximum Load testing.
It refers to the Web Application Performance tool. These
are scales or analyzing tools for measuring the performance
and output of any web application or web related interfaces.
These tools help us to measure the performance of any web
services, web applications or for any other web interfaces.
With this tool you have the advantage of testing the web
application performances in various different environment
and different load conditions. WAPT provides detailed
information about the virtual users and its output to its users
during the load testing. The WAPT tools can tests the web
application on its compatibility with browser and operating
system. It is also used for testing the compatibility with the
windows application in certain cases.
It is a desktop based advanced HTTP load testing
tool. The web browser can be used to record the
scripts which is easy to use and record. Using the
GUI you can modify the basic script with dynamic
variables to validate response. With control over
network bandwidth you can simulate large virtual user
base for your application stress tests. After test is
executed HTML report is generated for analysis.
It is a load testing tool which is mainly used in the cloud-
based services. This also helps in website optimization
and improvising the working of any web application. This
tools generates traffic to the website by simulating users
so as to find the stress and maximum load it can work.
This LoadImpact comprises of two main parts; the load
testing tool and the page analyzer. The load testing can
be divided into three types such as Fixed, Ramp up and
Timeout. The page analyzer works similar to a browser
and it gives information regarding the working and
statistics of the website. The fame of developing this load
testing tool belongs to Gatorhole AB. This is a freemium
service which means that, it can be acquired for free and
also available for premium price.
It is an automated performance testing tool which can
be used for a web application or a server based
application where there is a process of input and
output is involved. This tool creates a demo of the
original transaction process between the user and the
web service. By the end of it all the statistical
information are gathered and they are analyzed to
increase the efficiency. Any leakage in the website or
the server can be identified and rectified immediately
with the help of this tool. This tool can be the best
option in building a effective and error free cloud
computing service.
It is a automated testing tool which can be employed
for testing the performance of any web sites, web
applications or any other objects. Many developers
and testers make use if this tool to find out any
bottlenecks in their web application and rectify them
accordingly. This testing tool comes along with a built
in editor which allows the users to edit the testing
criteria according to their needs. The testing
anywhere tool involves 5 simple steps to create a
test. They are object recorder, advanced web
recorder, SMART test recorder, Image recognition
and Editor with 385+ comments.
Thanks
Prepared by Mr. Prashanth B S
Software Testing Corporate Trainer
On behalf of ISQT International

ISQT - Process & Consulting Services Private Limited
732, 1st Floor, 12th Main, 3rd Block, Rajajinagar,
Bangalore - 560 010, INDIA
Phone: + 91- 80 - 23012501-15
Fax: + 91 80 23142425
www.isqtinternational.com
email: contact@isqtinternational.com

You might also like