You are on page 1of 1

Data Sourcing into Enterprise Data Warehouse -

Across many organizations BI practice is moving into a more Enterprise initiative and moving out of silos
of departments, to maximize the information value and optimize platform investments.
In the deluge of high power concepts like big data, self-service BI and like, a fundamental foundational
element is not given the attention it deserves when projects kick off, that being comprehensive
understanding of source data characteristics and behavior. Data quality, profiling and other processes
are step in the right direction, but lack of good grasp of source systems and in-flowing data creates
superstructures of Data Warehouse with a weak foundation.
The thoughts below are putting in technical and project management practices to ensure we get data
right from right sources.
With growing reliance on Intelligence systems in the enterprise, it is imperative that the data and the
DW team meets the user expectation and trust to get to the data driven state. This is one of the most
important factor for increasing business intelligence usage in the end user community.
The article discusses some common observations and thoughts based on past project experiences
working on multitude of source areas like web clicks, MDM data (Customer, Product), Online Orders,
Merchandising data.
The observations from project execution and support can be broken into two major categories,
Common Technical gotchas.
1. Hardcoding the Extract part of ETL, relying on source system program codes, specific field
values. Improper documentation and change management with this form of tight coupling result
in changes undetected by downstream system, when changes happen to source systems.
2. Data mismatches due to reads happening for downstream interfaces and parallel
updates/inserts being run as part of ad-hoc data maintenance by source systems.
3. Inadequate communication and partial understanding of lifecycle events causes either storing
too many undesired statuses or filtering out important statuses.
4. Trying to replicate calculations and compound metrics done at source systems, this over time
becomes unmanageable and difficult to keep pace with as source systems change.
5. Difficult in audits/balancing, since no balancing checkpoints are designed in the source system
for other systems to synchronize and validate data.
6. Not handling for the rare, scenarios of back postings and special handling of data.
7. Missing out on low frequency business processes, like year ending reclasses, other monthly
stock close outs etc. This causes improper performance benchmarking and other data impacts.
8. Underestimating the effort and time needed for historical loads/conversions.
Top Project/Program management Ahas
1. Not planning for source system SMEs to support with data extracts.
2. Creating sponsorship at relevant levels to help prioritize DW needs when faced with multiple
equal priorities for source systems.
3. Not getting to an agreed timeline early enough on project.

You might also like