Professional Documents
Culture Documents
(CS/IT) Part I
Paper IV
Data Warehousing and Mining
Text Books: Paulraj Ponnian, Data Warehousing Fundamentals, John Wiley. W.H. Inmon, Building the Data Warehouses, Wiley Dreamtech R. Kimpall, The Data Warehouse Toolkit, John Wiley Ralph Kimball, The Data Warehouse Lifecycle toolkit, John Wiley
Girish Tere, Lecturer (CS), TCSC 1
3/16/2014
Understand the desperate need for strategic information Recognize the information crises at every enterprise Distinguish between operational and informational systems Past attempts to provide strategic information The solution Data Warehousing
Girish Tere, Lecturer (CS), TCSC 2
3/16/2014
Introduction
What is your role in IT? Your IT experience Applications to run business What they do? What they provide? What executives requires? Where is the strategic information required?
Girish Tere, Lecturer (CS), TCSC 3
3/16/2014
Organizations use of DW
Retail
Manufacturing
Financial
Utilities
Airlines
Government
3/16/2014
Who needs strategic information in an Enterprise? What is strategic information? Examples of Business Objectives
Retain the present customer base Increase the customer base by 15% over the next 5 years Gain market share by 10% in next 3 years
Girish Tere, Lecturer (CS), TCSC 5
3/16/2014
Improve product quality levels in the top five product groups Enhance customer service level in shipments Bring three new products to market in 2 years Increase sales by 15% in the North East Division
Girish Tere, Lecturer (CS), TCSC 6
3/16/2014
Is it for running the day-to-day operation of the business? What is SI? Characteristics of SI
3/16/2014
Characteristics of SI
Integrated Must have a single, enterprisewide view
Data Integrity
Accessible
Credible
Timely
Information must be accurate and must conform to business rules Easily accessible with intuitive access paths, and responsive for analysis Every business factor must have unique value Information must be available within the stipulated time period
How much data is stored and available? Where is all this data? On which platforms? On one PC or across the network? Facts are Organization have lots of data IT resources and systems are not affective to use this data as SI
Girish Tere, Lecturer (CS), TCSC 9
3/16/2014
Real Problem
Most companies are faced with information crisis not because of lack of sufficient data, but because the available data is not readily usable for strategic decision making. Why is this so? We need information integrated from all systems. Operational data is event driven Operational data is not directly suitable for review from different viewpoints
Technology Trends
Name of Computer Department in Company DP, MIS, IS, IT Phenomenon growth of IT in areas like
3/16/2014
The user will ask a question and get the results This interactive process continues Why making provision of SI is feasible now?
3/16/2014
12
What are the opportunities available to companies resulting from the possible use of SI? What are threats and risks resulting from lack of SI available in companies?
3/16/2014
13
Some Opportunities
SI required for Reliance Telecommunication industry SI required for ICICI Bank SI required for Mediclaim companies SI required for Apna Bazar A Community based pharmacy company
3/16/2014
14
Some Risks
A car rental company (fleet management) A multinational company - Supplier of systems and components to automobile industry (Inconsistent data)
3/16/2014
15
Example A Chennai Branch is not You have to gather the data from multiple applications and start from scratch. In order to understand the reasons for the failures of IT to provide SI in the past, we need to consider how IT was attempting to do this all these years.
Girish Tere, Lecturer (CS), TCSC 16
3/16/2014
Past DSSs
Ad- Hoc reports Special Extract Programs Small applications Information Centers DSS EIS (only programmed screens and reports were available)
Girish Tere, Lecturer (CS), TCSC 17
3/16/2014
Figure 1.4 IT receives too many ad hoc requests, resulting in a large overload. Requests keep changing Users ask for more and more reports Users have to depend on IT to provide the information You need very flexible and conductive environment for providing info for making strategic decisions. IT has been unable to provide such an environment.
Girish Tere, Lecturer (CS), TCSC 18
3/16/2014
Operational vs DSS
What is the basic reason for the failure of all the previous attempts by IT to provide SI? Do we need different types of systems?
3/16/2014
19
3/16/2014
20
3/16/2014
21
Watching the wheels of business turn Show me the top-selling products Show me the problem regions Tell me why (drill down) Let me see other data (drill across) Show me highest margins Alert me when a district sells below target
Girish Tere, Lecturer (CS), TCSC 22
3/16/2014
That serve different purposes Whose scopes are different Whose data content is different Where the data usage patterns are different Where the data access types are different
Girish Tere, Lecturer (CS), TCSC 23
3/16/2014
Text Books: 1. Paulraj Ponnian, Data Warehousing Fundamentals, John Wiley. 2. W.H. Inmon, Building the Data Warehouses, Wiley Dreamtech 3. R. Kimpall, The Data Warehouse Toolkit, John Wiley 4. Ralph Kimball, The Data Warehouse Lifecycle toolkit, John Wiley
Girish Tere, Lecturer (CS), TCSC 24
3/16/2014
Understand the desperate need for strategic information Recognize the information crises at every enterprise Distinguish between operational and informational systems Past attempts to provide strategic information The solution Data Warehousing
Girish Tere, Lecturer (CS), TCSC 25
3/16/2014
Introduction
What is your role in IT? Your IT experience Applications to run business What they do? What they provide? What executives requires? Where is the strategic information required?
Girish Tere, Lecturer (CS), TCSC 26
3/16/2014
Organizations use of DW
Retail
Manufacturing
Financial
Utilities
Airlines
Government
3/16/2014
27
Who needs strategic information in an Enterprise? What is strategic information? Examples of Business Objectives
Retain the present customer base Increase the customer base by 15% over the next 5 years Gain market share by 10% in next 3 years
Girish Tere, Lecturer (CS), TCSC 28
3/16/2014
Improve product quality levels in the top five product groups Enhance customer service level in shipments Bring three new products to market in 2 years Increase sales by 15% in the North East Division
Girish Tere, Lecturer (CS), TCSC 29
3/16/2014
Is it for running the day-to-day operation of the business? What is SI? Characteristics of SI
3/16/2014
30
Characteristics of SI
Integrated Data Integrity Accessible Must have a single, enterprisewide view Information must be accurate and must conform to business rules Easily accessible with intuitive access paths, and responsive for analysis Every business factor must have unique value Information must be available within the stipulated time period
Credible Timely
How much data is stored and available? Where is all this data? On which platforms? On one PC or across the network? Facts are Organization have lots of data IT resources and systems are not affective to use this data as SI
Girish Tere, Lecturer (CS), TCSC 32
3/16/2014
Real Problem
Most companies are faced with information crisis not because of lack of sufficient data, but because the available data is not readily usable for strategic decision making. Why is this so? We need information integrated from all systems. Operational data is event driven Operational data is not directly suitable for review from different viewpoints
Technology Trends
Name of Computer Department in Company DP, MIS, IS, IT Phenomenon growth of IT in areas like
3/16/2014
The user will ask a question and get the results This interactive process continues Why making provision of SI is feasible now?
3/16/2014
35
Current values Optimized for transactions High Read, update, delete Predictable, Repetitive msecs Large numbers
Girish Tere, Lecturer (CS), TCSC
Archived, derived, summarized Optimized for complex queries Medium to low Read Ad hoc, random, heuristic Many seconds Relatively small numbers
36
We need different types of DSS to provide SI Information required for strategic decision making is not available in operational systems New environment is required for analysis, deciding trends and monitoring performance
Girish Tere, Lecturer (CS), TCSC 37
3/16/2014
Database designed for analytical tasks Data from multiple applications Easy to use and helping to long interactive sessions by users Read-intensive data usage Direct interaction with the system by the users without help from IT staff Content updated periodically and stable Content to include current and historical data Ability for users to run queries and get results online Ability for users to make reports
Girish Tere, Lecturer (CS), TCSC 38
3/16/2014
Running of simple queries and reports against current and historical data Ability to perform what if analysis Ability to query, analyze and again make query continue this process as many as times required Realize historical trends, mistakes and apply/correct them for future results
Girish Tere, Lecturer (CS), TCSC 39
3/16/2014
BI at DW
The needed environment is DW It is kept separate from the system environment supporting the day-to-day operations DW contains BI.
3/16/2014
40
Data transformation
Data Warehouse
3/16/2014
41
E.g. of BI at DW
DW containing units of sales stored along business dimensions Important : Data staging area
3/16/2014
42
Provides an integrated and total view of the enterprise Makes the enterprises current and historical information easily available for decision making Makes decision-support transactions possible without burdening operational systems Renders consistently organizations information Presents a flexible and interactive source of strategic information
Girish Tere, Lecturer (CS), TCSC 43
3/16/2014
DW concept
Is not to generate fresh data Is to make use of large existing data and to transform it into forms suitable for providing SI Take all the data you already have in the organization, clean and transform it, and then use it to provide SI
Girish Tere, Lecturer (CS), TCSC 44
3/16/2014
It is a user-centric and user-driven environment An ideal environment for data analysis and decision support Constantly changing, flexible and interactive Useful for the ask-answer-ask-again pattern Provides the ability to discover answers to complex, unpredictable questions
Girish Tere, Lecturer (CS), TCSC 45
3/16/2014
Data extraction Loading the data Transforming the data Storing the data Providing UI
3/16/2014
47
Technologies used in DW
Data Quality
Data Modeling Data Acquisition Data Management Metadata Management Analysis Applications Development Tools Storage Management
Administration
3/16/2014
48
2.
3. 4. 5. 6. 7. 8.
9.
10.
information crisis SI operational systems information center DW order processing EIS data staging area extract programs IT
A.
B.
C. D. E.
F. G. H. I. J.
OLTP application Produce ad hoc reports explosive growth despite lots of data data cleaned and transformed users go to get information used for decision making environment, not product for day-to-day operations Simple, easy to use
49
3/16/2014
Class Test
1. 2.
3.
4. 5.
6.
7.
What do you mean by SI? For a commercial bank, name five types of strategic objectives. Do you agree that a typical retail store collects huge volumes of data through its operational systems? Name three types of transaction data likely to be collected by a retail store in large volumes during its daily operations. Why were all the past attempts by IT to provide SI failures? List three concrete reasons and explain. Differentiate between operational systems and informational systems. List characteristics of the computing environment needed to provide SI. What types of processing take place in a DW? A DW is an environment, not a product. Discuss.
Girish Tere, Lecturer (CS), TCSC 50
3/16/2014
9.
You are the IT Director of a nationwide insurance company. Write a memo to the VP explaining the types of opportunities that can be realized with What do you mean by SI? For a commercial bank, name five types of strategic objectives. For an airlines company, how can SI increase the number of frequent flyers? Discuss giving specific details.
3/16/2014
51