You are on page 1of 35

Introduction

Over the last three decades, many organizations have generated a large amount of data in the form of files and databases. To process this data, we have the database technology available that supports query languages like SQL. The problem with SQL is that it is a structured language that assumes the user is aware of the database schema.
By:- Ritesh M. Chawda (5th

Introduction [contd]

So, as a solution for this, now a days we have data warehouses that let us view the same information along multiple dimensions. And as we know that today we have a large amount of data and to find out useful information out of this is a difficult task. So, for that also we have a great solution that is DATA MINING. As the term connotes, data mining refers to the mining or discovery of new information in terms of patterns or rules from vast amount of (5th By:- Ritesh M. Chawda

WHAT IS A DATA WAREHOUSE?


A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin]
By:- Ritesh M. Chawda (5th
4

A PRODUCER WANTS TO KNOW.


Which are our Which are our lowest/highest lowest/highest margin margin customers ?? customers What is the most What is the most effective distribution effective distribution channel? channel? Who are my customers Who are my customers and what products and what products are they buying? are they buying?

What product promWhat product prom-otions have the -otions have the biggest biggest impact on revenue? impact on revenue? What impact will What impact will new products/services new products/services have on revenue have on revenue and margins? and margins?

Which customers Which customers are most likely to go are most likely to go to the competition ?? to the competition

By:- Ritesh M. Chawda (5th

DATA, DATA EVERYWHERE YET ... I cant find the data I need
data is scattered over the network many versions, subtle differences

o I cant get the data I need need an expert to get the data
o I cant understand the data I found

available data poorly documented


o I cant use the data I found

results are unexpected data needs to be transformed from one form to other
By:- Ritesh M. Chawda (5th
6

WHAT IS DATA WAREHOUSING?


Information A process of transforming data into information and making it available to users in a timely enough manner to make a difference
[Forrester Research, April 1996]
By:- Ritesh M. Chawda (5th
7

Data

Data Mining Vs. Data Warehousing

Data Mining is not well integrated with the database management systems. Data warehouses provide storage, functionality and responsiveness to quarries beyond the capabilities of transactionoriented databases. And the goal of data warehouse is to support decision making with data. So, data mining can be used in conjunction wit a data warehouse to help with certain types of decisions.

By:- Ritesh M. Chawda (5th

Data Mining Vs. Data Warehousing [contd]

To make data mining more efficient, the data warehouse should have an aggregated or summarized collection of data. Data mining helps in extracting meaningful new patterns that cant be found necessarily by merely querying or processing data or metadata in the data warehouse. So, Data Mining applications and tools should be designed to facilitate their use in conjunction with data warehouses.
By:- Ritesh M. Chawda (5th

DATA MINING WORKS WITH WAREHOUSE DATA


Data

Warehousing provides the Enterprise with a memory

o Data Mining provides the Enterprise with intelligence


By:- Ritesh M. Chawda (5th
10

DATA WAREHOUSE
A

data warehouse is a

subject-oriented integrated time-varying non-volatile

collection of data that is used primarily in organizational decision making.


-- Bill Inmon, Building the Data Warehouse 1996

By:- Ritesh M. Chawda (5th

Data Warehouses

Data Warehouse is a store of integrated data from multiple sources, processed for storage in a multidimensional model.
W S N Juice Cola Milk Cream Toothpaste Soap 1 2 34 5 6 7 Month By:- Ritesh M. Chawda (5th
R eg io n

Product

Data Warehouse ( contd )

The multidimensional data model is a good fit for OLAP and Decision Support System. Compared with transactional databases, Data Warehouses are non-volatile. They are optimized for data retrieval, not for routine transaction processing. The distinguish characteristic of Data Warehouse is that they are mainly intended for decision support applications.
By:- Ritesh M. Chawda (5th

DATA WAREHOUSE ARCHITECTURE


Relational Databases Optimized Loader ERP Systems

Extraction Cleansing Data Warehouse Engine Analyze Query

Purchased Data

Legacy Data

Metadata Repository

14

By:- Ritesh M. Chawda (5th

Data Warehouses ( contd )

There are basically three types of Data Warehouses: Enterprise-wide


They

Data Warehouse:-

are huge projects requiring massive investments of time & resources.

Virtual

Data Warehouse:-

They

provide views of operational databases that are materialized for efficient access.

Data

Marts: They

are generally targeted to a subset of organization. By:- Ritesh M. Chawda (5th

DATA WAREHOUSE AND DATA MARTS


OLAP Data Mart Lightly summarized Departmentally structured

Organizationally structured Atomic Detailed Data Warehouse Data

By:- Ritesh M. Chawda (5th

CHARACTERISTICS OF THE DEPARTMENTAL DATA MART


OLAP Small Flexible Customized

by

Department Source is departmentally structured data warehouse


By:- Ritesh M. Chawda (5th
17

How Data is Stored in Data Warehouse?

By:- Ritesh M. Chawda (5th

Applications supported by Data Warehouse

There are several general applications supported by Data Warehouse: OLAP DSS Data Mining

By:- Ritesh M. Chawda (5th

OLAP ( On-Line Analytical Process )

OLAP is known as On-Line Analytical Process. It is a term used to describe the analysis of complex data from the data warehouse. OLAP tools use distributed computing capabilities for analyses that require more storage and processing power than can be economically and efficiently located on an individual desktop.

By:- Ritesh M. Chawda (5th

WHAT IS OLAP?

Online Analytical Processing - coined by EF Codd in 1994 paper contracted by Arbor Software Generally synonymous with earlier terms such as Decisions Support, Business Intelligence, Executive Information System OLAP = Multidimensional Database MOLAP: Multidimensional OLAP (Arbor Essbase, Oracle Express) ROLAP: Relational OLAP (Informix MetaCube, Microstrategy DSS Agent)
By:- Ritesh M. Chawda (5th

TYPICAL OLAP QUERIES

Write a multi-table join to compare sales for each product line YTD this year vs. last year. Repeat the above process to find the top 5 product contributors to margin. Repeat the above process to find the sales of a product line to new vs. existing customers. Repeat the above process to find the customers that have had negative sales growth.
By:- Ritesh M. Chawda (5th

DATA WAREHOUSE FOR DECISION SUPPORT & OLAP


Putting

Information technology to help the knowledge worker make faster and better decisions
Which What How

of my customers are most likely to go to the competition? product promotions have the biggest impact on revenue? did the share price of software companies correlate with profits over last 10 years?
By:- Ritesh M. Chawda (5th

DSS ( Decision Support System )

It supports to an organizations leading decision makers with higher level data for complex and important decisions. Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be ad-hoc Used by managers and end-users to understand the business and make judgments.

By:- Ritesh M. Chawda (5th

WE WANT TO KNOW ...

Given a database of 100,000 names, which persons are the least likely to default on their credit cards? Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer? If I raise the price of my product by Rs. 2, what is the effect on my ROI? If I offer only 2,500 airline miles as an incentive to purchase rather than 5,000, how many lost responses will result? If I emphasize ease-of-use of the product as opposed to its technical capabilities, what will be the net effect on my revenues? Which of my customers are likely to be the most loyal? Data Mining helps extract such information
By:- Ritesh M. Chawda (5th

Goal of Data Mining


1.

Then the goals of data mining are as below:Prediction:

Data Mining can show how certain attributes within the data will behave in the future. Ex. Of predictive data mining include the analysis of buying transactions to predict what consumers will buy under certain discounts, how much sales volume a store would generate in a given period. Data patterns can be used to identify the existence of an item, an event or an activity.

1.

Identification:

By:- Ritesh M. Chawda (5th

Goal of Data Mining ( contd )

3. Classification:

Data Mining can partition the data so that different classes or categories can be identified based on combinations of parameters.

4. Optimization:

One eventual goal of data mining may be to optimize the use of limited resources such as time, space, money or materials and to maximize output variables such as sales or profits under a given set of constraints. By:- Ritesh M. Chawda (5th

Data Mining Techniques

The analytical techniques that are used in data mining are:Artificial Neural Networks:

1.

It is a technique derived from artificial intelligence research.

1.

Decision Trees:

It is tree-shaped structure that represent sets of decisions. These decisions generate rules for the classification of a data set.

By:- Ritesh M. Chawda (5th

Data Mining Techniques ( contd )


3.

Genetic Algorithm:

Optimization techniques that use process such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.

3.

Rule Induction:

The extraction of useful if-then rules from data based on statistical significance.

By:- Ritesh M. Chawda (5th

How it Works?

Data Mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. Generally, we can find four types of relationships:Classes:

1.

Stored data is used to locate data in predetermined groups.

1.

Clusters:

Data items are grouped according to logical relationships or consumer preferences. By:- Ritesh M. Chawda (5th

How it Works? ( contd )


3. Associations:

Data can be mined to identify associations. Associations can be very useful in terms of business. It can give hidden patterns lied in data stored.

4. Sequential Patterns:

Data is mined to anticipate behavior patterns and trends.

By:- Ritesh M. Chawda (5th

Applications
Retail/Marketing Insurance & Health Care Transportation Medicine E-commerce Scientific, Engineering data mining Web Data Mining

By:- Ritesh M. Chawda (5th

Conclusion

During Next 10 years, No business will be without use of Data Mining & Warehouse.

Data Mining is beginning to contribute research advances of its own, by providing scalable extensions and advances to work in associations, ensemble learning, graphical models, techniques for on-line discovery, and algorithms for the exploration of massive and distributed data sets.
By:- Ritesh M. Chawda (5th

ANY QUESTIONS???

THANK YOU

You might also like