4 Data Warehousing Data Mining

Data Warehousing Data Mining
Course: Basics of Management Information Systems

BBA
Symbiosis Centre for Management Studies Noida
Dr. Tarun Kumar Singhal 1
Preface of Data Warehousing
Many organizations have amassed vast amounts of data that

employees use to unlock valuable secrets to enable the
organization to compete successfully.
Some organizations do this extremely well, but others are

quite ineffective.
To use analytic tools to improve organizational decision-

making, a foundational data architecture and enterprise
architecture must be in place to facilitate effective decision
analysis.
Preface of Data Warehousing
Enabling decision analysis through access to all relevant

information is known as business intelligence.
Business intelligence includes data warehousing, online

analytical processing, data mining, and visualization and
multidimensionality.

Introduction of Data Warehousing
Data warehousing is the process of constructing and using a

data warehouse.
A data warehouse is constructed by integrating data from

multiple heterogeneous sources that support analytical
reporting, structured and/or ad hoc queries, and decision
making.
Data warehousing involves data cleaning, data integration,

and data consolidations.

Characteristics of Data Warehousing
Subject-oriented
Integrated
Time-variant (time series)
Non-volatile
Summarized
Not normalized
Sources
Metadata

Functions of Data Warehousing
Data Extraction − Involves gathering data from multiple

heterogeneous sources.
Data Cleaning − Involves finding and correcting the errors
in data.
Data Transformation − Involves converting the data from
legacy format to warehouse format.
Data Loading − Involves sorting, summarizing, consolidating,
checking integrity, and building indices and partitions.
Refreshing − Involves updating from data sources to
warehouse.

Types of Data Warehousing
1. Enterprise Data Warehouse:
Enterprise Data Warehouse is a centralized warehouse. It provides
decision support service across the enterprise. It offers a unified approach
for organizing and representing data.
2. Operational Data Store:

Operational Data Store (ODS) are used when Data warehouse cannot
support organizations reporting needs. In ODS, Data warehouse is
refreshed in real time. Hence, it is widely preferred for routine activities
like storing records of the Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a
particular line of business, such as sales, finance, sales or finance. In an
independent data mart, data can collect directly from sources.
Components of Data Warehousing
Load manager:
Load manager is also called the front component. It performs with all the
operations associated with the extraction and load of data into the
warehouse. These operations include transformations to prepare the data
for entering into the Data warehouse.
Warehouse Manager:
Warehouse manager performs operations associated with the management
of the data in the warehouse. It performs operations like analysis of data to
ensure consistency, creation of indexes and views, generation of
denormalization and aggregations, transformation and merging of source
data and archiving and baking-up data.

Components of Data Warehousing
Query Manager:
Query manager is also known as backend component. It performs all the
operation operations related to the management of user queries. The
operations of this Data warehouse components are direct queries to the
appropriate tables for scheduling the execution of queries.
End-user access tools:

This is categorized into five different groups like 1. Data Reporting 2.
Query Tools 3. Application development tools 4. EIS tools, 5. OLAP tools
and data mining tools.

Online Analytical Processing
(OLAP)
OLAP is a category of software that allows users to analyze
information from multiple database systems at the same time.
OLAP is a powerful technology for data discovery, including capabilities for
limitless report viewing, complex analytical calculations, and predictive
“what if” scenario (budget, forecast) planning.
OLAP performs multidimensional analysis of business data and provides
the capability for complex calculations, trend analysis, and sophisticated
data modeling. It is the foundation for many kinds of business applications
for Business Performance Management, Planning, Budgeting, Forecasting,
Financial Reporting, Analysis, Simulation Models, Knowledge Discovery, and
Data Warehouse Reporting.
OLAP enables end-users to perform ad hoc analysis of data in multiple
dimensions, thereby providing the insight and understanding they need for
better decision making.
Advantages of OLAP
The more data a company can access about a specific activity, the more
likely that the plan to improve that activity will be effective. All businesses
collect data using many different systems, and the challenge remains: how
to get all the data together to create accurate, reliable, fast information
about the business. A company that can take advantage and turn it into
shared knowledge, accurately and quickly, will surely be better positioned
to make successful business decisions and rise above the competition.
OLAP technology has been defined as the ability to achieve “fast access
to shared multidimensional information.” Given OLAP technology’s
ability to create very fast aggregations and calculations of underlying data
sets, one can understand its usefulness in helping business leaders make
better, quicker “informed” decisions.

Basic Operations of OLAP
Four types of analytical operations in OLAP are:
Roll-up
Drill-down
Slice and dice
Pivot (rotate)

Roll-up
Roll-up is also known as "consolidation" or "aggregation." The Roll-up
operation can be performed in 2 ways
Reducing dimensions
Climbing up concept hierarchy. Concept hierarchy is a system
of grouping things based on their order or level.
Example
In this example, cities New Jersey and Los Angeles are rolled up into
country USA
The sales figure of New Jersey and Los Angeles are 440 and 1560
respectively.They become 2000 after roll-up
In this aggregation process, data is location hierarchy moves up from city to
the country.
In the roll-up process at least one or more dimensions need to be
removed.
Drill-down
In drill-down data is fragmented into smaller parts. It is the opposite of the
rollup process. It can be done via
Moving down the concept hierarchy
Increasing a dimension
Example
Quarter Q1 is drilled down to months January, February, and March.
Corresponding sales are also divided.
In this example, dimension “months” is added.

Slice
Here, one dimension is selected, and a new sub-cube is created.
Following example explains how slice operation performed:

Dimension Time is Sliced with Q1 as the filter.
A new cube is created altogether.
Dice
This operation is similar to a slice. The difference in dice is you select two
or more dimensions that results in the creation of a sub-cube.

Pivot
In Pivot, you rotate the data axes to provide a substitute presentation of
data.

Data Mining
Data mining is the process of sorting through large data sets
to identify patterns and establish relationships to solve
problems through data analysis. Data mining tools allow
enterprises to predict future trends.
In data mining, association rules are created by analyzing data

for frequent if/then patterns, then using the support and
confidence criteria to locate the most important
relationships within the data. Support is how frequently the
items appear in the database, while confidence is the number
of times if/then statements are accurate.

Data Mining
Other data mining parameters include Sequence or Path
Analysis, Classification, Clustering and Forecasting.
Sequence or Path Analysis parameters look for patterns where
one event leads to another later event.
A Sequence is an ordered list of sets of items, and it is a

common type of data structure found in many databases. A
Classification parameter looks for new patterns, and might
result in a change in the way the data is organized.
Classification algorithms predict variables based on other
factors within the database.

Data Mining Tools and Techniques
Data mining techniques are used in many research areas,
including mathematics, cybernetics, genetics and
marketing. While data mining techniques are a means to
drive efficiencies and predict customer behavior, if used
correctly, a business can set itself apart from its competition
through the use of predictive analysis.
Web mining, a type of data mining used in customer

relationship management, integrates information gathered by
traditional data mining methods and techniques over the web.
Web mining aims to understand customer behavior and to
evaluate how effective a particular website is.

Data mining techniques are used in many research areas,
including mathematics, cybernetics, genetics and
marketing. While data mining techniques are a means to
drive efficiencies and predict customer behavior, if used
correctly, a business can set itself apart from its competition
through the use of predictive analysis.
Web mining, a type of data mining used in customer

relationship management, integrates information gathered by
traditional data mining methods and techniques over the web.
Web mining aims to understand customer behavior and to
evaluate how effective a particular website is.

Other data mining techniques include network approaches
based on multitask learning for classifying patterns, ensuring
parallel and scalable execution of data mining algorithms, the
mining of large databases, the handling of relational and
complex data types, and machine learning.
Machine learning is a type of data mining tool that designs

specific algorithms from which to learn and predict.

Benefits of Data Mining
In general, the benefits of data mining come from the ability to
uncover hidden patterns and relationships in data that can be used
to make predictions that impact businesses.
Specific data mining benefits vary depending on the goal and the
industry. Sales and marketing departments can mine customer data
to improve lead conversion rates or to create one-to-one
marketing campaigns. Data mining information on historical sales
patterns and customer behaviors can be used to build prediction
models for future sales, new products and services.
Companies in the financial industry use data mining tools to build
risk models and detect fraud. The manufacturing industry uses data
mining tools to improve product safety, identify quality issues,
manage the supply chain and improve operations.

OLTP
OLTP (online transaction processing) is a class of software
programs capable of supporting transaction-oriented applications
on the Internet.
Typically, OLTP systems are used for order entry, financial

transactions, customer relationship management (CRM) and retail
sales. Such systems have a large number of users who conduct short
transactions. Database queries are usually simple, require sub-
second response times and return relatively few records.
An important attribute of an OLTP system is its ability to maintain

concurrency. To avoid single points of failure, OLTP systems are
often decentralized.
OLTP vs OLAP
We can divide IT systems into transactional (OLTP) and analytical
(OLAP). OLTP systems provide source data to data warehouses,
whereas OLAP systems help to analyze it.

OLTP vs OLAP
OLTP (On-line Transaction Processing) is characterized by a
large number of short on-line transactions (INSERT, UPDATE,
DELETE). The main emphasis for OLTP systems is put on very fast
query processing, maintaining data integrity in multi-access
environments and an effectiveness measured by number of
transactions per second. In OLTP database there is detailed and
current data.
OLAP (On-line Analytical Processing) is characterized by

relatively low volume of transactions. Queries are often very
complex and involve aggregations. For OLAP systems a response
time is an effectiveness measure. OLAP applications are widely used
by Data Mining techniques. In OLAP database there is aggregated
and historical data.

4 Data Warehousing Data Mining

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 Data Warehousing Data Mining

Uploaded by

Copyright:

Available Formats

Data Warehousing Data Mining

Course: Basics of Management Information Systems

Many organizations have amassed vast amounts of data that

Some organizations do this extremely well, but others are

To use analytic tools to improve organizational decision-

Enabling decision analysis through access to all relevant

Business intelligence includes data warehousing, online

Dr. Tarun Kumar Singhal 3

Data warehousing is the process of constructing and using a

A data warehouse is constructed by integrating data from

Data warehousing involves data cleaning, data integration,

Dr. Tarun Kumar Singhal 4

Dr. Tarun Kumar Singhal 5

Data Extraction − Involves gathering data from multiple

Dr. Tarun Kumar Singhal 6

2. Operational Data Store:

Dr. Tarun Kumar Singhal 8

End-user access tools:

Dr. Tarun Kumar Singhal 9

Dr. Tarun Kumar Singhal 11

Slice and dice

Dr. Tarun Kumar Singhal 12

Dr. Tarun Kumar Singhal 14

Following example explains how slice operation performed:

Dr. Tarun Kumar Singhal 15

Dr. Tarun Kumar Singhal 16

In data mining, association rules are created by analyzing data

Dr. Tarun Kumar Singhal 17

A Sequence is an ordered list of sets of items, and it is a

Dr. Tarun Kumar Singhal 18

Web mining, a type of data mining used in customer

Dr. Tarun Kumar Singhal 19

Web mining, a type of data mining used in customer

Dr. Tarun Kumar Singhal 20

Machine learning is a type of data mining tool that designs

Dr. Tarun Kumar Singhal 21

Dr. Tarun Kumar Singhal 22

Typically, OLTP systems are used for order entry, financial

An important attribute of an OLTP system is its ability to maintain

Dr. Tarun Kumar Singhal 24

OLAP (On-line Analytical Processing) is characterized by

You might also like