You are on page 1of 25

Data Warehousing Data Mining

Course: Basics of Management Information Systems


BBA
Symbiosis Centre for Management Studies Noida
Dr. Tarun Kumar Singhal 1
Preface of Data Warehousing

Many organizations have amassed vast amounts of data that


employees use to unlock valuable secrets to enable the
organization to compete successfully.

Some organizations do this extremely well, but others are


quite ineffective.

To use analytic tools to improve organizational decision-


making, a foundational data architecture and enterprise
architecture must be in place to facilitate effective decision
analysis.
Dr. Tarun Kumar Singhal 2
Preface of Data Warehousing

Enabling decision analysis through access to all relevant


information is known as business intelligence.

Business intelligence includes data warehousing, online


analytical processing, data mining, and visualization and
multidimensionality.

Dr. Tarun Kumar Singhal 3


Introduction of Data Warehousing

Data warehousing is the process of constructing and using a


data warehouse.

A data warehouse is constructed by integrating data from


multiple heterogeneous sources that support analytical
reporting, structured and/or ad hoc queries, and decision
making.

Data warehousing involves data cleaning, data integration,


and data consolidations.

Dr. Tarun Kumar Singhal 4


Characteristics of Data Warehousing

Subject-oriented
Integrated
Time-variant (time series)
Non-volatile
Summarized
Not normalized
Sources
Metadata

Dr. Tarun Kumar Singhal 5


Functions of Data Warehousing

Data Extraction − Involves gathering data from multiple


heterogeneous sources.
Data Cleaning − Involves finding and correcting the errors
in data.
Data Transformation − Involves converting the data from
legacy format to warehouse format.
Data Loading − Involves sorting, summarizing, consolidating,
checking integrity, and building indices and partitions.
Refreshing − Involves updating from data sources to
warehouse.

Dr. Tarun Kumar Singhal 6


Types of Data Warehousing
1. Enterprise Data Warehouse:
Enterprise Data Warehouse is a centralized warehouse. It provides
decision support service across the enterprise. It offers a unified approach
for organizing and representing data.

2. Operational Data Store:


Operational Data Store (ODS) are used when Data warehouse cannot
support organizations reporting needs. In ODS, Data warehouse is
refreshed in real time. Hence, it is widely preferred for routine activities
like storing records of the Employees.

3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a
particular line of business, such as sales, finance, sales or finance. In an
independent data mart, data can collect directly from sources.
Dr. Tarun Kumar Singhal 7
Components of Data Warehousing
Load manager:
Load manager is also called the front component. It performs with all the
operations associated with the extraction and load of data into the
warehouse. These operations include transformations to prepare the data
for entering into the Data warehouse.

Warehouse Manager:
Warehouse manager performs operations associated with the management
of the data in the warehouse. It performs operations like analysis of data to
ensure consistency, creation of indexes and views, generation of
denormalization and aggregations, transformation and merging of source
data and archiving and baking-up data.

Dr. Tarun Kumar Singhal 8


Components of Data Warehousing
Query Manager:
Query manager is also known as backend component. It performs all the
operation operations related to the management of user queries. The
operations of this Data warehouse components are direct queries to the
appropriate tables for scheduling the execution of queries.

End-user access tools:


This is categorized into five different groups like 1. Data Reporting 2.
Query Tools 3. Application development tools 4. EIS tools, 5. OLAP tools
and data mining tools.

Dr. Tarun Kumar Singhal 9


Online Analytical Processing
(OLAP)
OLAP is a category of software that allows users to analyze
information from multiple database systems at the same time.
OLAP is a powerful technology for data discovery, including capabilities for
limitless report viewing, complex analytical calculations, and predictive
“what if” scenario (budget, forecast) planning.
OLAP performs multidimensional analysis of business data and provides
the capability for complex calculations, trend analysis, and sophisticated
data modeling. It is the foundation for many kinds of business applications
for Business Performance Management, Planning, Budgeting, Forecasting,
Financial Reporting, Analysis, Simulation Models, Knowledge Discovery, and
Data Warehouse Reporting.
OLAP enables end-users to perform ad hoc analysis of data in multiple
dimensions, thereby providing the insight and understanding they need for
better decision making.
Dr. Tarun Kumar Singhal 10
Advantages of OLAP
The more data a company can access about a specific activity, the more
likely that the plan to improve that activity will be effective. All businesses
collect data using many different systems, and the challenge remains: how
to get all the data together to create accurate, reliable, fast information
about the business. A company that can take advantage and turn it into
shared knowledge, accurately and quickly, will surely be better positioned
to make successful business decisions and rise above the competition.

OLAP technology has been defined as the ability to achieve “fast access
to shared multidimensional information.” Given OLAP technology’s
ability to create very fast aggregations and calculations of underlying data
sets, one can understand its usefulness in helping business leaders make
better, quicker “informed” decisions.

Dr. Tarun Kumar Singhal 11


Basic Operations of OLAP
Four types of analytical operations in OLAP are:

Roll-up

Drill-down

Slice and dice

Pivot (rotate)

Dr. Tarun Kumar Singhal 12


Roll-up
Roll-up is also known as "consolidation" or "aggregation." The Roll-up
operation can be performed in 2 ways
Reducing dimensions
Climbing up concept hierarchy. Concept hierarchy is a system
of grouping things based on their order or level.

Example
In this example, cities New Jersey and Los Angeles are rolled up into
country USA
The sales figure of New Jersey and Los Angeles are 440 and 1560
respectively.They become 2000 after roll-up
In this aggregation process, data is location hierarchy moves up from city to
the country.
In the roll-up process at least one or more dimensions need to be
removed.
Dr. Tarun Kumar Singhal 13
Drill-down
In drill-down data is fragmented into smaller parts. It is the opposite of the
rollup process. It can be done via
Moving down the concept hierarchy
Increasing a dimension

Example
Quarter Q1 is drilled down to months January, February, and March.
Corresponding sales are also divided.
In this example, dimension “months” is added.

Dr. Tarun Kumar Singhal 14


Slice
Here, one dimension is selected, and a new sub-cube is created.

Following example explains how slice operation performed:


Dimension Time is Sliced with Q1 as the filter.
A new cube is created altogether.

Dice
This operation is similar to a slice. The difference in dice is you select two
or more dimensions that results in the creation of a sub-cube.

Dr. Tarun Kumar Singhal 15


Pivot
In Pivot, you rotate the data axes to provide a substitute presentation of
data.

Dr. Tarun Kumar Singhal 16


Data Mining
Data mining is the process of sorting through large data sets
to identify patterns and establish relationships to solve
problems through data analysis. Data mining tools allow
enterprises to predict future trends.

In data mining, association rules are created by analyzing data


for frequent if/then patterns, then using the support and
confidence criteria to locate the most important
relationships within the data. Support is how frequently the
items appear in the database, while confidence is the number
of times if/then statements are accurate.

Dr. Tarun Kumar Singhal 17


Data Mining
Other data mining parameters include Sequence or Path
Analysis, Classification, Clustering and Forecasting.
Sequence or Path Analysis parameters look for patterns where
one event leads to another later event.

A Sequence is an ordered list of sets of items, and it is a


common type of data structure found in many databases. A
Classification parameter looks for new patterns, and might
result in a change in the way the data is organized.
Classification algorithms predict variables based on other
factors within the database.

Dr. Tarun Kumar Singhal 18


Data Mining Tools and Techniques
Data mining techniques are used in many research areas,
including mathematics, cybernetics, genetics and
marketing. While data mining techniques are a means to
drive efficiencies and predict customer behavior, if used
correctly, a business can set itself apart from its competition
through the use of predictive analysis.

Web mining, a type of data mining used in customer


relationship management, integrates information gathered by
traditional data mining methods and techniques over the web.
Web mining aims to understand customer behavior and to
evaluate how effective a particular website is.

Dr. Tarun Kumar Singhal 19


Data Mining Tools and Techniques
Data mining techniques are used in many research areas,
including mathematics, cybernetics, genetics and
marketing. While data mining techniques are a means to
drive efficiencies and predict customer behavior, if used
correctly, a business can set itself apart from its competition
through the use of predictive analysis.

Web mining, a type of data mining used in customer


relationship management, integrates information gathered by
traditional data mining methods and techniques over the web.
Web mining aims to understand customer behavior and to
evaluate how effective a particular website is.

Dr. Tarun Kumar Singhal 20


Data Mining Tools and Techniques
Other data mining techniques include network approaches
based on multitask learning for classifying patterns, ensuring
parallel and scalable execution of data mining algorithms, the
mining of large databases, the handling of relational and
complex data types, and machine learning.

Machine learning is a type of data mining tool that designs


specific algorithms from which to learn and predict.

Dr. Tarun Kumar Singhal 21


Benefits of Data Mining
In general, the benefits of data mining come from the ability to
uncover hidden patterns and relationships in data that can be used
to make predictions that impact businesses.
Specific data mining benefits vary depending on the goal and the
industry. Sales and marketing departments can mine customer data
to improve lead conversion rates or to create one-to-one
marketing campaigns. Data mining information on historical sales
patterns and customer behaviors can be used to build prediction
models for future sales, new products and services.
Companies in the financial industry use data mining tools to build
risk models and detect fraud. The manufacturing industry uses data
mining tools to improve product safety, identify quality issues,
manage the supply chain and improve operations.

Dr. Tarun Kumar Singhal 22


OLTP
OLTP (online transaction processing) is a class of software
programs capable of supporting transaction-oriented applications
on the Internet.

Typically, OLTP systems are used for order entry, financial


transactions, customer relationship management (CRM) and retail
sales. Such systems have a large number of users who conduct short
transactions. Database queries are usually simple, require sub-
second response times and return relatively few records.

An important attribute of an OLTP system is its ability to maintain


concurrency. To avoid single points of failure, OLTP systems are
often decentralized.
Dr. Tarun Kumar Singhal 23
OLTP vs OLAP
We can divide IT systems into transactional (OLTP) and analytical
(OLAP). OLTP systems provide source data to data warehouses,
whereas OLAP systems help to analyze it.

Dr. Tarun Kumar Singhal 24


OLTP vs OLAP
OLTP (On-line Transaction Processing) is characterized by a
large number of short on-line transactions (INSERT, UPDATE,
DELETE). The main emphasis for OLTP systems is put on very fast
query processing, maintaining data integrity in multi-access
environments and an effectiveness measured by number of
transactions per second. In OLTP database there is detailed and
current data.

OLAP (On-line Analytical Processing) is characterized by


relatively low volume of transactions. Queries are often very
complex and involve aggregations. For OLAP systems a response
time is an effectiveness measure. OLAP applications are widely used
by Data Mining techniques. In OLAP database there is aggregated
and historical data.
Dr. Tarun Kumar Singhal 25

You might also like