You are on page 1of 20

Chapter 20: Data Analysis

Database System Concepts, 6th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Decision-Support Systems: Overview
 Data analysis tasks are simplified by specialized tools and SQL
extensions
 Example tasks
 For each product category and each region, what were the total
sales in the last quarter and how do they compare with the
same quarter last year
 As above, for each product category and each customer
category
 Statistical analysis packages (e.g., : S++) can be interfaced with
databases
 Statistical analysis is a large field, but not covered here
 Data mining seeks to discover knowledge automatically in the form of
statistical rules and patterns from large databases.
 A data warehouse archives information gathered from multiple
sources, and stores it under a unified schema, at a single site.
 Important for large businesses that generate data from multiple
divisions, possibly at multiple sites
 Data may also be purchased externally

Database System Concepts - 6th Edition 20.2 ©Silberschatz, Korth and Sudarshan
Data Warehousing
 Data sources often store only current data, not historical data
 Corporate decision making requires a unified view of all organizational
data, including historical data
 A data warehouse is a repository (archive) of information gathered
from multiple sources, stored under a unified schema, at a single site
 Greatly simplifies querying, permits study of historical trends
 Shifts decision support query load away from transaction
processing systems

Database System Concepts - 6th Edition 20.3 ©Silberschatz, Korth and Sudarshan
Data Warehouse vs. Operational DBMS
 OLTP (on-line transaction processing)
 Major task of traditional relational DBMS
 Day-to-day operations: purchasing, inventory, banking, manufacturing,
payroll, registration, accounting, etc.
 OLAP (on-line analytical processing)
 Major task of data warehouse system
 Data analysis and decision making
 Distinct features (OLTP vs. OLAP):
 Data contents: current, detailed vs. historical, consolidated
 Database design: ER, Normalized design + application vs. star + subject
 View: current, local vs. evolutionary, integrated
 Access patterns: update vs. read-only but complex queries

April 8, 2018 Data Mining: Concepts and 4


Database System Concepts - 6th Edition Techniques 20.4 ©Silberschatz, Korth and Sudarshan
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response

April 8, 2018 Data Mining: Concepts and 5


Database System Concepts - 6th Edition Techniques 20.5 ©Silberschatz, Korth and Sudarshan
Why Separate Data Warehouse?
 High performance for both systems
 DBMS— tuned for OLTP: access methods, indexing, concurrency control,
recovery
 Warehouse—tuned for OLAP: complex OLAP queries, multidimensional
view, consolidation
 Different functions and different data:
 missing data: Decision support requires historical data which operational
DBs do not typically maintain
 data consolidation: DS requires consolidation (aggregation,
summarization) of data from heterogeneous sources
 data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled
 Note: There are more and more systems which perform OLAP analysis
directly on relational databases
April 8, 2018 Data Mining: Concepts and 6
Database System Concepts - 6th Edition Techniques 20.6 ©Silberschatz, Korth and Sudarshan
From Tables to Data Cubes
 A data warehouse is based on a multidimensional data model which views
data in the form of a data cube
 A data cube, such as sales, allows data to be modeled and viewed in
multiple dimensions
 Dimension tables, such as item (item_name, brand, type), or time(day,
week, month, quarter, year)
 Fact table contains measures (such as dollars_sold) and keys to each of
the related dimension tables
 In data warehousing literature, an n-D base cube is called a base cuboid.
The top most 0-D cuboid, which holds the highest-level of summarization, is
called the apex cuboid. The lattice of cuboids forms a data cube.

April 8, 2018 Data Mining: Concepts and 7


Database System Concepts - 6th Edition Techniques 20.7 ©Silberschatz, Korth and Sudarshan
A Concept Hierarchy: Dimension
(location)

all all

region Europe ... North_America

country Germany ... Spain Canada ... Mexico

city Frankfurt ... Vancouver ... Toronto

office L. Chan ... M. Wind

April 8, 2018 Data Mining: Concepts and 8


Database System Concepts - 6th Edition Techniques 20.8 ©Silberschatz, Korth and Sudarshan
Data Warehouse: A Multi-Tiered Architecture

Monitor
& OLAP Server
Other Metadata
sources Integrator

Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining

Data Marts

Data Sources Data Storage OLAP Engine Front-End Tools


April 8, 2018 Data Mining: Concepts and 9
Database System Concepts - 6th Edition Techniques 20.9 ©Silberschatz, Korth and Sudarshan
Multidimensional Data

 Sales volume as a function of product, month, and region

Dimensions: Product, Location, Time


Hierarchical summarization paths

Industry Region Year

Category Country Quarter


Product

Product City Month Week

Office Day

Month
April 8, 2018 Data Mining: Concepts and 10
Database System Concepts - 6th Edition Techniques 20.10 ©Silberschatz, Korth and Sudarshan
A Sample Data Cube

Total annual sales


Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR

Country
sum
Canada

Mexico

sum

April 8, 2018 Data Mining: Concepts and 11


Database System Concepts - 6th Edition Techniques 20.11 ©Silberschatz, Korth and Sudarshan
Cube: A Lattice of Cuboids

all
0-D(apex) cuboid

time item location supplier


1-D cuboids

time,location item,location location,supplier


time,item 2-D cuboids
time,supplier item,supplier

time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier

4-D(base) cuboid
time, item, location, supplier
April 8, 2018 Data Mining: Concepts and 12
Database System Concepts - 6th Edition Techniques 20.12 ©Silberschatz, Korth and Sudarshan
Conceptual Modeling of Data Warehouses

 Modeling data warehouses: dimensions & measures


 Star schema: A fact table in the middle connected to a
set of dimension tables
 Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
set of smaller dimension tables, forming a shape
similar to snowflake
 Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
April 8, 2018 Data Mining: Concepts and 13
Database System Concepts - 6th Edition Techniques 20.13 ©Silberschatz, Korth and Sudarshan
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
April 8, 2018 Data Mining: Concepts and 14
Database System Concepts - 6th Edition Techniques 20.14 ©Silberschatz, Korth and Sudarshan
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
April 8, 2018 Data Mining: Concepts and 15
Database System Concepts - 6th Edition Techniques 20.15 ©Silberschatz, Korth and Sudarshan
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name
units_sold
street
branch_type dollars_sold city units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
April 8, 2018 Data Mining: Concepts and location_key
16
Database System Concepts - 6th Edition Techniques 20.16 ©Silberschatz,shipper_type
Korth and Sudarshan
Typical OLAP Operations
 Roll up (drill-up): summarize data
 by climbing up hierarchy or by dimension reduction
 Drill down (roll down): reverse of roll-up
 from higher level summary to lower level summary or detailed data,
or introducing new dimensions
 Slice and dice: project and select
 Pivot (rotate):
 reorient the cube, visualization, 3D to series of 2D planes
 Other operations
 drill across: involving (across) more than one fact table
 drill through: through the bottom level of the cube to its back-end
relational tables (using SQL)

April 8, 2018 Data Mining: Concepts and 17


Database System Concepts - 6th Edition Techniques 20.17 ©Silberschatz, Korth and Sudarshan
Fig. 3.10 Typical
OLAP Operations

April 8, 2018 Data Mining: Concepts and 18


Database System Concepts - 6th Edition Techniques 20.18 ©Silberschatz, Korth and Sudarshan
End of Chapter

Database System Concepts, 6th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Data Warehousing

Database System Concepts - 6th Edition 20.20 ©Silberschatz, Korth and Sudarshan

You might also like