You are on page 1of 35

Data Warehouse and Data

Mining
Mind Map Mata Kuliah
Kompetensi Umum
• Mahasiswa mampu menganalisis data dengan
menerapkan teknik data mining dalam
membuat keputusan bisnis pada suatu
perusahaan (C4)
Introduction to Data
Warehousing
Pertemuan ke 1 & 2
Kompetensi Khusus
• Mahasiswa mampu menjelaskan konsep data
warehouse, karakteristik, arsitektur data
warehouse, hubungan dan trend antara data
warehouse dan bisnis (C2)
Materi
1. Definition of Data Warehouse
2. Key Characteristics of a Data Warehouse
3. Data Warehouse Architecture
4. Data Warehouse and Business
5. Trends in Data Warehousing
1. Definition of Data Warehouse
1.1 Quick Question

What Is Database ?

Business
Data
Goals
1.1 Quick Question (Cont..)

Can database
answer
questions like
these?
1.2 The Question
• What is the cost of staff to break into a new
line of business?
• What are the travel routes of my competition’s
inventory?
• At what velocity is my competitor moving
toward a common goal?
• How will a transaction on a certain date be
affected by currency exchange rates?
1.2 The Question (Cont..)
• Is a foreign labor source likely to produce a
higher quality product
• By product and location, how can we regain a
lost customer base?
• Which skill and staff levels are most likely to
accept the voluntary layoff package?
1.3 What is Data Warehouse?
• Defined in many different ways, but not rigorously.
– A decision support database that is maintained
separately from the organization’s operational
database
– Support information processing by providing a solid
platform of consolidated, historical data for analysis.
• “A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision-making process.” W. H. Inmon
• Data warehousing:
– The process of constructing and using data warehouses
1.3 What is Data Warehouse? (Cont..)
• A data warehouse is a pool of data organized in a
format that enables users to interpret data and
convert it into useful information to gain knowledge
from this interpretation.
• It is a single place that contains complete and
consistent data from multiple sources.
• Data warehousing is the act of a business person
extracting business value from the data stored in the
data warehouse.
2. Key Characteristics of a
Data Warehouse
2.1 Data Warehouse—Subject-Oriented
• Organized around major subjects, such as
customer, product, sales.
• Focusing on the modeling and analysis of data
for decision makers, not on daily operations or
transaction processing.
• Provide a simple and concise view around
particular subject issues by excluding data that
are not useful in the decision support process.
2.1 Data Warehouse Subject-Oriented
• DW organized around major subjects
– Insurance company: customer, premium,
claim
• Conventional database organized around
applications
– Insurance company: auto, health, life
2.2 Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous data
sources
– relational databases, flat files, on-line transaction
records
• Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different
data sources
• E.g., Hotel price: currency, tax, breakfast covered,
etc.
– When data is moved to the warehouse, it is converted.
2.3 Data Warehouse—Time Variant
• The time horizon for the data warehouse is
significantly longer than that of operational systems.
– Operational database: current value data.
– Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not
contain “time element”.
2.3 Data Warehouse—Time Variant
(Cont..)
Operational DB Data Warehouse

• key may / may not • key contains element


have element of of time
time • time horizon 5-10
• time horizon 60-90 years
days
2.4 Data Warehouse—Non-Volatile
• A physically separate store of data transformed
from the operational environment.
• Operational update of data does not occur in the
data warehouse environment.
– Does not require transaction processing,
recovery, and concurrency control mechanisms
– Requires only two operations in data
accessing:
• initial loading of data and access of data.
2.4 Data Warehouse Non-Volatile
(Cont..)
Data Warehouse Operational DB
Change Select

Delete Insert Load Access

• Data is not updated, • Data can be


updated
snapshot
• One record at a
• Loaded usually en time
masse
3. Data Warehouse
Architecture
Fig 1. Core elements of the Kimball DW/BI architecture

Sumber: Kimball Ralph., Ross Margy, 2013


Fig 1. Simplified illustration of the independent
data mart “architecture.”
Sumber: Kimball Ralph., Ross Margy, 2013
4. Data Warehouse and
Business
Relationship Data and Business

Data Information Knowledge Decision

QUANTITY QUALITY
• The DW/BI system must make information
easily accessible.
• The DW/BI system must present information
consistently.
• The DW/BI system must adapt to change
• The DW/BI system must present information
in a timely way.
• The DW/BI system must be a secure bastion
that protects the information assets.
• The DW/BI system must serve as the
authoritative and trustworthy foundation for
improved decision making.
• The business community must accept the
DW/BI system to deem it successful.
5. Trends in Data
Warehousing
• Several future trends in data warehousing
today are unstructured data, search,
service-oriented architecture, and
real-time data warehousing.
5.1 Unstructured data
• Unstructured data does not have a data
structure such as rows and columns , a tree-
like structure, or classes and types.
• Examples of unstructured data are
documents, images (photos, diagrams, and
pictures), audio (songs, speeches, and
sounds), video (films, animations), streaming
data, text, e-mails, and Internet web sites.
5.2 Search
• How we get the information out? The answer is by
searching. For get the information out of unstructured
data, especially text data such as documents, e-mails,
and web pages, you do a search.
• Like on the Internet, the search engine has already
crawled the data warehouse and indexed the
unstructured data.
• People collect their unstructured data in a huge
amount
• It easier to use, even for structured data.
5.3 Service-oriented Architecture
• SOA is a method of building an application using a
number of smaller, independent components that
talk to each other by offering and consuming their
services.
• a data ware-house system consists of many
components: source systems, ETL systems, a data
quality mechanism, a metadata system, audit and
control systems, a BI portal, a reporting application,
OLAP/analytic applications, data mining
applications, and the database system itself.
5.4 Real-time Data Warehousing
• A few years ago, a data warehouse was usually
updated every day or every week.
• A real-time data warehouse is a data
warehouse that is updated (by the ETL) the
moment the transaction happens in the
source system.
Summary
• A data warehouse is a subject-oriented,
integrated, time-variant, and nonvolatile
collection of data in support of management’s
decision-making process.
• Key Characteristics of a Data Warehouse are
Subject Oriented, integrated, time variant and
Non-Volatile.
• Most data warehouses are used for business
intelligence to enhance CRM and for data mining.
TERIMA KASIH

You might also like