You are on page 1of 10

Paper Presentation

DATA WARE HOUSING


AND
DATA MINING

PRESENTED BY
N.Mukesh Kumar M.Jayalaxmi
03511A0522 03511A0515

DEPARTMENT OF COMPUTER SCIENCE


(C.S.E-3/4)

SISTAM COLLEGE OF ENGINEERING

SRIKAKULAM
Introduction
Online analytical processing (OLAP) and Data mining represents some of the latest trends in
computing environment and I.T applications to large scale processing and analysis of data
Data Warehouse along with OLAP tools are being increasingly developed to analyze
historical data to identify past patterns or trends which may be useful in forecasting future.

What Is Data warehousing?

A data warehouse is an integrated store of information collected from other systems that
becomes the foundation for decision support and data analysis. Although there are many types of
data warehouses, based on different design methodologies and philosophical approaches, they all
have these common traits:

• Information is organized around the major subjects of the enterprise (for example, customers,
products, sales, or vendors), reflecting a data-driven design. Raw data is gathered from
nonintegrated operational and legacy applications, cleansed, and then summarized and
presented in a way that makes sense to end users.
• Based on feedback from end users and discoveries in the data warehouse, the data warehouse
architecture will change over time, reflecting the iterative nature of the process.
• The data warehousing process is inherently complex and, as a result, is costly and time-
consuming.

Why to build a data warehouse?

Business strategy requires answers to questions in business and future strategy. This means
that the decisions are to be taken quickly and correctly using all the available data. As the data size
increases continuosly and data is required to be processsed faster and faster the need for data
warehousing technology arises in terms of ability to organise, maintain large data and also be able to
analyse in few seconds in the manner and depth required.There are conventional information systems
which didn’t succeed in meeting these requirements. Conventional and data warehousing tackle two
different activity domains-OLTP & OLAP.
Besides this conventional information system is not capable of analyzing a large number of
past transactions or large number of data records.

The cost of processing the data also increases as the volume of data increases .As a result
the analyst finds data mart extremely useful for fast and easy analysis. Since the data flows into data
mart (from the Data Warehouse), the department which owns it can easily customize the data.
Data marts are of 2 types:
1. Multi dimensional OLAP (MOLAP)
2. Relational OLAP (ROLAP)

Data Warehousing Framework

The goal of the Data Warehousing Framework is to simplify the design, implementation, and
management of data warehousing solutions. This framework has been designed to provide:
• Open architecture that is easily integrated with and extended by third-party vendors.
• Heterogeneous data import, export, validation, and cleansing services with optional data
lineage.
• Integrated metadata for data warehouse design, data extraction/transformation, server
management, and end-user analysis tools.
• Core management services for scheduling, storage management, performance monitoring,
alerts/events, and notification.

Determining Business, User, and Technical Requirements


Before a data warehouse can be built, a detailed project and implementation plan should be
written. The project and implementation plan includes:
• Building a business case.
• Gathering user requirements.
• Determining the technical requirements.
• Defining standard reports required by users.
• Analyzing client application tools being used.

Building the business case involves determining the business needs solved by the project, the
costs of the project, and the return on the investment.
The user requirements determine:
• Data requirements (level of granularity).
• Operational systems within the enterprise containing the data.
• Business rules followed by the data.
• Queries required providing the users with data.

The technical requirements may involve determining:


• Hardware architecture and infrastructure.
• Backup and recovery mechanisms.
• Security guidelines.
• Methods of loading and transforming data from operational systems to the data warehouse.

Data Warehousing Framework Components

Building the data warehouse requires a set of components for describing the logical and
physical design of the data sources and their destinations in the enterprise data warehouse or data
mart.
To conform to definitions laid out during the design stage, operational data must pass
through a cleansing and transformation stage before being placed in the enterprise data warehouse or
data mart. This data staging process can be many levels deep, especially with enterprise data
warehousing

Architectures, but is necessarily simplified in this illustration.


End-user tools, including desktop productivity products, specialized analysis products, and
custom programs, are used to gain access to information in the data warehouse. Ideally, user access is
through a directory facility that enables end-user searches for appropriate and relevant data to resolve
questions and that provides a layer of security between the end users and the data warehouse systems.
Finally, a variety of components can come into play for the management of the data
warehousing environment, such as for scheduling repeated tasks and managing multiserver networks.
OLE DB provides for standardized, high-performance access to a wide variety of data, and
allows for integration of multiple data types.
Microsoft Repository provides an integrated metadata repository that is shared by the various
components used in the data warehousing process. Shared metadata allows for the transparent
integration of multiple products from a variety of vendors, without the need for specialized interfaces
between each of the products:
• Mainframe indexed sequential access method/virtual storage access method (ISAM/VSAM)
and hierarchical databases
• E-mail and file system stores
• Text, graphical, and geographical data
• Custom business objects

Data Warehousing Architecture


Many methodologies have been proposed to simplify the information technology efforts
required to support the data warehousing process on an ongoing basis. This has led to debates about the
best architecture for delivering data warehouses in organizations.

Two basic types of data warehouse architecture exist: enterprise data warehouses and data
marts.

The enterprise data warehouse contains enterprise-wide information integrated from multiple
operational data sources for consolidated data analysis. Typically, it is composed of several subject
areas, such as customers, products, and sales, and is used for both tactical and strategic decision
making. The enterprise data warehouse contains both detailed point-in-time data and summarized
information, and can range in size from 50 gigabytes (GB) to more than 1 terabyte. Enterprise data
warehouses can be very expensive and time-consuming to build and manage. They are usually created
from the top down by centralized information services organizations.
Enterprise data warehouses and data marts are constructed and maintained through the same
iterative process described earlier. Furthermore, both approaches share a similar set of technological
components.

Data Marts

A data mart is typically defined as a subset of the contents of a data warehouse, stored within
its own database. A data mart tends to contain data focused at the department level, or on a specific
business area. The data can exist at both the detail and summary levels. The data mart can be
populated with data taken directly from operational sources, similar to a data warehouse, or data taken
from the data warehouse itself. Because the volume of data in a data mart is less than that in a data
warehouse, query processing is often faster.

Characteristics of a data mart include


• Quicker and simpler implementation.
• Lower implementation cost.
• Needs of a specific business unit or function met.
• Protection of sensitive information stored elsewhere in the data warehouse.
• Faster response times due to lower volumes of data.
• Distribution of data marts to user organizations.
• Built from the bottom upward.
Building a Data Warehouse from Data Marts

Data warehouses can be built using a top-down or bottom-up approach. Top-down describes
the process of building a data warehouse for the entire organization, containing data from multiple,
heterogeneous, operational sources. The bottom-up approach describes the process of building data
marts for departments, or specific business areas, and then joining them to provide the data for the
entire organization. Building a data warehouse from the bottom-up, by implementing data marts, is
often simpler because it is less ambitious.
A common approach to using data marts and data a warehouse involves storing all detail data
within the data warehouse, and summarized versions within data marts. Each data mart contains
summarized data per functional split within the business, such as sales region or product group, further
reducing the data volume per data mart.

Data Mart Considerations

Data marts can be useful additions or alternatives to the data warehouse, but issues to
consider before implementation include:
• Additional hardware and software.
• Time required populating each data mart regularly.
• Consistency with other data marts and the data warehouse.
• Network access (if each data mart is located in a different geographical region).

OLAP:
Online Analytical Processing (OLAP) systems, contrary to regular conventional OLTP
systems are capable of analyzing online a large number of transactions or large number of data records
(ranging from mega bytes to tera bytes).This type of data is usually multi dimensionality is the key
driven for OLAP technology, which happens to be central to data warehousing.
Any multidimensional data can’t be processed by conventional SQL type DBMS. For
complex real world problems the data is usually multi dimensional in nature.
SQL will not be capable of handling it effectively even one can manage to put that data in a
conventional relational database in normalized table

Integrated OLAP Analytical Capabilities

OLAP is an increasingly popular technology that can dramatically improve business analysis.
Historically, OLAP has been characterized by expensive tools, difficult implementation, and inflexible
deployment. OLAP Services is a new, fully featured OLAP capability provided as a component of
SQL Server 7.0. OLAP Services includes a middle-tier server that allows users to perform
sophisticated analysis on large volumes of data with exceptional results. OLAP Services also includes
a client-side cache and calculation engine called Microsoft PivotTable Service, which helps improve
performance and reduce network traffic. PivotTable Service allows end users to conduct analyses
while disconnected from the network.
OLAP Services is a middle-tier OLAP server that simplifies user navigation and helps
improve performance for queries against information in the data warehouse.
OLAP is a key component of data warehousing, and OLAP Services provides essential
functionality for a wide array of applications ranging from reporting to advanced decision support.
OLAP functionality within SQL Server 7.0 helps make multidimensional analysis much more
affordable and bring the benefits of OLAP to a wider audience, from smaller organizations to groups
and individuals within larger corporations. Coupled with the wide variety of tools and applications
supporting OLAP applications through Microsoft OLE DB for OLAP, OLAP Services helps increase
the number of organizations that have access to sophisticated analytical tools and can help reduce the
costs of data warehousing.
For more information about Microsoft SQL Server OLAP Services, see SQL Server Books
Online.

Data Warehousing and OLAP

DTS can function independent of SQL Server and can be used as a stand-alone tool to
transfer data from Oracle to any other ODBC or OLE DB-compliant database. Accordingly, DTS can
extract data from operational databases for inclusion in a data warehouse or data mart for query and
analysis.
In the illustration, the transaction data resides on an IBM DB2 transaction server. A package
is created using DTS to transfer and clean the data from the DB2 transaction server and to move it into
the data warehouse or data mart. In this example, the relational database server is SQL Server 7.0, and
the data warehouse is using OLAP Services to provide analytical capabilities. Client programs (such
as Excel) access the OLAP Services server using the OLE DB for OLAP interface, which is exposed
through a client-side component called Microsoft PivotTable service. Client programs using
PivotTable service can manipulate data in the OLAP server and can even change individual cells.

Data warehousing Components

A data warehouse always consists of a number of components, including:


• Operational data sources.
• Design/development tools.
• Data extraction and transformation tools.
• Database management system (DBMS).
• Data access and analysis tools.
• System management tools.

Several years ago, Microsoft recognized the need for a set of technologies that would integrate
these components. This led to the creation of the Microsoft Data Warehousing Framework, a roadmap
not only for the development of Microsoft products such as SQL Server 7.0, but also for the
technologies necessary to integrate products from other vendors.

Data warehousing Characteristics

A data warehouse can assist decision support and online analytical processing (OLAP)
applications because it provides data that is:
• Consolidated and consistent.
• Subject-oriented.
• Historical.

Consolidated and Consistent Data


A data warehouse consolidates operational data from a variety of sources with consistent
naming conventions, measurements, physical attributes, and semantics.
For example, in many organizations, applications can often use similar data in different
formats: dates can be stored in Julian or Gregorian format; true/false data can be represented as
one/zero, on/off, true/false, or positive/negative.
Data should be stored in the data warehouse in a single, acceptable format agreed to by
business analysts, despite variations in the external operational sources. This allows data from across
the organization, such as legacy data on mainframes, data in spreadsheets, or even data from the
Internet, to be consolidated in the data warehouse, and effectively cross-referenced, giving the analysts
a better understanding of the business.

Subject-oriented Data

Operational data sources across an organization tend to hold a large amount of data about a
variety of business-related functions, such as customer records, product information, and so on.
However, most of this information is also interspersed with data that has no relevance to business or
executive reporting, and is organized in a way that makes querying the data awkward. The data
warehouse organizes only the key business information from operational sources so that it is available
for business analysis.

Historical Data

Data in OLTP systems correctly represents the current value at any moment in time. For
example, an order-entry application always shows the current value of stock inventory; it does not
show the inventory at some time in the past. Querying the stock inventory a moment later may return a
different response. However, data stored in a data warehouse is accurate as of some past point in time
because the data stored represents historical information.
The data stored in a data warehouse typically represents data over a long period of time;
perhaps up to ten years or more. OLTP systems often contain only current data, because maintaining
large volumes of data used to represent ten years of information in an OLTP system can affect
performance. In effect, the data warehouse stores snapshots of a business’s operational data generated
over a long period of time. It is accurate for a specific moment in time and cannot change. This
contrasts with an OLTP system where data is always accurate and can be updated when necessary.

Designed Building a Data Warehouse and OLAP System


The steps required to build a data warehouse include:
• Determining business, user, and technical requirements.
• Designing and building the database.
• Extracting and loading data into the data warehouse.
• Designing and processing aggregations using OLAP tools.
• Querying and maintaining the data warehouse and LAP databases.
Data Granularity

A significant difference between an OLTP or operational system and a data warehouse is the
granularity of the data stored. An operational system typically stores data at the lowest level of
granularity: the maximum level of detail. However, because the data warehouse contains data
representing a long period in time, simply storing all detail data from an operational system can result
in an overworked system that takes too long to query.
A data warehouse typically stores data in different levels of granularity or summarization, depending
on the data requirements of the business. The different levels of summarization in order of increasing
granularity are:
• Current operational data
• Historical operational data
• Aggregated data
• Metadata

Current and historical operational data are taken, unmodified, directly from operational systems.

Aggregated, or summary, data is a filtered version of the current operational data.

Metadata does not contain any operational data, but is used to document the way the data
warehouse is constructed.

Process Flow Within A Data Warehouse


The processes are:
1. Extract and load data.
2. Clean and transform data into a form that can cope with large data volumes.
3. Back up and archive data.
4. Manage queries and direct them to appropriate data source.
Conclusion

So, Data Warehousing with OLAP technology is useful for handling large amounts of data .It
is also useful for analyzing past data for forecasting, and also for graphical analysis and handling
multidimensional data .This is useful for fast analysis rather than conventional databases with OLTP.

References:

Data warehousing in Real World


--DENNIS MURRAY
--SAM ANAHORY
Data warehousing with Microsoft SQL Server
--JAKE STURM
Data warehousing with Oracle
--Sima Yazdani
--Shirley S.Wons

www.olapreport.com

You might also like