You are on page 1of 10

QIS COLLEGE OF ENGINEERING & TECHNOLOGY

ONGOLE-523272.

DATA MINING AND DATA


WAREHOUSE

Dept. of Computer Science Engineering

PRESENTED BY:

P.NagaBhrmaChari N.Mahesh
(05491A0534) (05491A0532)
III-B.Tech, CSE III-B.Tech, CSE
E-Mail:nbc_style@yahoo.com E-Mail:mahesh532_qis@yahoo.com
Mobile: 9290296819 Mobile: 9393020972
ABSTRACT:
A data warehouse is a repository of integrated information, available for queries and
analysis. Data and information are extracted from non-homogeneous sources as they are
generated and processed using process managers (load/warehouse/query).This makes it much
easier and more efficient to run queries over data that originally came from different sources. It
also enables the people to take informed decisions.
Data mining draws from the data warehouse, revealing patterns of informationin
historical data, in terms of customer data or any other data in ways that we never thought
possible. It combines techniques like statistical analysis, data visualization, induction and neutral
networks. Data mining systems improve an organization’s effectiveness, efficiency and value by
increasing the usefulness of the knowledge the organization possesses.
Data warehousing & Data Mining- A View:
Data warehousing- Definition:Data warehouses are built to support large cost-effective data
volumes (above 100GB of database) which can be a relational database, multidimensional
database, flat file, hierarchical database, object database, etc.
Data warehouse- goals:
The fundamental to enable user’s appropriate access to a homogenized and
comprehensive view of organization. It also supports forecasting planning and decision making
processes. In additional goal is to achieve information consistency provide security and
adaptability.
Data warehouse-process flow:
The process flow is represented as follows:

1 Extract and load the data: Data extraction involves extracting the data from source systems
and makes it available to the data warehouse where as data load takes extracted data and
loads it into the data warehouse.
Clean and transform data: It performs the consistency checks on the loaded data, and then
structures it for query performance and for minimizing the operational costs.
2 Back up and archive data: The data is being backed up regularly and also older data is
removed from the system in a format that allows it to be quickly restored if required.
3 Query management: It manages the queries and speeds them up by directing queries to the
most effective data source and also monitor the actual query profiles.

Data warehouse architecture:


The architecture is made up of a number of inter- connected parts:
1 Operational database/External database layer: Operational systems process data to
support critical operational needs. To do that, operational databases have been historically
created to provide an efficient processing structure for a relatively small number of well-
defined business transactions.
2 Information Access Layer: This is the layer that the end-user deals with directly. In
particular, it represents the tools that the end-user normally uses day to day.
e.g.: Excel, Lotus 1-2-3, etc.
3 Data Access Layer: The Data Access Layer of the Data Warehouse Architecture is involved
with allowing the information Layer to talk to the Operational Layer.
4 Data Directory (Meta-data) Layer: Meta-data is the data about data with in the enterprise.
Record description in a COBOL program is Meta-data.

Data Warehouse Architecture


1 Process Management Layer: The Process Management Layer is involved in scheduling the
various tasks that must be accomplished to build and maintain the data warehouse and data
directory.
2 Application Messaging Layer: The Application Message Layer has to do with transporting
information around the enterprise-computing network. Application Message is also referred
to as “Middleware”, but it can involve those just networking protocols.
3 Data Warehouse (physical) layer: The (core) Data Warehouse is where the actual data used
primarily for informational uses occur. In a Physical Data Warehouse, copies, in some cases
many copies, of operational and or external data are actually stored in form that is easy to
access.
4 Data Stating Layer: Data staging is also called copy management or replication
management, but in fact, it includes all of the processes necessary to select, edit, summarize,
combine and load data warehouse and information access data from operational and/or
external databases.

Data Warehouse-applications: Role of data warehouse in various application areas.


1. Marketing solutions: Marketing database, customer loyalty scheme & profiling, etc.
2. Retail: sales analysis, shrinkage analysis, promotion analysis, space planning.
3. Insurance: Product profitability analysis, orphan analysis.
4. Telephone companies: individual terrifying through call analysis, network analysis.
5. Retail banking: customer profitability analysis, customer scoring/loan decision.
Data Warehouse-Future developments:
Data warehousing is such a new field that it is difficult to estimate what new
developments are likely to most affect it. Clearly, the development of parallel DB servers with
improved query engines is likely to be one of the most important. Another new technology is data
warehouses that allow for the mixing of traditional numbers, text and multi-media. The
availability of improved tools for data visualization (business intelligence) will allow users to see
things that could never be seen before.
Data mining – definition:
Data mining, “the extraction of hidden predictive information from large databases”, is a
powerful new technology with great potential to help companies focus on the most important
information in their data warehouses. Data mining tools predict future trends and behaviors,
allowing business to make proactive, knowledge-
driven decisions. The automated, prospective analyses offered by data mining move beyond the
analyses of past events provided by retrospective tools typical of decision support systems.

Data mining – Supporting Technologies:


Data mining can be applied in the business field, because it is supported by three
technologies.
Massive data collection: Databases are growing at unprecedented rates and can be larger than
expected.
Powerful multiprocessor computers: The need for improved computational engines can now be
met.
Data mining algorithms: They have been implemented as mature, reliable, understandable tools.
Data Mining:
Scope: Given databases of sufficient size and quality, data mining technology can generate new
business opportunities by providing these capabilities:
Automated prediction of trends and behaviors: Example: predictive problem is targeted
marketing.
Automated discovery of previously unknown patters: Data mining tools sweep through
databases and identify previously hidden patterns in one step. An example is the analysis of retail
sales.
Data Mining-Algorithms: Some of the most common data mining algorithms in use today are
two sections based on when the technique was developed and when it became ready to be used.
1. Classical Techniques: Statistics, neighborhoods and clustering that have been used for
decades.
Statistics: These are data driven and are used to discover patterns and build predictive models.
(a)Histograms: One of the best ways to summarize data is to provide a histogram of the data.
Ex : Representing the majority of customers that are over the age of 50.

Figure – depicts customers of different ages.


(b)Linear regression: In statistics, prediction is usually synonymous with regression of
some form. The simplest form of regression is simple linear regression that just contains one
predictors and a prediction. The relationship between the two can be mapped on a two

dimensional space.

Clustering: It is the method by which like records are grouped together. Usually this is done to
give the end user a high level view of what is going on in the database. There are mainly two
types.
Hierarchical and Non-Hierarchical Clustering: The hierarchy of clusters is usually viewed
as a tree where the smallest clusters merge together to create the next highest level of clusters
and so on.

Hierararchy of clusters elongated clusters


2. Next Generations Techniques: They represent techniques such as Trees, Networks and Rules
that have only been widely used since the early 1980’s.
Neural Networks: Neural networks consist of a number of neurons that are
interconnected--often in complex ways--and then organized into layers. Neurons are very simple
processing units that compute a linear combination of a number of inputs and then perform a
simple mathematical process on the result to produce an output.
Data Mining - Working Procedure: While large-scale information technology has been
evolving separate transaction and analytical systems, data mining provides the link between the
two. Data mining software analyzes relationships and patterns in stored transaction data based on
open-ended user queries.. Generally, any of four types of relationships are sought:

• Classes: Stored data is used to locate data in predetermined groups. For example, a
restaurant chain could mine customer purchase data to determine when customers visit
and what they typically order. This information could be used to increase traffic by
having daily specials.

• Clusters: Data items are grouped according to logical relationships or consumer


preferences. For example, data can be mined to identify market segments or consumer
affinities.

• Associations: Data can be mined to identify associations. The beer-diaper example is an


example of associative mining

• Sequential patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood of a hiking shoes.

Data Mining – Architecture:


To best apply these advanced techniques, they must be fully integrated with data
Warehouse as well as flexible interactive business analysis tools. Many data mining tools
currently operate outside of the Warehouse, requiring extra steps for extracting, importing and
analyzing the data. Furthermore, when new insights require operational implementation,
integration with the warehouse simplifies the application of results from data mining. The
resulting analytical data warehouse can be applied to improve business processes throughout the
organization. The following figure illustrates architecture for advanced analysis in a large data
warehouse.
The ideal starting point is a data warehouse containing a combination of internal data
tracking all customer contact coupled with external market data about competitor activity.
Background information on potential customers also provides an excellent basis for prospecting.
This warehouse can be implemented in a variety of relational database systems: Sybase, Oracle,
Redbrick and so on.
An OLAP (On-Line Analytical Processing) server enables amore sophisticated end-user
business model to be applied when navigating the data ware house. The multidimensional
structures allow the user to analyze the data as they want to view their business. The Data Mining
Server must be integrated with the data warehouse and the OLAP server to embed ROI-focused
business analysis directly into this infrastructure. As the warehouse grows with new decisions and
results, the organization can continually mine the best practices and apply them to future
decisions.
Data mining – Applications:
Some successful application areas:
1. A pharmaceutical company can analyze its recent sales and can determine which
marketing activities will have the greatest impact in future.
2. A credit card company using a small test mailing can identify the customer attributes.
3.A diversified transportation comp-any can apply data mining to identify the best prospects.
4. A large consumer package goods company can apply data mining to improve its sales
process to retailers.

Conclusions- Data Warehousing & Data Mining:


All large organizations already have data warehouses, but they are just not managing
them. In order to get most out of this period, the data warehouse planners and developers must
have a clear idea of what they are looking for and then choose strategies and methods that will
improve the performance and flexibility.
There is a growing gap between more powerful storage and retrieval systems and the
users’ ability of effectively analyzing them. As seen, both relational and OLAP technologies are
used for navigating massive data warehouses. Quantifiable business benefits have been proven
through the integration of data mining with current information systems, and new products are on
the horizon that will bring this integration to an even wider audience of users.

BIBILIOGRAPHY
1. www.kdnuggets.com
2. www.ultragem.com
3. info.gte.com/kdd/
4. www.google.com
5. Data Base Management Systems by RaghuRamaKrishnan
6. Data Mining Techniques.

You might also like