Professional Documents
Culture Documents
8.1 Introduction
So for in the previous units we had a discussion about data warehousing
from this unit we are going to introduce you the data mining and knowledge
discovery from data. In the Previous semesters you have studied data base
management systems. These units are going to present you from a
database perspective, where emphasis is placed on basic data mining
concepts and techniques for uncovering interesting data patterns hidden in
large data sets. Data Mining is the process of analyzing data from different
perspectives and summarizing it into useful information information that
can be used to increase revenue, cut costs, or both. The implementation
methods discussed are particularly oriented toward the development of
scalable and efficient data mining tools. In this unit, you will learn how data
mining is part of the natural evolution of database technology, how it is
defined and why data mining is important, and. In addition to studying Data
Mining Technologies, you will also read about Data Mining Software tools.
Objectives:
After studying this unit, you should be able to:
explain the basics of Data Mining.
describe the relationship between Data mining and various Business
Intelligence tools like Data Warehousing, OLAP and Statistics.
discuss on data mining technologies
list data mining Software available in the market.
although the concept itself has been around for years. Data warehousing
represents an ideal vision of maintaining a central repository of all
organizational data. Data warehouse is an enabled relational database
system designed to support very large databases (VLDB) at a significantly
higher level of performance and manageability. Data warehouse is an
environment, not a product. It is an architectural construct of information that
is hard to access or present in traditional operational data stores.
Any organization or a system in general is faced with a wealth of data that is
maintained and stored, but the inability to discover valuable, often previously
unknown information hidden in the data, prevents it from transferring these
data into knowledge or wisdom.
To satisfy these requirements, the following steps needs to be considered,
1. Capture and integrate both the internal and external data into a
comprehensive view Mine for the integrated data information
2. Organize and present the information and knowledge in ways that
expedite complex decision making.
Data Mining has different applications in the industry. Some of the industries
to be mentioned is
Banking
Insurance
Credit marketing
Telecommunications
Pharmaceuticals
Bioinformatics
Some of the applications in the above mentioned industries include:
Identifying new customers
Predicting customer buying habits
Confirming suitable loan applicants
Revealing fraud
Relationship marketing
Managing equity portfolios
Diagnosing medical problems
Inventory management
Conducting certain aspects of Marketing
Customer segmentation
Web site design and promotion.
8.10 Summary
Data mining is concerned with finding hidden relationships present in
business data to allow businesses to make predictions for future use.
Data Mining is a multidisciplinary field drawing works from statistics,
database technology, artificial intelligence, pattern recognition, machine
learning, information theory, knowledge acquisition, information retrieval,
high-performance computing, and data visualization.
Data Mining consists of many up-to-date techniques such as Classification,
Clustering & Association. Data mining is a process, and its successful
application requires Data Preprocessing (dimensionality reduction, cleaning,
noise/outlier removal), post processing (understandability, summary,
presentation), good understanding of problem domains and domain
expertise. Data mining is also referred to as knowledge discovery in
databases (KDD).OLAP and Data Mining can complement each
other .OLAP stands for Online Analytical Processing Data Mining is a step in
the KDD (Knowledge Discovery Process) Process.
8.12 Answers
Self Assessment Questions
1. Historical
2. Meta data
3. Data
4. Data warehouse and data mining
5. Knowledge discovery
6. Decision support
7. Multiple
8. True
9. Genetic algorithms
10. MineSet is a software provides tools for searching, sorting, filtering and
drilling down enabling previously complex data models to be viewed
intuitively through real-time 3-D graphical representation
Terminal Questions
1. Data Mining is the process of analyzing data from different perspectives
and summarizing it into useful information - information that can be used
to increase revenue, cuts costs, or both. Refer section 8.2 and 8.8.
2. Online Analytical Processing (OLAP) is a technology that is used to
create decision support software. Refer section 8.6.
3. Data Mining is a multidisciplinary field drawing works from statistics,
database technology, artificial intelligence, pattern recognition, machine
learning, information theory, knowledge acquisition, information retrieval,
high-performance computing, and data visualization where as Data
warehousing is defined as a process of centralized data management
and retrieval. Refer section 8.4.
4. Artificial neural networks, Decision trees, Rule induction etc. Refer
section 8.8.
5. Data Mining is also referred to as knowledge discovery in databases
(KDD). Refer section 8.6.
6. i) Classification
ii) Clustering
iii) Association
7. Data Preprocessing involves dimensionality reduction, cleaning,
noise/outlier removal. Refer section (8.4)