Professional Documents
Culture Documents
Importance in IT Industry
JOCELYN B. BARBOSA, CoE, MSIT
IT Faculty
Data , Data everywhere..
The Information Age (also known as the Computer Age, Digital Age, or New
Media Age) is a period in human history characterized by the shift from
traditional industrialization, to an economy based on information
computerization.
2
Data: stored representations of meaningful
objects and events
Structured: numbers, text, dates
Unstructured: images, video, documents
control
find
access
Personal Details
Emails
Videos
Social Networks
To collect, Documents
Store and
Contacts
process
Instant messages
There has been a huge increase in the
amount of data being stored in database
over past twenty years.
9
What is Data mining?
Data mining refers to extracting or
mining knowledge from large amount of
data. - By Jiawei Han, Micheline Kamber,
10
What is Data mining?
Searching through large amounts of data for
correlations, sequences, and trends.
Sales data
C luster
70% of
Sequence customers who
purchase
comforters later
C lassify purchase
Inference curtains
What is Data Mining?
History
Knowledge Discovery in Databases workshops started 89
Now a conference under the auspices of ACM SIGKDD
IEEE conference series starting 2001
Key founders / technology contributers:
Usama Fayyad, JPL (then Microsoft, now has his own company,
Digimine)
Gregory Piatetsky-Shapiro (then GTE, now his own data mining
consulting company, Knowledge Stream Partners)
Rakesh Agrawal (IBM Research)
The term data mining has been around since at least 1983
-- as a pejorative term in the statistics community
Knowledge Discovery in Databases:
Process
Interpretation/
Evaluation
Preprocessing
Patterns
Selection
Preprocessed
Data
Data
Target
Data
adapted from:
U. Fayyad, et al. (1995), From Knowledge Discovery to Data
Mining: An Overview, Advanced in Knowledge Discovery and Data
Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press
Data mining focuses on extraction of information from a large set of data and
transforms it into an easily interpretable structure for further use.
Why Data mining?
14
Example of Data mining in IT:
15
Management has the high interest in solving the
problem because the response time of the site directly
affects the online sales, IT department has already
looked in to log files with the help of mining tool.
16
The patterns for this problem is determined by a combination
of country code , operating system, and the version of a
browser that the customers use.
17
Speed of internet and cookie settings also lead to
greater time for loading the site. This is how data
mining helps in indentifying the problem and
hence solving the problem becomes quick.
18
What Can Data Mining Do?
19
Predictive Data mining
The objective of predictive tasks is to use the
values of some variable to predict the values
of other variable.
20
Classification
Classification is used to map data in a predefined groups.
Find ways to separate data Route documents to
items into pre-defined groups
We know X and Y belong most likely
together, find other things in
same group interested parties
Requires training data: Data English or non-
items where group is known
english?
Uses:
Profiling Domestic or Foreign?
Technologies: Training Data
Generate decision trees
(results are human tool produces
understandable)
Neural Nets Groups
classifier
Classification Example
Clustering
Most customers who bought the nappies also bought the beer.
Descriptive Data mining
29
Application Area of Data mining
30
Marketing /Retail :
32
Application Area of Data mining
(contd)
In business,
Data mining is useful for discovering patterns and
relationships in data to help make better decisions.
33
Application Area of Data mining
(contd)
Data Mining application in Medical Image Classification or Medical
Diagnosis
35
Application Area of Data mining
(contd)
36
Example : Facial Paralysis Assessment
(FP classification and grading)
Facial
movement :
whistling
Facial
movement:
screwing of
nose/snarl
Calculation of the distance ratio and the iris area ratio during snarl
activity or screwing of nose
Classification
Classification
0
0
0
0
0
0
0
0
1
1
1
1
?
Training Data
tool produces
Groups
classifier
Advantages of Data Mining
In Marketing / Retail :
Finance / Banking
By building a model from historical
customers data of loans, the bank officials
and financial institution can determine
good and bad loans.
43
Advantages of Data Mining
Manufacturing
Data mining is useful in operational
engineering data, which can detect faulty
equipments and determines optimal control
parameters.
Governments
Data mining helps in building patterns that can detect
money laundering or criminal activities.
44
Specific use of data mining include:
45
Direct marketing - Direct marketing identifies which
prospects should be included to obtain the highest
response rate.
Interactive marketing - It is useful for predicting what
each user on a Web site is most likely interested in seeing.
Market basket analysis - It helps to understand what
products or services are commonly purchased together.
Trend analysis - Trend analysis identifies the difference
between a typical customer this month and last.
46
Disadvantages of data mining
Privacy Issues
Information might be collected and used in unethical
way which can potentially cause a lot of troubles.
47
Security issues
Security is the biggest concern in data mining.
Businesses own all the information of their
employees which even includes personal and
financial information, there are the chances of
misusing data by hackers and which cause
serious trouble to the organization and its
employees.
48
In Data mining,
49
50