Professional Documents
Culture Documents
Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas
2003, Prentice-Hall
Chapter 3 - 1
2003, Prentice-Hall
Chapter 3 - 3
2003, Prentice-Hall
Chapter 3 - 4
Making Accurate Predictions with Data Mining Although the literature contains statements such as data mining will allow us to predict who will buy a particular product, that is against human nature. In situations where data mining is used to predict response to a marketing campaign, only about 5% of the people selected as likely respondents actually do respond.
2003, Prentice-Hall Chapter 3 - 5
Making Accurate Predictions with Data Mining (cont.) Although the accuracy of predicting individual behavior is not so good, it is better than it seems, since direct marketing efforts often have hit rates of only about 1% without data mining.
2003, Prentice-Hall
Chapter 3 - 6
Multidimensional view Transparent to user Accessible Consistent reporting Client-server architecture Generic dimensionality
Dynamic sparse matrix handling Multiuser support Cross-dimensional ops Intuitive manipulation Flexible reporting Unlimited dimension and aggregation
Chapter 3 - 7
2003, Prentice-Hall
OLAP as Implemented
To date, it does not appear that any implementation exists that satisfies all 12 rules. Some people argue it might not even be possible to attain all of them. More recently, the term OLAP has come to represent the broad category of software technology that enables multidimensional analysis of enterprise data.
2003, Prentice-Hall Chapter 3 - 8
0.7
0.6
Sales
0.5
0.4 4 0.3 1 2 3
Product
Region
2 3
Chapter 3 - 9
2003, Prentice-Hall
Chapter 3 - 11
Classification methods
The goal is to discover rules that define whether an item belongs to a particular subset or class of data. For example, if we are trying to determine which households will respond to a direct mail campaign, we will want rules that separate the probables from the not probables. These IF-THEN rules often are portrayed in a tree-like structure.
2003, Prentice-Hall Chapter 3 - 13
Association Methods
These techniques search all transactions from a system for patterns of occurrence. A common method is market basket analysis, in which the set of products purchased by thousands of consumers are examined. Results are then portrayed as percentages; for example, 30% of the people that buy steaks also buy charcoal.
2003, Prentice-Hall
Chapter 3 - 14
Sequencing Methods
These methods are applied to time series data in an attempt to find hidden trends. If found, these can be useful predictors of future events. For example, customer groups that tend to purchase products tied-in with hit movies would be targeted with promotional campaigns timed to release dates.
2003, Prentice-Hall
Chapter 3 - 15
Clustering Techniques
Clustering techniques attempt to create partitions in the data according to some distance metric. The clusters formed are data grouped together simply by their similarity to their neighbors. By examining the characteristics of each cluster, it may be possible to establish rules for classification.
2003, Prentice-Hall Chapter 3 - 16
Data Mining Technologies (cont.) Decision trees these technologies are conceptually simple and have gained in popularity as better tree growing software was introduced. Because of the way they are used, they are perhaps better called classification trees.
2003, Prentice-Hall
Chapter 3 - 18
The Knowledge Discovery Search Process Table 3-2 contains a more detailed outline of the process, but the major steps are: Define the business problem and obtain the data to study it. Use data mining software to model the problem. Mine the data to search for patterns of interest.
2003, Prentice-Hall Chapter 3 - 19
the mining results and refine them by respecifying the model. Once validated, make the model available to other users of the DW.
2003, Prentice-Hall
Chapter 3 - 20
Once these are created, they are treated as tables in the database so they can be viewed and joined by other users.
2003, Prentice-Hall Chapter 3 - 22
2003, Prentice-Hall
Chapter 3 - 23
Web mining
Web mining is a special case of text mining where the mining occurs over a website. It enhances the website with intelligent behavior, such as suggesting related links or recommending new products. It allows you to unobtrusively learn the interests of the visitors and modify their user profiles in real time. They also allow you to match resources to the interests of the visitor.
2003, Prentice-Hall Chapter 3 - 24
Chapter 3 - 25
To make effective use of a rule, three numeric measures about that rule must be considered: (1) support, (2) confidence and (3) lift
2003, Prentice-Hall
Chapter 3 - 26
An Example
Rule: Green Peppers IMPLIES Bananas 1.37 Red Peppers IMPLIES Bananas 1.43 Yellow Peppers IMPLIES Bananas 1.17
Lift
Support
Confidence
3.77
85.96
8.58
89.47
22.12
73.09
The confidence suggests people buying any kind of pepper also buy bananas. Green peppers sell in about the same quantities as red or yellow, but are not as predctive.
2003, Prentice-Hall Chapter 3 - 28
Pizza
Milk Cola Chips Pretzels
2
1 2 0 0
1
3 1 1 1
2
1 3 0 1
0
1 0 1 0
0
1 1 0 2
Pizza and Cola sell together more often than any other combo; a cross-marketing opportunity? Milk sells well with everything people probably come here specifically to buy it.
2003, Prentice-Hall Chapter 3 - 31
2003, Prentice-Hall
Chapter 3 - 33
This might allow us to see what patterns new customers have versus old customers.
2003, Prentice-Hall
Chapter 3 - 34
Pizza
Milk Cola Chips Pretzels
2
1 2 0 0
1
3 1 1 1
2
1 3 0 1
0
1 0 1 0
0
1 1 0 2
2003, Prentice-Hall
Chapter 3 - 35
Taxonomies The presence of items not purchased very frequently is an obstacle to a good market basket analysis. One way to deal with this is to eliminate products that occur with a frequency less than some threshold. A better idea would be to try to form groups of products that fall below the threshold. Four flavors of popsicle occur 9% of the time all together, but no more than 3% individually.
2003, Prentice-Hall Chapter 3 - 36
2003, Prentice-Hall
Chapter 3 - 39
Visual Presentation
For any kind of high dimensional data set, displaying predictive relationships is a challenge. The picture on the previous slide uses 3-D graphics to portray the weather balloon data numbers in text Table 11-4. We learn very little from just examining the numbers . Shading is used to represent relative degrees of thunderstorm activity, with the darkest regions the heaviest activity.
2003, Prentice-Hall Chapter 3 - 40
A Bit of History
An early effort used sequences of twodimensional graphs to add depth. Current virtual reality programs allow the user to step through a data set. Try going to a realtors website and taking a tour of a house up for sale.
2003, Prentice-Hall
Chapter 3 - 41
2003, Prentice-Hall
Chapter 3 - 43
2003, Prentice-Hall
Chapter 3 - 45
2003, Prentice-Hall
Chapter 3 - 46
Note: On the live map, clicking on an area allows the user to drill down and see results for smaller areas.
2003, Prentice-Hall Chapter 3 - 48
Siftware -- Continued
Oracle A large suite of connectivity products allows transparent access to mainframe databases. Some major customers include John Alden Insurance, ShopKo Stores and Pacific Bell. Informix Associated Grocers uses Informix data warehousing products at the heart of its three-tier client-server system.
2003, Prentice-Hall Chapter 3 - 50
Siftware -- Continued
Sybase Sybase Warehouse WORKS is an integrated system designed around the four key functions in data warehousing. Silicon Graphics Data mining software is mated to 3-D visualization tools to allow users to fly through data.
IBM provides a number of decision support tools in its Information Warehouse Solutions.
2003, Prentice-Hall Chapter 3 - 51