Professional Documents
Culture Documents
Ashish Mangalampalli Advisor: Dr. Vikram Pudi Centre for Data Engineering International Institute of Information Technology (IIIT) Hyderabad
1
Outline
Introduction
Crisp and Fuzzy Associative Classification Pre-Processing and Mining
FACISME Fuzzy Adaption of ACME (Maximum Entropy Associative Classifier) Simple and Effective Associative Classifier (SEAC) Fuzzy Simple and Effective Associative Classifier (FSEAC)
Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC) Associative Classifier for Ad-targeting
Conclusions
2
Introduction
Associative classification
Mines huge amounts of data Integrates Association Rule Mining (ARM) with Classification
A = a, B = b, C = c X = x
Frequent itemsets capture dominant relationships between items/features Statistically significant associations make classification framework robust Low-frequency patterns (noise) are eliminated during ARM Rules are very transparent and easily understood
Unlike black-box-like approach used in popular classifiers, such as SVMs and Artificial Neural Networks
Outline
Introduction
Simple and Effective Associative Classifier (SEAC) Fuzzy Simple and Effective Associative Classifier (FSEAC)
Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC) Associative Classifier for Ad-targeting
Conclusions
4
Most real-life datasets contain binary and numerical attributes Use sharp partitioning Transform numerical attributes to binary ones, e.g. Income = [100K and above]
Introduces uncertainty, especially at partition boundaries Small changes in intervals lead to misleading results Gives rise to polysemy and synonymy Intervals do not generally have clear semantics associated
Up to 20K, 20K-100K, 100K and above Income = 50K would fit in the second partition But, so would Income = 99K
Fuzzy logic
Used to convert numerical attributes to fuzzy attributes (e.g. Income = High) Maintains integrity of information conveyed by numerical attributes Attribute values belong to partitions with some membership - interval [0, 1]
Outline
Simple and Effective Associative Classifier (SEAC) Fuzzy Simple and Effective Associative Classifier (FSEAC)
Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC) Associative Classifier for Ad-targeting
Conclusions
7
Fuzzy pre-processing
Convert crisp dataset (binary and numerical attributes) into fuzzy dataset (binary and fuzzy attributes) FPrep Algorithm used
Web-scale datasets mandate such algorithms Fuzzy Apriori is most popular Many efficient crisp ARM algorithms exist like ARMOR and FP-Growth Algorithms used
FAR-Miner for normal transactional datasets FAR-HD for high dimensional datasets
Outline
Simple and Effective Associative Classifier (SEAC) Fuzzy Simple and Effective Associative Classifier (FSEAC)
Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC) Associative Classifier for Ad-targeting
Conclusions
13
Processed using additional (greedy) algorithms like FOIL and PRM Overhead in running time; process more complex
Exhaustive approach
Rule pruning and ranking take care of huge volume and redundancy
Global rule-mining and training Local rule-mining and training Provides better accuracy and representation/coverage
Pre-processing to generate fuzzy dataset (for fuzzy associative classifiers) using FPrep Classification Association Rules (CARs) mining using FAR-Miner or FAR-HD CARs pruning and classifier training using SEAC or FSEAC Rule ranking and application (scoring) techniques
15
Direct mining of CARs faster and simpler training CARs used directly through effective pruning and sorting Pruning and rule-ranking based on
Two-phased manner
16
SEAC - Example
Example Dataset
Scoring Example Unlabeled: B=2, C=2 X=1 16, 17, 19 (IG=0.534) X=2 13, 14, 20 (IG=0.657)
Ruleset
17
CARs pruned based on Fuzzy Information Gain (FIG) and rule length - no sorting required
Scoring rules applied taking into account
18
FSEAC - Example
Example Dataset
19
Ruleset
20
SEAC
12 classifiers (Associative and non-associative) 14 UCI ML datasets 100-5000 records per dataset 2-10 classes per dataset Up to 20 features per dataset 10-fold Cross Validation
FSEAC
17 classifiers (Associative and non-associative; fuzzy and crisp) 23 UCI ML datasets 100-5000 records per dataset 2-10 classes per dataset Up to 60 features per dataset 10-fold Cross Validation
21
continued
22
23
continued 24
25
Outline
Introduction
Simple and Effective Associative Classifier (SEAC) Fuzzy Simple and Effective Associative Classifier (FSEAC)
Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC) Associative Classifier for Ad-targeting
Conclusions
26
Speeded-Up Robust Features (SURF) - interest point detector and descriptor for images Fuzzy clusters used as opposed to hard clustering used in Bagof-words
CN = U CP
Other AC algorithms use third-party algorithms for rulegeneration from frequent itemsets Top k rules are used for scoring and classification
ICPR 2010
27
I-FAC
FCM applied to derive clusters Clusters (with s) used to generate dataset for mining
ARM generates Classification Association Rules (CARs) associated with positive class CARs are pruned and sorted using
Fuzzy Information Gain (FIG) of each rule Length of each rule i.e. number of attributes in each rule
Fuzzy nature helps avoid polysemy and synonymy Uses only positive class for training
30
ICPR 2010
38 visual concepts e.g. car, sky, clouds, water, building, sea, face
Experimental evaluation
First 10K images of MIR Flick dataset AUC values for each concept
31
continued 32
33
Display-ad targeting currently done using methods which rely on publisher-defined segments like Behavior-targeting (BT) Look-alike model trained to identify similar users
Similarity is based on historical user behavior Model iteratively rebuilt as more users are added Advertiser supplies seed list of users
Complements publisher defined segments such as BT Provides advertisers control over the audience definition
Given a list of target users (e.g., people who clicked or converted on a particular category or ad campaign), find other similar users.
34 WWW 2011
Feature-pairs modelled as AC rules Only rules for positive class used Works well in Tail Campaigns
FLLR = P(f) log(P(f | conv) / P(f | non-conv)) Rules sorted in descending order by F-LLRs
35
Performance Study
300K records each One record per user Training window - 14 days Scoring window - seven days
Baseline
Random Targeting Linear SVM GBDT
Lift (AUC)
11% 2%
-12% -40%
-6% -14%
Outline
Simple and Effective Associative Classifier (SEAC) Fuzzy Simple and Effective Associative Classifier (FSEAC) Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC) Associative Classifier for Ad-targeting
Conclusions
37
Conclusions
38
References
Ashish Mangalampalli, Adwait Ratnaparkhi, Andrew O. Hatch, Abraham Bagherjeiran, Rajesh Parekh, and Vikram Pudi. A Feature-Pair-based Associative Classification Approach to Look-alike Modeling for Conversion-Oriented User-Targeting in Tail Campaigns. In International World Wide Web Conference (WWW), 2011. Ashish Mangalampalli, Vineet Chaoji, and Subhajit Sanyal. I-FAC: Efficient fuzzy associative classifier for object classes in images. In International Conference on Pattern Recognition (ICPR), 2010. Ashish Mangalampalli and Vikram Pudi. FPrep: Fuzzy clustering driven efficient automated pre-processing for fuzzy association rule mining. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2010. Ashish Mangalampalli and Vikram Pudi. FACISME: Fuzzy associative classification using iterative scaling and maximum entropy. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2010. Ashish Mangalampalli and Vikram Pudi. Fuzzy Association Rule Mining Algorithm for Fast and Efficient Performance on Very Large Datasets. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2009.
39
40