Professional Documents
Culture Documents
Aggregation
• Combining two or more attributes (or objects) into a
single attribute (or object)
• Purpose
– Data reduction
– Reduce the number of attrib. or objects
– Change of scale
– Cities aggregated into regions, states, countries,
etc
– More “stable” data
– Aggregated data tends to have less variability
Lecture 5/ 03-08-09 1
Aggregation
Variation of Precipitation in Australia
Lecture 5/ 03-08-09 2
Motivation for Aggregation
• 1. Smaller datasets resulting from data reduction
require less memory and processing time.
• 2. Aggregation can act as a change of
scope/scale by providing high-level view of the
data instead of low-level view.
• For ex. Aggregating over store locations and
months gives a monthly, per store view rather
than of a daily, per item view of the store.
Lecture 5/ 03-08-09 3
• DISADVANTAGE of Aggregation
Lecture 5/ 03-08-09 4
Data Reduction :2. Sampling
• Sampling is the main technique employed for data
selection.
– It is often used for both the preliminary investigation
of the data and the final data analysis.
• Statisticians sample because obtaining the entire set of
data of interest is too expensive or time consuming.
• Sampling is used in data mining because processing the
entire set of data of interest is too expensive or time
consuming.
Lecture 5/ 03-08-09 5
Types of Sampling
• Simple Random Sampling
– There is an equal probability of selecting any particular item
• Sampling without replacement
– As each item is selected, it is removed from the population
• Sampling with replacement
– Objects are not removed from the population as they are
selected for the sample.
• In sampling with replacement, the same object can be picked up
more than once
Lecture 5/ 03-08-09 6
• Stratified Sampling:
– Population of different types with wide variety
of objects.
– Entire population is divided into stratas or pre
specified groups
– Random samples are picked up from them.
• Progressive/Adaptive sampling:
– Such sampling starts with small samples and
size of sample keeps on increasing till a
sufficient size is obtained.
Lecture 5/ 03-08-09 7
3.Dimensionality Reduction
• Purpose/Benefit:
– DM algos. work much better in low
dimensions.
– Helps to eliminate irrelevant features or
reduce noise.
– Avoid curse of dimensionality.
– More understandable DM model can be
obtained b’coz of less attributes.
– Reduce amount of time and memory required
by data mining algorithms.
– Allow data to be more easily visualized.
Lecture 5/ 03-08-09 8
Curse of dimensionality
• Data becomes sparse as dim. Increases,
in the space, it occupies.
• In clustering, concept of density and
distance bet. points becomes less
meaningful.
• This produces poor quality clusters or
results in poor classification results.
Lecture 5/ 03-08-09 9
Techniques for Dimensionality
reduction:
Lecture 5/ 03-08-09 10
PCA & SVD
• Linear algebra technique for continuous attributes
• Looks towards combination of attributes to find new
attributes (principal components) that are:
• SVD
– Similar and related to PCA.
Lecture 5/ 03-08-09 11
Dimensionality Reduction : PCA
• Goal is to find a projection that captures the
largest amount of variation in data
x2
x1
Lecture 5/ 03-08-09 12
Dimensionality Reduction : PCA
• Find the eigenvectors of the covariance
matrix
• The eigenvectors define the new space
Lecture 5/ 03-08-09 13
4.Feature Subset Selection
• Another way to reduce dimensionality of data
• Redundant features
– duplicate much or all of the information contained in one or
more other attributes
– Example: purchase price of a product and the amount of sales
tax paid
• Irrelevant features
– contain no information that is useful for the data mining task at
hand
– Example: students' ID is often irrelevant to the task of
predicting students' GPA
Lecture 5/ 03-08-09 14
Techniques
for
FSS
Embedded Filter
Brute-force Wrapper
Approaches Approaches
Approaches Approaches
Lecture 5/ 03-08-09 15
• Brute-force Approaches:
– Try all possible feature subsets as input to data
mining algorithm
• Embedded Approaches:
– Feature selection occurs automatically by DM algos.
– DM algos. itself decides which attribute is to be used
and which is to be left.
– Algos. for constructing Decision tree classifiers often
operate in this manner.
Lecture 5/ 03-08-09 16
• Filter approaches:
– Features are selected before data mining
algorithm is run.
– These appro. are independent of DM tasks.
– Filtering those attributes whose pairwise
correlation is low.
• Wrapper approaches:
– Use the data mining algorithm as a black box to
find best subset of attributes.
Lecture 5/ 03-08-09 17
Architecture of FSS
• Four steps in FSS:
– 1. A search strategy that generates new
subsets of feature.
– 2. A measure for evaluating a subset
– 3. A stopping crieteria
– 4. A validation procedure
Selected
attri.
Done Stopping Evaluation
criterion
Validation
Procedure
Not done
Search Subset of
Attributes Strategy attri.
Lecture 5/ 03-08-09 18
Wrapper Approaches Filter Approaches
Lecture 5/ 03-08-09 19
• The no. of subsets are usually very large, so it is
impractical to examine them all---------so some
stopping criterion must be employed.
• SC can be
• Dependent on no. of iterations.
• The value of the subset evaluation measure is optimal or
exceeds some threshold value.
• A subset of desired size has been obtained.
Lecture 5/ 03-08-09 20
5. Feature Creation
• Create new attributes that can capture the
important information in a data set much more
efficiently than the original attributes
Methodologies for
FEATURE
CREATION
Lecture 5/ 03-08-09 21
• Feature Extraction
Lecture 5/ 03-08-09 22
Mapping Data to a New Space
• A different view of data can reveal
interesting & important features.
• Time series data: contains periodic
patterns.
– If single periodic pattern without much noise---
pattern is easily detectable.
– Or it can be multiple periodic patterns with
good amount of noise.
– Such patterns are usually detected by
applying FT or WT.
Lecture 5/ 03-08-09 23
Mapping Data to a New Space
• Fourier transform
• Wavelet transform
Frequency
Lecture 5/ 03-08-09 24
Feature Construction
• Example
– A dataset consisting of information regarding
antique items like mass, volume etc.
– Suppose these items are made up of say
wood, clay, bronze, silver etc.
– DM task : Classify objects wrt the material
they are made of.
– Density=mass/vol, provides the accurate
classification
Lecture 5/ 03-08-09 25
6.Binarization
• Both cont. and discrete attr. may be
transformed to binary attr...Binari-zation .
• ‘m’ catego. values, then assign each original
value to an integer value in [0,m-1].
• binary digits are required to
represent these values.
n = log 2 man
• Consider ex., a catego. attri. With 5 values
{awful, poor, OK, good, great} would require
3 binary variables (n=log 5=2.321928)
2
Lecture 5/ 03-08-09 26
Table 2.5 conversion of a catego. Att. To 3 binary attr.
Cate. Integer x1 x2 x3
value value
awful 0 0 0 0
poor 1 0 0 1
OK 2 0 1 0
Good 3 0 1 1
Great 4 1 0 0
Lecture 5/ 03-08-09 27
Table 2.6 conversion of a catego. Att. To 5 binary attr.
Cate. Integer
value
x1 x2 x3
x4 x5 If no. of attr. Is
value
large, then first the
awful 0 1 0 0
0 0 no. of attr. Are to
poor 1 0 1 0
0 0 be reduced.
OK 2 0 0 1
0 0
Good 3 0 0 0 1 0
Great 4 0 0 0 0 1
Lecture 5/ 03-08-09 29
7.Variable Transformation
• It refers to transformation applied to the values of an attribute
(variable).
• Two types of Attr. Trans.:
– Simple functional Trans.:
• A simple mathe. func. Is applied to each value individually.
• If x is a variable then such func. Can be xk, log x, ex, sin x or |
x|.
– Normalization:
• Its goal is to make an entire set of values to have a particular
property.
Lecture 5/ 03-08-09 30