Data Reduction :1. Aggregation

Data Reduction :1.
Aggregation
• Combining two or more attributes (or objects) into a
single attribute (or object)
• Purpose
– Data reduction
– Reduce the number of attrib. or objects
– Change of scale
– Cities aggregated into regions, states, countries,
etc
– More “stable” data
– Aggregated data tends to have less variability
Lecture 5/ 03-08-09 1
Aggregation
Variation of Precipitation in Australia
Standard Deviation of Average Standard Deviation of Average

Monthly Precipitation Yearly Precipitation
Lecture 5/ 03-08-09 2
Motivation for Aggregation
• 1. Smaller datasets resulting from data reduction
require less memory and processing time.
• 2. Aggregation can act as a change of
scope/scale by providing high-level view of the
data instead of low-level view.
• For ex. Aggregating over store locations and
months gives a monthly, per store view rather
than of a daily, per item view of the store.
Lecture 5/ 03-08-09 3
• DISADVANTAGE of Aggregation
– May lose interesting and potential

details regarding data.
– Ex. aggregating over months loses
information abt. which day of the week
has the highest sales.
Lecture 5/ 03-08-09 4
Data Reduction :2. Sampling
• Sampling is the main technique employed for data
selection.
– It is often used for both the preliminary investigation
of the data and the final data analysis.
• Statisticians sample because obtaining the entire set of
data of interest is too expensive or time consuming.
• Sampling is used in data mining because processing the
entire set of data of interest is too expensive or time
consuming.
Lecture 5/ 03-08-09 5
Types of Sampling
• Simple Random Sampling
– There is an equal probability of selecting any particular item
• Sampling without replacement
– As each item is selected, it is removed from the population
• Sampling with replacement
– Objects are not removed from the population as they are
selected for the sample.
• In sampling with replacement, the same object can be picked up
more than once
Lecture 5/ 03-08-09 6
• Stratified Sampling:
– Population of different types with wide variety
of objects.
– Entire population is divided into stratas or pre
specified groups
– Random samples are picked up from them.
• Progressive/Adaptive sampling:
– Such sampling starts with small samples and
size of sample keeps on increasing till a
sufficient size is obtained.
Lecture 5/ 03-08-09 7
3.Dimensionality Reduction
• Purpose/Benefit:
– DM algos. work much better in low
dimensions.
– Helps to eliminate irrelevant features or
reduce noise.
– Avoid curse of dimensionality.
– More understandable DM model can be
obtained b’coz of less attributes.
– Reduce amount of time and memory required
by data mining algorithms.
– Allow data to be more easily visualized.
Lecture 5/ 03-08-09 8
Curse of dimensionality
• Data becomes sparse as dim. Increases,
in the space, it occupies.
• In clustering, concept of density and
distance bet. points becomes less
meaningful.
• This produces poor quality clusters or
results in poor classification results.
Lecture 5/ 03-08-09 9
Techniques for Dimensionality
reduction:
– Principle Component Analysis (PCA)

– Singular Value Decomposition (SVD)
– Others: supervised and non-linear techniques
Lecture 5/ 03-08-09 10
PCA & SVD
• Linear algebra technique for continuous attributes
• Looks towards combination of attributes to find new
attributes (principal components) that are:
– 1. linear comb. of original attributes.

– 2. orthogonal to each other.
– 3. finds a projection that captures maximum variability in
the data.
• SVD
– Similar and related to PCA.
Lecture 5/ 03-08-09 11
Dimensionality Reduction : PCA
• Goal is to find a projection that captures the
largest amount of variation in data
x2
x1
Lecture 5/ 03-08-09 12
Dimensionality Reduction : PCA
• Find the eigenvectors of the covariance
matrix
• The eigenvectors define the new space
Lecture 5/ 03-08-09 13
4.Feature Subset Selection
• Another way to reduce dimensionality of data
• Redundant features
– duplicate much or all of the information contained in one or
more other attributes
– Example: purchase price of a product and the amount of sales
tax paid
• Irrelevant features
– contain no information that is useful for the data mining task at
hand
– Example: students' ID is often irrelevant to the task of
predicting students' GPA
Lecture 5/ 03-08-09 14
Techniques
for
FSS
Embedded Filter
Brute-force Wrapper
Approaches Approaches
Approaches Approaches
Lecture 5/ 03-08-09 15
• Brute-force Approaches:
– Try all possible feature subsets as input to data
mining algorithm
• Embedded Approaches:
– Feature selection occurs automatically by DM algos.
– DM algos. itself decides which attribute is to be used
and which is to be left.
– Algos. for constructing Decision tree classifiers often
operate in this manner.
Lecture 5/ 03-08-09 16
• Filter approaches:
– Features are selected before data mining
algorithm is run.
– These appro. are independent of DM tasks.
– Filtering those attributes whose pairwise
correlation is low.
• Wrapper approaches:
– Use the data mining algorithm as a black box to
find best subset of attributes.
Lecture 5/ 03-08-09 17
Architecture of FSS
• Four steps in FSS:
– 1. A search strategy that generates new
subsets of feature.
– 2. A measure for evaluating a subset
– 3. A stopping crieteria
– 4. A validation procedure
Selected
attri.
Done Stopping Evaluation
criterion
Validation
Procedure
Not done
Search Subset of
Attributes Strategy attri.
Lecture 5/ 03-08-09 18
Wrapper Approaches Filter Approaches
1. Subset evaluation uses DM 1. Subset evaluation is

Algorithm. independent of DM algorithm.
2. Evaluation procedure 2. Evaluation procedure attempts
consists of actually running to predict “how well DM algo. will
the DM application perform for that particular set of
attributes.
Lecture 5/ 03-08-09 19
• The no. of subsets are usually very large, so it is
impractical to examine them all---------so some
stopping criterion must be employed.
• SC can be
• Dependent on no. of iterations.
• The value of the subset evaluation measure is optimal or
exceeds some threshold value.
• A subset of desired size has been obtained.
• Feature subset selection done, then results of

target DM algo. on the selected features is
validated.
Lecture 5/ 03-08-09 20
5. Feature Creation
• Create new attributes that can capture the
important information in a data set much more
efficiently than the original attributes
Methodologies for
FEATURE
CREATION
Mapping Data to Feature

Feature extraction
New Space Construction
Lecture 5/ 03-08-09 21
• Feature Extraction
– The creation of new set of features from the

original raw data.
– highly domain-specific. The techniques for FE,
developed for one field are often not applicable to
other fields.
– DM whenever applied to a relatively new field,
new feature extraction methods are to be looked
for.
Lecture 5/ 03-08-09 22
Mapping Data to a New Space
• A different view of data can reveal
interesting & important features.
• Time series data: contains periodic
patterns.
– If single periodic pattern without much noise---
pattern is easily detectable.
– Or it can be multiple periodic patterns with
good amount of noise.
– Such patterns are usually detected by
applying FT or WT.
Lecture 5/ 03-08-09 23
Mapping Data to a New Space
• Fourier transform
• Wavelet transform
Frequency
Two Sine Waves Noisy time series Power spectrum
Lecture 5/ 03-08-09 24
Feature Construction
• Example
– A dataset consisting of information regarding
antique items like mass, volume etc.
– Suppose these items are made up of say
wood, clay, bronze, silver etc.
– DM task : Classify objects wrt the material
they are made of.
– Density=mass/vol, provides the accurate
classification
Lecture 5/ 03-08-09 25
6.Binarization
• Both cont. and discrete attr. may be
transformed to binary attr...Binari-zation .
• ‘m’ catego. values, then assign each original
value to an integer value in [0,m-1].
• binary digits are required to
represent these values.
n =  log 2 man
• Consider  ex., a catego. attri. With 5 values
{awful, poor, OK, good, great} would require
3 binary variables (n=log 5=2.321928)
2
Lecture 5/ 03-08-09 26
Table 2.5 conversion of a catego. Att. To 3 binary attr.
Cate. Integer x1 x2 x3
value value
awful 0 0 0 0
poor 1 0 0 1
OK 2 0 1 0
Good 3 0 1 1
Great 4 1 0 0
•For Association analysis- asymmetric binary attributes are

(may be) required, where only the presence of the attri.
(value=1) is essential.
•In such situations, one binary attribute for each categorical
value is necessary as shown in next table:
Lecture 5/ 03-08-09 27
Table 2.6 conversion of a catego. Att. To 5 binary attr.
Cate. Integer
value
x1 x2 x3
x4 x5 If no. of attr. Is
value
large, then first the
awful 0 1 0 0
0 0 no. of attr. Are to
poor 1 0 1 0
0 0 be reduced.
OK 2 0 0 1
0 0
Good 3 0 0 0 1 0
Great 4 0 0 0 0 1
Symmetric Binary attr.: both states (0 & 1)

are equally important & carry equal weight.
Ex. “gender” can be male or female.
Asymmetric Binary attr.: if the outcomes of
the states are not equally important.
Ex. “+ve” or “–ve” outcome of a disease
test. Lecture 5/ 03-08-09 28
6.Discretization
• Transformation of a cont. attribute into
categorical attr….Discretization.
• Two step process:
• Deciding the no. of categories (HOW?)
– Values of the continuous attr. are sorted
– Partitioned into n intervals by specifying
n-1 split points LIKE {(x0,x1],(x1,x2],….

(xn-1,xn)}
• Determination of how continuous values to categorical
values. (VERY SIMPLE)
– All values in one interval are mapped to same categorical value.
Lecture 5/ 03-08-09 29
7.Variable Transformation
• It refers to transformation applied to the values of an attribute
(variable).
• Two types of Attr. Trans.:
– Simple functional Trans.:
• A simple mathe. func. Is applied to each value individually.
• If x is a variable then such func. Can be xk, log x, ex, sin x or |
x|.
– Normalization:
• Its goal is to make an entire set of values to have a particular
property.
X -- mean of some attr. Values

Sx– SD
X’=(X-X)/Sx creates a new variable with
mean=0 and SD=1
Lecture 5/ 03-08-09 30

Data Reduction :1. Aggregation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Reduction :1. Aggregation

Uploaded by

Copyright:

Available Formats

Data Reduction :1.

Standard Deviation of Average Standard Deviation of Average

– May lose interesting and potential

– Principle Component Analysis (PCA)

– 1. linear comb. of original attributes.

1. Subset evaluation uses DM 1. Subset evaluation is

• Feature subset selection done, then results of

Mapping Data to Feature

– The creation of new set of features from the

Two Sine Waves Noisy time series Power spectrum

•For Association analysis- asymmetric binary attributes are

Symmetric Binary attr.: both states (0 & 1)

n-1 split points LIKE {(x0,x1],(x1,x2],….

X -- mean of some attr. Values

You might also like