Women Entrepreneurship

Neelam Rawat, AI UNIT - V 1
Unit-V
Pattern Recognition : Introduction, Design principles of

pattern recognition system, Statistical Pattern recognition,
Parameter estimation methods - Principle Component
Analysis (PCA) and Linear Discriminant Analysis (LDA),
Classification Techniques Nearest Neighbor (NN) Rule,
Bayes Classifier, Support Vector Machine (SVM), K means
clustering.

PATTERN RECOGNITION: Introduction
RECOGNITION
Recognition Re + Cognition
COGNITION:- To become acquainted with, to come to know the

act, or the process of knowing an entity (the process of knowing).
Recognition : The knowledge or feeling that the present object has

been met before (the process of knowing again).
PR is a study of ideas and algorithms that provide computers with

a perceptual capability to put abstract objects, or patterns into
categories in a simple and reliable way.
PATTERN : Pattern is a set of objects or phenomena or concepts

where the elements of the set are similar to one another in certain
ways/aspects. The Pattern are described by certain quantities,
qualities, traits, notable features and so on.

Cloud Patterns
Forest and Cultivated
Land
Coal Mine Detection
Natural Gas Detection
Examples of
applications
Handwritten: sorting letters by
Optical Character postal code, input device for PDAs.
Printed texts: reading machines for
Recognition (OCR) blind people, digitalization of text
documents.
Biometrics Face recognition, verification,
retrieval.
Diagnostic systems Finger prints recognition.
Speech recognition.
Military applications
Medical diagnosis: X-Ray, EKG
analysis.
Machine diagnostics, waster
detection.
Automated Target Recognition (ATR).
Image segmentation and analysis
(recognition from aerial or satelite
photographs).

PATTERN RECOGNITION & CLASSIFICATION PROCESS
Step-1: Stimuli produced by objects are perceived by sensory

devices. The attribute and their relations are used to characterize
an object in the form of a pattern vector X. The range of
characteristic attribute values is known as the measurement
space M.
Step-2: A subset of attributes whose values provide cohesive
object grouping or clustering consistent with some goals
associated with the object classification are selected. The range of
subset of attribute values is known as the feature space F.
Step-3: Using the selected attribute values, object or class
characterization models are learned by forming generalized photo
type descriptions, classification rules or decision functions.
The range of the decision function values or classification rules is
known as the decision space D.
Step-4: Recognition of familiar objects is achieved through
application of the rules learned in step 3 by comparison and
matching of objects features with the stored models.

PATTERN RECOGNITION APPLICATIONS
Medical diagnosis
Life form analysis
Sonar detection
Radar detection
Image processing
Process control
Information Management systems
Aerial photo interpretation.
Weather prediction
Sensing of life on remote planets.
Behavior analysis
Character recognition
Speech and Speaker recognition etc.
Methodology of PR consists of the following:

1.We observe patterns
2.We study the relationships between the various patterns.
3.We study the relationships between patterns and ourselves and thus
arrive at situations.
4.We study the changes in situations and come to know about the
events.
5.We study events and thus understand the law behind the events.
6. Using the law, we can predict future events.
EXAMPLE: Astrology/Palm history
According to this methodology, it consists of the following

1.We observe the different planets/lines on hand.
2.We study the relationship between the planets/lines.
3.We study the relations between the position of planets/lines
and situations
in life and arrive at events.
4.We study the events and understand the law behind the
events.
5.Using the law we can predict the future of a person.
EXAMPLE: DC Machines
According to this methodology, it consists of the following:

1.We observe the patterns like magnetic poles, conductors core
and so on.
2.We study the relationship between poles, conductors etc.
3.We study the relationship between patterns and arrive at
voltage current etc.
4.We study changes of situation and arrive at events like rotating
the conductor
and voltage being induced in them
Neelam Rawat, AIbecause
UNIT - V of cutting the lines 11
of flux.
TYPES OF PATTERNS
1.SPATIAL PATTERNS- These patterns are located in space.

Eg:- characters in character recognition
* images of ground covers in remote sensing
* images of medical diagnosis.
2.TEMPORAL PATTERN-These are distributed in time.

Eg:- Radar signal, speech recognition, sonar signal etc.
3.ABSTRACT PATTERNS-Here the patterns are distributed

neither in space nor time.
Eg:- classification of people based on psychological tests.
* Medical diagnosis based on medical history and other
medical tests.
* Classification of people based on language they speak.

APPROACH TO PATTERN RECOGNITION
1.Statistical or decision theoretic or discriminant method.

2.Syntactic or Grammatical or structural approach.
1. Statistical Pattern Recognition
The data is reduced to vectors of numbers and statistical

techniques are used for the tasks to be performed.
2. Structural Pattern Recognition
The data is converted to a discrete structure(such as a

grammar or a graph) and the techniques are related to
computer science subjects (such as parsing and graph
matching).
STATISTICAL APPROACH
Patterns Results
Feature Extraction
Transducer Learning Classification
And
Feature Selection
Fig1.1: Block diagram representation of statistical approach
Transducer : It is used for making measurements for various attributes of

the pattern.
Feature Extractor: From the measurements, it extracts, number of features
which are
required for describing the pattern and classifying.
Feature selector : Depending on the problem the feature selector selects
minimum
number of features that are sufficient to classify the
pattern.
STATISTICAL APPROACH
There are two feature selector methods.
1.Transformation Method :
Here we reduce the features by considering the linear or nonlinear
combinations of original features. This is also called as aggregation
method.
Eg:- let us assume originally we have four features f1,f2,f3,f4.
One method of selecting two features is
f5 = f1 + f2
f6 = f3 + f4.
2.Subsetting or filtering Method:
Here we select a subset of the original features.
Eg:- Original features are f1,f2,f3,f4.
We can select a subset like
f5 = f1 and f6 = f3.
Learning : It is a process of determining useful parameters which are required
for classifying the patterns efficiently.
Classifying: Here the patterns are assigned to different classes using a
suitable classification method.
PRINCIPAL COMPONENT
ANALYSIS (PCA)

Why Principal Component Analysis?
Motive
Find bases which has high variance in data
Encode data with small number of bases with low MSE

Derivation of PCs
Assume that
1
E[ x ] 0 T
ax qq x T T
|| q || (q q) 2 1
2 E[a 2 ] E[a ]2 E[ a 2 ]
E[(qT x)(xT q)] qT E[xxT ]q qT Rq
Find qs maximizing this!!
Principal component q can be obtained
by Eigenvector decomposition such as SVD!
R QQT , Q [q1 , q 2 ,..., q j ,..., q

m ], diag[1 , 2 ,..., j ,..., m ]
Rq j j q j j 1, 2,..., m
Rq q
Dimensionality Reduction
(1/2)
Can ignore the components of less significance.
You do lose some information, but if the eigenvalues are small, you
dont lose much
n dimensions in original data
calculate n eigenvectors and eigenvalues
choose only the first p eigenvectors, based on their
eigenvalues
final data set has only p dimensions
Dimensionality Reduction
(2/2)
Variance
Dimensionality

Reconstruction from
PCs
q=1 q=2 q=4 q=8
Original
q=16 q=32 q=64 q=100 Image

LINEAR DISCRIMINANT
ANALYSIS (LDA)

Limitations of PCA
Are the maximal variance dimensions the

relevant dimensions for preservation?

Linear Discriminant Analysis (1/6)
What is the goal of LDA?
Perform dimensionality reduction while preserving as much

of the class discriminatory information as possible.
Seeks to find directions along which the classes are best
separated.
Takes into consideration the scatter within-classes but also
the scatter between-classes.
For example of face recognition, more capable of
distinguishing image variation due to identity from variation
due to other sources such as illumination and expression.

Linear Discriminant
Analysis (2/6)
c ni
Within-class scatter matrix Sw (Y
i 1 j 1
j M i )(Y j M i )T
c
Between-class scatter matrix Sb (M
i 1
i M )( M i M )T
projection matrix
y U x T
LDA computes a transformation that maximizes the

between-class scatter while minimizing the within-class
scatter:
| S%
b | | U T SbU | products of eigenvalues !
max max T
%
| Sw | | U S wU |
S w1Sb U U T
S%
b , %: scatter matrices of the projected data
S w Neelam Rawat, AI UNIT - V 25
y
Does Sw-1 always exist?
If Sw is non-singular, we can obtain a conventional eigenvalue

problem by writing:
S w1Sb U U T
In practice, Sw is often singular since the data are image vectors

with large dimensionality while the size of the data set is much
smaller (M << N )
c.f. Since Sb has at most rank C-1, the max number of

eigenvectors with non-zero eigenvalues is C-1 (i.e., max
dimensionality of sub-space is C-1)

Linear Discriminant
Analysis (4/6)
Does Sw-1 always exist? cont.
To alleviate this problem, we can use PCA first:
1) PCA is first applied to the data set to reduce its

dimensionality.
2) LDA is then applied to find the most discriminative directions:

PCA LDA
D. Swets, J. Weng, "Using Discriminant Eigenfeatures for

Image Retrieval", IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 18, no. 8, pp. 831-836, 1996
Linear Discriminant
Analysis (6/6)
Factors unrelated to classification
MEF vectors show the tendency of PCA to capture major variations
in the training set such as lighting direction.
MDF vectors discount those factors unrelated to classification .
D. Swets, J. Weng, "Using Discriminant Eigenfeatures for

Image Retrieval", IEEE Transactions on Pattern Analysis and
MachineNeelam
Intelligence, vol. 18, no. 8, pp. 831-836, 1996 29
Rawat, AI UNIT - V
PCA vs LDA vs ICA: A short
Review
PCA : Proper to dimension reduction

LDA : Proper to pattern classification if the number of training
samples of each class are large
ICA : Proper to blind source separation or classification using ICs
when class id of training data is not available
Is LDA always better than PCA?

There has been a tendency in the computer vision
community to prefer LDA over PCA.
This is mainly because LDA deals directly with
discrimination between classes while PCA does not pay
attention to the underlying class structure.
This paper shows that when the training set is small, PCA
can outperform LDA.
When the number of samples is large and representative
for each class, LDA outperforms PCA.
SVM SUPPORT VECTOR
MACHINE
A new classification method for both linear and nonlinear data

It uses a nonlinear mapping to transform the original training data
into a higher dimension
With the new dimension, it searches for the linear optimal
separating hyperplane (i.e., decision boundary)
With an appropriate nonlinear mapping to a sufficiently high
dimension, data from two classes can always be separated by a
hyperplane
SVM finds this hyperplane using support vectors (essential
training tuples) and margins (defined by the support vectors)

SVM HISTORY &
APPLICATION
Vapnik and colleagues (1992)groundwork from Vapnik &

Chervonenkis statistical learning theory in 1960s
Features: training can be slow but accuracy is high owing to
their ability to model complex nonlinear decision boundaries
(margin maximization)
Used both for classification and prediction
Applications:
handwritten digit recognition, object recognition, speaker
identification, benchmarking time-series prediction tests

SVM GENERAL PHYLOSPHY
Small Margin Large Margin

Support Vectors

SVM MARGINS & SUPPORT
VECTORS

SVM When Data Is Linearly
Separable
Let data D be (X1, y1), , (X|D|, y|D|), where Xi is the set of training
tuples associated with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but
we want to find the best one (the one that minimizes classification
error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e.,
maximum marginal hyperplane (MMH)
SVM Linearly Separable
A separating hyperplane can be written as

WX+b=0
where W={w1, w2, , wn} is a weight vector and b a scalar (bias)
For 2-D it can be written as
w0 + w1 x1 + w2 x2 = 0
The hyperplane defining the sides of the margin:
H1: w0 + w1 x1 + w2 x2 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 1 for yi = 1
Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors
This becomes a constrained (convex) quadratic optimization
problem: Quadratic objective function and linear constraints
Quadratic Programming (QP) Lagrangian multipliers

SVM Why Is SVM Effective on High Dimensional
Data?
The complexity of trained classifier is characterized by the # of

support vectors rather than the dimensionality of the data
The support vectors are the essential or critical training
examples they lie closest to the decision boundary (MMH)
If all other training examples are removed and the training is
repeated, the same separating hyperplane would be found
The number of support vectors found can be used to compute an
(upper) bound on the expected error rate of the SVM classifier,
which is independent of the data dimensionality
Thus, an SVM with a small number of support vectors can have
good generalization, even when the dimensionality of the data is
high
K NEAREST NEIGHBOUR ALGORITHM
1. All instances correspond to points in the n-D space

2. The nearest neighbor are defined in terms of Euclidean distance,
dist(X1, X2)
3. Target function could be discrete- or real- valued
4. For discrete-valued, k-NN returns the most common value among
the k training examples nearest to xq
5. Vonoroi diagram: the decision surface induced by 1-NN for a
typical set of training examples:
_
_
_ _ .
+
_ . +
xq + . . .
Neelam Rawat, AI UNIT - V . 38
Discussion on the k-NN Algorithm
k-NN for real-valued prediction for a given unknown tuple

Returns the mean values of the k nearest neighbors
Distance-weighted nearest neighbor algorithm
Weight the contribution of each of the k neighbors
according to their distance to the query xq
-- Give greater weight to closer neighbors
Robust to noisy data by averaging k-nearest neighbors
Curse of dimensionality: distance between neighbors could be
dominated by irrelevant attributes
To overcome it, axes stretch or elimination of the least
relevant attributes


Women Entrepreneurship

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Women Entrepreneurship

Uploaded by

Copyright:

Available Formats

Neelam Rawat, AI UNIT - V 1

Pattern Recognition : Introduction, Design principles of

Neelam Rawat, AI UNIT - V 2

COGNITION:- To become acquainted with, to come to know the

Recognition : The knowledge or feeling that the present object has

PR is a study of ideas and algorithms that provide computers with

PATTERN : Pattern is a set of objects or phenomena or concepts

Neelam Rawat, AI UNIT - V 3

Step-1: Stimuli produced by objects are perceived by sensory

Methodology of PR consists of the following:

According to this methodology, it consists of the following

According to this methodology, it consists of the following:

1.SPATIAL PATTERNS- These patterns are located in space.

2.TEMPORAL PATTERN-These are distributed in time.

3.ABSTRACT PATTERNS-Here the patterns are distributed

Neelam Rawat, AI UNIT - V 12

1.Statistical or decision theoretic or discriminant method.

The data is reduced to vectors of numbers and statistical

2. Structural Pattern Recognition

The data is converted to a discrete structure(such as a

Fig1.1: Block diagram representation of statistical approach

Transducer : It is used for making measurements for various attributes of

Neelam Rawat, AI UNIT - V 16

Neelam Rawat, AI UNIT - V 17

R QQT , Q [q1 , q 2 ,..., q j ,..., q

Can ignore the components of less significance.

Neelam Rawat, AI UNIT - V 20

q=1 q=2 q=4 q=8

Neelam Rawat, AI UNIT - V 21

Neelam Rawat, AI UNIT - V 22

Are the maximal variance dimensions the

Neelam Rawat, AI UNIT - V 23

What is the goal of LDA?

Perform dimensionality reduction while preserving as much

Neelam Rawat, AI UNIT - V 24

LDA computes a transformation that maximizes the

Does Sw-1 always exist?

If Sw is non-singular, we can obtain a conventional eigenvalue

In practice, Sw is often singular since the data are image vectors

c.f. Since Sb has at most rank C-1, the max number of

Neelam Rawat, AI UNIT - V 26

To alleviate this problem, we can use PCA first:

1) PCA is first applied to the data set to reduce its

2) LDA is then applied to find the most discriminative directions:

Neelam Rawat, AI UNIT - V 27

D. Swets, J. Weng, "Using Discriminant Eigenfeatures for

D. Swets, J. Weng, "Using Discriminant Eigenfeatures for

PCA : Proper to dimension reduction

Is LDA always better than PCA?

A new classification method for both linear and nonlinear data

Neelam Rawat, AI UNIT - V 31

Vapnik and colleagues (1992)groundwork from Vapnik &

Neelam Rawat, AI UNIT - V 32

Small Margin Large Margin

Neelam Rawat, AI UNIT - V 33

Neelam Rawat, AI UNIT - V 34

A separating hyperplane can be written as

Neelam Rawat, AI UNIT - V 36

The complexity of trained classifier is characterized by the # of

1. All instances correspond to points in the n-D space

k-NN for real-valued prediction for a given unknown tuple

Neelam Rawat, AI UNIT - V 39

You might also like