Linear Methods For Classification

Linear Methods for Classification
as. Prof. Dr. abdulhamit subasi
5/16/12 By Hakan
Machine balk stilini dzenlemek Asl alt Learning Presentation
Introduction

Basic setup of a classification problem. Understanding the Bayes classification rule. Understanding the classification approach by linear regression of indicator matrix. Understanding the phenomenon of masking.
5/16/12
Setup for supervised Learning

Training data: {(x1,g1), (x2,g2), ..., (xN,gN)}. The feature vectorX= (X1,X2, ... ,Xp), where each variableXjis quantitative. The response variable G is categorical. G G = {1, 2, ... ,K} Form a predictorG(x) to predictGbased onX.
5/16/12
Setup for Supervised Learning
G(x) divides the input space (feature vector space) into a collection of regions, each labeled by one class. See the
5/16/12
Linear Methods
Decision boundariesare linear: linear methods for classification. Two class problem: The decision boundary between the two classes is a hyperplane in the feature vector space. A hyperplane in thepdimensional input space is the set:
5/16/12
Linear Methods
The two regions separated by a hyperplane:
More than two classes: The decision boundary between any pair of classkandlis a hyperplane How do you choose the hyperplane? 5/16/12
Linear Methods
Example methods for deciding the hyperplane:

Linear regression of an indicator matrix. Linear discriminant analysis. Logistic regression. Rosenblatts perceptron Learning algorithm
5/16/12 Note: Linear decision boundaries are not necessarily
The Bayes Classification Rule
5/16/12
5/16/12
5/16/12
Linear Regression of an Indicator Matrix

g
1 3 2 4
Y1Y2Y3Y4
1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1
5/16/12
Linear Regression Fit to the Class Indicator Variables
Verification of
We want to prove which is equivalent to prove
(Eq. 1) Notice
(Eq. 5/16/12 2)
Linear Regression Fit to the Class Indicator Variables

And the augmented X has
5/16/12 From Eq. 2: we can see that
Linear Regression Fit to the Class Indicator Variables Eq. 1 becomes:
True for any x.

5/16/12
Linear discriminant analysis

density of X in class prior G=k probability fk(x) Gaussian and the class have a common covariance matrix log-ratio : is linear in x decision boundaries are linear discriminant function : classification :
5/16/12
Remarks
with 2 classes, linear discriminant analysis classification with linear least square with more than 2 classes : avoid masking problems if not common covariance matrix, quadratic discriminant analysis
5/16/12
Regularized discriminant analysis (RDA)
Compromise between linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) regularized covariance matrix :
covariance matrix used in LDA determined by crossvalidation
5/16/12
Computations
Simplified by diagonalisation of covariance matrices
(eigen-decomposition) Algorithm :
Sphere the data X (using Eigendecomposition of

5/16/12 common
the covariance matrix)
Reduced-rank linear discriminant analysis
Fisher : Find the linear combination Z=aTX such that the between-class variance is maximized relative to the withinclass variance.
maximizing the Rayleigh quotient B where : : Between-class covariance W : within-class covariance

5/16/12
Logistic regression
model specified by K-1 log-odds or logit transformations :
5/16/12
Fitting logistic regression model
usually, by maximum likelihood (Newton-Raphson algorithm to solve the score equations) example : K =2 (2 groups), write log-likelihood
5/16/12
encode
Example : South african heart disease

correlation between the set of predictors surprising results : some variables not included in the logistic model
5/16/12
Quadratic approximations and inference

weigth s
5/16/12
Differences between LDA and logistic regression
same form BUT differences in the way the coefficients are estimated : logistic regression : more general, less assumptions (arbitrary density function for X), more robust BUT very similar results in practice
5/16/12
Separating hyperplanes
perceptron = classifiers such as : hyperplane or affine set L : defined by the equation

Properties

vector normal to the surface L for any point x0 in L, the signed distance of any point x to L is given by :
5/16/12
Rosenblatts perceptron learning algorithm
try to separate hyperplanes by minimizing the distance of missclassified points to the decisison if is misclassified, then boundary if is misclassified, then M is the index set of missclassified The algorithm uses stochastic gradient descent to minimize points.
this piecewise linear criterion.
5/16/12
minim ize
Optimal separating hyperplanes
find hyperplane that minimizes some measure of overlap in the training data. advantages over Rosenblatts algorithm :

unique solution better classification performance on test data
least square 2 solutions by perceptron algorithm with different 5/16/12
Resources
Celine BUGLI `The elements of Statistical Learning`
5/16/12
Thank you
5/16/12

Linear Methods For Classification

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Methods For Classification

Uploaded by

Copyright:

Available Formats

Linear Methods for Classification

as. Prof. Dr. abdulhamit subasi

Machine balk stilini dzenlemek Asl alt Learning Presentation

Setup for supervised Learning

Setup for Supervised Learning

The two regions separated by a hyperplane:

Example methods for deciding the hyperplane:

5/16/12 Note: Linear decision boundaries are not necessarily

The Bayes Classification Rule

Linear Regression of an Indicator Matrix

Linear Regression Fit to the Class Indicator Variables

Linear Regression Fit to the Class Indicator Variables

5/16/12 From Eq. 2: we can see that

Linear Regression Fit to the Class Indicator Variables Eq. 1 becomes:

True for any x.

Linear discriminant analysis

Regularized discriminant analysis (RDA)

Simplified by diagonalisation of covariance matrices

Sphere the data X (using Eigendecomposition of

the covariance matrix)

Reduced-rank linear discriminant analysis

maximizing the Rayleigh quotient B where : : Between-class covariance W : within-class covariance

model specified by K-1 log-odds or logit transformations :

Fitting logistic regression model

Example : South african heart disease

Quadratic approximations and inference

Differences between LDA and logistic regression

perceptron = classifiers such as : hyperplane or affine set L : defined by the equation

Rosenblatts perceptron learning algorithm

Optimal separating hyperplanes

unique solution better classification performance on test data

least square 2 solutions by perceptron algorithm with different 5/16/12

Celine BUGLI `The elements of Statistical Learning`

You might also like