You are on page 1of 18

Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is most commonly used as dimensionality


reduction technique in the pre-processing step for pattern-classification and
machine learning applications. The goal is to project a dataset onto a lower-
dimensional space with good class-separability in order avoid over-fitting
("curse of dimensionality") and also reduce computational costs.
So, in a nutshell, often the goal of an LDA is to project a feature space (a
dataset n-dimensional samples) onto a smaller
subspace kk (where k≤n−1k≤n−1) while maintaining the class-discriminatory
information.
In general, dimensionality reduction does not only help reducing
computational costs for a given classification task, but it can also be helpful to
avoid overfitting by minimizing the error in parameter estimation ("curse of
dimensionality").
Listed below are the 5 general steps for performing a linear discriminant
analysis.

1. Compute the dd-dimensional mean vectors for the different classes

from the dataset.

2. Compute the scatter matrices (in-between-class and within-class scatter


matrix).
3. Compute the eigenvectors (e1,e2,...,ede1,e2,...,ed) and corresponding
eigen values (λ1,λ2,...,λdλ1,λ2,...,λd) for the scatter matrices.
4. Sort the eigenvectors by decreasing eigenvalues and
choose kk eigenvectors with the largest eigenvalues to form
a k×dk×d dimensional matrix WW (where every column represents an
eigenvector).
5. Use this k×dk×d eigenvector matrix to transform the samples onto the
new subspace. This can be summarized by the mathematical
equation: Y=X×WY=X×W (where XX is a n×dn×d-dimensional matrix
representing the nn samples, and yy are the transformed n×kn×k-
dimensional samples in the new subspace).
Data Rescaling
Your preprocessed data may contain attributes with a mixtures of scales for various
quantities such as dollars, kilograms and sales volume.

Many machine learning methods expect or are more effective if the data attributes
have the same scale. Two popular data scaling methods are normalization and
standardization.

Data Normalization
Normalization refers to rescaling real valued numeric attributes into the range 0 and
1.

It is useful to scale the input attributes for a model that relies on the magnitude of
values, such as distance measures used in k-nearest neighbors and in the
preparation of coefficients in regression.

Data Standardization
Standardization refers to shifting the distribution of each attribute to have a mean of
zero and a standard deviation of one (unit variance).

It is useful to standardize attributes for a model that relies on the distribution of


attributes such as Gaussian processes.

Decision Tree
Decision tree are commonly learned by recursively splitting the set of training
instance into subsets based on the instances' value for the explanatory
variables.
Memorizing the training set is called Over-Fitting. A program that memorizes
its observation may not perform its task well, as it is memorized relations and
structure that are noise or coincidence.
Balancing memorization and generalization, or over-fitting and under-fitting, is
a problem common to many machine learning algorithm.

Agenda for Today's Session


What is classification?
Types of Classification.
Classification Use case
What is Decision Tree?
Terminologies associated to a Decision Tree.
Visualizing a Decision Tree
Writing a Decision Tree Classification from scratch in Python using CART
algorithm.
What is Classification?
Classification is the process of dividing the datasets into different categories or
groups by adding label.
Note - It adds the data point to a particular labeled group on the basis of some
condition.
Types of Classification
 Decision Tree
 Random Forest
 Naive Bayes
 KNN

Decision Tree
 Graphical representation of all the possible solutions to a decision
 Decision are based on some conditions
 Decision made can be easily explained
Random Forest
 Builds multiple decision trees and merges them together
 More accurate and stable prediction
 Random decision forests correct for decision trees' habit of over fitting to their
training set.
 Trained with the "bagging" method
Naive Bayes
 Classification technique based on Bayes' Theorem
 Assumes that the presence of particular feature in a class is unrelated to the
presence of any other feature

K- Nearest Neighbors
 Stores all the available cases and classifies new cases based on a similarity measure
 The "K" is KNN algorithm is the nearest neighbors we wish to take vote from.
What is decision Tree?
"A decision tree is a graphical representation of all the possible solution to a
decision based on certain conditions"

Dataset
This is how our dataset look like!
Decision Tree Terminology

CART Algorithm
Which one among them should you pick first? from the following data set.

Answer :- Determine the attribute has that best classified the training data.
But how do we choose the best attribute?
or
How does a tree decide where to split?

Entropy
How will you decide what is the best attribute?
Attribute with the highest information is considered as best.
Next Question - What is the information?
What is entropy?
 Define the randomness in the data
 Entropy is just a metric which measures the impurity or
 The first step to solve the problem of a decision tree.
What is information gain?
 Measures the reduction in entropy
 Decide with attribute should be selected as the decision node
if S is our total collection
Information Gain = Entropy(S) - [(Weighted Avg) * Entropy(each feature)]
why should i Pruning?

You might also like