You are on page 1of 19

Discriminant Analysis

Discriminant Analysis 2
 Discriminant analysis (DA) is a technique for
analyzing data when the criterion or dependent
variable is categorical and the predictor or
independent variables are interval in nature.
 It is a technique to discriminate between two or more
mutually exclusive and exhaustive groups on the basis
of some explanatory variables
Types of D.A
 Linear D A - when the criterion / dependent variable
has two categories eg: adopters & non-adopters
 Multiple D A- when three or more categories are
involved eg: SHG1, SHG2,SHG3
Similarities and Differences
3
ANALYSIS ANOVA REGRESSION DISCRIMINANT

Similarities
1.Number of dependent One One One
variables
2.Number of independent Multiple Multiple Multiple
variables

Differences
1.Nature of the dependent Metric Metric Categorical
2.Nature of the independent Categorical Metric Metric
Assumptions 4

1. Sample size (n)


 group sizes of the dependent should not be grossly different i.e.
80:20. It should be at least five times the number of
independent variables.
2. Normal distribution
 Each of the independent variable is normally distributed.
3. Homogeneity of variances / covariances
 All variables have linear and homoscedastic relationships.
4. Outliers
 Outliers should not be present in the data. DA is highly
sensitive to the inclusion of outliers.
5
5. Non-multicollinearity
 There should NOT BE MULTICOLLINEARITY among
the independent variables.
6. Mutually exclusive
 The groups must be mutually exclusive, with every
subject or case belonging to only one group.
7. Classification
 Each of the allocations for the dependent
categories in the initial classification are correctly
classified.
Discriminant Analysis Model
6
The discriminant analysis model involves linear combinations of
the following form:

D = b0 + b1X1 + b2X2 + b3X3 + . . . + bkXk


where
D = discriminant score
b 's = discriminant coefficient or weight
X 's = predictor or independent variable
 The coefficients, or weights (b), are estimated so that the
groups differ as much as possible on the values of the
discriminant function.
 Discriminant analysis – creates an equation which will
minimize the possibility of misclassifying cases into their
respective groups or categories
Hypothesis
7
 Discriminant analysis tests the following hypotheses:

H0: The group means of a set of independent variables


for two or more groups are equal.

Against

H1: The group means for two or more groups are not
equal

 This group means is referred to as a centroid.


Statistics Associated with Discriminant 8
Analysis
 Canonical correlation:
Canonical correlation measures the extent of association
between the discriminant scores and the groups.
 It is a measure of association between the single discriminant function and
the set of dummy variables that define the group membership.
 The canonical correlation is the multiple correlation between the
predictors and the discriminant function
 Centroid. The centroid is the mean values for the
discriminant scores for a particular group.
 There are as many centroids as there are groups, as there is
one for each group. The means for a group on all the
functions are the group centroids.
9
 Classification matrix. Sometimes also called
confusion or prediction matrix, the classification
matrix contains the number of correctly classified
and misclassified cases.
 Discriminant function coefficients. The
discriminant function coefficients (unstandardized)
are the multipliers of variables, when the variables
are in the original units of measurement.
 F values and their significance. These are
calculated from a one-way ANOVA, with the grouping
variable serving as the categorical independent
variable. Each predictor, in turn, serves as the metric
dependent variable in the ANOVA.
 Discriminant scores. The unstandardized
10
coefficients are multiplied by the values of the
variables. These products are summed and added to
the constant term to obtain the discriminant scores.
 Eigenvalue. For each discriminant function, the
Eigenvalue is the ratio of between-group to within-
group sums of squares. Large Eigenvalues imply
superior functions.
 Pooled within-group correlation matrix. The
pooled within-group correlation matrix is computed
by averaging the separate covariance matrices for all
the groups.
11
 Standardized discriminant function coefficients.
The standardized discriminant function coefficients
are the discriminant function coefficients and are
used as the multipliers
 Structure correlations. Also referred to as
discriminant loadings, the structure correlations
represent the simple correlations between the
predictors and the discriminant function.
 Group means and group standard deviations.
These are computed for each predictor for each
group.
12
 Wilks‘ lambda . Sometimes also called the U statistic,
Wilks‘ λ for each predictor is the ratio of the within-
group sum of squares to the total sum of squares. Its
value varies between 0 and 1.
 Large values of λ (near 1) indicate that group means do
not seem to be different. Small values of λ (near 0)
indicate that the group means seem to be different. It is
(1-R2 ) where R2 is the canonical correlation
 It is used to measure how well each function separates
cases into groups. It also indicates the significance of
the discriminant function and provides the
proportion of total variability not explained.
13
Linear discriminant analysis : Hypothetical example

Groups based on quality accessibility Price


adoption intention (x1) (x2) (x3)
Group A: would adopt

Person 1 8 9 6
Person 2 6 7 5
Person 3 10 6 3
Person 4
9 4 4
Person 5
4 8 2
Group B: would not
adopt
Person 6 5 4 7
Person 7 3 7 2
Person 8 4 5 5
Person 9 2 4 3
Person 10 2 2 2
14
12

10

Mis-classification
8
adopters
QUALITY (X1)

2 Non-adopters

0
0 2 4 6 8 10 12
PERSON
15
10

9
Mis-classification
8

adopters
7

5
ACCESSIBILITTY (X2)

3
Non-adopters
2

0
0 2 4 6 8 10 12
PERSON
16
8
Mis-classification
7
adopters
6

4
PRICE (X3)

1
Non-adopters

0
0 2 4 6 8 10 12
PERSON
Out put :
Function Eigen value % of variance Cumulative % 17
Canonical
correlation
1 3.315 100 100 0.877

Test of Wilk’s lambda Chi-squre d.f. Sig.


functions
1 0.232 9.504 3 0.023

Standardised canonical discrimination function coefficients

Function
1 Discriminant function can be written as
X1 1.110
X2 0.709 Zi = 1.110x1+0.709x2-0.564x3
x3 -0.564

Note : more eigen value and lesser wilk’s lambda preferred


18

Predicting group membership:


 Group centroids are calculated as 10.77 and 4.52.
by taking the mean of respective discriminant
scores of the Group. Thus the cut of score is
average of both = 7.65
 One can predict a person’s choice of dependent
variable i.e. adopting / non – adopting
Multiple discriminant analysis 19

 When we need to discriminate among more


than two groups, we use multiple
discriminant analysis.
 Thistechnique requires fitting g-1 number of
discriminant functions, where g is the
number of groups
 Assumptions remain same for this type too..
 Thebest D will be judged as per the
comparison between functions

You might also like