Discriminant Analysis

1
Discriminant analysis
Discriminant Analysis
Discriminant analysis is a statistical procedure which
allows us to classify cases/observations/respondents in
separate categories to which they belong on the basis of
a set of characteristic independent variables called
predictors or discriminant variables
The target variable (the one determining allocation into
groups) is a qualitative (nominal or ordinal) one, while
the characteristics are measured by quantitative
variables.
DA looks at the discrimination between two groups
Multiple discriminant analysis (MDA) allows for
classification into three or more groups.
Objective
Development of discriminant functions, or linear
combinations of the predictor or independent
variables, which will best discriminate between the
categories of the criterion or dependent variable
(groups).
Examination of whether significant differences exist
among the groups, in terms of the predictor variables.
Determination of which predictor variables contribute
to most of the intergroup differences.
Classification of cases to one of the groups based on
the values of the predictor variables.
Evaluation of the accuracy of classification.
When the criterion variable has two categories, the technique is
known as two-group discriminant analysis.
When three or more categories are involved, the technique is
referred to as multiple discriminant analysis.
The main distinction is that, in the two-group case, it is possible
to derive only one discriminant function. In multiple
discriminant analysis, more than one function may be
computed. In general, with G groups and k predictors, it is
possible to estimate up to the smaller of G - 1, or k, discriminant
functions.
The first function has the highest ratio of between-groups to
within-groups sum of squares. The second function,
uncorrelated with the first, has the second highest ratio, and so
on. However, not all the functions may be statistically
significant.
Geometric Interpretation
X
2
X
1
G1
G2
D
G1
G2
2
2
2
2 2
2
2 2
2 2
2
1 1
1
1
2 2 2
2 1
1 1
1 1
1 1 1
1
1
6
Applications
DA is especially useful to understand the differences
and factors leading consumers to make different
choices allowing themto develop marketing
strategies which take into proper account the role of
the predictors.
Examples
Determinants of customer loyalty
Shopper profiling and segmentation
Determinants of purchase and non-purchase
Applications
Whether a bank should offer a loan to a new
customer?
Which customer is likely to buy?
Identify patients who may be at high risk for
problems after surgery
Assumptions
Multivariate normality of the independent
variables.
Equal variance and covariance for the
groups.
Box M test : test of homogeneity of
variance-covariance matrices
Insignificant test implies use of
discriminant analysis
Discriminant Analysis Model
The discriminant analysis model involves linear combinations of
the following form:
D = b
0
+ b
1
X
1
+ b
2
X
2
+ b
3
X
3
+ . . . + b
k
X
k
Where:
D = discriminant score
b 's = discriminant coefficient or weight
X 's = predictor or independent variable
The coefficients, or weights (b), are estimated so that the groups differ as
much as possible on the values of the discriminant function.
This occurs when the ratio of between-group sum of squares to within-group
sum of squares for the discriminant scores is at a maximum.
Canonical correlation. Canonical correlation measures the
extent of association between the discriminant scores and the
groups. It is a measure of association between the single
discriminant function and the set of variables that define the
group membership.
Centroid. The centroid is the mean values for the discriminant
scores for a particular group. There are as many centroids as
there are groups, as there is one for each group. The means
for a group on all the functions are the group centroids.
Classification matrix. Sometimes also called prediction matrix,
the classification matrix contains the number of correctly
classified and misclassified cases.
Statistics Associated with
Discriminant function coefficients. The discriminant function
coefficients (unstandardized) are the multipliers of variables,
when the variables are in the original units of measurement. It
is used to determine to determine discriminant score
Discriminant scores. The unstandardized coefficients are
multiplied by the values of the variables. These products are
summed and added to the constant term to obtain the
discriminant scores.
Eigenvalue. For each discriminant function, the Eigenvalue is
the ratio of between-group to within-group sums of squares.
Large Eigenvalues imply superior functions.
F values and their significance. These are calculated from a
one-way ANOVA, with the grouping variable serving as the
categorical independent variable. Each predictor, in turn,
serves as the metric dependent variable in the ANOVA.
Group means and group standard deviations. These are
computed for each predictor for each group.
Pooled within-group correlation matrix. The pooled within-
group correlation matrix is computed by averaging the separate
covariance matrices for all the groups.
Standardized discriminant function coefficients. The standardized
discriminant function coefficients are the discriminant function coefficients
and are used as the multipliers when the variables have been standardized
to a mean of 0 and a variance of 1. It is used to determine rank of the
variables
Structure matrix (correlations). Also referred to as discriminant loadings,
the structure correlations represent the simple correlations between the
predictors and the discriminant function.
Total correlation matrix. If the cases are treated as if they were from a
single sample and the correlations computed, a total correlation matrix is
obtained.
Statistics Associated with Discriminant Analysis
Wilks
Sometimes also called the U statistic, Wilks'
for each predictor is the ratio of the within-
group sum of squares to the total sum of
squares. Its value varies between 0 and 1.
Large values of (near 1) indicate that group
means do not seem to be different. Small
values (near 0) indicate that the group
means seem to be different. It adds predictive
power to the discriminant function
Conducting Discriminant Analysis
Assess Validity of Discriminant Analysis
Estimate the Discriminant Function Coefficients
Determine the Significance of the Discriminant Function
Formulate the Problem
Interpret the Results
Formulate the Problem
Identify the objectives, the criterion variable, and the
independent variables.
The criterion variable must consist of two or more mutually
exclusive and collectively exhaustive categories.
The predictor variables should be selected based on a
theoretical model or previous research, or the experience of the
researcher.
One part of the sample, called the estimation or analysis
sample, is used for estimation of the discriminant function.
The other part, called the holdout or validation sample, is
reserved for validating the discriminant function.
Often the distribution of the number of cases in the analysis and
validation samples follows the distribution in the total sample.
Example
To determine the salient characteristics of
families visited a vacation resort during the
last two years.
Visited-1
not visited 2
Information on Resort Visits: Analysis Sample
Annual Attitude Importance Household Age of Amount
Resort Family Toward Attached Size Head of Spent on
No. Visit Income Travel to Family Household Family
($000) Vacation Vacation
1 1 50.2 5 8 3 43 M (2)
2 1 70.3 6 7 4 61 H (3)
3 1 62.9 7 5 6 52 H (3)
4 1 48.5 7 5 5 36 L (1)
5 1 52.7 6 6 4 55 H (3)
6 1 75.0 8 7 5 68 H (3)
7 1 46.2 5 3 3 62 M (2)
8 1 57.0 2 4 6 51 M (2)
9 1 64.1 7 5 4 57 H (3)
10 1 68.1 7 6 5 45 H (3)
11 1 73.4 6 7 5 44 H (3)
12 1 71.9 5 8 4 64 H (3)
13 1 56.2 1 8 6 54 M (2)
14 1 49.3 4 2 3 56 H (3)
15 1 62.0 5 6 2 58 H (3)
Information on Resort Visits: Analysis Sample
Table 18.2, cont.
16 2 32.1 5 4 3 58 L (1)
17 2 36.2 4 3 2 55 L (1)
18 2 43.2 2 5 2 57 M (2)
19 2 50.4 5 2 4 37 M (2)
20 2 44.1 6 6 3 42 M (2)
21 2 38.3 6 6 2 45 L (1)
22 2 55.0 1 2 2 57 M (2)
23 2 46.1 3 5 3 51 L (1)
24 2 35.0 6 4 5 64 L (1)
25 2 37.3 2 7 4 54 L (1)
26 2 41.8 5 1 3 56 M (2)
27 2 57.0 8 3 2 36 M (2)
28 2 33.4 6 8 2 50 L (1)
29 2 37.5 3 2 3 48 L (1)
30 2 41.3 3 3 2 42 L (1)
Information on Resort Visits:
Holdout Sample
1 1 50.8 4 7 3 45 M(2)
2 1 63.6 7 4 7 55 H (3)
3 1 54.0 6 7 4 58 M(2)
4 1 45.0 5 4 3 60 M(2)
5 1 68.0 6 6 6 46 H (3)
6 1 62.1 5 6 3 56 H (3)
7 2 35.0 4 3 4 54 L (1)
8 2 49.6 5 3 5 39 L (1)
9 2 39.4 6 5 3 44 H (3)
10 2 37.0 2 6 5 51 L (1)
11 2 54.5 7 3 3 37 M(2)
12 2 38.2 2 2 3 49 L (1)
Estimate the Discriminant Function Coefficients
The direct method involves estimating the discriminant
function so that all the predictors are included
simultaneously.
In stepwise discriminant analysis, the predictor
variables are entered sequentially, based on their ability
to discriminate among groups.
Direct method
Test Results
Box's M
28.744
F Approx.
1.048
df1
21
df2
2.884E3
Sig.
.400
Tests null hypothesis of equal
population covariance matrices.
Group Statistics
Resort visit Mean
Std.
Deviation
1 annual family
income
60.5200 9.83065
attitue towards
travel
5.4000 1.91982
importance
attached to family
vacation
5.8000 1.82052
household size
4.3333 1.23443
age of head of
household
53.7333 8.77062
amount spent on
family vacation
2.6000 .63246
2 annual family
income
41.9133 7.55115
attitue towards
travel
4.3333 1.95180
importance
attached to family
vacation
4.0667 2.05171
household size
2.8000 .94112
age of head of
household
50.1333 8.27101
amount spent on
family vacation
1.4000 .50709
Tests of Equality of Group Means
Wilks'
Lambda F df1 df2 Sig.
annual family
income
.453 33.796 1 28 .000
attitude towards
travel
.925 2.277 1 28 .143
importance
attached to
family vacation
.824 5.990 1 28 .021
household size .657 14.636 1 28 .001
age of head of
household
.954 1.338 1 28 .257
amount spent on
family vacation
.460 32.870 1 28 .000
Eigenvalues
Fun
ctio
n
Eigenva
lue
% of
Variance
Cumulativ
e %
Canonical
Correlatio
n
1 2.310
a
100.0 100.0 .835
a. First 1 canonical discriminant functions
were used in the analysis.
Wilks' Lambda
Test
of
Fun
ctio
n(s)
Wilks'
Lambda
Chi-
square df Sig.
1 .302 29.925 6 .000
Overall Model Fit
Interpretation of overall fit
Square canonical correlation and get % of
variation in the dependent variable can be
accounted for by the model
Standardized Canonical Discriminant
Function Coefficients
Function
1
annual family income .130
attitude towards travel -.006
importance attached to family
vacation
.339
household size .560
age of head of household
.028
amount spent on family
vacation
.748
Canonical Discriminant Function Coefficients
Function
1
attitue towards travel -.003
vacation
.175
household size .510
age of head of household .003
amount spent on family vacation 1.304
(Constant) -6.212
Unstandardized coefficients
Function
1
attitude towards travel -.006
importance attached to
family vacation
.339
household size .560
age of head of household .028
vacation
.748
Structure Matrix
Function
1
annual family income
.723
amount spent on
family vacation
.713
household size
.476
importance attached
to family vacation
.304
attitue towards travel
.188
age of head of
household
.144
Discriminant Loadings
Discriminant loadings in structure matrix gives the unique impact of each
Independent variable. It gives association of each variable with discriminant score
Functions at Group Centroids
Resort visit
Function
1
1 1.468
2 -1.468
Unstandardized canonical discriminant
functions evaluated at group means
Group means of discriminant functions
Group centroids mesasure relative position of each group on the discriminant
functions such that overall average becomes zero. It is used to find out cut score
Classification Statistics
Prior Probabilities for Groups
Resort
visit Prior
Cases Used in
Analysis
Unweig
hted
Weight
ed
1 .500 15 15.000
2 .500 15 15.000
Total 1.000 30 30.000
Classification Function Coefficients
Resort visit
1 2
annual family income .801 .757
attitue towards travel 1.606 1.614
vacation
.796 .282
household size 3.044 1.545
age of head of household .869 .860
vacation
-2.692 -6.522
(Constant) -58.014 -39.771
Fisher's linear discriminant functions
Once the discriminant functions are determined
groups are differentiated, the utility of these
functions can be examined via their ability to
correctly classify each data point to their a priori
groups. Classification functions are derived from
the linear discriminant functions to achieve this
purpose. Different classification functions are
used and equations exist that are best suited for
equal or unequal samples in each group.
Fishers Linear Discriminant Function
Classification Results
b,c
Resort visit
Predicted Group
Membership
Total 1 2
Original Count 1
13 2 15
2
0 15 15
% 1
86.7 13.3 100.0
2
.0 100.0 100.0
Cross-
validated
a
Count 1
11 4 15
2
2 13 15
% 1
73.3 26.7 100.0
2
13.3 86.7 100.0
a. Cross validation is done only for those cases in the analysis.
In cross validation, each case is classified by the functions
derived from all cases other than that case.
b. 93.3% of original grouped cases correctly classified.
c. 80.0% of cross-validated grouped cases correctly
classified.
Determine the Significance of Discriminant Function
The null hypothesis that, in the population, the means of all
discriminant functions in all groups are equal can be statistically
tested.
In SPSS this test is based on Wilks' . If several functions are
tested simultaneously (as in the case of multiple discriminant
analysis), the Wilks' statistic is the product of the univariate for
each function. The significance level is estimated based on a chi-
square transformation of the statistic.
If the null hypothesis is rejected, indicating significant
discrimination, one can proceed to interpret the results.
l
Interpret the Results
The interpretation of the discriminant weights, or coefficients, is similar to that in
multiple regression analysis.
Given the multicollinearity in the predictor variables, there is no unambiguous
measure of the relative importance of the predictors in discriminating between
the groups.
With this caveat in mind, we can obtain some idea of the relative importance of
the variables by examining the absolute magnitude of the standardized
discriminant function coefficients.
Some idea of the relative importance of the predictors can also be obtained by
examining the structure correlations, also called canonical loadings or
discriminant loadings. These simple correlations between each predictor and the
discriminant function represent the variance that the predictor shares with the
function.
Another aid to interpreting discriminant analysis results is to develop a
Characteristic profile for each group by describing each group in terms of the
group means for the predictor variables.
Many computer programs, such as SPSS, offer a leave-one-out cross-
validation option.
The discriminant weights, estimated by using the analysis sample, are
multiplied by the values of the predictor variables in the holdout
sample to generate discriminant scores for the cases in the holdout
sample. The cases are then assigned to groups based on their
discriminant scores and an appropriate decision rule. The hit ratio, or
the percentage of cases correctly classified, can then be determined
by summing the diagonal elements and dividing by the total number
of cases.
It is helpful to compare the percentage of cases correctly classified
by discriminant analysis to the percentage that would be obtained
by chance. Classification accuracy achieved by discriminant analysis
should be at least 25% greater than that obtained by chance.
Maximum Chance Criterion based on sample size of largest
group. In case of equal proportion, 0.5.
Proportional Chance Criterion assumption: cost of
misclassification is equal : Calculate ; p:
proportion cases in the total sample belong to first group
Note: Hit ratio (classification accuracy level) should exceed
selected comparison standard by 25%.. e.g. for proportional
chance criterion, classification accuracy should cross the
threshold of (cp*1.25)
2 2
) 1 ( p p c
p

New Observation
Mahalanobis distances (obtained from the Method
Dialogue Box) are used to analyse cases as it is the
distance between a case and the centroid for each
group of the dependent.
So a new case or cases can be compared with an
existing set of cases. A new case will have one distance
for each group and therefore can be classified as
belonging to the group for which its distance is
smallest. Mahalanobis distance is measured in terms of
SD from the centroid, therefore a case that is more
than 1.96 Mahalanobis distance units from the
centroid has a less than 5% chance of belonging to that
group.
Method: Stepwise
Stepwise Discriminant Analysis
Stepwise discriminant analysis is analogous to stepwise multiple
regression in that the predictors are entered sequentially based
on their ability to discriminate between the groups.
An F ratio is calculated for each predictor by conducting a
univariate analysis of variance in which the groups are treated
as the categorical variable and the predictor as the criterion
variable.
The predictor with the highest F ratio is the first to be selected
for inclusion in the discriminant function, if it meets certain
significance and tolerance criteria.
A second predictor is added based on the highest adjusted or
partial F ratio, taking into account the predictor already
selected.
Each predictor selected is tested for retention based on its
association with other predictors selected.
The process of selection and retention is continued until all
predictors meeting the significance criteria for inclusion and
retention have been entered in the discriminant function.
The selection of the stepwise procedure is based on the
optimizing criterion adopted. The Mahalanobis procedure is
based on maximizing a generalized measure of the distance
between the two closest groups.
The order in which the variables were selected also indicates
their importance in discriminating between the groups.
Stepwise Discriminant Analysis
Variables Entered/Removed
a,b,c,d
Step Entered Removed
Min. D Squared
Statistic
Between
Groups
Exact F
Statistic df1 df2 Sig.
1 annual
family
income
4.506 1.00 and 2.00 33.796 1 28.000 3.027E-6
2 househol
d size
5.978 1.00 and 2.00 21.616 2 27.000 2.484E-6
3 amount
spent on
family
vacation
7.681 1.00 and 2.00 17.832 3 26.000 1.711E-6
4 annual
family
income
7.452 1.00 and 2.00 26.947 2 27.000 3.686E-7
At each step, the variable that maximizes the Mahalanobis distance between the two closest groups is
entered.
a. Maximum number of steps is 12.
b. Minimum partial F to enter is 3.84.
c. Maximum partial F to remove is 2.71.
d. F level, tolerance, or VIN insufficient for further computation.
Variables in the Analysis
Step Tolerance
F to
Remove
Min. D
Squared
Between
Groups
1 annual family
income
1.000 33.796
2 annual family
income
.992 19.123 1.952
1.00 and
2.00
household size
.992 4.823 4.506
1.00 and
2.00
3 annual family
income
.489 .533 7.452
1.00 and
2.00
household size
.901 6.955 5.272
1.00 and
2.00
amount spent on
family vacation
.480 4.561 5.978
1.00 and
2.00
4 household size
.975 10.211 4.383
1.00 and
2.00
amount spent on
family vacation
.975 26.124 1.952
1.00 and
2.00
Variables Not in the Analysis
Step Tolerance Min. Tolerance F to Enter Min. D Squared
Between
Groups
0 annual family income 1.000 1.000 33.796 4.5061.00 and 2.00
attitue towards travel 1.000 1.000 2.277 .3041.00 and 2.00
family vacation
1.000 1.000 5.990 .7991.00 and 2.00
household size 1.000 1.000 14.636 1.9521.00 and 2.00
age of head of household 1.000 1.000 1.338 .1781.00 and 2.00
vacation
1.000 1.000 32.870 4.3831.00 and 2.00
1 attitue towards travel .961 .961 .059 4.5241.00 and 2.00
family vacation
.992 .992 1.617 5.0001.00 and 2.00
household size .992 .992 4.823 5.9781.00 and 2.00
age of head of household 1.000 1.000 .672 4.7111.00 and 2.00
vacation
.529 .529 2.511 5.2721.00 and 2.00
family vacation
.988 .985 1.054 6.3711.00 and 2.00
age of head of household .998 .990 .680 6.2321.00 and 2.00
vacation
.480 .480 4.561 7.6811.00 and 2.00
family vacation
.931 .453 2.050 8.6171.00 and 2.00
age of head of household .930 .448 .063 7.7101.00 and 2.00
4 annual family income .489 .480 .533 7.6811.00 and 2.00
attitue towards travel .957 .933 .020 7.4611.00 and 2.00
Wilks' Lambda
Step
Number of
Variables
Lambd
a df1 df2 df3
Exact F
Statisti
c df1 df2 Sig.
1
1 .453 1 1 28 33.796 1 28.000 .000
2
2 .384 2 1 28 21.616 2 27.000 .000
3
3 .327 3 1 28 17.832 3 26.000 .000
4
2 .334 2 1 28 26.947 2 27.000 .000
Eigenvalues
Func
tion
Eigenval
ue
% of
Variance
Cumulativ
e %
Canonical
Correlation
1 1.996
a
100.0 100.0 .816
a. First 1 canonical discriminant functions were
used in the analysis.
Wilks' Lambda
Test of
Functio
n(s) Wilks' Lambda Chi-square df Sig.
1 .334 29.627 2 .000
Function
1
household size .650
vacation
.870
Multiple Discriminant Analysis (MDA)
Households are classified into three categories
viz. high, medium, low.
Question of interest is whether the
households that spend high, medium or low
amounts on their vacations can be
differentiated in terms of annual income,
attitude towards travel, importance attached
to family vacations, household size and age of
the head of household.
Group Statistics
amount spent on family vacation Mean Std. Deviation
1 Resort visit 1.9000 .31623
annual family income 38.5700 5.29718
vacation
4.7000 1.88856
age of head of household 50.3000 8.09732
2 Resort visit 1.6000 .51640
vacation
4.2000 2.48551
3 Resort visit 1.0000 .00000
vacation
5.9000 1.66333
Total Resort visit 1.5000 .50855
vacation
4.9333 2.09981
Tests of Equality of Group Means
Wilks'
Lambda F df1 df2 Sig.
Resort visit
.440 17.182 2 27 .000
annual family
income .262 37.997 2 27 .000
attitue towards
travel .788 3.634 2 27 .040
importance attached
to family vacation
.881 1.830 2 27 .180
household size
.874 1.944 2 27 .163
age of head of
household .882 1.804 2 27 .184
All-Groups Scattergram
-4.0
Across: Function 1
Down: Function 2
4.0
0.0
-6.0 4.0 0.0 -2.0 -4.0 2.0 6.0
1
1
1
1
1
1
1 1
1
2
12
2
2
2
2 3 3
3
3
3
3
3
2
3
*
*
*
* indicates a group centroid
Territorial Map
-4.0
Across: Function 1
Down: Function 2
4.0
0.0
-6.0 4.0 0.0 -2.0 -4.0 2.0 6.0
1
1 3
*
-8.0
-8.0
8.0
8.0
1 3
1 3
1 3
1 3
1 3
1 3
1 3
1 1 2 3
1 1 2 2 3 3
1 1 2 2
1 1 1 2 2 2 2 3 3
1 1 1 2 2
1 1 2 2
1 1 2 2
1 1 1 2 2
1 1 2 2
1 1 2 2
1 1 1 2 2
1 1 1 2 2
1 1 2 2 2
2 2 3
2 3 3
2 2 3 3
2 2 3
2 2 3
2 2 3
2 2 3 3
2 3 3
2 3 3
2 3 3
*
*
* Indicates a group
centroid

Discriminant Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discriminant Analysis

Uploaded by

Copyright:

Available Formats

1

You might also like