You are on page 1of 36

DISCRIMINANT ANALYSIS

Dealing with associative data the marketing researcher may encounter cases where criterion variable is CATEGORICAL Predictor variables involve interval scaled data. For Example, one may wish to predict whether sales potential in a given marketing territory will be good or bad based on certain measurements regarding the territory's personal disposal income, population density, number of retail outlets etc.

What is Discriminant Analysis The concept of partitioning sums of squares Assumptions

Stepwise discriminant analysis with Wilks' lambda


Testing the goodness-of-fit of the model Determining the significance of predictor variables A 2-group discriminant problem A multi-group discriminant problem

Discriminant Model
Z = a + W1X1 + W2X2 + ... + WkXk Z = discriminant score, a number used to predict group membership of a case a = discriminant constant

Wk = discriminant weight or coefficient, a measure of the extent to which variable Xk discriminates among the groups of the DV
Xk = an IV or predictor variable. Can be metric or nonmetric.
DA uses OLS to estimate the values of the parameters (a) and Wk that minimize the Within Group SS

Some more applications (Two Classes)


How do consumers who are loyal to my brand differ in their demographic profiles from non loyal? How do respondents who show high interest differ in their readership levels of certain magazines from those who show low interest? How do homeowners who select a variable rate mortgage differ in their demographic profiles, mortgage shopping behavior and attitudes, and preferences for mortgage features from homeowners selecting a conventional fixed rate mortgage? How do doctors, lawyers, and bankers differ in terms of their preference ratings of eight different luxury automobiles ?

Are significant demographic differences observed among purchasers of Ceat, Goodyear, MRF, Dunlop and Bridgestone tires? Do long distance, local, and quasi-local/ long distance geographically mobile users differ in individual household demographic and economic characteristics ?

Dependency Technique Dependent variable is nonmetric Independent variables can be metric and/or nonmetric Used to predict or explain a nonmetric dependent variable with two or more a priori categories

Assumptions
Cases should be independent. Predictors shd have a multivariate normal distributions. Within group var-covariance matrices shd be equal. Group membership is assumed to be mutually exclusive and collectively exhaustive. DA is most effective when group membership is truly categorical variable. If group membership is based on values of a continuous variable (e.g. high IQ vs Low IQ), consider using linear regression. Absence of outliers The sample is large enough (n>30) for each predictors.

Predicting A Nonmetric Dep Variable


Two approaches Logistical Regression

with a dummy coded DV Limited to a binary nonmetric dependent variable Makes relatively few restrictive assumptions

Discriminant Analysis

with a nonmetric dependent variable with 2 or more groups Not limited to a binary nonmetric dependent variable Makes several restrictive assumptions

Partition of SS in DA

In Linear Regression The Total SS ( Y- ) 2 is partitioned into Regression SS ( Y'- ) 2 Residual SS + ( Y'- Y) 2 ( Y- ) 2 = ( Y'- ) 2 + ( Y'- Y) 2 Goal Estimate parameters that minimize the Residual SS

In Discriminant Analysis

TSS ( Zi- ) 2 is partitioned into: Bet Group SS ( Zj- ) 2 + Within Groups SS ( Zij- j) 2 ( Zi- ) 2 = ( Zj- ) 2 + ( Zij- j) 2
i = an individual case, j = group j Zi = individual discriminant score = grand mean of the discriminant scores j = mean discriminant score for group j

Goal Estimate parameters that minimize the Within Group SS

Steps in DA Process
Specify the dependent & the predictor variables Test the models assumptions a priori Determine the method for selection and criteria for entering the predictor variables into the model Estimate the parameters of the model Determine the goodness-of-fit of the model and examine the residuals Determine the significance of the predictors Test the assumptions ex post facto Validate the results

Are the Var-Cov matrix be Homogeneous?


Box's M test

H0: the variance/covariance matrices of the two groups are the same in the population.
Box's M = 0.361,

Approximate F = 0.116, p = 0.951


Conclusion: The null hypothesis with respect to the homogeneity of variance/covariance matrices in the population is accepted.
Log Determinants Log Determinant 3.076 2.988 3.040

Test Results Box's M F Approx. df1 df2 Sig. .361 .116 3 1476249 .951

TYPE_SEN .00 1.00 Pooled within-groups

Rank 2 2 2

The ranks and natural logarithms of determinants printed are those of the group cov ariance matrices.

Tests null hypothesis of equal population covariance matrices.

How will predictors be entered in DA?

SPSS offers two methods for building a discriminant model Entering all the variables simultaneously Stepwise method by using Wilks' lambda criterion.

= WSS/TSS

Step 1: Compute four one-way ANOVAs with type_sent as the IV and each of the four predictor variables as the DVs
Step 2: Identify the predictor variable that has the lowest significant and enter it into the model, i.e. (P in = 0.05)

Variables in the Analysis Sig. of F to Remov e .000 .000 .019 Wilks' Lambda .983 .832

Step 1 2

SER_INDX SER_INDX DR_SCORE

Tolerance 1.000 .864 .864

Step 3: Estimate the parameters of the resulting discriminant model

Step 4: Of the variables not in the model, select the predictor that has the lowest significant and enter it into the model. Determine if the addition of the variable was significant. Now check if the predictor(s) previously entered are still significant. (P out default = 0.10)

Step5: Repeat Step 4 until all the predictor variables are entered into the model or until none the variables outside the model have significant 's.

Significance of change by entering a variable

F=

1 - ( k-1) / (k) ( k-1) / (k)


Where;

(N - g - 1) (g - 1)

= WSS / TSS of the function

N = total sample size


g = number of DV groups df = (N - g - 1) and (g - 1)

Measurement of Goodness of fit Model Eigenvalues () The Canonical Correlation eta () Wilks' Lambda () Classification Table Hit Ratio t-test of the Hit Ratio Maximum Chance Criteria Proportional Chance Criteria Presss Q Statistic Casewise Plot of the Predictions

What is an Eigen value in DA?


= BSS / WSS = [ ( Zj- Z)2 / ( Zij- Zj)2 ]

If = 0.00, the model has no discriminatory power as BSS = 0.0


The larger the value of , the greater the discriminatory power of the model

The canonical correlation () = / (1 + ) = BSS / TSS = the corr of the predictor(s) with Discriminant scores 2 = coefficient of determination 1 - 2 = coefficient of non-determination

The Wilks' for the discriminant model = (1 - 2) = [ 1 / (1 + ) ] = WSS / TSS ~~ 2 = - [(n - 1) - 0.5 (m + k + 1)] ln df= (k - 1)
Eigenvalues Canonical Correlation .483

Function Eigenv alue % of Variance Cumulativ e % 1 .305a 100.0 100.0 analy sis.

a. First 1 canonical discriminant f unctions were used in the

Wilks' Lambda Wilks' Test of Function(s) Lambda Chi-square 1 .766 17.837

df 2

Sig. .000

How well Does Model Predict?


a Classification Results

Original

Count %

TYPE_SEN .00 1.00 .00 1.00

Predicted Group Membership .00 1.00 27 10 14 19 73.0 27.0 42.4 57.6

Total 37 33 100.0 100.0

a. 65.7% of original grouped cases correctly classif ied.


Overall hit ratio = 65.7%
Correctly classified Class 0 = 73.0% Correctly classified Class 1 = 57.6%

How can a Cutting Score be calculated to sort cases in groups? When n0 = n1 Zcutting = (Z0 + Z1) / 2 (Zj = mean Discriminant score for group j) When n0 n1 Zcutting = (n0 Z0 + n1 Z1) / 2

Casewise Statistics Discrimin ant Scores

Highest Group Squared Mahalanobis Distance to Centroid 3.040 2.000 3.914 1.621 1.390 .144 3.040 .722 .225 1.390

Second Highest Group

Case Number Actual Group Original 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 1 **. Misclassified case

Predicted Group 0 0 0 0 0 0 0 0 0 0**

P(D>d | G=g) p df .081 .157 .048 .203 .238 .704 .081 .395 .635 .238

1 1 1 1 1 1 1 1 1 1

P(G=g | D=d) .932 .905 .946 .891 .880 .755 .932 .837 .773 .880

Group 1 1 1 1 1 1 1 1 1 1

P(G=g | D=d) .068 .095 .054 .109 .120 .245 .068 .163 .227 .120

Squared Mahalanobis Distance to Centroid Function 1 8.031 -2.258 6.273 -1.928 9.418 -2.493 5.588 -1.787 5.151 -1.693 2.162 -.894 8.031 -2.258 3.765 -1.364 2.448 -.988 5.151 -1.693

Is each Predictor in the Model Significant?


By conducting MANOVA with grouping variable as IV and Discriminant predictors as DV. The MANOVA SS are used for = WSS / TSS H0 : the Discriminant coeff = 0

Variables in the Analysis Sig. of F to Remov e .000 .000 .019 Wilks' Lambda .983 .832

Step 1 2

Tolerance SER_INDX 1.000 SER_INDX .864 DR_SCORE .864

Relative Impact of Predictor Variables


Two ways Compare the standardized Discriminant weights, i.e. coefficients Compare the structure coefficients, also called the Discriminant loadings

Canonical Discriminant Function CoefficientsStandardized Canonical Discriminant Function Coefficients Function 1 DR_SCORE -.235 SER_INDX .564 (Constant) -.706 Unstandardized coefficients

Function 1 DR_SCORE -.625 SER_INDX 1.044

Structure Coefficient
A structure coefficient, or Discriminant loading, is the correlation between a predictor variable and the Discriminant scores . The higher the absolute value of the coefficient, the greater the discriminatory impact of the predictor variable on the DV.
Structure Matrix Function 1 .814 -.240 -.194 -.185

SER_INDX DR_SCORE a SKL_INDX a AGE_FIRS

Pooled within-groups correlations between discriminating v ariables and standardized canonical discriminant f unctions Variables ordered by absolute size of correlation within f unction. a. This v ariable not used in the analy sis.

A Numerical Example
Suppose that a marketing researcher is interested in consumers attitudes toward nutritional additives in ready-to-eat cereals. Specially, a set of written concept descriptions of a childrens cereal is prepared in which two characteristics of the cereal are varied: X1 : the amount of protein (in gms) per 2 ounce serving, and X2 : The percentage of minimum daily requirements of vitamin D per 2-ounce serving. Y : Consumers are simply asked to either classify dislike or like.

Person Evaluation X1
1 2 3 4 5 6 7 8 9 10 Dis Dis Dis Dis Dis Like Like Like Like like 2 3 4 5 6 7 8 9 10 11

X2
4 2 5 4 7 6 4 7 6 9

Mean Dis (X1) = 4; Mean Dis (X2) = 4.4

Mean Like (X1) = 9; Mean Like (X2) = 6.4


G mean (X1) = 6.5; G Mean (X2) = 5.4

Scatter Diagram
Two groups are much more widely separated on X1 than X2. If we are forced to choose just one of the axes, X1 is better than X2. However, there is information provided by X2. If some linear composite of both X1 and X2 could do better than X1 alone ?

Z = k1 X1 + k2 X2 Where k1 & k2 are weights

How to define Variability


There are several ways. Ratio of variability of two groups means on the composite around grand mean and pooled variability of individual cases around their respective groups means. Computing Discriminant weights z = .368 X1 - 0.147 X2 Find discriminant scores for Centroids of the two groups and grand mean Mean (Z-dislike) = .368 (4) - .147 (4.4) = .824 Mean (Z-Like) = .368(9) - .147 (6.4) = 2.368

Discriminant Criterion
As stated earlier, discriminant function represents to maximize the ratio of between group to within group variability. Dis-likers Likers Person Dis Score Person Dis Score 1 0.148 6 1.691 2 0.809 7 2.353 3 0.735 8 2.279 4 1.250 9 2.794 5 1.176 10 2.721 Mean 0.824 Mean 2.368 Gmean =1.596 BetGroup Var =5(.824-1.596)2+ 5(2.368-1.596)2 =5.096 Within =(0.148-0.824)2 +..+ (2.721-2.368)2 = 1.544 C = 5.096 / 1.544 = 3.86

We can now see whether using X1, while ignoring X2, can produce higher discriminant ratio. Between Group var = 5(4-6.5)2 + 5(9-6.5)2 = 62.5 Pooled within var = (2-4)2 + .+(11-9)2 = 20 C = 62.5 / 20 = 3.125 ----- Of course less than best. Optimal function Z = .368 X1 - 0.147 X2 in the Z, X2 receives a negative weight. More important, X2 is highly correlated with X1 and account for some of the error variance in X1. X2 serves as a suppressor variable.

Is the function statistically significant ?


While the dis fn does perfectly in classifying the ten cases, we still have not tested whether group centroid differ significantly. It is analogous to testing for R2 in regression analysis. It is done by Mahalanobis Squared Distance. Test of group variability can be done by F test also. Canonical Correlation coefficient is also used to relate the discriminant function to the groups. Residual Discrimination (Wilks Lambda). If Lambda is nearer to zero -- high discrimination if close to one -- high similarity.

Multiple Discriminant Analysis


Here, more than one discriminant function may be computed. In general, with G groups an m predictors, we can find lessor of G-1 or m discriminant function. Not all functions are statistically significant. The first one accounts for highest proportion of the among-to within group variability an so on.

Application of SPSS
At stage one, F ratios are computed for each predictor. That is ,a series of univariate ANOVA across the group are performed involving each predictor separately. The variable with largest F is entered first, and discrimination is effected with respect to this variable only. A second variable is the added on the basis of largest adjusted F value, conditioned by already entered variable. Each variable entered is then tested for retention on the basis of its association with other predictors. The process continues till all variables that pass significance level for inclusion and retention.

At the conclusion of stepwise procedure, a summary of the predictors entered or removed and associated Wilks lambda are presented. Classification results of all cases in groups are also presented. For each canonical discriminant function computed the output shows its associated canonical correlation and wilks lambda, standardized coefficient are calculated.

Thank you.

You might also like