You are on page 1of 21

Multiple Discriminant Analysis

Multiple Discriminant Analysis


Dependent variable will have more than two values Amount spent on family vacation can be High, medium or low thus it is a three-group discriminant analysis Question of interest is whether the households that spend high, medium or low amounts on their vacations can be differentiated in terms of

Annual family income Attitude towards travel Importance attached to family vacation Household size & Age of the head of the household

Group Means
Amount 1 2 Income 38.57 50.11 Travel 4.5 4 Vacation 4.7 4.2 Hsize 3.1 3.4 Age 50.3 49.5

3
Total

64.97
51.22

6.1
4.87

5.9
4.93

4.2
3.57

56
51.93

Group Standard Deviations Amount 1 2 3 Total Income 5.3 6 8.61 12.8 Travel 1.72 2.36 1.2 1.98 Vacation 1.89 2.49 1.66 2.1 Hsize 1.2 1.51 1.14 1.33 Age 8.1 9.25 7.6 8.57

Group means indicate that income appears to differentiate the 3 groups more widely than any other variable. There is some differentiation on travel and vacation, with group 3 being fairly high on both. Group 1 & 2 are very close on household size and age. Age has a large standard deviation relative to the separation between the groups.

Pooled within-groups correlation matrix Income Income Travel 1 0.0512 1 Travel Vacation Hsize Age

Vacation
Hsize Age

0.3068
0.3805 -0.209

0.036
0.005 -0.34

1
0.2208 -0.01326 1 -0.02512 1

There is some correlation between Hsize & Income ; Vacation & Income. Age has some ve correlation with travel. But these correlations are not very high and hence will not be of concern.

Wilks' Lambda and Univariate F ratio with 2 & 27 degrees of freedom Wilks' Lambda 0.26 0.79 0.88 0.87 0.88

Variable Income Travel Vacation Hsize Age

F 38 3.63 1.83 1.94 1.8

Significance 0 0.04 0.18 0.16 0.18

Univariate F ratios indicates that when the predictors are considered individually, only income and travel are significant in differentiating between the two groups.

Number of discriminant functions

In multiple discriminant analysis, if there are G groups, G-1 discriminant functions can be estimated if the number of predictors is larger than this quantity Thus with G groups and k predictors, it is possible to estimate up to the smaller of G-1 or k discriminant functions The first function has the highest ratio of between-groups to within-groups sum of squares

The second function, uncorrelated with the first has the second highest ratio and so on It is not necessary that all the functions may be statistically significant

Canonical Discriminant Functions Function 1 2 EigenValue 3.82 0.25 Percent of Variance 93.93 6.07 Cumulative Percent 93.93 100 Canonical Correlation 0.89 0.45

Since there are G=3 groups & k=5 predictor variables, the number of discriminant functions will be min(G-1,k)=min(2,5)=2 Eigenvalue associated with the first function is 3.82 & it explains 93.93% of the explained variance. Since it has a large Eigenvalue, function 1 will be superior

After Function 0 1

Wilks 0.17 0.8

Chi-square 44.83 5.52

DF 10 4

Sig. 0 0.24

After Function 0 indicates the significance of the two functions together, whereas Function 1 indicates only function 2 after removal of Function 1

Thus, the two functions together significantly differentiate between the three groups. However, when the first function is removed, the second function is not significant at the 0.05 level. Therefore, the second function does not contribute significantly to the group differences

Standard Canonical Discriminant Function Coefficients Func1 Income Travel Vacation Hsize Age 1.0474 0.33991 -0.14198 -0.16317 0.49474 Func2 -0.42076 0.76851 0.53354 0.12932 0.52447 Pooled withingroups correlations Func1 Func2 -0.27833 0.07749 0.58829 0.45362 0.34079 Income Hsize Vacation Travel Age 0.85556 0.19319 0.21935 0.14899 0.16576

Standardised coefficients indicate a large coefficient for income on func1, whereas travel, vacation and age have a large coefficient on func2

Similarly the correlation matrix indicates that income and hsize have higher correlation on func1 compared to func2.
Vacation, travel and age have higher correlation on func2 compared to func1

Group Centroids Groups 1 2 3 Func1 -2.041 -0.40479 2.44578 Func2 0.41847 -0.65867 0.2402

Group 3 has the highest value on function 1 and since function1 is primarily associated with income and hsize, group 3 will have people with higher income and higher household size. Group 1 is highest on function 2 and Group 2 is lowest. Thus, this function separates these two groups. Since the function is primarily associated with travel, vacation and age, group 1 will be higher than group 2 on these variables

Unstandard Canonical Discriminant Function Coefficients Func1 Income Travel Vacation Hsize Age Constant 0.15427 0.18680 -0.06952 -0.12653 0.05928 -11.09442 Func2 -0.06197 0.42234 0.26127 0.10028 0.06284 -3.79160

Thus the 2 equations will be Funct1= -11.09422+.15427*Income+.18680*Travel-.06952*Vacation.12653*Hsize+.05928*Age Funct2=-3.79160.06197*Income+.42234*Travel+.26127*Vacation+.10028*Hsize+.06284* Age

Analysis Sample Amount Count Original % Hit Ratio Holdout sample Amount Count Original % Hit Ratio Total 1 2 3 75% 1 2 3 Total 1 2 3 86.70% 1 2 3

Predicted Group Membership 1 2 9 1 1 9 0 2 10 12 90 10 10 90 0 20

3 Total 0 10 0 10 8 10 8 30 0 0 80

Predicted Group Membership 1 2 3 1 0 3 1 0 4 4 75 25 0 75 25 0

3 Total 0 4 1 4 3 4 4 12 0 25 75

Three groups of equal size, so by chance one would expect a hit ratio of 1/3 =33.3%. Thus there is large improvement over chance, thus validating the discriminant

Example1

A recent survey asked business people about the concern of hiring and maintaining employees during the current harsh economic environment If an organisation wants to retain its employees, it must learn why people leave their jobs and why others stay and are satisfied with their jobs Discriminant analysis was used to determine what factors explained the differences between salespeople who left a large computer manufacturing company and those who stayed

Example2

Independent variables were Company rating Job security Seven job satisfaction dimensions Four role-conflict dimensions Four role-ambiguity dimensions Nine measures of sales performance Dependent variable was dichotomous Those who stayed and those who left The canonical correlation, an index of discrimination (R=0.4572), was significant (p =.0180) Results indicated that the variables discriminated between those who left and those who stayed

Discriminant Analysis Results 1 Work 2 Promotion 3 Job Security 4 Customer Relations 5 Company Rating 6 Working with others 7 Overall performance 8 Time-territory management 9 Sales produced 10 Presentation skill 11 Technical Information 12 Pay-benefits 13 Quota achieved 14 Management 15 Information collection 16 Family 17 Sales manager 18 Coworker 19 Customer 20 Family 21 Job 22 Job 23 Customer 24 Sales manager 25 Sales manager 26 Customer Coefficients Standardised Coefficients Structure Correlations 0.0903 0.391 0.5446 0.0288 0.1515 0.5044 0.1567 0.1384 0.4958 0.0086 0.1751 0.4906 0.4059 0.324 0.4824

Characteristic profile1

In the example, based on structure correlations, Promotion was identified as the second most important variable. However, looking at standardised discriminant functions, Promotion is not the second most important variable The anamoly arises because of multi-collinearity In such cases, develop a Characteristic Profile for each group

By describing each group in terms of the group means for the predictor variables

Characteristic profile2
Promotion Company Rating Those who stayed 4.5 4 Those who left 2.3 3.83 Overall 3.42 3.92
Clearly promotion is more discriminating the two groups than company rating. Those who stayed with the company are satisfied with the promotions.

Discriminant Analysis using SPSS

Analyse>Classify>Discriminant

Select Analyse from the SPSS menu bar Click Classify and then Discriminant Move criterion variable into the Grouping Variable box

Taken Vacation in the 1st example; Amt spent on vacation in the 2nd example Enter 1- Taken vacation in last 2 years & 2 the rest Enter 1- Low spenders, 2- Medium spenders, 3- High spenders Move Income, Travel, Vacation, Hsize and Age into the Independents box

Click Define Range.

Move predictor variables to the Independents box

Select Enter Independents Together (default option) Click on Statistics. In the pop-up window, in the Descriptives box check Means and Univariate ANOVAS. In the Matrices box check Within Group Correlations. Click Continue Click Classify. In the Display box check Summary Table. In the Use Covariance Matrix box check Within Groups. Click Continue Click OK.

Classroom Problem1

Data on Nike was obtained from 45 respondents. Which of the independent variables discriminate between the 2 types of users of Nike?

Dependent variable

2 types of users of Nike


1- Not so Heavy users 2- Heavy users

Independent variables

Gender

1 Females 2 Males

Awareness Attitude Preference Intention & Loyalty

All these are measured on a 7 point scale where 1- very unfavorable & 7 very favorable

Classroom Problem2

Data on Nike was obtained from 45 respondents. Which of the independent variables discriminate between the 3 types of users of Nike?

Dependent variable

2 types of users of Nike

1- Light users 2- Medium users 3- Heavy users

Independent variables

Gender

1 Females 2 Males

Awareness Attitude Preference Intention & Loyalty

All these are measured on a 7 point scale where 1- very unfavorable & 7 very favorable

Thank you

You might also like