You are on page 1of 7

Example of Two-group Discriminant Analysis

Suppose we want to analyze the salient characteristics of families that have


visited a vacation resort during the last two years. Data were obtained from a
pretest sample of 42 households. Of these 30 households were included in the
analysis sample. The households that visited a resort during the last two years
are coded as 1= those that visited and 2 = that did not visited (VISIT). Data
were also obtained on annual family income (INCOME), attitude towards travel
(TRAVEL, measured on a 9-point scale), importance attached to family
vacation (VACATION, measured on a 9-point scale), household size (HSIZE),
and age of head of the household (AGE).

RESULTS AND DISCUSSION


1. Estimation of Discriminant Function Coefficients

Group Statistics
Valid N (listwise)
Resort Visit
Visited

Mean

Std. Deviation

Unweighted

Weighted

Annual Family Income

60.520

9.8307

15

15.000

Attitude Towards Travel

5.400

1.9198

15

15.000

Importance attached to

5.800

1.8205

15

15.000

4.333

1.2344

15

15.000

Age of Head of Household

53.733

8.7706

15

15.000

Annual Family Income

41.913

7.5511

15

15.000

Attitude Towards Travel

4.333

1.9518

15

15.000

Importance attached to

4.067

2.0517

15

15.000

2.800

.9411

15

15.000

Age of Head of Household

50.133

8.2710

15

15.000

Annual Family Income

51.217

12.7952

30

30.000

Attitude Towards Travel

4.867

1.9780

30

30.000

Importance attached to

4.933

2.0998

30

30.000

3.567

1.3309

30

30.000

51.933

8.5740

30

30.000

Family Vacation
Household size

Did not visited

Family Vacation
Household size

Total

Family Vacation
Household size
Age of Head of Household

It appears from the mean value that two groups are more widely separated in terms
of income than other variables. There appears to be more separation on the
importance attached to family vacation than on attitude towards travel. The
difference between the two groups on age of the households is small, and the
standard deviation of this variable is large.

The pooled Within0Group correlation matrix indicates low


correlations between the independent variables. Multicollinearity
is unlikely to be a problem.

Tests of Equality of Group Means


Wilks' Lambda

df1

df2

Sig.

Annual Family Income

.453

33.796

28

.000

Attitude Towards Travel

.925

2.277

28

.143

Importance attached to

.824

5.990

28

.021

Household size

.657

14.636

28

.001

Age of Head of Household

.954

1.338

28

.257

Family Vacation

The significance of univariate F-ratios indicates that when the independent


variables are considered individually, only income, Importance attached to family
vacation and household size significantly differentiate between those who visited a
resort and those who did not.
Eigenvalues
Canonical
Function
1

Eigenvalue
1.786a

% of Variance
100.0

Cumulative %
100.0

a. First 1 canonical discriminant functions were used in the analysis.

Correlation
.801

Because there are two groups, only one discriminant function is estimated. The
eigenvalue associated with this function is 1.786. The canonical correlation
associated with this function is 0.801. The square of correlation is (0.801)2 = 0.64,
indicates that 64 percent of the variance in the dependent variable is explained or
accounted for by this model.
2. Determine the Significance of the Discriminant Function
The null hypothesis is given below:
Ho: In the population, the means of all discriminant functions in all groups are
equal.
This test is based on the value of Wilks .
Wilks' Lambda
Test of
Functio
n(s)
1

Wilks' Lambda
.359

Chi-square
26.130

df

Sig.
5

.000

The Wilks associated with the function is 0.359, which transform to a chi-square
value of 26.130 with 5 degrees of freedom. This is significant at 5% level of
significance. Hence null hypothesis is rejected.
3. Interpretation of Results

Standardized Canonical Discriminant


Function Coefficients
Function
1
Annual Family Income

.743

Attitude Towards Travel

.096

Importance attached to

.233

Family Vacation
Household size

.469

Age of Head of Household

.209

An examination of standardized discriminant function coefficients shows that


income is the most important predictor in discriminating between the groups,
followed by household size and importance attached to family vacation.
The same observation is obtained from examination of the structure correlations.
The simple correlation between the predictors and the discriminant functions are
listed in order of magnitude.

Structure Matrix
Function
1
Annual Family Income

.822

Household size

.541

Importance attached to

.346

Family Vacation
Attitude Towards Travel

.213

Age of Head of Household

.164

Pooled within-groups correlations between


discriminating variables and standardized
canonical discriminant functions
Variables ordered by absolute size of
correlation within function.

The group centroids, giving the value of the discriminant function evaluated at the
group means, are shown below:

Functions at Group Centroids


Function
Resort Visit

Visited
Did not visited

1.291
-1.291

Unstandardized canonical
discriminant functions evaluated
at group means

Group 1 who have visited a resort, has a positive value (1.291), whereas group 2
has an equal negative value. The signs of the coefficients associated with all the
predictors are positive. This suggests that higher family income, larger household
size, more importance attached to family vacation, more favourable attitude
towards travel and older heads of households are more likely to results in the
family visiting the resort.
It would be reasonable to develop a profile of the two groups in terms of the three
variables that seems to be most important: Income, Household size and importance
of vacation.
4. Validity of Discriminant Analysis
The hit ratio or the percentage of cases correctly classified can be detrmined by
summing the diagonal elements and dividing by the total number of cases.

Classification Resultsb,c
Predicted Group Membership
Resort Visit
Original

Count

Visited

Did not visited

Visited

12

15

15

15

80.0

20.0

100.0

Did not visited

.0

100.0

100.0

Visited

11

15

13

15

Visited

73.3

26.7

100.0

Did not visited

13.3

86.7

100.0

Did not visited


%

Cross-validateda

Count

Visited

Did not visited


%

Total

a. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
b. 90.0% of original grouped cases correctly classified.
c. 80.0% of cross-validated grouped cases correctly classified.

In the table above the hit ratio or the percentage of cases correctly classified is 0.90
or 90%. Leave one out cross validation correctly classifies 0.80 or 80% of cases. It
means that we can be 80% sure that the classification is correct.

You might also like