Professional Documents
Culture Documents
Multivariate Analysis
Assignment 2 Group Based Assignment
January 2015 Presentation
SUBMITTED BY:
TAN KWEE HUAT Z1210897
NEO SIOK TIN B1173511
LEE GI SIANG B1370676
TUTORIAL GROUP: T01
Table of Contents
Question 1a..................................................................................................................................................3
Question 1b..................................................................................................................................................5
Question 1c..................................................................................................................................................7
Question 2..................................................................................................................................................12
Question 3..................................................................................................................................................15
Question 4..................................................................................................................................................22
Question 5..................................................................................................................................................29
Reference...................................................................................................................................................31
Question 1a
(i)
Recoded Payment
Recode mode of payment (D1) by combining Credit Card and Debit Card and renaming it to
Cards and coding it using coded value of 2. Check and Other mode of payment are also similarly
combined and renamed to Others and coded using coded value of 3.
Initial and Recoded Value for Mode of Payment (D1)
Payment Method
Cash
Credit Card
Debit Card
Check
Other
Coded Value
1
2
3
4
5
Recoded Value
1
2
2
3
3
Coded Value
1
2
3
N-Rows
1277
140
33
Recoded Income
Recode family annual household income (D6) by combining Under $25,000 and $25,000 but under
$50,000 and coding it using coded value of 1. $50,000 but under $75,000 and $75,000 but under
$100,000 family annual household incomes are also similarly combined and coded using coded value of
2. Finally, $100,000 but under $150,000 are recoded to 3 while $150,000 but under $200,000 and
$200,000 or more are recoded to 4.
Coded Value
1
2
3
4
5
6
7
Coded Value
1
2
3
4
Coded Value
1
2
3
4
N-Rows
636
531
108
175
Question 1b
Distributions
Household Size
2.9310345
1.4965116
0.0393003
3.0081261
2.8539429
1450
Fitted Normal
Parameter Estimates
Type
Parameter Estimate Lower 95%
Location
2.9310345
2.8539429
Dispersion
1.4965116
1.4439586
-2log (Likelihood) = 5283.01841640364
Upper 95%
3.0081261
1.5530635
Goodness-of-Fit Test
Shapiro-Wilk W Test
W
Prob<W
0.890802
<.0001*
Note: Ho = the data is from the Normal distribution. Small p-values reject Ho.
The above analysis presents the distributions of the new variable Household Size obtained by
combining the responses in D3A and running a one sample non-parametric test using JMP.
As revealed in the summary statistics, the mean is computed as 2.931, while standard deviation is given
as 1.497. The 95% confidence interval is between 2.853 and 3.008, and its sample size is 1450, which
explains the softwares auto-selection of the Shapiro-Wilk W Test to run the test for normal distribution
since sample size is less than 2000. The critical value is given as <0.0001, and assuming level of
significance is at 0.05, the null hypothesis will be rejected since p value at <0.0001 is lower than 0.05. In
light of this, we can conclude that Household Size is not normally distributed.
Question 1c
Contingency Analysis of Frequency of Visits (Q3A_39) By Recoded Income
Mosaic Plot
Contingency Table
Recoded Income by Frequency of Visits (Q3A_39)
Count
Total %
Col %
Row %
Below $50,000
$150,000 or more
More than 4
Within the past weeks to within
4 weeks
the past 3
months
264
150
20.75
11.79
41.77
47.62
47.14
26.79
249
103
19.58
8.10
39.40
32.70
53.78
22.25
48
25
3.77
1.97
7.59
7.94
52.17
27.17
71
37
5.58
2.91
11.23
11.75
45.22
23.57
632
315
49.69
24.76
More than 3
months ago
Never
140
11.01
44.87
25.00
107
8.41
34.29
23.11
17
1.34
5.45
18.48
48
3.77
15.38
30.57
312
24.53
6
0.47
46.15
1.07
4
0.31
30.77
0.86
2
0.16
15.38
2.17
1
0.08
7.69
0.64
13
1.02
560
44.03
463
36.40
92
7.23
157
12.34
1272
Tests
N
1272
DF
9
Test
Likelihood Ratio
Pearson
-LogLike
5.4280272
RSquare (U)
0.0039
ChiSquare
10.856
11.142
Prob>ChiSq
0.2857
0.2661
The above analysis presents a two-way cross-tabulation between the frequency of visits to Wendys
(Q3A_39) and recoded income. The result from the JMP output shows that the same frequency of visits
i.e. Within the past 4 weeks is prevalent among the respondents, with the highest percentage of
49.69%. Also, it follows that those earning below $50,000 and $50,000 but under $100,000 form the
bulk of the sample group whom visited the restaurant within the prevalent visitor frequency revealed
above, each contributing 20.75% and 19.58% respectively. Nonetheless, the bivariate associations
between the two variables is not significant at =0.05 since the p-value generated is 0.2661, which is
greater than 0.05. As a result, the null hypothesis of no association cannot be rejected and we conclude
that there is no significant relationship between visitor frequency and income.
Contingency Analysis of Frequency of Visits (Q3A_39) By Employment Status (D7)
Mosaic Plot
Contingency Table
Employment Status (D7) By Frequency of Visits (Q3A_39)
Count
Total %
Col %
Row %
Full-time
Part-time
Retired
Student
Homemaker
Unemployed
More than 4
Within the past weeks to within
4 weeks
the past 3
months
401
182
31.53
14.31
63.45
57.78
53.83
24.43
52
39
4.09
3.07
8.23
12.38
42.28
31.71
4
3
0.31
0.24
0.63
0.95
44.44
33.33
97
53
7.63
4.17
15.35
16.83
44.09
24.09
54
28
4.25
2.20
8.54
8.89
46.15
23.93
21
7
1.65
0.55
3.32
2.22
46.67
15.56
3
3
0.24
0.24
0.47
0.95
23.08
23.08
632
315
49.69
24.76
More than 3
months ago
Never
157
12.34
50.32
21.07
29
2.28
9.29
23.58
2
0.16
0.64
22.22
68
5.35
21.79
30.91
34
2.67
10.90
29.06
15
1.18
4.81
33.33
7
0.55
2.24
53.85
312
24.53
5
0.39
38.46
0.67
3
0.24
23.08
2.44
0
0.00
0.00
0.00
2
0.16
15.38
0.91
1
0.08
7.69
0.85
2
0.16
15.38
4.44
0
0.00
0.00
0.00
13
1.02
745
58.57
123
9.67
9
0.71
220
17.30
117
9.20
45
3.54
13
1.02
1272
Tests
N
1272
DF
18
-LogLike
15.683112
RSquare (U)
0.0114
Test
ChiSquare Prob>ChiSq
Likelihood Ratio
31.366
0.0261*
Pearson
34.971
0.0095*
Warning: 20% of cells have expected count less than 5, ChiSquare suspect.
The above analysis indicates similarities as the previous analysis in that most respondents had visited
Wendys within the past 4 weeks. Also, it reveals that those employed full-time are the most frequent
goers to the fast food restaurant (58.57%); follow by students who form the second largest group of
consumers (17.30%). The associated p-value is 0.0095, which is lower than 0.05, and this shows that the
relationship between the two is significant at = 0.05. As a result, we can conclude that there exists a
significant relationship between visitor frequency and employment status.
To sum up, the findings suggest that the dependent variable, frequency of visits to Wendys (Q3A_39),
shares a relationship with independent variable of employment status (D7) and not recoded income.
Also, those who visited the restaurant most regularly are mainly those employed full-time as they form
the largest population (58.57%) of the sample group, followed by the students whom account for
17.30%. Viewed in this light, Wendys may wish to channel the bulk of its marketing efforts towards
appealing those in full-time employment. This could be in the form of promotional campaigns such as
discriminant pricing based on time of purchase, or even bundle discounts or loyalty programmes to
create values for this particular group of consumers. Similar marketing attention should also be extended
to the part-timers as, while they constitute only 9.67% of the sample group, their requirements match
closely to that of the full-time employees. Also, to induce greater spendings from the students, whom
form second largest group of the restaurant goers, Wendys may wish to employ price segmentation
strategies such as student pricing or promotional buddy meals to capture a greater student market share.
Finally, the restaurant should also relook into its menu focusing on serving healthier food offerings and
to tailor its marketing and communication strategy to appeal to the other segments.
Question 2
The objective is to examine the relationship between frequency of visits to Wendys with level of
education (D5) via cross tabulation and the subsequent introduction of recoded income into the model as
the controlling variable. At the outset, the relationship between frequency of visits (Q8_39) and level of
education (D5) is tested and the results are as shown below.
Contingency Analysis of Frequency of Visits (Q8_39) By Level of Education (D5)
Mosaic Plot
Contingency Table
Level of Education (D5) By Frequency of Visits (Q8_39)
Count
Total %
Col %
Row %
Some high school or less
Some college
Completed college
Post graduate
More
Often
About
the
same
Less
Often
7
0.74
2.51
43.75
39
4.12
13.98
27.66
111
11.72
39.78
29.84
85
8.98
30.47
26.90
37
3.91
13.26
36.63
0
0.00
0.00
0.00
279
29.46
9
0.95
1.71
56.25
75
7.92
14.26
53.19
206
21.75
39.16
55.38
181
19.11
34.41
57.28
55
5.81
10.46
54.46
0
0.00
0.00
0.00
526
55.54
0
0.00
0.00
0.00
27
2.85
19.01
19.15
55
5.81
38.73
14.78
50
5.28
35.21
15.82
9
0.95
6.34
8.91
1
0.11
0.70
100.00
142
14.99
16
1.69
141
14.89
372
39.28
316
33.37
101
10.67
1
0.11
947
Tests
N
947
DF
10
-LogLike
8.4984320
RSquare (U)
0.0092
Test
ChiSquare Prob>ChiSq
Likelihood Ratio 16.997
0.0744
Pearson
16.426
0.0881
Warning: 20% of cells have expected count less than 5, ChiSquare suspect.
The contingency table above reveals that the same frequency of visits i.e. about the same is prevalent
among the respondents, with the highest percentage of 55.54%. Additionally, it also identifies those
whom have received Some College education (21.75%) and those whom Completed College
(19.11%) belonging to the group of the prevalent visitor frequency. However, the test result also shows
that the overall model of frequency of visits and level of education is not statistically significant at
=0.05 as the Chi-square probability is 0.0881, which is greater than 0.05 level of significance. At this
juncture, we conclude that there is no significant relationship between frequency of visits and level of
education. However, JMP also warns of 20% of the cells have expected count of less than 5, suggesting
that of insufficient data for a complete assessment. In light of this, the third variable, recoded income, is
introduced to clarify the initial association. Result from Cochran Mantel Haenszel test is shown below.
Cochran-Mantel-Haenszel Tests
Stratified by Recoded Income
CMH Test
Correlation of Scores
Row Score by Col Categories
Col Score by Row Categories
General Assoc. of Categories
ChiSquare
0.1597
11.3743
0.4417
14.7884
DF
1
5
2
10
Prob>Chisq
0.6894
0.0444*
0.8019
0.1400
The Cochran Mantel Haenszel Test reveals that the chi-square test for both adjusted correlation (0.6894)
and general association of categories (0.1400) are much larger than 5%, indicating again of the lack of
association between frequency of visits and the two independent variables of recoded income and level
of education. This reinforces the results of the previous test.
In summary, the introduction of the third variable of recoded income reinforces the results of the initial
findings of no association between the two variables of frequency of visits and level of education. The
lack of correlation amongst them somewhat suggests that Wendys may wish to consider alternative
independent variables like household age groups or employment status which may better correlate with
the dependent variable of frequency of visits and in this sense, provide greater insights into their
associations.
Question 3
A 2-Way ANOVA analysis was conducted to find out whether the level of education (D5) and
employment status (D7) have any effect on Wendys restaurant rating (Q9_39). Adjustment had been
made to convert level of education (D5) to ordinal since it belongs to ordered categories, while
employment status (D7) to remain nominal as it only serves only as labels for identification and
classification (Malhotra, 2015).
Actual by Predicted Plot
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.041532
0.010141
1.703514
7.559662
947
Analysis of Variance
Source
Model
Error
C. Total
DF
30
916
946
Effect Tests
Source
level of education (D5)
employment status (D7)
level of education (D5)*Employment
status (D7)
19
88.357881
1.6025
0.0490*
LostDFs
NonEstimable
NonEstimable
NonEstimable
NonEstimable
NonEstimable
NonEstimable
Std Error
0.30585168
.
.
.
.
.
.
Mean
7.51973
7.51648
8.00000
7.60000
7.91463
7.17857
7.50000
Least Sq Mean
6.2500000
9.0000000
.
NonEstimable
8.1666667
.
NonEstimable
9.0000000
.
NonEstimable
7.6470588
7.8181818
8.0000000
6.8333333
7.5454545
8.0000000
5.0000000
7.5209581
7.5813953
8.2500000
7.7207207
8.0882353
7.1000000
7.6666667
7.4658635
7.4642857
Std Error
0.6022831
1.7035137
.
0.6954566
.
1.7035137
.
0.1847720
0.5136287
1.7035137
0.4917621
0.3631904
0.5678379
1.7035137
0.1318219
0.2597834
0.8517569
0.1616904
0.2921502
0.5386983
0.9835241
0.1079558
0.3219338
Level
Completed College,3
Completed College,4
Completed College,5
Completed College,6
Completed College,7
Post graduate,1
Post graduate,2
Post graduate,3
Post graduate,4
Post graduate,5
Post graduate,6
Post graduate,7
6,1
6,2
6,3
6,4
6,5
6,6
6,7
Least Sq Mean
.
7.6000000
8.4444444
5.8333333
.
7.7260274
6.7500000
7.5000000
6.3333333
7.0000000
7.0000000
8.5000000
5.0000000
.
.
.
.
.
.
NonEstimable
NonEstimable
NonEstimable
NonEstimable
NonEstimable
NonEstimable
NonEstimable
NonEstimable
Std Error
.
0.4398454
0.4015220
0.6954566
.
0.1993812
0.6022831
1.2045661
0.6954566
0.6022831
1.2045661
1.2045661
1.7035137
.
.
.
.
.
.
Measurement of Results
The overall model effect of the sum of squares is calculated as 112.784, computed by adding the sum of
squares for level of education (5.321), employment status (19.104) and interaction effect (88.358).
Significance of Model
The model test F-statistics is computed as 1.323 with associated significance given as 0.1160. This is not
significant at the 0.05 level. Given this, we cannot reject the null hypothesis but to conclude that the
model is not statistically significant and that both independent variables of level of education (D5) and
employment status (D7) do not have a significant impact on Wendys restaurant rating.
For the interaction, the test statistic for the significance of the interactive effects is 1.6025 with a
significance of 0.0490. This is significant at the 0.05 level. This means that the two factors do interact
significantly with each other.
Individually, the test statistic for the significance of the main effects of level of education (D5) is
computed as 0.9169 while employment status (D7) is calculated as 2.1944. Their associated
significances are given as 0.4001 and 0.0872 respectively, which are not significant at the 0.05 level.
With knowledge of this, we can conclude that separately, these two variables do not have any significant
influence on the restaurant rating.
Finally, it is interesting to note that in the last column of the effect tests tableLostDFs are flagged
out for all the three significant values, implying that DF is less than Nparm. Accordingly, this means that
not all the parameters associated with the effect are testable (JMP, 2015).
Results Interpretation
The above results show that separately, neither level of education (D5) nor employment status (D7) has
direct impact on Wendys restaurant rating. Notwithstanding, both variables do exert certain level of
influence on its rating when combined as they do interact significantly with each other.
To conclude, the choice to use level of education and employment status fails to yield much insight for
Wendys in its restaurant rating. Other variables like Cleanliness, Service Quality, Nutritional
Value and Value for Money may have been better choices, as they are often deemed more
appropriate, and are widely recognised for their ability to provide better insights into the formulations of
strategies and actions that need to be undertaken at different levels to improve a restaurants rating. In
this sense, the restaurant may wish to relook into its choices, and to consider other variables instead in
examining their effect and effectiveness on its rating.
Question 4
A multiple regression analysis was conducted to examine the effect on the frequency of eating fast-food
(S3A) in terms of the ratings on the psychographic statements (q14_1, q14_2, q14_3, q14_4, q14_5,
q14_6, and q14_7) and demographic information on household size. The results are as follows.
Response S3A
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.01085
0.005358
6.559442
7.728276
1450
Analysis of Variance
Source
Model
Error
C. Total
Lack Of Fit
Source
Lack Of Fit
Pure Error
Total Error
DF
1104
337
1441
Sum of Squares
48881.261
13119.599
62000.860
Mean Square
44.2765
38.9306
F Ratio
1.1373
Prob > F
0.0765
Max RSq
Parameter Estimates
Term
Intercept
q14_1
q14_2
q14_3
q14_4
q14_5
q14_6
q14_7
Household Size
Estimate
7.7367977
-0.007885
-0.01696
-0.003541
0.0394514
-0.009271
0.0113858
-0.049935
0.1612185
Std Error
0.484205
0.066239
0.048754
0.031466
0.040536
0.003984
0.024474
0.02911
0.124194
t Ratio
15.98
-0.12
-0.35
-0.11
0.97
-2.33
0.47
-1.72
1.30
Prob>|t|
<.0001*
0.9053
0.7280
0.9104
0.3306
0.0201*
0.6418
0.0865
0.1945
VIF
.
8.5483785
5.6492182
2.9799786
3.9212688
1.1686583
2.2913446
2.2000752
1.163316
The reverse is true for q14_1, q14_2, q14_3, q14_5, q14_7 where their slope is downward sloping i.e.
negative values. In this sense, any 1 unit increase in score will result in a corresponding decrease in by
0.007885, 0.01696, 0.003541, 0.009271 and 0.049935 respectively, holding all other independent
variables constant.
Model Fit
According to results of the above summary of fit, the model is not a good predictor, as revealed by R =
0.01085 and Adjusted R = 0.005358. Essentially, the R 2 and Adjusted R2 indicate the proportion of
variability in frequency of eating fast-food (S3A) that can be accounted by the independent variables.
However, based on the adjusted R value, which is the more accurate as it adjusts for sample sizes and
number of independent variables, the independent variables only account for 0.5338% variance in the
frequency of eating fast-food. In consideration, it does appear that the model is unable to offer a
reasonable explanation on the relationship between and Xk.
Model Test
The F-value statistics test the overall significance of the regression model. From the above analysis of
variance, the value is computed as 1.9578 with associated p-value of 0.0461. This is significant at the
0.05 level since the p value is 0.0461<0.05. Accordingly, we can reject null hypothesis of no association
and infer that at least one of the independent variables is a significant predictor for the frequency of
eating fast-food (S3A).
Testing of Individual Regression Coefficients
Based on the above parameter estimates, it appears that only q14_5 whose p-value is 0.0201 is
significant at = 0.05 level. In other words, only q14_5 should be retained as it is useful in explaining
the variation in the frequency of eating fast-food (S3A).
Residual Analysis
From the above residual plot, it does not demonstrate any trends or patterns that reveal any relationship
between predictors and residuals. As such, it also signifies that the variance is constant and hence the
regression results can be relied upon.
Multicollinearity Check
Based on VIF statistics for all variables in the above parameter estimates, none of the variables has a
reading of above 10. Hence, there are no high inter-correlations among independent variables.
Implications
Based on the above preliminary findings, it appears that respondents rating on the frequency of eating
fast-food is only dependent on the amount of fat that are present in the foods that their kids eat at fastfood restaurants.
Improved Regression Model Using Stepwise Method
From the above analysis, it appears that only q14_5 is useful in explaining the variation in the frequency
of eating fast-food. In the following analysis, stepwise method will be employed to confirm the above
findings since it is designed to select the best subset of the predictors that accounts for most of the
variation in the dependent variable (Malhotra, 2015).
Response S3A
Regression Plot
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.006675
0.005989
6.557361
7.728276
1450
Analysis of Variance
Source
Model
Error
C. Total
Lack of Fit
Source
Lack Of Fit
Pure Error
Total Error
DF
4
1444
1448
Sum of
Squares
92.960
62169.575
62262.534
Mean Square
F Ratio
23.2400
43.0537
0.5398
Prob > F
0.7065
Max RSq
Parameter Estimates
Term
Intercept
q14_5
Estimate
8.1939532
-0.011493
Std Error
0.227904
0.003684
t Ratio
35.95
-3.12
Prob>|t|
<.0001*
0.0018*
VIF
.
1
The improved model, based on stepwise method, only retains q14_5 to examine the effect on the
frequency of eating fast-food (S3A).
The above regression equation indicates that , the frequency of eating fast-food (S3A), would still
score an incremental 8.193 when X1 is 0.
Further, as the slope for q14_5 is downward sloping i.e. negative value, any increase in the value of X 1
will result in a corresponding decrease in . As a case in point, for every 1 unit increase in q14_5 , the
frequency of eating fast-food will decrease correspondingly by 0.011493, holding all other independent
variables constant.
Model Fit
According to the results of the above summary of fit, the model is not a good predictor, as revealed by
R = 0.006675 and Adjusted R = 0.005989. While it presents better results relative to the preceding
analysis, the adjusted R value reveals that the independent variables only account for 0.5989% variance
in the frequency of eating fast-food. Given this, it appears that the model is unable to offer a reasonable
explanation on the relationship between and X1.
Model Test
The F-value statistics test the overall significance of the regression model. From the above analysis of
variance, the value is computed as 9.7306 with associated p-value of 0.0018. This is significant at the
0.05 level since the p value is 0.0018<0.05. Accordingly, we can reject null hypothesis of no association
and infer that q14_5 is a significant predictor for the frequency of eating fast-food (S3A).
Testing of Individual Regression Coefficients
Based on the above parameter estimates, it appears that q14_5 whose p-value is 0.0018 is significant at
= 0.05 level. In other words, q14_5 is useful in explaining the variation in the frequency of eating fastfood (S3A).
Residual Analysis
Not applicable
Multicollinearity Check
Not applicable
Conclusion
In summary, out of the 8 independent variables identified, only I consider the amount of fat in the foods
my kids eat at fast-food restaurants (q14_5) has a significant effect on the dependent variable,
"frequency of eating fast-food (S3A). In this regard, Wendys will have to offer healthier food options,
paying particular attention to the amount of fat content that are present in the food served to kids dining
at its restaurants to improve its rating.
Question 5
This report contains our findings and highlights possible managerial implications based on the data from
the survey Wendys conducted to understand its customers.
Of the 1450 respondents surveyed, 1277 favour paying for their food in cash, 140 use credit/debit cards,
while the remaining 33 prefer other modes of payment.
In terms of annual household income, 636 households earn below $50,000, 531 collect between $50,000
but under $100,000, 108 get paid between $100,000 but under $150,000, and the remaining 175 earn
$150,000 or more.
For household size, the sampling data does not follow a normal distribution. Nonetheless, the sampling
data distribution is deemed reliable since its true mean falls between the lower and upper mean and
standard deviation is relatively low.
The two-way cross-tabulation between the frequency of visit to Wendys and recoded income did not
suggest any meaningful associations. In relation to employment status, however, those employed fulltime form the bulk of patrons (58.57%) to the restaurant, followed by students whom form the second
largest group (17.30%). As such, the management should channel their marketing efforts towards
appealing these two segments.
The two-way cross-tabulation between the frequency of visit to Wendys and level of education of the
respondent, with the introduction of income yielded weak associations. Consequently, Wendys should
consider alternative independent variables which can better correlate with the dependent variable to
provide greater insights into their associations.
Results from the two-way ANOVA reveals that neither level of education nor employment status has an
effect on Wendys restaurant rating, thus the management should consider other variables which may be
more appropriate.
The improved new model using the stepwise approach suggests the management should concentrate on
reducing the fat content that are present in the food served to kids dining at its restaurants to improve its
rating.
Overall, the above analysis has suggested some valuable insights that management can act upon.
Notwithstanding, Wendys should regularly seek refreshed data to better understand its customers
behaviour to gain competitive edge over its competitors.
Reference
1. JMP. (2015). The Factor Models. Retrieved March 31, 2015, from SAS Institute Inc.:
http://www.jmp.com/support/help/The_Factor_Models.shtml
2. Malhotra, N. K. (2015). Marketing Research-An Applied Orientation, 6th Edition. Singapore:
Pearson Education South Asia Pte Ltd.