Case 2

Case 2 : Cross Tabulation
1. For each of the 11 variables, examine its nature (metric, interval,

ordinal, nominal)
Salary Metric( continous)

Size_Family Nominal
Education Nominal
Region Ordinal
Lifestyle Ordinal
Cars Nominal
Credit Ordinal
Stataion_Wagon Ordinal
Foriegn_Car Ordinal
Van Ordinal
Other Ordinal
Table 1: Nature of Variables

2. Carry out the data cleaning steps, identify the possible outlier or extreme values and
remove them if necessary.
Data has been observed and there were no missing values in this data. Also no extreme
values of variables has been observed.
Cases
Valid Missing Total
N Percent N Percent N Percent
Size of the Family 100 100.0% 0 0.0% 100 100.0%
Years of Education of 100 100.0% 0 0.0% 100 100.0%
the head of the family
Area (Northeren or 100 100.0% 0 0.0% 100 100.0%
Southeren)
Life Style 100 100.0% 0 0.0% 100 100.0%
Number of the cars in 100 100.0% 0 0.0% 100 100.0%
possesion
Buy cars on credit 100 100.0% 0 0.0% 100 100.0%
Have a stataion Vagon 100 100.0% 0 0.0% 100 100.0%
Have a foriegn 100 100.0% 0 0.0% 100 100.0%
economic care?
Have a Van? 100 100.0% 0 0.0% 100 100.0%
Have another type of a 100 100.0% 0 0.0% 100 100.0%
car?
3. Examine the distribution of income. Is income normally distributed? If not, which
transformation would you recommend to make it normal?
For checking the normality distribution of variables One-sample test is used. Here to
check the distribution of income of family one- sample test is applied and following
results have been observed.
Interpretations
Total no. of respondents are 100 and salary have mean value 43105, with Std. Deviation
value is 37918.172 as shown here in the following table of One-Sample Statistics.
With the Confidence interval of difference of 95% lower value is 35581.21 and 50628.79
is higher value. With the df 100, t value is 11.368 and Sig .000 is observed. The values are
showing that the income of family is not normally distributed.
Cases
Valid Missing Total
Income of the 100 100.0% 0 0.0% 100 100.0%
family
Table 1: Case Processing Summary
Statistic Std.
Error
Mean 43105.00 3791.817
Lower 35581.21
95% Confidence Bound
Interval for Mean Upper 50628.79
Bound
5% Trimmed Mean 38484.44
Median 35400.00
Income of the 1437787752.
Variance
family 525
Std. Deviation 37918.172
Minimum 0
Maximum 304200
Range 304200
Interquartile Range 37350
Skewness 3.843 .241
Kurtosis 22.614 .478
Table 2: Descriptives
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Income of the .201 100 .000 .675 100 .000
family
Table 3: Tests of Normality
4. After preparing the data, apply the cross tabulation and chi-square test
of independence.
To describe the relationship between two categorical variables, we use a special type of table
called a cross-tabulation. This type of table is also known as a Crosstab. In a crosstab, the
categories of one variable determine the rows of the table, and the categories of the other variable
determine the columns.
A Cross tabulation and chi-square test is applied on all variables by assuming folloing assumptions.
H0= There is no relation between Area and other variables
H1= There is a relation between the variables
a) Area (Northeren or Southeren) * Income of the family:
In the below table p- value .558 is greater than the critical value .05 so H0 is
accepted.
Chi-Square Tests
Value df Asymp. Sig. (2-
sided)
a
Pearson Chi-Square 85.417 88 .558
Likelihood Ratio 115.194 88 .027
Linear-by-Linear Association .074 1 .786
N of Valid Cases 100
a. 178 cells (100.0%) have expected count less than 5. The minimum
expected count is .40.
b) Area (Northeren or Southeren) * Size of the Family

accepted
Chi-Square Tests
Value df Asymp. Sig.
(2-sided)
a
Linear-by-Linear .000 1 1.000
Association
c) Area (Northeren or Southeren) * Years of Education of the head of the family

accepted
Chi-Square Tests
Value df Asymp. Sig. (2-
sided)
a
Linear-by-Linear Association .764 1 .382
d) Area (Northeren or Southeren) * Life Style
In the below table p- value .04 is smaller than the critical value .05 so H0 is
rejected.
Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2-sided) Exact Sig. (1-sided)
sided)
Pearson Chi-Square 4.209a 1 .040
Continuity Correction b 3.409 1 .065
Fisher's Exact Test .064 .032
Linear-by-Linear Association 4.167 1 .041
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 18.00.
b. Computed only for a 2x2 table
e) Area (Northeren or Southeren) * Number of the cars in possession

accepted
Chi-Square Tests
(2-sided)
a
Pearson Chi-Square .400 2 .819
Likelihood Ratio .402 2 .818
Linear-by-Linear .111 1 .739
Association
f) Area (Northeren or Southeren) * buy cars on credit
accepted
Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
Continuity Correctionb 1.240 1 .265
Linear-by-Linear Association 1.768 1 .184
g) Area (Northeren or Southeren) * have a stataion Vagon

accepted
Chi-Square Tests
a
Continuity Correctionb .003 1 .959
Association
h) Area (Northeren or Southeren) * Have a foriegn economic care?
accepted
Chi-Square Tests
Linear-by-Linear 2.427 1 .119
Association
i) Area (Northeren or Southeren) * Have a Van?
accepted
Chi-Square Tests
a
Association
j) Area (Northeren or Southeren) * Have another type of a car?
accepted
Chi-Square Tests
a
Association

5. After analyzing the data, try to make a simplified table to present all the
results to
Variable Pearson Chi-Square
the Size_Family 0.882 mangers
Education 0.498
lifestyle 0.40
Cars 0.819
credit 0.181
Stataion_Wagon 0.755
Foriegn_Car 0.117
van 0.0000
Other 0.002
Salary 0.558
(professionals).
6. Does the number of cars possessed depend more on the income of the
family or the size of the family? Does there exist an interaction between
these two factors?
To see the dependency of car possessed on income or size of family is
analyzed by the regression analysis.
First regression analysis is done between car possession and
income of family and following results have been observed.
Model Summaryb
Model R R Square Adjusted R Std. Error of
Square the Estimate
1 .245a .060 .050 .477
a. Predictors: (Constant), Income of the family
b. Dependent Variable: Number of the cars in possesion
ANOVAa
Model Sum of df Mean Square F Sig.
Squares
Regression 1.422 1 1.422 6.254 .014b
1 Residual 22.288 98 .227
Total 23.710 99
a. Dependent Variable: Number of the cars in possesion
b. Predictors: (Constant), Income of the family
Coefficientsa
Model 95.0% Confidence Interval for
B
Lower Bound Upper Bound
(Constant) .990 1.277
1 Income of the .000 .000
family
Residuals Statisticsa
Minimum Maximum Mean Std. Deviation N
Predicted Value 1.13 2.10 1.27 .120 100
Residual -1.095 1.553 .000 .474 100
Std. Predicted Value -1.137 6.886 .000 1.000 100
Std. Residual -2.297 3.257 .000 .995 100
Regression analysis is done between car possession and income of

family and following results have been observed.
ariables Entered/Removeda
Model Variables Variables Method
Entered Removed
Size of the . Enter
1
Familyb
a. Dependent Variable: Number of the cars in
possesion
b. All requested variables entered.
Model Summaryb
Model R R Square Adjusted R Std. Error of
Square the Estimate
1 .587a .344 .338 .398
a. Predictors: (Constant), Size of the Family
b. Dependent Variable: Number of the cars in possesion
ANOVAa
Model Sum of df Mean Square F Sig.
Squares
Regression 8.161 1 8.161 51.438 .000b
1 Residual 15.549 98 .159
Total 23.710 99
b. Predictors: (Constant), Size of the Family
Coefficientsa
Model 95.0% Confidence Interval for
B
Lower Bound Upper Bound
(Constant) .481 .851
1 Size of the .111 .195
Family
Residuals Statisticsa
Minimum Maximum Mean Std. Deviation N
Predicted Value .97 2.35 1.27 .287 100
Residual -.737 1.569 .000 .396 100
Std. Predicted Value -1.039 3.756 .000 1.000 100
Std. Residual -1.849 3.940 .000 .995 100
7. Does there exist an association between the fact of having a Van and the
life style? What happens to this association when the area is considered?
Finally, the possession of a Van is related to the area or the life style?
Does there exist an association between the fact of having a Van
and the life style?
To find out the relation between the having van and life style cross tab
analysis is done and following results have been observed. Following tables
show that p values is greater than value of 0.05 so there is no association
between having a van and life style.
Case Processing Summary

Cases
Valid Missing Total
Have a Van? * Life 100 100.0% 0 0.0% 100 100.0%
Style
Have a Van? * Life Style Crosstabulation
Life Style Total

1 2
1 46 34 80
Have a Van?
2 9 11 20
Total 55 45 100
Chi-Square Tests
Value df Asymp. Sig. Exact Sig. (2- Exact Sig. (1-
(2-sided) sided) sided)
Association
What happens to this association when the area is considered?
To find out the relation between the having van and area cross tab analysis is
done and following results have been observed. Following tables show that p
values is less than value of 0.05 so there is an association between having a
van and area.

Cases
Valid Missing Total
Have a Van? * Area 100 100.0% 0 0.0% 100 100.0%
(Northeren or
Southeren)
Have a Van? * Area (Northeren or Southeren)

Crosstabulation
Count
Area (Northeren or Total
Southeren)
1 2
1 57 23 80
Have a Van?
2 3 17 20
Total 60 40 100
Chi-Square Tests
Value Df Asymp. Sig. Exact Sig. (2- Exact Sig. (1-
Association
Finally, the possession of a Van is related to the area.

8. Does the possession of economic foreign cars depend on the size of the
family?
The possession of economic foreign cars depends on the size of the family
as shown in the following ANOVA table.
Size of the Family * Have a foriegn economic care?

Crosstabulation
Count
Have a foriegn economic Total
care?
1 2
Chi-Square Tests
2 17 2 19
Value Df Asymp. Sig.
3 28 2 30
(2-sided)
4 28 1 29
5 3 2 5
Linear-by-Linear Size of the 7.0116 1 6 .008 1 7
Association Family 7 3 0 3
N of Valid Cases 1008 2 1 3
a. 16 cells (80.0%) have expected9count less than 5. The
2 0 2
minimum expected count is .11. 10 0 1 1
9. Is 11 0 1 1 possession
Total 89 11 100
of a station-
wagon has any association with the size of the family? Is this conclusion
remained when one adds the effect of the income?
H0= There is no association between having a station wagon and size of family
H1= There is an association between the variables
In the below table p- value .000 is less than the critical value .05 so H1 is
accepted.
Chi-Square Tests
(2-sided)
Association
a. 14 cells (70.0%) have expected count less than 5. The
minimum expected count is .19.
But when one adds the effect of the income.

The regression analysis is showing that there is no effect of income on the association of having
station wagon and size of family.
Descriptive Statistics
Mean Std. Deviation N
have a stataion Vagon 1.19 .394 100
Size of the Family 3.95 1.877 100
Income of the family 43105.00 37918.172 100
ANOVAa
Model Sum of df Mean F Sig.
Squares Square
Regression 5.862 2 2.931 29.841 .000b
1 Residual 9.528 97 .098
Total 15.390 99
a. Dependent Variable: have a stataion Vagon
b. Predictors: (Constant), Income of the family, Size of the Family
10.Can the use of a credit for the purchase of a car be explained by the
level of education? What does happen if one considers simultaneously
the effect of the income?
H0= There is no relationship between use of credit and level of education
H1= There is an relationship between the variables
accepted. So there is no relationship between these two variables.
Chi-Square Tests
Value df Asymp. Sig. Exact Sig. (2- Exact Sig. (1-
Association
When the effect of income is added
Following results have been observed. And conclusion remains the same.
ANOVAa
Model Sum of Df Mean Square F Sig.
Squares
Regression .781 2 .391 1.873 .159b
1 Residual 20.219 97 .208
Total 21.000 99
a. Dependent Variable: buy cars on credit
b. Predictors: (Constant), Income of the family, Years of Education of the head of the
family
11.Is the possession of a station-wagon affected by the area in which the
family lives? Is this conclusion upheld when one adds the size of the
family?
A Cross tabulation and chi-square test is applied on all variables by assuming following assumptions.
H0= There is no relation between Area and having station wagon
H1= There is a relation between the variables
Area (Northeren or Southeren) * Having a station wagon
accepted.
Cases
Valid Missing Total
have a stataion Vagon * 100 100.0% 0 0.0% 100 100.0%
Area (Northeren or
Southeren)
Chi-Square Tests
Value Df Asymp. Sig. Exact Sig. (2- Exact Sig. (1-
a
Association
When the effect of size of family is studied on this relation a regression analysis showing the
following results and according to following table the possession of station wagon is
dependent on the area and size of family.
ANOVAa
Model Sum of df Mean F Sig.
Squares Square
1 Regression 5.809 2 2.904 29.402 .000b
have a stataion Vagon * Area (Northeren or Southeren)
Residual 9.581 97
Crosstabulation .099
Total
Count 15.390 99
a. Dependent Variable: have a stataion Vagon
Area (Northeren or Total
b. Predictors: (Constant), Area (Northeren orSoutheren)
Southeren), Size of the Family
1 2
have a stataion 1 48 33 81
Vagon 2 12 7 19
Total 60 40 100

Case 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Case 2

Uploaded by

Copyright:

Available Formats

Case 2 : Cross Tabulation

1. For each of the 11 variables, examine its nature (metric, interval,

Salary Metric( continous)

Table 1: Nature of Variables

H1= There is a relation between the variables

a) Area (Northeren or Southeren) * Income of the family:

b) Area (Northeren or Southeren) * Size of the Family

c) Area (Northeren or Southeren) * Years of Education of the head of the family

e) Area (Northeren or Southeren) * Number of the cars in possession

g) Area (Northeren or Southeren) * have a stataion Vagon

N of Valid Cases 100

Regression analysis is done between car possession and income of

Case Processing Summary

Have a Van? * Life Style Crosstabulation

Life Style Total

What happens to this association when the area is considered?

Case Processing Summary

Have a Van? * Area (Northeren or Southeren)

Finally, the possession of a Van is related to the area.

Size of the Family * Have a foriegn economic care?

H1= There is an association between the variables

But when one adds the effect of the income.

H0= There is no relationship between use of credit and level of education

H1= There is an relationship between the variables

H0= There is no relation between Area and having station wagon

H1= There is a relation between the variables

Area (Northeren or Southeren) * Having a station wagon

You might also like