You are on page 1of 13

The Chi-Square Test

Measures of relationships on ordered sets of data on two or more nominal or ordinal


variables can be measured by the Chi-Square Test for a one sample case, Chi-Square for two
sample cases, the tetra choric correlation, the phi correlation the rank biserial correlation, and
the point biserial correlation.

The Chi-Square test is used to determine the strength of association between two
nominal variables. The Chi-Square Test is of two types: the Chi-Square Test in a One Sample
Case and the other is the Chi-Square Test in Two Sample Cases. This type of test is only
applicable for a non-parametric testing for relationship of variables with nominal data. The Chi-
Square Test has the general equation shown below:

where:

x = the Chi-Square value

= the actually observe frequencies

= the expected frequencies

The Chi-Square Test in a One Sample Case

The Chi-Square Test in a One Sample Case is used to determine whether a significant
correlation or a significant difference exists between the observed frequency with that of the
expected frequency distribution. It is a statistical test commonly used to compare observed data
we would expect to obtain according to a specific hypothesis. For example, if , according to
Mendels Law, you expected 10 out of 20 offspring from s cross to be male and actual observed
number was 8 males. Then you might want to know about the goodness to fit between the
observed and the expected. Were the deviations (differences between observed and expected)
the result of chance, or were they due to other factors? How much deviation can occur before
you the investigator, must conclude that something other than chance is at work, causing the
observed to differ from the expected?. The Chi-Square=re Test is always testing what scientist
call the null hypothesis, which states and that there is no significant difference between the
expected and observed result.

For example in a survey conducted by the University Student Council on Would you
like to have a torch parade during the eve of the University Day Celebration?. in this case, the
investigator would like to determine whether or not there is a significant difference in the
observed with that of the expected frequencies. The data of the responses are tabulated below.

Favor Undecided Not in Favor Total


8 15 51
28
17 17 51
17
- -9 -2
11
(- ) 81 4
121
(- ) 4.76 .24
7.12

(- ) 12.12

Calculated x Value Tabular


Value

.05
.01

12.12 5.99
9.21

P<.01 significant at .01valpha d.f=k -1 ;


3-1 =2

Since the calculated Chi-Square value of 12.12 is greater than the tubular value of 9.21 at .01
level of probability, the null hypothesis that students do not differ in their responses was
rejected, and the alternative hypothesis that students differ in their responses was accepted.
This means that students were agreeable to the holding of torch parade during the eve of
University Day Celebration.

The Chi-Square Test in Two Sample Cases

The Chi-Square Test for two sample cases will be used only in determining the degree of
association or determining significant differences between two variables with two or more
categories. The data to be used in this kind of statistical tool should be normal for both of the
variables. Chi-Square in two sample cases can either be used in testing significant association
or significant differences between two variables with two or more categories.

Constraints in Using The Chi-Square Test


The statistician or the researcher must be cautioned against using the Chi-Square Test.
In testing the independence or relatedness of the variables using Chi-Square in two sample
cases, the following are the precautions that must be considered.

1 When the degree of freedom is equal to one df = 1, and the contingency table has a
cell frequency of less than 5, apply the yates correction formula which is now being
called as the Fischers Exact Test. Yates Correction formula is shown:

X2 =

2 When there is zero cell frequency , regardless when the degree of freedom is one or
has a cell frequency of less than five (5), the standard Chi-Square and the Fischers
exact Test could no longer be employed; instead the Kolmogorov-Smirnov Test in a
one sample or two sample cases will be used.

Chi-Square Test in two sample cases design depends on the number of categories of
each variable. For example, in the example shown below from which teachers annual
salary is of these categories, it is being called as a 3 x 3 Chi-Square Testin two cample
cases.

Teachers Educational Attainment


Annual Salary Bachelors Masters Doctoral Total
Rate
High PhP 5 40 45 90
120,000 above
Average PhP 10 20 45 75
72,000 to PhP
119,999
Low (below PhP 20 8 8 36
72,000)
Total 35 68 98 201

Steps in the process of computing the calculated Chi-Square value:

Step 1. Examine the cell frequencies in the tables and determine if there are
zero cell frequency, or if the degrees of freedom are equal to one, or a cell frequency of less
than five. This process is an idea of determining the appropriate tool in the analysis of data
based from the constraints of when to use a Chi-Square Test.

Step 2. Since by inspection of the contingency table, degree of freedom is


greater than one, no zero cell frequency and no cell frequency of less than 5, and so, the
standard Chi-Square formula will be used.
Step 3. Designate the corresponding cells by the use of a letter symbol
such as for bachelors degree with high salary rate it is designated as cell a, those with masters
degree and with high salary rate is designated as cell b and so on.

Step 4. Compute the expected cell frequency by using the equation below:

Expected Frequency (fe) = (row total) (column total)


Grand total

For example in cell a, the row total is 90, the column total is 35, and the
grand total is 201. Computing the expected frequency for the following cells:

Cell a = (90)(35) = 15.67


201

Cell b = (90)(68) = 30.45


201

Cell c = (90)(98) = 43.88


201

Cell d = (75)(35) = 13.06


201

Cell e = (75)(98) = 25.37


201

Cell f = (75)(98) = 36.57


201

Cell g = (36)(35) = 6.27


201

(36)(38)
Cell h= =12.1 8
201

(36)(98)
Celli= =17.55
201

Step 5. Construct a table showing the column of cells, the column of observed
frequencies(fo), the column of the expected frequencies (f e), the column of the difference
between the observed frequencies with that of the expected frequencies(f o fe), the column for
squares of the difference between the observed frequencies and the expected frequencies(fo
fe)2, and the column for the squares of the difference between the observed frequencies and the

f 0f e


expected frequencies divided by its expected frequency 2

Step 6. Interpret results by following the steps in testing a research hypothesis in


Chapter 5.

Cells fo fe (fo fe) (fo fe)2 f 0f e




2


A 5 16 -11 121 7.56
B 40 30 10 100 3.33
C 45 44 1 1 0.02
D 10 13 -3 9 0.69
E 20 25 -5 25 1.00
F 45 37 8 64 1.73
G 20 6 14 96 32.67
H 8 12 -4 16 1.33
I 8 18 -10 100 5.56
f O f e 2

f e


Using the variables in the given data, follow the analysis by practicing the seven (7)
steps in testing a research hypothesis:

1 Statement of the problem: is there any significant association between the teachers
annual salary rate with their educational attainment? Or, are teachers salary rates
associated with their educational attainment?
2 Statement of the research hypotheses:
Null Hypothesis (Ho): There is no significant association between the teachers
annual salary rate and their educational attainment. Or, salary rates are not
associated with their educational attainment.
Alternative Hypothesis (H1): There is a significant association between the
teachers annual salary rate and their educational attainment. Or, salary rates is
associated with their educational attainment.
3 Determine the level of measurements of variables:
Annual Salary Rate: 1 High (Php 120,000.00 and above)
2 Average (Php 72,000.00 to Php 119,999.00)
3 Low (Below Php 72,000)
Educational Attainment: 1 Bachelors degree
2 Masters degree
3 Doctoral degree
Variables 1 (annual salary rate) the level of measurements in nominal
Variables 2 (educational attainment) the level of measurement is also nominal. The
statistical tool to be used in analyzing the strength of association is by the use of Chi-
Square in Two Sample Cases
4 Set the level of significance: The level of significance will be set at 0.05 alpha under
a two-tailed test.
5 Test the hypothesis. Testing the research hypothesis is by computing the calculated
f O f e 2

fe
Chi-Square value, and the calculated Chi-Square value is the


6 Make a decision to the null hypothesis by using the table and figure below:
Calculated X2 Value Tabular X2 Value
5.05 0.01
53.89 9.49 13.28
P < .01 Significant at .01
alpha

The calculated Chi-Square value of 53.89 is greater than the tabular Chi-Square value of
13.28 at 1% level of probability. The null hypothesis of there is no significant association
between the teachers annual salary rate and their educational attainment was therefore
rejected. In this case, the decision to the null hypothesis was rejected, and the alternative
hypothesis is accepted. This means that there Is a significant association between the teachers
annual salary rate and their educational attainment.

Another method of making a decision to the null hypothesis is by using a normal curve
with specified areas of acceptance and areas of rejection of a null hypothesis.

In making a decision with a null hypothesis, for example, if the calculated Chi-Square
value is 5.42, this value does not exceed with tabular value at .05 alpha. The value of 5.42 falls
under the area of acceptance. So, the null hypothesis of there is no significant association
between the teachers annual salary rate and their educational attainment was thus accepted.
This means that, teachers annual salary is independent of their educational attainment.

However, in another example, the calculated value of 53.89 falls in the area of rejection.
In this case, the null hypothesis of there is no significant association between the teachers
annual salary rate and their educational attainment was thus rejected. There is a significant
association between the teachers annual salary rate and their educational attainment, or
educational attainment of teachers is the one that influenced the increase in their annual salary
rate.

Step 7. Interpretation. There are two methods of interpreting the calculated value
whether or not is significant.

The use of statistical interpretation and the psychological interpretation. In


statistical interpretation, there are two levels that will be observed, the process of accepting or
rejecting the null hypotheses and the process of accepting and rejecting the alternative
hypotheses. For psychological interpretation, there are also two levels of interpreting results: the
process of how to surmise the results or findings and the process of making a support to your
findings. This level of interpreting results now your review of related literatures

For statistical interpretation:

Association between teachers annual salary rate and their educational attainment. The
calculated Chi-Square value of 53.89 is greater than the tabular Chi-Square value of 13.28 at
1% level of probability. The null hypothesis of there is no significant association between the
teachers annual salary rate and their educational attainment was therefore rejected. (X 2 value =
53.89, p < .01). This means that there is a significant association between the teachers annual
salary rate and their educational attainment.

For psychological interpretation:

The significant association between the teachers annual salary rate and their
educational attainment simply shows that as the educational attainment of teachers increases
from bachelors degree to masters degree and to doctoral degree, revealed a significant
increase in their annual salary. As stated by (Cruz, 2002), the only factor that can improve the
socioeconomic status of teachers is that they must have to go back to school, obtain a higher
degree and study the advances of information technology on the strategy of teaching on their
own line of specialization to innovate and contribute towards economic growth and success of
the country in order to educate students and nurture talents, in the service of the country and
society.

Differentiating the use of Chi-Square in determining significant differences and


significant association between variables. The data shown in a contingency table will guide
you on how to understand the use of Chi-Square being considered a sophisticated statistical
tool in determining differences and association of variables with nominal data.
Data: The responses of mothers and their sons to preferences in childrens clothing
yielded the following results:
Son Mother
Favor Dislikes Sometimes Total
Favor 40 6 35 81
Dislikes 7 8 8 23
Sometimes 7 10 7 24
Total 53 24 51 128

Determining the calculated chi-square value:

cells f0 fe fo- fe (f0- fe)2 (f0- fe)2


fe
A 40 34 6 36 1.06
B 6 15 -9 81 5.40
C 35 32 3 9 0.28
D 6 10 -4 16 1.60
E 8 14 4 16 4.00
F 9 9 0 0 0.00
G 7 10 -3 9 0.90
H 10 5 5 25 5.00
I 7 10 -3 9 0.90
(fo fe)2
=19.4
fe

Making a decision with the null hypothesis reveals that:

Calculated X2-value Tabular X2-value


5.05 0.01
19.14 9.49 13.28
p<.01 significant at 0.1 alpha

If the purpose is to determine significant difference in the responses of mothers and their sons
about clothing preferences, the null hypothesis of there is no significant difference between
mother and their sons as to clothing preferences was therefore rejected. This means that the
responses of parents differ significantly with their sons as to clothing preferences. This simply
shows that the clothing preferences by her soon, resulting to the fact that the son does not like
the clothing preferred by his mother for him.

But, if the purpose is to determine significant association between mothers and their sons
responses about clothing preferences, the tendency is to determine dependency of responses
between mothers and their sons about clothing preferences. Since there is significant difference
in the responses between mothers and their sons about clothing preferences, this simply shows
that in order for the mother to conform her clothing preferences for her son, she has to ask or
consult first her son that her clothing preferences is also agreeable or acceptable to him.
Meaning to say, there is being called dependency between responses in order to conform or in
agreement of having the same clothing.

Tetrachoric correlation

r is used when both variables, X and Y are nominal with artificial dichotomies.

artificial dichotomies

refer to the unnatural or artificially-created two categories of a variable such as length of


service or length of teaching experience.
This variable has no fixed categories
Its categorization depends on the data obtained to a certain group and it varies from one
group to another.
For example, in a group of teachers subject as a subject of study, one of the variables to
be considered is their length of teaching experience.
Length of teaching experience of teachers is a nominal variable with artificial
dichotomies since the categories or the process of categorizating the variable is
arbitrary. From the result of the survey, the least number of years is 5 and the greatest
number of years of teaching experience by teachers under investigation is 35, getting
the range which is 30; dividing it by 2 as the desired level of categories, you can now
categorize this into two levels such that: 1-(short, 15 years and below); 2-(long, above 15
years). The interval may vary depending on the range of those who are new in the
service and those who are longer in the service. The other variable with nominal but with
artificial dichotomies is teaching performance of teachers in the states universities and
colleges. The teaching performance of teachers has no fixed categorization because it
also varies from one teachers to another. The categorization is also arbitrary. In other
word, if the categorization of the variables is arbitrary, the variables is nominal but with
artifiacial dichotomies.
For example: is the length of teaching experience of teachers associated with their
teaching performance?

Fundamentals of statistics

Categorization of variables:

Variable 1: teaching experience Variable 2: teaching


Performance

1- Short (15 years and below) 1- average and below

2- Long (above 15 years) 2- above average

Data for computing the tetrachoric coefficient of correlation.


Teaching experience Teaching performance total
1- average and 2-above average
below
1- short (15 years and 20 40 60
below) 30 25 55
2- long (above 15
years)
total 50 65 115

Designate cell a=20; cell b=40; cell c=30; cell d =25

Use the given formula in determining the tetrachoric coefficient of


correlation:

180
R=cos ine
bc
1+
ad

Where : r = tetrachoric correlation coefficient

bc= frequency data in cell b and c

ad= frequency data in cell a and d

computing the value:

180
R=cos ine
(40)(30)
1+
(20)(25)

180
cos
1200
1+
500

=cos 70.60 then inverse cosine it will give the value,


r= 0.33 hence, this value indicates low correlation; definite but
small relationship. But, if 115 is a sample data taken randomly, you can verify or
test the significance of the computed r value , by using the given previously:

The Chi-square test

n2
tvalue=r
1r 2

1152
0.33 2
1(0.33)

= 0.33 (11.26)

= 3.72 subjects this value to hypothesis testing. Is


teaching experience of teachers associated with their teaching performance?

Calculated t value Tabular


t- value
5.05
0.01
3.72 1.98
2.62

p< 0.1 significant at 0.1 alpha.

Since the calculated t-value of 3.72 is higher than the tabular value of 2.62 at
0.01 alpha, the null hypothesis of there is no significant relationship between the
teaching experiences of teachers with their teaching performance, the decision to
the null hypothesis was therefore rejected. This means that the researcher is 99%
confident that there is a significant relationship between the lengths of teaching
experience with their teaching performance. As the length of teaching experience
increases in the number of years, their teaching performance also improves from
average or below average to above average.

The Fischers exact test. In testing the significance of the data using a non-
parametric test, the researcher uses the chi- square test with yates correction or the
fischers exact test. The computation is shown:
Cells f0 fe f0 - fe ( F0 F 0.05 ) 2
e

fe

A 20 26 -6 1.16
B 40 34 6 0.89
C 30 24 6 1.26
D 25 31 -6 0.98

( F 0 Fe) 0.05

x 2value=
2 =4.29
fe

Making a decision with the null hypothesis:


2
Calculated x value

tabular x 2 - value

.05
.01
4.29 3.84
6.64
P< .05 significant at .05 alpha.

df= (c-1)(r-1)=(2-1)(2-1)=1

Using non- parametric testing, the decision to the null hypothesis is also the same
with that of the parametric testing. But the only difference is that the confidence of
the researcher is lower compared with that of using the parametric testing. For
example using a tetrachoric coefficient of correlation as a parametric test the
significance of the value is at the 0.01 alpha meaning the confidence of the
researcher is 99% of there is a significant correlation. But by using the non-
parametric test, the use of Chi test, the calculated value is significant only to 0.5
alpha which gives only a confidence on the part of the researcher of about 95% of
there is a significant correlation. This is the reason why parametric testing is always
recommended because of higher power test.

You might also like