Categorical Response Data (HALF)

Categorical Response Data
Categorical Variable
A categorical variable is one for which the measurement scale consists of a set of categories. Categorical variables
represent types of data which may be divided into groups.
Example: Race, Sex, Age, Group, Educational level, Smoking status etc
Data
Data is the measurement of characteristics of an object or subject. There are two types of data one is Qualitative data
and another is Quantitative data. We can get the data from the field of Multinomial sampling and Poisson sampling.
Measurement Scale
Nominal
Categorical variables for which levels do not have a natural ordering are called nominal. For nominal variables the
order of listing of the categories is irrelevant to the statistical analysis.
Example: Religious affiliation (Categories Catholic, Jewish, Protestant, Other), mode of transportation (automobile,
bus, subway, bicycle, other), choice of residence (house, apartment, condominium, other), race, gender and marital
status
Ordinal
Many categorical variables do have ordered levels. Such variables are called ordinal. Ordinal variables clearly order
the categories, but absolute distances between categories are unknown.
Example: Size of automobile (subcompact, compact, mid-size, large), social class (upper, middle, lower), attitude
toward legalization of abortion (strongly disapprove, disapprove, approve, strongly approve), appraisal of companys
inventory level (too low, about right, too high) etc
Interval
An interval variable is one that does have numerical distances between any two levels of the scale.
Example: Blood pressure level, functional life length of television set, length of prison term, income and age.
Contingency Table
Let X and Y denote two categorical response variables, X having I levels and Y having J levels. When we
classify subjects on both variables, there are IJ possible combinations of classifications. The responses ( X , Y ) of a
subject randomly chosen from some population have a probability distribution. We display this distribution in a
rectangular table having I rows for the categories of X and J column for the categories of Y . The cells of the table
represent the IJ possible outcomes. Their probabilities are ij , where ij denotes the probability that ( X , Y ) falls
in the cell in row i and clumn j . When the cells contain frequency counts of outcomes, the table is called a
contingency table. The term contingency table was introduced by Karl Pearson (1904). Another name is cross
classification table.
A contingency table having I rows and J columns is referred to as a I by J or I J table.
Where, nI is the i
th
Y
2
Total
n11
n12
n1J
n1
n21
n22
n2J
n2
nI 1
nI 2
nIJ
nI
Total
n1
n2
row total, n J is the j
th
n n
n J
column total, nIJ is the frequency of the i
th
row and j
th
column total.
Categorical Response Data ~ 1 of 13
Notation and Definitions

Let nij denote the number of observations cross-classified in the cell of the table that is in row i and column j and
let pij denote the proportion of the total sample falling in that cell. That is pij
total sample size, so that
pij
i
. The set
pij
nij
n
; wrere, n
nij
i
is the
is called the sample joint distribution. The sample marginal
distributions are the row totals and column totals obtained by summing the joint proportions. These still denoted by
pi
for the row variable, where pi pij and by
pi
n j
ni
and p j
also
n
n
p j
for the column variable, where p j pij . Note that

i
pi p j 1 .
i
Similar notation will be used for population proportions, with the Greek letter in place of p . For instance, population
analogs will be denoted by ij for the joint probabilities and j i
for the conditional probabilities. The population
ij
conditional, joint and marginal probabilities are related by j i
and they satisfy
i
ij
i
and
j i
for i 1, 2, L , r . The following table illustrates the notation for the 2 2 case.
Table: Notation for joint, conditional and marginal distribution.
Column
Row
Total
11 1|1
12 2|1
21 1|2
22 2|2
Total
Independence
When both variables are response variables, we can describe the association using their joint distribution, the
conditional distribution of Y given X , or the conditional distribution of X given Y .
The conditional distribution of Y given X is related to the joint distribution by j |i
ij
i
for all i and j . The
variables are statistically independent if all joint probabilities equal the product of their marginal probabilities that is, if
ij i j
for i 1, 2, L , I
and
j 1, 2, L , J
1 ,
When X and Y are independent,

j |i
ij
i
i j
i
Use equ. 1
for i 1, 2, L , I
i.e., each conditional distribution of Y is identical to the marginal distribution of Y . Thus, two variables are
independent when the probability of column response j is the same in each row, for j 1, 2, L , J . When Y is a
response and X is an explanatory variable, the condition
j |1 L
j|I
for all j
provides a more natural
definition of independence than (1).
We use similar notation for sample distributions, with the letter p in place of . For instance,
pij
denotes the
i j nij
being
1|i , 2|i 1|i , 1 1|i
is the
sample joint distribution in a contingency table. The cell frequencies are denoted by nij , with n
the total sample size, so
pij
nij
The proportion of times that subjects in row
p j |i
where ni npi
pij
pi
nij
i made response j is
ni
nij .
j
Way of Comparing Proportions

Difference of proportions
For subjects in row i , i 1, 2, L , I ,
1|i
is the probability of response 1 and
conditional distribution of the binary response. We can compare two rows, say h and i , using the difference of
1| h 1|i . Comparison on response 2 is equivalent to comparison on response 1, since
proportions,
2| h 2| i 1 1| h 1 1| i 1| i 1| h .
The difference of proportions falls between -1 and 1. It equals zero when rows h and i have identical conditional
distributions. The response Y is statistically independent of the row classification when
1| h 1|i 0
for all pairs of
rows h and i .
For I J contingency tables, we can compare the conditional probabilities of response j for rows h and i using the
difference j | h j |i . The variables are independent when this difference equals zero for all pairs of rows h and i
and all possible responses j equivalently, when the I 1 J 1 differences
j |i j | I 0, i 1, 2, L , I 1
and
j 1, 2, L , J 1 .
When both variables are responses and there is a joint distribution ij , the comparison of proportions within rows h
and i satisfies
1| h 1|i
h1 i1
.
h i
For the 2 2 case,

p col. 1 | row 1 col. 1 | row 2
11 21
.
1 2
We can also compare columns in terms of the proportion of row-1 response, using the difference of within column
proportions
p row 1 | col . 1 p row 1 | col . 2
11 12
.
1 2
This does not usually give the same value as the difference of within row proportions.
Test procedure and Obtaining C.I. for Difference of Proportion

Under null hypothesis
H 0 : 1|1 1|2
1|1 1|2 0
Under alternative hypothesis

H1 : 1|1 1|2 0
For large n
Z
1|1 1|2 E 1|1 1|2
Var 1|1 1|2
1|1 1|2 1|1 1|2
1|1 1|2
Under
1|1 2|1 1|2 2|2

1
2
n1 1|1 1 1|1
n11

n
n12
1
Var 1|1 Var
Since,
Similarly ,
Var 1|1 Var 1|2
Var 1|1
1|1 2|1
n1
1|1 2|1
n1
H0
1|1 2|1
n1
n11 ~ n1 , 1|1 ;
n12 ~ n2 , 1|2
Confidence Interval can be written as follows
1|1 1|2 Z
1|1 2|1 1|2 2|2

1
1|1 1|2 1|1 1|2 Z

2
1|1 2|1 1|2 2|2

1
Relative Risk
A difference in proportions of fixed size may have greater importance when both proportions are close to 0 or 1 than
when they are near the middle of the range. For instance, suppose we compare two drugs in terms of the proportion of
subjects who suffer bad side effects. The difference between 0.010 and 0.001 may be more noteworthy than the
difference between 0.410 and 0.401. in such cases, the ratio of proportions is also a useful descriptive measure.
For 2 2 tables, the relative risk is the ratio
1|1
1|2
11
1
21
2
11 2
.
1 21
The ratio can be any non-negative member. A relative risk of 1 corresponding to independence. Comparison on the
second response gives a different relative risk
2|1
2| 2
1 1|1
1 1| 2
Comment: Relative Risk 1 indicates the independency of the categorical variables.
Note: Relative risk and difference of proportion are affected by the interchange of rows and columns.
Odds Ratio
In 2 2 contingence table, within row-1, the odds that the response is in column 1 instead of column 2 are defined to
be 1
1|1
2|1
and within row-2 the corresponding odds equals 2
definition is i
1| 2
2|2
. For joint distributions, the equivalent
i1
; i 1, 2 . Each i is non-negative, with value greater than 1 when response 1 is more likely
i2
than response 2.
For example, when 1 4 , in the first row response 1 is 4 times as likely as response 2. The within-row conditional
distributions are identical, and thus the variables are independent, if and only if 1 2 .
The ratio of the odds 1 and 2 is
1
, called the odds ratio. From the definition of odds using joint
2
probabilities,
11
12

11 22
21
12 21
22
0 .
It is called the cross-product ratio, since it equals the ratio of the products 11 22 and 12 21 of probabilities from
diagonally opposite cells. The odds ratio can equal any non negative number. When all cell probabilities are positive
independence of X and Y are equivalent to 1 , When 1 , subjects in row 1 are more likely to make the
first response that are subjects in row 2; that is, 1|1 1|2 . For instance, when 4 , the odds of the first response
are four times higher in row 1 than in row 2. this does not mean that the probability 1|1 is four times higher than 1|2 ;
that is the interpretation of a relative risk of 4.0. when 0 1 , the first response is less likely in row 1 than in row 2;
that is 1|1 1|2 . When one cell has zero probability, equals 0 or
Test procedure and Obtaining C.I. for Odds Ratio

For sample cell frequencies nij , a sample version of is n11n22 n12 n21 . For testing odds ratio we can use the
following procedure.
Under null hypothesis
H 0 : 1
H : 1
Under
Now,
1
log 0
N log , Var log
log

L arg e Sample
Where,
Var log
1
1
1
1
n
n
n
n
11
12
21
22
log 0
Var log
1
2
~ N 0,1 |H 0
Confidence Interval
L.L. log Z
U .L. log Z
2
Var log
Var log
0 contains in the interval indicates the independency
Original Unit
L.L. e L.L.
U .L. eU .L.
1 contains in the interval indicates the independency
Properties
The value of does not change if both cell frequencies within any row is multiplies by a non-zero constant or
if both cell frequencies within any column are multiplied by a constant.
Two values for represent the same level of association, but in opposite directions, when one value is the
inverse of the other. For instance, when 0.25 , the odds of the first response are 0.25 times as high in row
1 as in row 2, or equivalently 1 0.25 4.0 times as high in row 2 as in row 1. If the order of the rows is
reversed or if the order of the column is revered, the new value of is simply the inverse of the original
value.
The odds ratio does not change value when the orientation of the table is reversed so that the rows become
the columns and the columns become the rows. Therefore, it is unnecessary to identify one classification as
the response variable in order to calculate .
Values of farther from 1.0 in a given direction represent stronger levels of association.
It is sometimes more convenient to use ln . Independence corresponds to ln 0 . The log odds ratio
is symmetric about this value- reversal of rows or of columns results in a change in its sign. Two values for
ln that are the same except for sign, such as ln 4 1.39 and ln 0.25 1.39 , represent the same
level of association.
An implication of the multiplicative invariance property is that the sample odds ratio estimates the same
characteristic even when we select disproportionately large or small samples from marginal categories of
a variable. For instance, suppose a study investigates the association between vaccination and catching a
certain strain of flu. For a retrospective design, the sample odds ratio estimates the same characteristic
whether we randomly sample (1) 100 people who got the flu and 100 people who did not, or (2) 150 people
who got the flu and 50 people who did not, in each case classifying subjects on whether they took the
vaccine. In fact, the odds ratio is equally valid for retrospective, prospective, or cross-sectional sampling
designs. We would estimate the same characteristic if (3) we randomly sample 100 people who took the
vaccine and 100 people who did not, and then classify them on whether they got the flu, or (4) we randomly
sample 200 people and classify them on whether they took the vaccine and whether they got the flu.
Comments: Odds Ratio 1 indicates the independency of the categorical variables.
Relationship between Odds Ratio and Relative Risk

For the
2 2 tables, the relative risk is defined as,

Relative Risk
1|1
1|2
On the other hand, the odds ratio can be define as,

1|1
2|1
1|1 2|2
Odds Ratio
.
1| 2
1|2 2|1
2| 2
Now we have,
Odds Ratio
1|1 2|2
1|2 2|1
1|2 1 1|1
1|1 1 1|2
1 1|2
.
So, Odds Ratio Relative Risk
1 1|1
Their magnitudes are similar whenever the probability of response 1 is close to zero for both groups. When the
sampling design is retrospective, it is possible to construct conditional distributions within levels of the fixed response.
It is usually not possible to estimate the probability of the outcome of interest, or to compute the difference of
proportions or relative risk for that outcome.
We can compute the odds ratio, however, since it is determined by the conditional distributions in either direction.
When the probability of the outcome of interest is very small, the population odds ratio and relative risk take similar
values. Thus, we can use the sample odds ratio to provide a rough indication of the relative risk.
Measures of Ordinal Association

A basic question researchers usually pose when analyzing ordinal data is Does Y tend to increase as X increases?
Bivariate analyses of intervalscale variables often summarize covariation by the Pearson correlation, which describes
the degree to which Y has a linear relationship with X . Ordinal variables do not have a defined metric, so the notion
of linearity is not meaningful. However, the inherent ordering of categories allows consideration of monotonicityfor
instance, whether Y tends to increase as X does. Measures for ordinal variables that are analogous to the Pearson
correlation describe the degree to which the relationship is monotone.
In a strict sense, comparisons of two subjects on an ordinal scale can answer Which subject makes the higher
response? when we observe the ordering of two subjects on each of two variables, we can classify the pair of
subjects as concordant or discordant.
Concordant and Discordant

The pair is concordant if the subject ranking higher on variable X also ranks higher on variable Y . The pair is
discordant if the subject ranking higher on X ranks lower on Y . The pair is tied if the subjects have the same
classification on X and / or Y .
Consider two independent observations from a joint probability distribution ij
for two ordinal variables. For that pair
of observations,
c ij
i
hk
hi k j
and
d ij
i
j
hk
h i k j
are the numbers of concordance and discordance respectively.

Several measures of association for ordinal variables utilize the difference c d between these probabilities. For
these measures, the association is said to be positive if c d 0 and negative if c d 0 and independent if
c d 0 .
Example of Job Satisfaction
We illustrate concordance and discordance using Table 2.4, taken from the 1984 General Social Survey of the National
Data Program in the United States as quoted by Norusis (1988). The variables are income and job satisfaction. Income
has levels less than $6000 (denoted <6), between $6000 and $15,000 (6-15), between $15,000 and $25,000
denoted by (15-25) , and over $25,000 (>25) . Job satisfaction has levels very dissatisfied (DV), little dissatisfied
(LD), moderately satisfied (MS), and very satisfied (VS). We treat VS as the high end of the job satisfaction scale.
Table: Cross Classification of Job Satisfaction by Income
Very
Dissatisfied
Job Satisfaction
Little
Dissatisfied
6000
6000 -15, 000
20
24
80
82
22
38
104
125
15, 000 - 25, 000
13
28
81
113
25, 000
18
54
92
Income (US$)
Moderately
Satisfied
Very
Satisfied
Consider a pair of subjects, one of whom is classified in the cell ( 6, VD) and the other in the cell (6-15, LD) , so
there are 20 38 760 concordant pairs from these two cells. The 20 subjects in the cell ( 6, VD) are also part of a
concordant pair when matched with each of the other (104 125 28 81 113 18 54 92) subjects ranked higher
on both variables. Similarly, the 24 subjects in the cell ( 6, LD) cell are part of concordant pairs when matched with
the (104 125 81 113 54 92) subjects ranked higher on both variables.
The total number of concordant pair denoted by C
C 20 38 104 125 28 81 113 18 54 92
24 104 125 81 113 54 92 80 125 113 92
22 28 81 113 18 54 92 38 81 113 54 92
104 113 92 13 18 54 92 28 54 92 81 92 109, 520
The number of discordant pairs of observations is

D 24 22 13 7 80 22 38 13 28 7 18 ... 113 7 18 54 84, 915
In this example, C D suggests a tendency for low income to occur with low job satisfaction and high income with
high job satisfaction.
Gamma
Given that the pair is untied on both variables,
c
c d
is the probability of concordance and
d
c d
is the
probability of discordance. The difference between these probabilities is
c d
c d
called Gamma. Its range is 1 1 . The absolute value of the correlation is 1 when the relationship between X
and Y is perfectly linear, only monotonicity is required for | | 1 , with 1 , if d 0 and 1 if c 0 . The
perfect association value 1 occurs even when the relationship is not strictly monotone. If 1 , for instance, then
for observations X a , Ya and X b , Yb on a pair of subjects a and b having X a X b , it follows that Ya Yb but
not necessarily that Ya Yb . Independence implies 0 , but the converse is not true.
Yules Coefficient
For 2 2 tables, we define Q
11 22 12 21
. This measure, which Yule (1900, 1912) introduced and called Q in
11 22 12 21
honor of the Belgian statistician Quetelet, is now referred to as Yules Q . the range of Q is 1 Q 1 .
Relationship between Odds Ratio and Yules coefficient

For 2 2 table, the odds ratio can be defined as
11
12

11 22
21
12 21
22
and Yules coefficient can be defined as

Q
11 22 12 21
,
11 22 12 21
Now we have
11 22
1
11 22 12 21

1
Q
12 21
.
11 22
11 22 12 21
1
1
12 21
So, Q
1
1
1 Q 1
Gamma is a strictly monotone transformation of from the 0, scale onto the 1, 1 scale.
Kendalls tau-b and Somersd

The sample correlation between the n n 1 distinct X ab , Yab pairs equals
C D
n n 1
Tx

2

Where,
Tx
n n 1
Ty
2

ni 1 ni
Ty
1
2
n j 1 n j
2
This index of ordinal association is called Kendalls tau-b. Tau-b tends to be less sensitive than gamma to the choice of
response categories.
So,
C D
Kendall's tau-b=
n n 1
Tx

2
n n 1
Ty
2

1
2
Hence, Somersd can be also defined as

Somers ' d
CD
n n 1
Tx
Where, d indicate the difference between the proportion of concordant and discordant pairs, out of those pairs untied
on X .
Measures of Nominal Association

When variables in a two-way table are nominal, notions such as positive / negative association and monotonicity are
no longer meaningful. It is then more difficult to describe association by a single number, and summary measures are
less useful than for ordinal or interval-scale variables.
Proportional Reduction
The most interpretable indices for nominal variables have the same structure as R-square (the coefficient
determination) for interval variables. R-squared and the more general intraclass correlation coefficient and correlation
ratio describe the proportional reduction in variance from the marginal distribution to the conditional distributions of the
response.
Let V Y denote a measure of variation for the marginal distribution
V Y | i denote this measure computed for the conditional distribution
1 , L
1|i , L
, J of the response Y and let
, J |i
of Y at the i th setting of an
explanatory variable X . A proportional reduction in variation measure has form

where E V Y | X
V Y E V Y | X
V Y
is the expectation of the conditional variation taken with respect to the distribution of X . When
X is a categorical variable having marginal distribution 1 , L , I
E V Y | X
i V Y | i . If 1 , then
i
the association between X and Y is strong and if 0 , then the association of Q and Y is zero.
Concentration and Uncertainty Measure

One variation measure for a nominal response is
V Y j 1 j
j
j 2 j
j
1 2 j
j
This is the probability that two independent observations from the marginal distribution of Y fall in different categories.
J
The variation takes its minimum value of zero when j 1 for some j and its maximum value of
j
I
for all
J
1
J
when
j . Then the conditional variation in row i is

V Y | i
j |i 1 j |i
j |i 2j |i
j
2j | i
j
For an I J contingency table with joint probabilities ij , the average conditional variation is,
E V Y | X
E V Y | X
i V Y | i
E V Y | X 1
i i 2j |i
i
i 1 2j |i
ij2
i 2
i
ij2
i
i
The proportional reduction in variation is Goodman and Kruskals tau
V Y E V Y | X
V Y
2
1 2 j 1 ij
j
i
j i

1 2 j
iij
i
1 2 j
1 2 j
also called the concentration coefficient. A large value for represents a strong association.
Theil (1970) proposed an alternative variation measure V Y j log j

j
and V Y | i j |i log j | i .
j
Now,
i j | i log j | i
ij
ij
E V Y | X i
log
i
i
i
j
ij
ij log
i
i
j
E V Y | X i V Y | i
So the proportional reduction in variation index
ij
V Y E V Y | X
U
V Y
j log j ij log
j
j log j
j
ij
ij log j ij log
i
j log j
j
ij
ij
j
log j
log
j log j
j
ij
ij log
i j
j log j
j
is called the uncertainty coefficient.

The measure and U are well defined when more than one j 0 . They take values between 0 and 1. When
U 0 , then X and Y are independent and U 1 indicate that there is no conditional variation in the sense
that for each
i , j |i 1 for some j .
The variation measure used in
is called the Gini concentration and the variation measure used in U is the entropy.
A difficulty with these measures is in determining how large a value constitutes a strong association. When the
response variable has several possible categorizations, these measures tend to take smaller values as the number of
categories increases. For instance, for
the variation measure is the probability that two independent observations
occur in different categories. Often this probability approaches 1.0 for both the conditional and marginal distributions as
the number of response categories grows larger, in which case
decreases toward 0.
Lambda Coefficient
Goodman and Kruskal (1954) proposed an alternative measure, lambda, for nominal variables. It has
Var Y 1 max j
V Y E V Y | i
V Y
and
1 max j E 1 max j j |i
Var Y | i 1 max j j |i

1 max j i 1 max j j |i
i
1 max j
1 max j
Comments
equal 0 indicates the complete independency, where 1 indicates the complete dependency.
Example 2.10
Describe the association in table 2.11, based on a sample conducted in 1965 of a probability sample of high school
seniors and their parentsTable
Parent Party
Student
Identification
Party
Identification
Democrat
Independent
Republic
Total
Democrat
604
245
67
916
Independent
130
235
76
441
Republic
63
180
252
495
Total
797
660
395
1852
797
;
1852
660
;
1852
395
1852
V Y p1 1 p1 p2 1 p 2 p3 1 p 3
0.245 0.230 0.168
0.643
604
604
245
245
67
67
1

1

1
916
916
916
916
916
916
0.225 0.196 0.068
0.448
V Y | X D
130
130
235
235
76
76
1

1

1
441
441
441
441
441
441
0.208 0.249 0.143
0.60
V Y | X I
63
63
180
180
1

1

495
495
495
495
0.111 0..231 0.25

0.592
V Y | X R
E V Y | X
0.448 0.60 0.592

3
V Y E V Y | X
V Y
252
252
1
495
495
0.547
0.643 0.547
0.643
0.149
There is very little bit relation between high school seniors and their parents.
Example
A sample of size 500 respondent was selected in a large metropolitan area to determine various concerning consumer
behavior. The following contingency table was given below:
Enjoys Shopping for clothing
Sex
Yes
No
Male
136
104
Female
224
36
a)
Find joint probabilities, conditional probabilities.
b)
Is enjoy of shopping depends on sex?
c)
What is the probability that a female was not enjoying shopping for clothing?
d)
Compute sample difference of proportions, relative risk and odds ratio.
Solution
i)
Joint probabilities are,
136
500
104
500
224
500
36
500
11
0.272
12
0.208
21
22
0.448
and
0.072
Also, the conditional probabilities are
ii)
1|1
n11
n1
136
240
0.567
1|2
n21
n2
224
260
0.862
2|1
n12
n1
104
240
0.433
2|2
n22
n2
36
260
0.138
and
Here,
n1
240
n
500
n
360
1 1
n
500
1 0.48 0.72
0.48
0.72
0.346
Hence, 1 11
So that, we may conclude that enjoy of shopping for clothing depends on sex.
iii)
The probability that a female was not enjoy shopping for clothing is
1|2
iv)
36
260
0.138
Here, proportion of male is
136
224
0.567 and proportion of female is
0.862 . So that the sample
240
260
difference of proportion is 0.862 0.567 0.295 . Relative Risk
0.567
0.658 . So that proportion enjoy shopping
0.862
for clothing was 0.658 times lower for male than for female. The sample odds ratio is
136 36
0.210 .
224 104

Categorical Response Data (HALF)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Categorical Response Data (HALF)

Uploaded by

Copyright:

Available Formats

Categorical Response Data

row total, n J is the j

column total, nIJ is the frequency of the i

Categorical Response Data ~ 1 of 13

Notation and Definitions

total sample size, so that

is called the sample joint distribution. The sample marginal

for the row variable, where pi pij and by

for the column variable, where p j pij . Note that

analogs will be denoted by ij for the joint probabilities and j i

for the conditional probabilities. The population

for all i and j . The

When X and Y are independent,

provides a more natural

definition of independence than (1).

Categorical Response Data ~ 2 of 13

1|i , 2|i 1|i , 1 1|i

The proportion of times that subjects in row

Way of Comparing Proportions

is the probability of response 1 and

for all pairs of

For the 2 2 case,

Test procedure and Obtaining C.I. for Difference of Proportion

Categorical Response Data ~ 3 of 13

Under alternative hypothesis

1|1 1|2 E 1|1 1|2

Var 1|1 1|2

1|1 1|2 1|1 1|2

1|1 2|1 1|2 2|2

Var 1|1 Var

Var 1|1 Var 1|2

Confidence Interval can be written as follows

1|1 2|1 1|2 2|2

1|1 1|2 1|1 1|2 Z

1|1 2|1 1|2 2|2

Comment: Relative Risk 1 indicates the independency of the categorical variables.

and within row-2 the corresponding odds equals 2

. For joint distributions, the equivalent

Categorical Response Data ~ 4 of 13

The ratio of the odds 1 and 2 is

Test procedure and Obtaining C.I. for Odds Ratio

N log , Var log

0 contains in the interval indicates the independency

1 contains in the interval indicates the independency

Categorical Response Data ~ 5 of 13

Comments: Odds Ratio 1 indicates the independency of the categorical variables.

Relationship between Odds Ratio and Relative Risk

2 2 tables, the relative risk is defined as,

On the other hand, the odds ratio can be define as,

Categorical Response Data ~ 6 of 13

Measures of Ordinal Association

Concordant and Discordant

Consider two independent observations from a joint probability distribution ij

for two ordinal variables. For that pair

are the numbers of concordance and discordance respectively.

15, 000 - 25, 000

Categorical Response Data ~ 7 of 13

24 104 125 81 113 54 92 80 125 113 92

104 113 92 13 18 54 92 28 54 92 81 92 109, 520

The number of discordant pairs of observations is

is the probability of concordance and

probability of discordance. The difference between these probabilities is