You are on page 1of 11

RVUC, FACULTY OF HEALTH SCIENCES

BIOSTATISTICS SOLVED PROBLEMS FOR PUBLIC HEALTH STUDENTS



BASIC PROBABILITY AND PROBABILITY DISTRIBUTION
Example 1
To calculate the probability of event (A) and event (B) happening (independent events)for example, if
you have two identical packs of cards (pack A and pack B),what is the probability of drawing the ace of
spades from both packs?
Solution
Formula: P(A) x P(B)
P(pack A) =1 card, from a pack of 52 cards =1/52 =0.0192
P(pack B) =1 card, from a pack of 52 cards =1/52 =0.0192
P(A) x P(B) =0.0192 x 0.0192 =0.00037
Example 2
A study investigating the effect of prolonged exposure to bright light on retina damage in premature
infants. Eighteen of 21 premature infants, exposed to bright light developed retinopathy, while 21 of 39
premature infants exposed to reduced light level developed retinopathy. For this sample, the probability
of developing retinopathy is:
Solution
P(Retinopathy) =No. of infants with retinopathy
Total No. of infants
= 18 +21 =0.65
21 +39
Example 3
The following data are the results of electrocardiograms (ECGs) and radionuclide angiocardiograms(RAs)
for 19 patients with post-traumatic myocardial contusions. A +indicates abnormal results and a -
indicates normal results.
1.Calculate the probability of both ECG and RA is abnormal
2.Calculate the probability that either the ECG or the RA is abnormal

Solutions
1.P(ECG abnormal and RA abnormal) =7/19 =0.37
2.P(ECG abnormal or RA abnormal) =P(ECG abnormal) +P(RA abnormal) P(Both ECG and RA
abnormal) =17/19 +9/19 7/19 =19/19 =1
NB: We can not calculate the above probability by adding the number of patients with abnormal ECGs to
the number of abnormal RAs, I.e. (17+9)/19 =1.37
The problem is that the 7 patients whose ECGs and RAs are both abnormal are counted twice

Example3.1: For the retinopathy data, the conditional probability of retinopathy, given exposure to
light, is
P (Retinopathy/exposure to bright light)=No. of infants with retinopathy exposed to bright light
No. of infants exposed to bright light
=18/21=0.86
P(Retinopathy/exposure to reduced light)=No.of infants with retinopathy exposed to reduced light
No. of infants exposed to reduced light
=21/39 =0.54
The conditional probabilities suggest that premature infants exposed to bright light have a higher risk of
retinopathy than premature infants exposed to reduced light.
Example 4

1. What is the probability of a person randomly picked is a male?
2. What is the probability of a person randomly picked uses cocaine more than 100 times?
3. Given that the selected person is male, what is the probability of a person randomly picked uses
cocaine more than 100 times?
4. Given that the person has used cocaine less than 100 times, what is the probability of being female?
5. What is the probability of a person randomly picked is a male and uses cocaine more than 100 times?
Solutions
1. P
r
(m)=Total adult males/Total adult cocaine users =75/111 =0.68 .
2. P
r
(c>100)=All adult cocaine users more than 100 times/ Total adult cocaine Users=34/111=0.31.
3. P
r
(c>100\m)=25/75=0.33.
4. P
r
(f\c<100)=(7+20)/36=27/36=0.75.
5. P
r
(m c>100)= P
r
(m) P
r
(c>100) =75/11125/75=25/111=0.23.

Example 5
In 1932 the Stanford-Binet IQ test was roughly normally distributed with = 100 and = 15. Over time
IQs have increased (better nutrition or more experience taking test??) so average IQ for present day
American children taking the 1932 test would be 120 but with same . Very Superior" is an IQ above
130.
(a)What %of 1932 children were very superior?
(b) What %of present day children would be very superior on 1932 test?
Solution
Let X be 1932 IQ scores & Let Y the scores of present day children on the 1932 test.
X ~N(100, 15) & Y ~N(120, 15)
(a)P(X >130) =P(Z >(130 -100)/15) =P(Z >2.0) =0.0228( fromZ table.)
2.28 %of 1932 children were very superior
(b) P(Y>130) =P(Z >(130-120)/15) =P(Z >0.67) =0.2514 25.14%of present day children are
very superior
Example 6
A data collected on systolic blood pressure in normal healthy individuals is normally distributed with =
120 and = 10 mm Hg.
1)What proportion of normal healthy individuals have a systolic blood pressure above 130 mm Hg?
2)What proportion of normal healthy individuals have a systolic blood pressure between 100 and 140
mm Hg?
3)What level of systolic blood pressure cuts off the lower 95%of normal healthy individuals?
Solutions


Example 7
Each child born to a particular set of parents has a probability of 0.25 of having blood type O. If these
parents have 5 children. What is the probability that :
Exactly two of them have blood type O
At most 2 have blood type O
At least 4 have blood type O
ESTIMATION
Example 1
A physical therapist wished to estimate, with 99%confidence, the mean maximal strength of a particular
muscle in a certain group of individuals. He assumes that strength scores are approximately normally
distributed with a variance of 144. A sample of 15 subjects who participated in the experiment yielded a
mean of 84.3.

Solution: Given: =0.01
58 . 2
2
=

Z
x=84.3, n=15, =12
So, the 99% CI for is 84.3 2.58 (12/ 15) 84.38.0 (76.3, 92.3)
We are 99% confident that the population mean is between 76.3 and 92.3.
Exercise 2
A study of hypoxemia during the immediate postoperative period reported the fractions of ideal weight
for 11 patients who became severely hypoxemic during transfer to the recovery room. The mean is 1.51
and the standard deviation is 0.33. Estimate the 95%C.I. for the population mean fraction of ideal
weight, where the population consists of hypoxemic patients similar to those in the study (The data is
normally distributed, use =0.05).
Solutions
t
/2
,n-1= t
0.025
,10=2.2281

We are 95% sure that the population mean lies between 1.289 and 1.731
Example 3




Example 4
The serum progesterone levels for 29 women with ectopic pregnancies and 20 women with early
intrauterine pregnancies are obtained. The data are normally distributed with mean 5.6 for the
ectopicectopic pregnancies and mean 30.9 for the other group, and standard deviation of 3.6 for women
with ectopicectopic pregnancies and 6.9 for the early intrauterine pregnancies. Calculate a 95%
confidence interval for 1-2.
Solutions
Example: A study of gonadal dysfunction in diabetic men was
conducted and for the first group of men with primary organic
impotence, a sample of 11 subjects was selected.
1
x =524.0, mean total
testosterone value, S1=135.8. For the second sample which is men with
primary psychogenic impotence, the sample is 7 men with 2 x =701.1,
mean total testosterone value, and S2=154.4 Assume that the data are
normally distributed, calculate a 99% CI for 1 -2.

Solution: 2 8 . 0 5 . 0 8 . 0
2
) 4 . 154 (
2
) 8 . 135 (
2
2
2
1
< < = =
S
S


We assume that the population variances are equal.

NB. If the population variances are not with range of 0.5 - 2, we can't
use pooled variance procedure because they are not equal. Rather use the
single /separate variance procedure which is f formula in the following
section.
SP= 1 . 143
16
) 23839 ( 6 ) 6 . 18441 ( 10
=
+


=0.01 921 . 2
16 , 005 . 0
005 . 0 2 / = = t
) 0 . 25 , 2 . 379 (
7
2
) 1 . 143 (
11
2
) 1 . 143 (
921 . 2 ) 1 . 701 0 . 524 ( + .

We are 99% sure that 1 -2 is between -379.2 and
25.0. This indicates that the two population means can
be equal as the interval includes zero. So, there is no
significant difference.


Example 5

Example 6
Two hundred patients suffering from a certain disease were randomly divided into two equal groups. Of
the first group, who received the standard treatment, 78 recovered within three days. Out of the other
100, who were treated by a new method, 90 recovered within three days. The physician wished to
estimate the true difference in the proportions who would recovered within three days.
The estimate of the difference in the population proportions is:P1P2=0.780.90 =-0.120.12
The 95%C.I.is

we are 95%sure that the difference is between 0.22 and 0.02. Note that the negative signs merely
reflect the fact that better results were obtained by using the new treatment
Example 7
A study on the effect of low-calorie intake on abnormal pulmonary physiology in patients with chronic
hypercapneic respiratory failure.

We can be 90% sure that the interval from 8 to 19 mm Hg contains the actual mean increase in arterial
oxygen tension for patients after weight reduction program.

Example: A study, on dental health practice, of the 300 adults interviewed 123 said that
they regularly had a dental check-up twice a year. What is the 95% CI for the population
proportion, P fromwhich the sample is drawn?

p =123/300 = 0.41, a point estimator of P; = 0.05 96 . 1
025 . 0
2
= = Z Z



The 95% CI for P is therefore 0.41
300
) 59 . 0 )( 41 . 0 (
96 . 1 ) 46 . 0 , 36 . 0 ( . We are 95%
confident that the population proportion is between 36% and 46%.

SAMPLE SIZE DETERMINATION
Example 1

A) p = 0.26 , d = 0.03 , Z = 1.96 ( i.e., for a
95% C.I.)
Example

Example 2
What sample size do we need to estimate the prevalence of TB among residents of a town such that the
error of estimation is within 5%of its actual parameter with 95%confidence?

HYPOTHESIS TESTING
Example:1.
Researchers are interested in the mean level of some enzyme in a certain population. They want to
know whether they can conclude that the mean enzyme level in this population is different from 25.
Solution
Step 1: State the hypothesis: H
o
: =25 ; H
A
25
Step 2: They collect a sample of size 10 from a normally distributed population with a known
variance,
2
=45 and the calculated sample mean is =22.
Step 3: Select the appropriate test statistic. The assumptions given are
Testing a hypothesis about population mean
The population is normally distributed
Population variance is known
So, Z-statistic is appropriate
Step 4: Level of significance: =0.05
.
279
Step 5: Critical values
.
Step 6. Performthe calculation
280
x =22, o=25, = 45 , n=10 Z= 41 . 1
10 / 45
25 22
=



Step 7: Since -1.41 falls in the acceptance region we accept the null hypothesis.
The mean enzyme level in the population is not different from 25.

Example.2
Serum Amylase level determination was made on a sample of 15 apparently healthy subjects. The
sample yielded the mean of 96 units/100 ml and a standard deviation of 35 units /100 ml. The variance
of the population was unknown. We want to know whether we can conclude that the mean of the
population is different from 120 units/100 ml.
Solutions
At =0.05.
t value of 0.025 at df of 14: 2.145
Test of Hypothesis, Single Population Mean Cont..
Step 1and 2: DefinetheHo and H1.
Step 3: Decideappropratetest statistic.
t test
Step 4and 5: Decidelevel of significance and critical
value.
value of 0.05.
t value for of 0.025 at df of 14: 2.145
Step 6: Obtain the Valueof the Test Statistics and
label.
120 : =
o
H
120 :
1
= H
n S
X
t
/

=
15 / 35
120 96
= t
65 . 2 = t
283

Step 7: Make a decision and interpret it.
We reject the null hypothesis b/c
The cal test statistic -2.65 is in the rejection area
The corrspoinding P value of -2.65,is less than the /2 value of 0.025.
Example 3
A researcher wants to check whether the systolic blood pressure among males is different from females
or not. Among 50 male samples the mean SBP was 100mmHg with standard deviation of 5 mmHg.
Among 60 females, the mean SBP was 104mmHg with standard deviation of 10 mmHg. Is there
significant difference between the two means?
Solutions
Step 1 and 2: Define the Ho and H
1

Step 3: Decide approprate test statistic: Z test
Step 4 and 5: Decide the level of significance and critical value:
value of 0.05.
1.96 is the critical value.
Cont..
Step 6:Obtain the Value of the Test Statistic:
2
2
2
1
2
1
2 1
) (
n n
X X
Z

+

=
60
10
50
5
104 100
2 2
+

= Z
67 . 1 5 . 0
4
+

= Z
72 . 2
47 . 1
4
=

= Z
289

Step 7: Make a decision and interpret it.
We reject the H0 and accept the H1 (at95%confidence level) b/c
The cal test statistic -2.72 is in the rejection region.
The corrspoinding P value of -2.72 is less than the value of 0.025.
Example 4.
Serum amylase determination was made on a sample of 15 apparently healthy subjects and 21
hospitalized subjects. Among healthy subjects, the mean was 96 units/100ml with standard deviation of
35 units/100 ml. Among hospitalized patients, the mean was 120 units/100ml with standard deviation of
40 units/100 ml. Is there significant difference between the two mean values?
Solutions
Step 1 and 2: Define the H
o
and H
1

Step 3: Decide approprate test statistic.
t test
Step 4 and 5: Decide level of significance and critical value.
value of 0.01.
t value for /2 of 0.005 at df of 34: 2.728
Two Population Means Cont
Step 6:Obtain the Valueof the Test Statistics
38 1445.6
34
) 40 )( 20 ( ) 35 )( 14 (
2
) 1 ( ) 1 (
2 2
2 1
2
2 2
2
1 1
= =
+
=
+
+
=
n n
S n S n
S
293
Testing of Hypothesis about Two Population Means
Cont
Step 7:Make a decision and interpret it.
We accept the null hypothesis (at 99% confidence level)
b/ c:
The cal test statistic -1.89 is in the acceptance region.
The corrspoinding P value of -1.89is greater than the
value of 0.005.
21
38
15
38
120 96
2 2
+

= t
8 . 68 3 . 92
24
+

= t
89 . 1 = t
294

Example 5
A random sample of 10 young men was taken and the pulse rate was measured before and after taking
a cup of coffee. The result is given as follows. Does the coffee has any effect on the heart rate? (perform
the hypothesis testing with 95%CI)
Subject PR before PR after Difference
1 68 74 +6
2 64 68 +4
3 52 60 +8
4 76 72 -4
5 78 76 -2
6 62 68 +6
7 66 72 +6
8 76 76 0
9 78 80 +2
10 60 64 +4
Mean 68 71 +3

Testing of Hypothesis about Two Population Means
Cont
H0: Coffeeintakehasnoeffect onPR
H
1
: Coffeeintakehaseffect onPR
Test statistic: pairedt test
Critical value2.262
First calculatetheSD thenthetest statistic:
Reject thenull hypothesis(at 95%confidencelevel)
Coffeeintakehaseffect onPR.
92 . 3
1
) (
2
=

n
d di 4 . 2
10
92 . 3
3
= = t
299


Example 6
A survey was conducted to determine the prevalence of protein energy malnutrition in a rural kebele. Of
300 under five children assessed, 123 were stunted. Can we conclude that the prevalence of PEM in the
population is 50%?
Test of Hypothesis About Single Population
Proportion
Step 1and 2:Define the Ho and HN
Step 3: Approprate test statistic:
Z statistic
Step 4and 5: Decide the level of significance and the
corresponding critical value:
Lets take value of 0.1. Hence 1.645 is the critical
value.
5 . 0 : =
o
H
5 . 0 : =
N
H
302
..
6.calculatefor Z
303
11 . 3
300
25 . 0
09 . 0
300
) 5 . 0 ( 5 . 0
5 . 0 41 . 0
) 1 (
= =

=
n
p
Z



Step 7: Make a decision and interpret it.
At 90%confidence level we reject the null hypothesis that P=0.5.
The calculated test statistic -3.11 is in the rejection region.
The corrspoinding P value of -3.11 is less than the value of 0.05.
Example 7
The prevalence of malaria among two malaria endemic kebeles X and Y was compared. In kebele X
among 120 samples 15 were positive. In kebele B among 100 samples 20 were positive. Is there any
significant difference between the prevalence of malaria kebele X and Y?
Testing of Hypothesis, Two Population Proportions
Step 1and 2: Define the Ho and HN:
Step 3: Decide approprate test statistic:- Z statistic
Step 4and 5:Decide value &the critical value:
Lets take value of 0.05. Hence 1.96 is the
critical value.
Step 6: Obtain the Value of the Test Statistics:
First calculate the proportions & the pooled
proportion
P1 = 15/ 120 = 0.125, P2 = 20/ 100 = 0.2
2 1
: P P H
o
=
2 1
: P P H
N
=
307
Testing of Hypothesis about two Population
Proportions
Then we calculate thetest statistic:
Step 7: Make a decision and interpret it.
At 95% confidence level weaccept theH0 P1=P2 b/ c:
-1.51 is in theacceptanceregion.
2 1
2 2 1 1
n n
p n p n
P
+
+
=
100 120
) 2 . 0 ( 100 ) 125 . 0 ( 120
+
+
= P 159 . 0
220
20 15
=
+
= P
|
.
|

\
|
+

=
100
1
120
1
) 159 . 0 1 ( 159 . 0
2 . 0 125 . 0
Z
( )
51 . 1
0.0183 0.1337
075 . 0
=

= Z
308


Example 8
A researcher is interested to assess the effect of litracy on family planning use. Accordingly he collected
data and tabulated the findings in the following manner. Can we say there is association between
educational status and family planning use?
FP use Educational Status
Illiterate Literate Total
Yes 63 49 112
No 15 33 48
Total 78 82 160
Step 1 and 2: Define the Ho and HN:
Ho: There is no association between litracy and family planning use
H1: There is association between litracy and family planning use
Step 3: Decide approprate test statistic: X
2
test.
Step 4 and 5: Decide and the corresponding critical value:
Lets take value of 0.01.
At df of 1 =(2-1)(2-1)- the critical value is 6.63.
Accptance area is 0-6.635, Rejection area X
2
>6.63.
Step 6: Obtain the Value of the Test Statistics:
First the expected frequency should be calculated:
Expected frequency for cell a: 78 x 112/160 =54.6
Expected frequency for cell b: 82 x 112/160 =57.4
Expected frequency for cell c: 78 x 48/160 =23.4
Expected frequency for cell d: 82 x 48/160 =24.6
NB: Assumptions of X
2
test fulfilled.
Then we calculate the Chi-square statistics.

=
|
|
.
|

\
|
=
k
i i
i i
e
e O
x
1
2
2
) (
319

2
.
Step 7:Make a decision and interpret it.
At 99% confidence level we accept the H
A
that thetwo
variables are associateddue to the following reasons:
The calculated test statistic 8.41 is in the rejectionarea.
The corrspoinding P value of 8.41 (between 0.005 and
0.002) is less than thevalue of (0.01).
|
|
.
|

\
|
+
|
|
.
|

\
|
+
|
|
.
|

\
|
+
|
|
.
|

\
|
=
6 . 24
) 6 . 24 33 (
4 . 23
) 4 . 23 15 (
4 . 57
) 4 . 57 49 (
6 . 54
) 6 . 54 63 (
2 2 2 2
2
x
( ) ( ) ( ) ( ) 41 . 8 87 . 2 02 . 3 23 . 1 29 . 1
2
= + + + = x
320


Example 9
Suppose that in a cross-sectional study of the factors affecting the utilization of antenatal clinics you
found that 64%of the women who lived within 10 kilometers of the clinic came for antenatal care,
compared to only 47%of those who lived more than 10 kilometers away. This suggests that antenatal
care (ANC) is used more often by women who live close to the clinics. The complete results are
presented in the following Table :
Ex..
.

Distancefrom
ANC

UsedANC Didnot
useANC
Total
Lessthan 10 km 51(64%) 29(36%) 80(100%)
10 km 35(47%) 40(53%) 75(100%)
Total 86 69 155
Fromthe table we conclude that there seems to be a
difference in the use of antenatal care between those
who livecloseto andthosewholivefar fromtheclinic
(64% versus 47%). We now want to know if this
observeddifferenceisstatisticallysignificant or not.
322

Step 1 and 2: Define the Ho and HN:
Ho: There is no association between distance from clinic and ANCuse
H1: There is association between distance and ANC use
Step 3: Decide approprate test statistic: X
2
test.
Step 4 and 5: Decide and the corresponding critical value:
Lets take value of 0.05.
At df of 1 =(2-1)(2-1)- the critical value is 3.84.
Accptance area is 0-3.84, Rejection area X
2
>3.84
Ex
Step6:ObtaintheValueof theTest Statistics:
First calculateexpectedfrequenciesfor eachcell
E
1
=86x80/ 155=44.4 E
2
=69x80/ 155=35.6
E
3
=86x75/ 155=41.6 E
4
=69x75/ 155=33.4
Then calculatetheChi-squarestatistics.

=
|
|
.
|

\
|
=
k
i i
i i
e
e O
x
1
2
2
) (
324
Ex

2
=
(51 44.4)
2
44.4
+
(29 35.6)
2
35.6
+
(35 41.6)
2
41.6
+
(40 33.4)
2
33.4
= 0.98 + 1.22 + 1.05 + 1.30 = 4.55
Step7: Makeadecisionandinterpret it.
The calculated test statistic 4.55 is in the rejection area,
whichmeansthat thepvalueissmaller than0.05
At 95%confidencelevel weaccept theH
A
that thethereis
an association b/ n distance from clinic and ANC
utilization
We can now conclude that the women living within a
distance of 10 km from the clinic utilize antenatal care
significantlymoreoftenthanthewomenlivingmorethan10
kmaway.
325

You might also like