Professional Documents
Culture Documents
Science Statistics
1. Idea or Question 1. Stats model /
hypothesis
2. Collect data and
make observations 2. Study design
3. Describe data 3. Descriptive statistics
Point Estimation
For example, estimating Relative Risks
Hypothesis Testing
For example, testing if Relative Risk =
1
Types of Data
Categorical (qualitative)
Nominal scale - no natural order
gender, marital status, race
Ordinal scale
severity scale, good/better/best
Types of Data
Numerical (quantitative)
Discrete - (few) integer values
number of children in a family
Continuous - measure to arbitrary
precision
blood pressure, weight
Dependent versus Independent
Variables
These terms developed out of an
experimental research paradigm
Dependent Independent
Variable Variable Method
Categorical Categorical Relative Risk (C.I.)
(Discrete) (Discrete) Odds Ratio (C.I.)
Chi-square test
test of proportions
Dependent Independent
Variable Variable Method
Continuous Continuous Linear regression
Correlations
( 95% confidence)
(----- 99% confidence------
)
Point estimate
Measures of association for
categorical dependent
variables
and
categorical independent
variables
Test of Proportions and Chi-Square
test are used in related but different
situations
H0 : p1 = p2
HA : p1 p2
A statistic useful for this comparison is the
difference in the observed, or sample,
proportions
p1 p 2
p1 1 p1 p2 1 p2
p1 p 2 ~ N p1 p2 ,
2
n1 n2
Estimator for (p1 p2):
p 1 1 p 1 p 2 1 p 2
p 1 p 2 1.96
n1 n2
p1 p 2 0
Z
p 0 1 p 0 p 0 1 p 0
n1 n2
versus
# with reduced BP 65 38
HO: The proportion of men who had their
blood pressure reduced is the same as that
of the women who had their blood pressure
reduced.
p1q1 p2q2
Point estimate for p1- p2 = .76-.65
+ = .11
n1 n2
.76*.24
Standard .65*.35(p1-p2) =
deviation =
+ = .0770
50 100
Z =p1-p2 = .76-.65 = .11
standard deviation = .
0770
= 1.423
= (-0.011, 0.309)
Conclusion:
The hypothesis is
H0: factors are independent (pij=pi.p.j )
HA: factors are not independent
Chi - Square Test of Independence
=
2 (O E ) 2
(df) E
or k
[nij E(nij)]2 n i . n .j
2= i,j=1
E (nij) =
E (nij) n ..
HA: An association
k between smoking and
diabetes. [n ij E(nij)]2
2= i,j=1
Using: E (nij)
1. Calculate the expected values:
Diabetic Not Diabetic Total
Smoking 75*70/100 75*30/100 75
Not smoking 25*70/100 25*30/100 25
Total 70 30 100
2. Add up the squared differences in Obs -
Exp and divide by the expected values
= ((50-(75*70/100))2/ 75*70/100) +
((25-(75*30/100))2/ 75*30/100) +
((20-(25*70/100))2/ 25*70/100) +
((5 -(25*30/100))2/ 25*30/100) = 1.59
0
p, k
OR = p1/(1-p1)
p2/(1-p2)
HA: An association
(50*5)between smoking and
diabetes. (20*25)
Measures of association for
continuous dependent
variables
and
categorical independent
variables
Normal Distribution
A common probability model for continuous
data
Can be used to characterize the Binomial or
Poisson under certain circumstances
Bell-shaped curve
takes values between - and +
symmetric about mean
mean=median=mode
Examples
birthweights, height, weight
The arithmetic mean is the most common
measure of the central location of a
sample.
1 n
X Xj
n j 1
n 1 j 1
The T-Test
Tests for the equality of means in 2 groups
Null Hypothesis:
The two sample means are equal
HO: X1 X2 = 0 or X1 = X2
Alternate Hypothesis:
The two sample means are different
H : X X = 0 or X = X
(X1 X2)
Test Statistic: t (df) =
s2 s2
n1 +
1 2
n2
n1
i=1
(X 1i X1)2
where s2=
1 (n1 1)
n2
(X 2i X2)2
and s2= i=1
2
(n2 1)
(1.26 0.78)
t= = 2.806
(.32)2 (.32)2
+
7 7
n1 + n 2 2 = 7 + 7 2 =
12
What is the p-value?
Upper percentile of t distribution
Area =
0 t n
X
n
Sample mean= X = i
I=1
S
X + t/2
n
.0559
53 + 2.571 6 Or .53 + .059
1. Identify H0 and HA
2. Identify a test statistic
3. Determine a significance level, =
0.05, = 0.01
4. Critical value determines rejection /
acceptance region
5. p-value
6. Interpret the result