Professional Documents
Culture Documents
SCATTER DIAGRAM
The simplest method to assess relationship between two
quantitative variables is to draw a scatter diagram
CORRELATION COEFFICIENT
The correlation coefficient is an index of the degree of
association between two variables. It can also be used for
comparing the degree of association in different groups
High values of one variable occur with low values of the other
(and vice-versa)
we say that there is a negative correlation
A NOTE OF CAUTION
Correlation coefficient is purely a measure of degree of
association and does not provide any evidence of
a cause-effect relationship
It is valid only in the range of values studied
Extrapolation of the association may not always be valid
Spurious correlation :
The production of steel in UK and population in India
over the last 25 years may be highly correlated
PROPERTY OF
CORRELATION COEFFICIENT
Correlation coefficient is unaffected by addition / subtraction
of a constant or multiplication / division by a constant to all the
values of X and Y
= 0.7
,,
,,
5X & 2Y
= 0.7
COMPUTATION OF THE
CORRELATION COEFFICIENT
X
8
3
4
10
6
7
11
Sum 49
Y (X - X ) (Y- Y ) (X X) (Y- Y )
12
1
0
0
9
-4
-3
12
10
-3
-2
6
15
3
3
9
11
-1
-1
1
12
0
0
0
15
4
3
12
84
0
0
40
y
x
y
12
x
7
n=7
n
n
( x x )( y y )
40
6.67
Covariance (XY)
(n 1)
6
Cov ( xy )
6.67
r
0.98
S .d .( x) S .d .( y ) 2.94 X 2.31
UNIVARIATE REGRESSION
Regression : Method of describing the relationship
between two variables
Age (X)
45
48
46
45
46
48
46
55
51
56
53
60
53
54
49
Sys BP (Y)
150
153
148
150
147
153
149
159
157
160
158
165
157
158
154
REGRESSION MODEL
We can perform a regression of BP on age,
to derive a straight line that gives an estimated value of BP
for any given age.
The general equation of a linear regression line is
Y = a + bX + e
Where,
a = Intercept
b = Regression coefficient
e = Statistical error
CALCULATIONS
Estimated from the observed values of
Age (X) and BP (Y) by least square method
b 0
.......(1)
Test statistic t =
SE (b )
Where,
SE (b)
Y ) 2 b( X X ) 2
( n 2) ( X X ) 2
(Y
ASSUMPTIONS
1. The relation between the two variables should be linear
PRECAUTIONS
1. Adequate sample size should be ensured
2. Prediction should be made within the range of the
observed values. No extrapolation should be attempted
3. The equation Y = a + bX should not be used
to predict X for a given Y
4. Model adequacy should be verified
-------------------------------------------------------------------------------------Ind. variable
Reg Coeff. b SE b
t
P-value
-------------------------------------------------------------------------------------Age
1.08
0.08
14.16
< 0.0001
Constant
100.34
-------------------------------------------------------------------------------------R2 = 93.99% 94%
Systolic BP = 100.34 + 1.08 Age
95% CI for b = b 1.96 SE(b) = 1.08 1.96 x 0.08
= (0.92, 1.24)
INTERPRETATIONS
1. b 1.08 Change in age by one year results in a change of
1.08 mm Hg in Sys. BP
0.147 & 1.024 are regression coefficients for ht. and wt.
Indicate the increase in
PEmax
for
LOGISTIC REGRESSION
Response variable - Presence or absence of some condition
We predict a transformation of the response variable
instead of the actual value of the variable
Data : Hypertension, Smoking (X1) , Obesity(X2) & Snoring (X3)
Which of the factors are predictors of hypertension?