You are on page 1of 65

REGRESSION AND CORRELATION :

APPLIED
Vitri Widyaningsih

Public Health Department


Faculty of Medicine
Sebelas Maret University
What we are going to learn
Regression (Linear and Logistic)
Linear regression
When to use linear regression
Requirements for regression
Conducting regression with SPSS
Interpreting the result

Logistics regression
When to use logistics regression
Requirements for regression
Conducting regression with SPSS
Interpreting the result

Correlation
When to use correlation
Requirements for correlation
Conducting correlation with SPSS
Interpreting the result
Variable
Variable Scale

Categorical Numeric/ Continuous


REGRESSION
Typical problem to address
What is the effect of weight circumference on femur BMD
level? Linear regression
Continuous dependent variable
Continuous independent variable

What is the effect of parathyroid hormone on the


incidence of osteoporosis? Logistic
Categorical dependent variable regression
Continuous independent variable
Types of Regression Models

1 Explanatory Regression 2+ Explanatory


Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear
LINEAR REGRESSION
Linear Regression Models
Relationship between one dependent variable and
explanatory variable(s)

Equation Used
Numerical (Continuous) Dependent (Response) Variable
1 or More Numerical or Categorical Independent (Explanatory) Variables

Used Mainly for Prediction & Estimation


Linear Regression Model

Relationship Between Variables Is a Linear Function

Population Population Random


Y-Intercept Slope Error

Yi 0 1X i i
Dependent Independent
(Response) (Explanatory) Variable
Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
Linear Equations

Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X
1984-1994 T/Maker Co.
Linear Regression Assumptions
Linear association

Normal Distribution of Error


Mean of Distribution of Error Is 0

Homogeneity of Variance (Homoscedasticity) of Error

Errors Are Independent


ESTIMATING PARAMETERS:
LEAST SQUARES METHOD
Scatter plot
1. Plot of All (Xi, Yi) Pairs
2. Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?

Y
60
40
20
0 X
0 20 40 60
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?
Slope unchanged

Y
60
40
20
0 X
0 20 40 60
Intercept changed
Thinking Challenge
How would you draw a line through the
points? How do you determine which line
fits best?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept changed
Least Squares
1. Best Fit Means Difference Between Actual Y Values
& Predicted Y Values Are a Minimum. But Positive
Differences Off-Set Negative. So square errors!

2. LS Minimizes the Sum of the Squared Differences


(SSE)


n n
Yi Yi
2
2
i
i 1 i 1
Least Squares Graphically
n
LS minimizes i 1 2 3 4

2

2

2

2

2

i 1

Y Y2 0 1X 2 2
^4
^2
^1 ^3
Yi 0 1X i
X
Coefficient Equations
Sample Slope
SS xyyi 0xi 1xxi yi y
1
xi x
SS xx 2

Sample Y-intercept
0 y 1x
Best Linear Unbiased Estimator (BLUE)
= is an unbiased estimator of and
has the minimum variance among all unbiased
linear estimators
Interpretation of Coefficients
1 (Slope)
Estimated Y Changes by 1 for Each 1 Unit
Increase in X
If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit
Increase in X

0 (Y-Intercept)
Average Value of Y When X = 0
If 0 = 4, then Average Y Is Expected to Be 4 When X Is 0
EXAMPLE
CHECKING THE
ASSUMPTION
Linearity
Make scatter plot
Graphs legacy dialogs Scatter/Dot
Simple enter the variable youre going to plot in X and y axis
Analyzing data points for Linearity
Normal Distribution of Error
Make new variable of residuals
Analyze Regression Linear Save check for
residuals input the variable we are going to analyze
Run normality diagnostics for the Errors
(Standardized Reciduals)
Mean Distribution of Error = 0
Homoscedasticity
Run scatter plot of Standardized residuals (y) and
standardized predicted value (x) and check for the plot
The plot should look constant
CONDUCTING AND INTERPRETING
REGRESSION
Conducting Linear Regression in SPSS
Result
Result Interpretation 1 R Squared
Proportion of Variation in Y Explained by All X Variables
Taken Together

Explained variation TotalSS ResSS ResSS


R
2
1
Total variation TotalSS TotalSS
Result Interpretation 2 Anova Test
Hypothesis :
Ho : No influence of independent variable(s) on dependent
variable
H1 : There is influence of independent variable(s) on
dependent variable
Result Interpretation 3 T test
Test the effect of each independent variable on dependent
variable
Shows the
Shows the
significance of
coefficient for
each variable
each variable
LOGISTICS
REGRESSION
Logistic Regression Models
Relationship between one dependent variable and
explanatory variable(s)

Equation Used
Categorical (Binary) Dependent (Response) Variable
1 or More Numerical or Categorical Independent (Explanatory) Variables

Used Mainly for Prediction & Estimation


Rationale

LP Model
1
Logit Model

0
41

Logistic Regression


logit ( ) ln 0 1 x pX p
1
where is the probability of the event.

The logit (log-odds) transformation (link function)


make the relationship between the response and
exposure linear
Logistic Regression Logit Link
The meaning of = odds of event: prob of event to no event.

Interpretation of 0 (intercept) when x1 = 0.


The baseline log(odds) when the independent risk factor is at the level 0
(Often control

Interpretation of 1, or the slope:


change of log(odds) per unit increase of independent risk factor x1
1 = log(OR) baseline
(when value X = 0)

example
1 = log[odds(x1 = 1)] - log[odds(x1 = 0)] =

1 = log(OR) of X1
LOGISTIC
REGRESSION IN SPSS
Check for coding
Result
Result Interpretation 1 Model Fit
Percentage Correct
Result Interpretation 2 Variable(s)
CORRELATIONS
Correlation
Relationship between one dependent variable and
explanatory variable (both continuous)

Assumption for parametric:


Data normally distributed
Correlation
1. Answer How Strong Is the Linear Relationship
Between 2 Variables?
2. Coefficient of Correlation Used
Population Correlation Coefficient Denoted
(Rho)
Values Range from -1 to +1
Measures Degree of Association
3. Used Mainly for Understanding
Pearson Correlation
CHECKING ASSUMPTION
DATA

Normal
Distribution

Yes No

Transformed

Normal
Yes No
Distribution

PARAMETRIC TEST NON PARAMETRIC


(PEARSON TEST (SPEARMAN
CORRELATION) CORRELATION)
Sample Coefficient
of Correlation
Pearson Product Moment Coefficient of Correlation between
x and y:

X X Yi Y
n

i
SS xy
r i 1

X X Y Y
n n
2 2 SS xx SS yy
i i
i 1 i 1
Coefficient of Correlation Values

No
Correlation

-1.0 -.5 0 +.5 +1.0


Coefficient of Correlation Values

Perfect Perfect
Negative No Positive
Correlation Correlation Correlation

-1.0 -.5 0 +.5 +1.0

Increasing degree of Increasing degree of


negative correlation positive correlation
Coefficient of Correlation Examples
r=1 r = -1
Y Y

X X
r=0
Y r = .89 Y

X X
Test of Coefficient of Correlation
1. Shows If There Is a Linear Relationship Between 2
Numerical Variables
2. Same Conclusion as Testing Population Slope 1
3. Hypotheses
H0: = 0 (No Correlation)
Ha: 0 (Correlation)
CONDUCTING AND INTERPRETING
CORRELATION
Test your knowledge
What are the differences between correlation and
regression?
What are the similarity between correlation and
regression?
What are the requirements for correlation?
What are the requirements for regression?
What is correlation coefficient?
Whats p value meaning in correlation coefficient
What is represented by 0 and 1 in linear regression?
What is represented by 0 and 1 in logistic regression?
References
THANK YOU

You might also like