You are on page 1of 18

The Statistical Imagination

Chapter 14:
Bivariate Correlation and
Regression
Part 1: Concepts and
Calculations
2008 McGraw-Hill

Simple Linear or Bivariate


Correlation and Regression
A correlation is a systematic change in the
scores of two interval/ratio variables
Analysis of the relationship between a
single independent variable (X) and a
single dependent variable (Y) is called
simple linear or bivariate (two-variable)
correlation and regression analysis
2008 McGraw-Hill

The Idea Behind Linear Analysis


Simple linear correlation and regression
analysis uses the formula for a straight line
to improve best estimates of an
interval/ratio dependent variable (Y) for all
values of an interval/ratio independent
variable (X)
Linear means straight line

2008 McGraw-Hill

Scatterplots
A scatterplot is a two-dimensional grid of the
coordinates of two interval/ratio variables, X
and Y
On the horizontal axis we place values of the
independent variable X
On the vertical axis we place values of the
dependent variable Y
A coordinate is a point on a scatterplot where
the values of X and Y are plotted for a case
2008 McGraw-Hill

Linear Regression Formula


A linear regression formula is the formula for
a straight line
Simple linear, bivariate correlation and
regression statistics apply only to
scatterplots with coordinates in a linear,
cigar-shaped pattern
The formula for a straight line to estimate Y
is: = a + bX
is the best estimate of Y for a chosen
value of X
2008 McGraw-Hill

The Regression Line


on the Scatterplot
The regression line is the best-fitting straight line
plotted through the X,Y-coordinates of a
scatterplot
The line is produced by plugging values of X into
the linear regression formula and solving for
The benefit of regression analysis is improving
estimates of Y in a population by using points on
the regression line as predicted values instead of
simply reporting the mean of Y

2008 McGraw-Hill

Positive and
Negative Correlations
A positive correlation is an upward sloping
pattern in a scatterplot where an increase in
X is related to an increase in Y
A negative correlation is a downward sloping
pattern where an increase in X is related to a
decrease in Y
When the pattern lacks an elongated, sloped
cigar shape, there is no correlation, an
increase in X is unrelated to the scores of Y
2008 McGraw-Hill

A
P
o
s
i
t
v
e
R
l
a
t
i
o
n
s
h
i
p
s
X
n
c
r
a
s
,
Y
c
r
e
a
s
8
7
6
5
4
3
2
1

D
e
p
n
d
e
t
V
arib
l,Y

A Positive Correlation

1I2
3
4
5
6
7
8
n
d
e
p
n
d
e
n
tV
a
rib
le
,X
2008 McGraw-Hill

A
N
e
g
a
t
i
v
e
R
l
a
t
i
o
n
s
h
i
p
s
X
i
n
c
r
s
,
Y
d
e
c
r
a
e
s
8
7
6
5
4
3

D
epndeYtV
arible,

A Negative Correlation

2
12
3
4
5
6
7
8
In
d
e
p
n
d
e
n
tV
a
rib
le
,X
2008 McGraw-Hill

D
epndetV
arible,Y

N
o
R
e
l
a
t
i
o
n
s
h
i
p
A
s
X
i
n
c
r
e
a
s
,
Y
f
a
l
r
a
n
d
o
m
l
y
6
5
4
3
2

No Correlation

1
12
3
4
5
6
7
8
In
d
e
p
n
d
e
n
tV
a
rib
le
,X
2008 McGraw-Hill

Computing Correlation and


Regression Statistics
To calculate correlation and regression
statistics, set up a spreadsheet to obtain
the following sums:
_
_
n
X
Y
(X X)
(Y Y)
_
_
_
(X X) (Y Y)
(X X)2
_
(Y Y)2
2008 McGraw-Hill

Pearsons r
Correlation Coefficient
Pearsons r is a widely used correlation
coefficient that measures the tightness of fit
of X,Y-coordinates around the regression
line of a scatterplot
Computed values of Pearsons r can range
from -1 to +1
The larger the absolute value of r, the
tighter the fit of X,Y-coordinates around the
regression line
2008 McGraw-Hill

The Sign of Pearsons r


When the regression line slopes upward,
we have a positive correlation. Pearsons r
will be positive up to a value of +1
When the regression line slopes downward,
we have a negative correlation. Pearsons
r will be negative down to a value of -1
When the regression line is flat, we have no
correlation and Pearsons r = 0
2008 McGraw-Hill

Regression Statistics:
Let us define the coefficients and symbols
of the regression line formula, = a + bX
= the predicted Y (an estimate of the
dependent variable Y computed for a given
value of the independent variable X)
Recall that the objective of correlation and
regression is to use the regression line to
make better estimates of Y
2008 McGraw-Hill

Regression Statistics:
The Slope, b
b = slope of the regression line (called the
regression coefficient)
b conveys slope in the sense of going up
or down a hill. It answers the question:
How far does the line rise for every oneunit run of X ? (See Figure 14-6 in the
text)
2008 McGraw-Hill

Regression Statistics:
The Y-intercept,
a
a = Y-intercept, the point at which the
regression line intersects the Y-axis
when X = 0
To compute a, calculate the slope, b,
and the means of X and Y. Substitute
them into the regression equation,
and solve for a
2008 McGraw-Hill

Plotting the Regression Line


To plot the regression line on the
scatterplot, use the regression equation to
calculate a few values of
Do this by inserting chosen values of X and
solve for in the regression equation
= a + bX

2008 McGraw-Hill

Statistical Follies: Do Not Forget


to Observe the Scatterplot
The linear regression equation applies
only when the pattern of coordinates is
linear
The presence of outlier coordinates can
cause the attenuation (weakening) or
inflation of the Pearsons r correlation
coefficient
2008 McGraw-Hill

You might also like