You are on page 1of 20

CORRELATION AND REGRESSION

ANALYSIS
BIVARIATE DATA

The main objective of a correlation is to take


a collection of paired sample data
(sometimes called bivariate data) and
determine whether there appears to be a
relationship between two variables.
In Statistics, such relationship is referred to
as correlation. A correlation exists
between two variables when one of them is
related to the other in some way.

ASSUMPTIONS
The sample of paired (x, y) data is a random
sample.
The pairs of (x, y) data have a bivariate normal
distribution

Definition:
A scatterplot (or scatter diagram) is a graph in
which the paired (x, y) sample data are plotted with a
horizontal x axis and a vertical y axis. Each
individual (x, y) pair is plotted as a single point.

The linear correlation coefficient r


measures the strength of the linear
relationship between the paired x and y values
in a sample.
(The linear correlation coefficient is sometimes
referred to as the Pearson product moment
correlation coefficient in honor of Karl
Pearson (1857 1936), who originally
developed it.)

Formula:

PROPERTIES OF LINEAR CORRELATION COEFFICIENT r


The value of r is always between 1.00 and 1.00
inclusive, that is, 1.00 r 1.00
The value of r does not change if all values of either
variable are converted to a different scale.
The value of r is not affected by the choice of x or y.
Interchange all x values and y values and the
value of r will not change.
r measures strength of a linear relationship. It is not
designed to measure the strength of a relationship
that is not linear.

Degree of Correlation (r, ) according to Guilford


Numerical Interpretation:
0.00
: zero correlation; no relationship
0.01 0.20
: slight correlation; almost negligible
relationship
0.21 0.40
: low correlation; definite but small relationship
0.41 0.70
: moderate correlation; substantial relationship
0.71 0.90
: high correlation; high/dependable relationship
0.91 0.99
: very high correlation; very dependable
relationship
1.00
: perfect correlation; perfect relationship

Interpreting the Linear Correlation


Coefficient
If the absolute value of the computed
value of r exceeds the tabled value (in
Table A 6), conclude that there is a
significant linear correlation. Otherwise,
there is not sufficient evidence to
support the conclusion of a significant
linear correlation.

Interpreting r: Explained Variation


(Coefficient of Determination)
The value r2 is the proportion (in
percent) of the variation in y that is
explained by the linear relationship
between x and y.

Question:
The results of your correlation analysis
show that you have a correlation of +.8932
between salary and productivity. What do
you know? What information is provided by
the numeral value of the Pearson
correlation? What proportion of variation in
the number of productivity can be explained
by the variation in the amount of salary?

REGRESSION ANALYSIS

Linear regression is the simplest type of


prediction. When we take the observed
values of X to estimate or predict
corresponding Y values, the process is called
simple prediction.
When more than one X variable is used, the
outcome is a function of multiple predictors.
The simple and multiple predictions are
made using a technique called regression
analysis.

Regression is a term used to describe


the process of estimating the
relationship between two variables. The
relationship is estimated by fitting a
straight line through the given data.
The method of least squares permits us
to find a line of best fit called regression
line which keeps the errors of prediction
to a minimum.

The
regression equation algebraically
describes the relationship between two
variables.
The graph of the regression equation is called
the regression line (or line of best fit, or leastsquare line).
This definition expresses a relationship
between x (called the independent variable,
or predictor variable) and (called the
dependent variable or response variable).

(equation of the regression line)


(y intercept of regression equation)
(slope of regression equation)

Note: We cannot use the linear


regression analysis to predict the
expected value () when the x value
in the data set is beyond the
minimum and maximum observed
values

You might also like