Professional Documents
Culture Documents
The study of the statistical relationship between two variables X and Y is focused
on two basic issues:
To determine the function that better explains the behaviour of the variable Y
(dependent or endogenous) by means of a whole set of variables X1, X2, , Xp
(independent or exogenous); it is supposed to be a relationship among all these
variables (in the two-dimensional case there is only one explanatory variable, X).
This is the so-called regression function or curve.
Given two variables X and Y with joint frequency distribution ( xi , y j ); nij , the
regression of Y on X (Y/X) means the function that explains the variable Y for each
value of the variable X. In the same way, the regression of X on Y (X/Y) determines
the behaviour of X as a function of Y.
In order to determine these functions, we may use two criteria: type I regression
and type II regression.
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017
Lesson 4. Analysis of the relationship between variables Statistics I ~2~
______________________________________________________________________
Type I regression of Y on X
X Y y1 y2 yk
x2 n21 n22 n2 k
xh n h1 nh 2 nhk
If we think about the value we should assign to Y for X=x1, we would say the mean
of Y when X is x1, that is, the conditional mean of Y given X equals x1. Taking this
criterion into account for x2, we would take the conditional mean of Y given x2, and so
on and so forth. Therefore, type I regression of Y on X would consist of the pairs:
x1 ; y | x1
x 2 ; y | x 2
x3 ; y | x3
...
x h ; y | x h
EXAMPLE 2
X Y 1 2 4 5
2 9 0 4 0
3 0 1 0 5
5 0 1 0 0 n=20
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017
Lesson 4. Analysis of the relationship between variables Statistics I ~3~
______________________________________________________________________
Type I regression of Y on X
Observed values
yj nj 1 * 9 ... 5 * 5
y 2.7
1 9 20
(1 2'7) 2 * 9 ... (5 2'7) 2 * 5
2 2 S y2 3.01
20
4 4
5 5
20
yj nj|i
1 9
2 0 1* 9 4 * 4
y| x 2 1.923
13
4 4
5 0
13
yj nj|i
1 0 2 *1 5 * 5
y| x 3 4.5
2 1 6
4 0
5 5
6
yj nj/i
1 0 2 *1
y| x 5 2
2 1 1
4 0
5 0
1
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017
Lesson 4. Analysis of the relationship between variables Statistics I ~4~
______________________________________________________________________
Theoretical values
Distrib. of y y| x xi
y ( xi ) ni
1.923 13
4.5 6
2 1
20
1.923*13 4.5 * 6 2 * 1
y 2.7 y
20
(1.923 2.7) 2 *13 (4.5 2.7) 2 * 6 (2 2.7) 2 *1
S 2
y 1.389
20
Key idea
Advantage: it is the best possible fit Disadvantage: the function can adopt any form
(the result of connecting dots)
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017
Lesson 4. Analysis of the relationship between variables Statistics I ~5~
______________________________________________________________________
Type II regression of Y on X
In order to obtain the regression curve of Y on X, we first locate one point for each
pair of two variables that represent an observation in the data set in a coordinate axis
(cluster of dots or scatter plot) and we select the type of function that better fits those
pairs. Second, we determine this function by minimizing the sum of the squared errors
or residuals, eij; these are the difference between the observed dependent variable, yj,
and the theoretical value, y j , which is obtained by replacing X with xi in the selected
Min ( y j y j ) nij
2
(least squares method)
i j
- Type I regression is always the result of connecting dots, not a continuous curve,
which makes it less easy to use for our purposes. For this reason, type II
regressions become widespread.
- At a practical level, the sort of function is not set a priori in type I regression,
whilst this decision is the first step in the type II regression procedure.
When the function that better fits the cluster of dots is a straight line, we have a
linear regression. Its form is
y a bx for Y/X
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017
Lesson 4. Analysis of the relationship between variables Statistics I ~6~
______________________________________________________________________
Coefficients b and b are called regression coefficients; a and a are the points of
intersection with the respective axis.
Lets see an example where a straight line is a good fit for the cluster of dots
representing the pairs of values (X,Y):
(Source: http://www.idlcoyote.com/documents/cg_programs.php)
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017
Lesson 4. Analysis of the relationship between variables Statistics I ~7~
______________________________________________________________________
EXAMPLE 2 - CONTINUATION
S xy
b 2
Y a bX Sx
a y bx
xi ni 2 *13 3 * 6 5 *1
x i 1
r
2.45 ; y 2.7
n 20
xi2 ni 2 2 * 13 3 2 * 6 5 2 * 1
S x2 i 1
r
x2 2,452 0.5475
n 20
xi y j nij 2 * 1 * 9 ... 5 * 5 * 0
i 1 j 1
r c
S xy (x y) (2.7 * 2.45) 0.435
n 20
0.435
b 0.795 ; a 2.7 (0.795* 2.45) 0.752
0.5475
Y 0.752 0.795X
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017
Lesson 4. Analysis of the relationship between variables Statistics I ~8~
______________________________________________________________________
nij
- the mean of the observed series, y y
i j
j
n
- the mean of the residuals from the regression Y/X,
nij nij
e ej ( y j y j ) 0
i j n i j n
nij nij
- the mean of the theoretical values, y y
i j
j
n
( y j e j )
i j n
ye y
- variance of error: it measures the deviations between the theoretical value and
the observed value dispersion left out of the regression line,
nij SSE
S e2 ( y j y j ) 2
i j n n
There is a relationship between these three variances, both in the type I regression
and the linear fit. It is as follows:
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017
Lesson 4. Analysis of the relationship between variables Statistics I ~9~
______________________________________________________________________
S y2 SSR
R 2
2
S y SST
S e2 SSE
R 1 2 1
2
Sy SST
nij S xy2
S ( y j y j )
2
e
2
S 2
y
i j n S x2
Comments
- R2 r 2
- The coefficient that measures the degree of linear correlation between the
variables (r, linear coefficient of correlation) is:
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017
Lesson 4. Analysis of the relationship between variables Statistics I ~ 10 ~
______________________________________________________________________
Why do we incorporate r ?
: to add to r2 the nature of statistical dependence (positive or negative association).
r range of values: -1 r 1
(perfect negative linear relationship) (perfect positive linear relationship)
EXAMPLE 2 - CONTINUATION
In type I regression:
2
S e2 S y 1.389
R 1 2 2
2
0.461
Sy Sy 3.01
S xy2 (0.435) 2
r 2
0.115
S x2 S y2 0.5475* 3.01
y y e 2.7 2.7 0
______________________________________________________________________
Degree in Economics / Estefana Mourelle Espasandn 2016/2017