Mulitple Regression Example
relationships. Because we are short of time we will use a less involved approach and limit the
number of predictor variables in our examples,
Now the whole picture becomes more complex. We are concerned with the nature and
significance of the relations between the independent vatiables and the dependent variable.
Questions which are often asked are as follows:
1. What is the relative importance of the different factors?
2. What is the magnitude of the effect of each of the independent
variables on the dependent variable.
3. Can any independent variable be dropped because of a lack of
effect on the dependent variable?
4. Should any independent variables not yet included in the model
be considered for possible inclusion?
Simple Example of a Multiple Regression
obs, Y x. x
i 46 “4 1.00
2 51 15 125
3 69 16 3.00
4 1% ar 3.25
5 80 18 4.00
6 82 19 5.25
7 97 20 5.50
In the data above, X; and Xp are two independent variables and Y is a dependent
variable,Perhaps the most useful summary of the information from this data set is provided by
the analysis of variance for regression:
Source af.
Regression 2 1832.13 916.06
i 1 1824.14 1824.146%
Xy 1 7.99 7.99 NS
Deviation from
regression 4 83.30 20.82
eee eee cee
‘Total (corr. 1915.43
‘The least squares equation for the model containing both Xj and X, is obtained by solving the
following normal equations.
DXBby + EDX yXpdy + LEX pXqby DX
EX Xho + DOXFKY + DK Xyhy DX
EXgXqho + — EXpXyby + — OXZbg = OMY
Numerically, these equations are
7 by + 9b; 23.2 by = 99
9b + -2051b, + 417-7 bp = 8709
23.25 bg + 41775, + = 98.04bp = 1841.05
Solution of the above set of normal equations gives the following regression equation:
Y = 20.281 + 5.219 X, + 3.549 Xp
entered first into the model is the only
From the analysis of variance, we see that Xy
predictor variable which is required.
TEX; is ignored, we have a simple regression of Y on Xp which gives by = 9.825. The
t-test of the hypothesis Bp = 0 in this simple linear regression model is as follows;
2= s.0°*
amet ye -
— pee .
7 21.827/Ex 2?
‘This shows that if elther variable is included alone itis significant. However, the
second predictor variable n't needed. This tella us that the two independent variables are
highly correlated,
R?, the coefficient of multiple determination is
1832.12/1915.43,
is SS Regression/SSy
= 957
978.
R, the coefficient of multiple correlation is {R2
‘The test of the hypothesis that together these variables are not conteibuting
significantly to the explanation of Y is carried out by forming an F-ratio of mean square
regression over mean square error.
Since the hypothesis is rejected, we conclude that one or both of these variables is (are)
contributing significantly to the explanation of Y. ‘The predicted values for each of the
combinations of the X; in the original data are:
Obs.
47.388
53.495
64.995
71.032
78.913
88.569
94.676‘The expression for the vatiance of the predicted mean value of Y at the point X;
18.5 and Xp = 4.5 (a point not in the data set) is calculated as follows:
v1.0 sma sea |]:
2 x5 45 |] some sess a2 || es
wes a8 ase |] as
INVERSE MATRIX
where s? = error mean square = 20.82.
Note that the right-most expressions are matrices, We are dealing with the product of
matrices in computing thie variance,
Multiple regression, although appearing complicated, is necessary because many
biological phenomena are dependent upon more than one factor. This is a very useful
technique in agricultural research.
For your applications it would seem advisable to keep the number of predictor
variables to a manageable number. Not only does the computing become difficult with many
X; variables but the prediction equation becomes unwicldy. Often it is possible to find a few
Important predictor variables which may be used to predict Y well. Assessment of the relative
Importance of the predictor variables is part of the process in conducting a multiple regression
analysis,
Standard Output From Regression Programs
1. ‘The analysis of vatiance for the regression
2 RP
3. The regression equation
4. Standard error and t values for the regression coefficients5. Predicted Y’s are usually optional
6. The standard errors of predicted Y's.
Kinds of Regression Models
1. Prediction - The object here is to find a set of predictors which do
@ good job of predicting Y, the dependent variable. Less emphasis
is placed on the biological relevance of the factors. The main test
is whether or not a variable aids in the prediction of Y. There are
procedures available using computer packages which assist in the
building of a prediction model given that we know little of the
biological relevance of the factors. Some of these are: stepwise
regression, forward selection, maximum R? improvement, backward
elimination and all possible regressions.
2. Control - Here the levels of one or more quantitative factor(s)
ly controlled in an experimental setting and then the
) is related to the levels
response to the controlled variable(
of the factors by multiple regression. We are usually trying
to determine optimum combinations or optimum conditions using
such an approach. A good example is an N-P-K fertilizer
experiment in which each factor has four levels. We then fit «
“response surface” to find the “physical optimum” and the
combination of rates of N, P and K which produce this optimum.
Usually part of the analysis is an economic analysis to
determine the economic optimum combination of N, P and K.