You are on page 1of 4

Multiple Regression Analysis General Descriptive Form of a multiple linear equation:

Y/hat = a + b1 X1 + b2X2 + bkXk


Use k to represent the number of independent variables, so k can be any positive integer. where:

a is the intercept, the value of Y/hat when all the Xs are zero. bj is the amount by which Y/hat changes when that particular Xj increases by one unit, with the values of all other independent variables held constant.

When there are two independent variables, the regression equation is:

Y/hat = a + b1 X1 + b2X2
If a multiple regression analysis contains more than two independent variables, we cannot use a graph to illustrate since graphs are limited to three dimensions. a = where regression equation (plane) intersects Y-axis, regression equation intercept, and constant b1 = first regression coefficient, for each increase in 1 b. it will increase by b. pay respects to negatives when inputting variables Y/hat is used to estimate the value of Y To use the regression equation, you need to know the values of the regression coefficients, bj o If a regression coefficient is negative, relationship is negative/inverse Evaluating a Multiple Regression Analysis The ANOVA Table The statistical analysis of a multiple regression equation is summarized in an ANOVA Table. The total variation of the dependent variable (Y) is divided into two components Regression, or the variation of Y explained by all the independent variables The error or residual, or unexplained variation of Y, difference between the actual and estimated value. These two are identified in the first column of an ANOVA table. Degrees of freedom are n-1 Degrees of freedom in regression is equal to the number of independent variables in multiple aggression equation Regression degrees of freedom = k Degrees of freedom associated with the error term is equal to the total df minus the regression df

In multiple aggression, df is n (k +1) Sum of Squares (SS) refers to the amount of variation for each source SS Total= Total sumof squares SSR = Regression sum of squares SSE = Residual or error sum of squares MS = mean squares for regression and residual, calculated by SS/df A smaller multiple standard error indicates a better or more effective prediction equation Coefficient of Multiple Determinationthe percent of a variation in the dependent variable, Y, explained by the variation in the set of independent variables, X1 etc R2 Can range from 0 to 1 Cannot assume negative values SSR/SS Total Adjusted Coefficient of Determination The coefficient of determination tends to increase as more independent variables are added to a multiple regression model. Each new one causes the predictions to be more accurate, and makes the SSE smaller and SSR larger. R2 increases only b/c total num of independent variables increase. If k = n, coefficient of determination is 1.0 Found by dividing SSE and total sum of squares by their respected degrees of freedom (SSE total/n-1) Inferences in Multiple Linear Regressions When you create confidence intervals/perferm hyp. Tests, you are viewing the data as a random sample from a population. In multiple regressions, we assume there is an unknown population regression equation that relates to the dependent variable to the k independent variables. This is called a model of the relationship.

Y/hat = +

1X1 + etc kXk

Greek letters=population parameters a and bj are sample statistics regression coefficient b2 is a point estimate for follows the normal probability distribution sampling distributions are = to the parameter values to be estimated

Global Test: Testing the Multiple Regression Model Global Testinvestigates whether it is possible all the independent variables have zero regression coefficients

Test whether the regression coefficients in the population are all zero is the

Null Hypothesis: H0= 1 = 2 = 3 = 0 (none of the independent variables can be used to estimate the dependent variable) Alternate: H1: Not all the s are 0. (
To test the null, employ F distribution There is a family of f distribution Cannot be negative Is continuous Positively skewed As X increases, F curve will approach the horizontal axis, but will never touch it Numerator is the regression sum of squares divided by its degrees of freedom, k. denominator is the residual sum of squares divided by its degrees of freedom , n (k+1) Tests the basic null hypothesis that two mean squares are equal If ew reject null, F will be large and to the far right tail of the F-distribution and p-value will be small, less than significance level and therefore rejected Critical value of F found in B.4 Testing the null can be based on p-value, and defined as the probability of observing an F-value as large or larger than F if true If less than sig. level, reject null An advantage to using p-value is that it gives us probability of making a Type I error Evaluating Regression Coefficients If is equal to 0,it implies that this particular independent variable is of no value in explaining any variation in the dependent value. If there are no coefficient for which null cannot be rejected, we may want to eliminate them from the regression equation Follows t-dist. With n (k +1) df Critical value for t is in B.2 Standard Error estimates the variability for each of these regression coefficients Testing independent variables individually to determine whether the net regression coefficients differ from zero bi refers to any regression coefficient and sbi refers tostandard deviation of that distribution of the regression coefficient. Can also use p-values to test individual regression coefficient

T = bi 0/ sbi

Strategy for deleting independent variables

1. Determine regression equation 2. Detail the global test, find out df and refer to Appendix B4 using whatever sig. level to figure out the decision and compute the value of F 3. Conduct a test of the regression coefficients individually; we want to determine if one or both regression coefficients are different from 0 4. This whole thing can be automated Evaluating the Assumptions of Multiple Regression Validity of Statistical and Global Tests 1. There is a linear relationship 2. Variation is the same for both large and small values of Y/hat 3. Residuals follow normal distribution 4. Independent variables should not be correlated 5. Residuals are independent

You might also like