Professional Documents
Culture Documents
How to Find the Regression Equation 4.3 4.4 Properties of the Regression coefficients Difference between Correlation and Regression
Dependent variable is also known as regressed or predicted or explained variable .Independent variable is also known as regressor or predictor or explainer Simple regression is used to examine the relationship between one dependent and one independent variable. After performing an analysis, the regression statistics can be used to predict the dependent variable when the independent variable is known. Regression goes beyond correlation by adding prediction capabilities. The regression line (known as the least squares line) is a plot of the expected value of the dependent variable for all values of the independent variable. Technically, it is the line that "minimizes the squared residuals". The regression line is the one that best fits the data on a scatterplot. In the regression equation, if y is the dependent variable and x is the independent variable. Here are three equivalent ways to mathematically describe a linear regression model. 1 2 3 y = intercept + (slope x) + error y = constant + (coefficient x) + error y=a+bx+e
The slope quantifies the steepness of the line. It equals the change in Y for each unit change in X. It is expressed in the units of the Y-axis divided by the units of the X-axis. If the slope is positive, Y increases as X increases. If the slope is negative, Y decreases as X increases.
The Y intercept is the Y value of the line when X equals zero. It defines the elevation of the line. For two variables X and Y, we will have two regression lines and they show mutual relationship between two variables. The regression line of Y on X gives the most probable estimate of the values of Y for given values of X whereas regression line of X on Y gives the most probable estimate of the values of X for given values of Y. Only one regression line: In case of perfect correlation (r = +1), both the line of regression coincide and we get only one line.
Method 2 By making use of deviations from arithmetic mean formula ( for estimating byx). E.g. by using deviations from arithmetic mean formula
Five randomly selected students took a math aptitude test before they began their statistics course. The Statistics Department has three questions. i. What linear regression equation best predicts statistics performance, based on math aptitude scores? ii. iii. If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics? How well does the regression equation fit the data?
In the table below, the xi column shows scores on the aptitude test. Similarly, the y i column shows statistics grades. The last two rows show sums and mean scores that we will use to conduct the regression analysis. Student 1 2 3 4 5 Mean xi 95 85 80 70 60 90 yi 85 95 70 65 70 77 (xi - xmean) 17 7 2 -8 -18 (yi - ymean) 8 18 -7 -12 -7 (xi - xmean)2 289 49 4 64 324 730 (yi - ymean)2 64 324 49 144 49 630 (xi - xmean)(yi - ymean) 136 126 -14 96 126 470
The regression equation is a linear equation of the form: y-ymean =b yx (x-xmean) byx is the regression coefficient of y on x (x - xmean)(y - ymean) byx = __________________________ (x - xmean)2 y - 77 = .643836 ( x - 78) y = .643836 x + 26.78082 Once you have the regression equation, using it is a snap. Choose a value for the independent variable (x), perform the computation, and you have an estimated value (y) for the dependent variable. =
730
In our example, the independent variable is the student's score on the aptitude test. The dependent variable is the student's statistics grade. If a student made an 80 on the aptitude test, the estimated statistics grade would be: y = 0.643836 x + 26.78082 =0.643836 x 80 + 26.78082= 26.768 + 51.52 = 78.288
4.5
The difference between regression and correlation needs to be emphasised. Both methods attempt to describe the association between two (or more) variables, and are often confused by students and professional scientists alike!
1 Correlation makes no a priori assumption as to whether one variable is dependent on the other(s) and is not
concerned with the relationship between variables; instead it gives an estimate as to the degree of association between the variables. In fact, correlation analysis tests for interdependence of the variables.
2 As regression attempts to describe the dependence of a variable on one (or more) explanatory variables; it implicitly
assumes that there is a one-way causal effect from the explanatory variable(s) to the response variable, regardless of whether the path of effect is direct or indirect.
4.6
The standard error of estimate is also called standard deviation of the error term t. It measures the variability of the
observed values around the regression line . Standard error of estimate y - a y b xy = ( ________________________ ) n-2