Professional Documents
Culture Documents
*
Where the average squared
distance of the points from the
regression line is minimized
Minimizing Prediction Error: What
that Means (For Math Types)
The Regression Coefficient: Close
Your Eyes if You Don’t Want the
Derivation
• by = rxy (sy/sx)
– by = regression coefficient
– r = correlation between X and Y
– sy = standard deviation of Y
– sx = standard deviation of X
• Compute by: divide the standard deviation
of Y (sy) by the standard deviation of X (sx)
then multiply by the Pearson correlation
(rxy)between X and Y
The Constant (a): More Math
• Regression Constant (a): the altitude of the
regression line; the value where the regression line
intercepts Y where X = 0 (the Y intercept)
• a = Y - byX
– a = the regression constant
– Y = mean of Y
– by = regression coefficient
– X = mean of X
• Compute a: multiply X (mean of X) by the
regression coefficient (by) and then subtract that
product from Y (mean of Y)
Plotting Regression Line
*
Errors of Prediction
Standard Error of Estimate
4.00
Graduate GPA
3.75
3.75
3.50
3.25
3.25
3.00
Undergraduate GPA
Standard Error of Estimate: Another
Visual Representation
Y
Is the prediction worth pursuing?
• Standard error
• Amount of variance explained by X
• Testing the regression coefficient (b) for
significance
Explaining Variance: How much?
Predicted Variance
2
Total Variance  (Y' - Y )
Y
 (Y - Y) 2
Unpredicted
Variance
2
 (Y - Y' )
Assessing Prediction Accuracy:
Explaining Variance
• Total Variance: = Predicted variance + Residual
(unexplained) variance
• Coefficient of Determination (r2):Proportion of total
variance in Y that has been predicted by variable X (r2
= s2y’/s2y)
– Our example: r = .56, so r2 = .3136
• Coefficient of Non-Determination (1-r2): : Proportion
of total variance in Y that is not predicted by X
– Our example: 1- r2 = 1- .31 = .69
Proportion of Explained (Predicted)
and Unexplained (Residual) Variance
rxy = .56
X Y
r2=.31 (31%)
Explained variance
t-Test for Individual Regression
Coefficients (by)
• H0: = 0 (where is the population
regression coefficient)
• H1: not= 0
• Compute a t statistic:
• T = (b - )/sb = b/sb (how many standard
error points b is from the hypothesized
population parameter under the null
hypothesis, = 0 )
t-Test of b: Our Example
• t = .24/.12 = 2.00
• Set alpha at .05 (two-tailed)
• Figure out df (N-2): 8
• t critical (05/2,8) = 2.306
• Decision: tobserved (2.00) < tcritical (2.306) so
do not reject the null hypothesis
• Conclusion: cannot conclude that the slope
is significantly different from 0 in the
population.
Our Conclusion: Do not reject the null
hypothesis
*
Warnings
• Simple regression assumes a straight line
relationship
• Outliers can control regression results
• Assumes random samples for making proper
generalizations
• Regression is correlational and does not show a
causal link between x causes y