Professional Documents
Culture Documents
Email: bianh@ecu.edu
Phone: 328-5428
Location: 2307 Old Cafeteria Complex
Regression Equation
Ypredicted = b0+b1x1+b2x2++bpxp+ep
Ypredicted : predicted score of dependent variable
b0: intercept
p: number of predictors
b1-bp: weights or partial regression coefficients for
predictors/slope
x1-xp: scores of predictors
ep: errors of prediction
Positive and negative regression weights reflect the nature of
correlations between predictor and dependent variable.
Y
Regression line
Intercept
Slope
X
6
Positive relationship
Negative relationship
No relationship
Residuals or errors =
y(observed scorepredicted score)
11
Normality
Linearity
Equal variance
12
14
Multicollinearity
Moderate to high inter-correlations among the
independent variables
It limits the size of R.
The model is unstable in terms of prediction.
It is hard to interpret the significance of predictors.
15
Checking assumptions
Histogram of the standardized or studentized
residuals (normality assumption)
Scatter plots: the dependent variable, standardized
predicted values, standardized residuals, deleted
residuals, adjusted predicted values, Studentized
residuals, or Studentized deleted residuals.
16
17
18
19
20
21
Exercise
One variable: a03 (race)
Recode a03 into three categories: white, black, and
others and create a new variable named a03r (1 =
White, 2 = Black, 3 = Others)
Then recode a03r into two dummy variables
White is the reference category
Two new dummy variables are: Dummy1 (black vs.
white) and Dummy2 (others vs. white)
24
26
27
Dummy1
Dummy2
White
Black
Others
29
30
31
32
33
Self-control
Error
Problems related to
drug uses
Marijuana use
Peer norms
34
35
36
Data screening
Purpose of data screen is to check assumptions for the regression
model
Residual plots:
used to check the constant variance assumption.
standardized residuals (on Y axis) versus standardized predicted
values (on X axis)
If there is no violation of assumptions, standardized residuals should
scatter randomly around a horizontal line of 0.
Histogram and Normal p-p plot of standardized or studentized
residuals
Used to check normality assumption
37
Click Statistics
39
Click Plots
40
Click Save
41
1. Model Summary
a. R is a Pearson correlation between
predicted values and actual values of
dependent variable.
b. R2 is multiple correlation
coefficient that represents the
amount of variance of dependent
variable explained by the
combination of six predictors. 14%
variance of drug problem is explained
by six predictors.
c. Adjusted R2 is a more conservative
than R2.
2. ANOVA table
The significant F value, F(6, 195)
= 5.18, p < .01, indicates that there is
a significant relationships between
drug problem and six predictors.
43
Colinearity
Tolerance is the percentage of the variance in a given
predictor that cannot be explained by the other predictors.
When the tolerances are close to 0, there is high
multicollinearity and the standard error of the regression
coefficients will be inflated.
Variance Inflation Factor (VIF) greater than 2 is usually
considered problematic (based on SPSS manual).
45
Histogram of
standardized residuals
46
Normal QQ plot
1. We want to know
whether the
distribution of errors
matches a normal
distribution.
2. If the selected
variable matches the
test distribution, the
points cluster around a
straight line.
47
Residual plot
1. Our residuals scatter
randomly around 0.
2. The constant variance
assumption is not
violated.
3. The standardized
residual of ID 1090 is
3.06.
48
1.
Zero-order correlation
Simple bivariate correlations between independent variable and
dependent variable.
Partial correlation
Correlation between independent variable and dependent variable
after all other independent variables are controlled.
Part correlation
Correlation between independent variable and dependent variable
with the correlation between dependent variable and other
independent variable is controlled.
When squared, it represents the unique contribution of the
independent variable to the model.
50
51
Explore results
52
Explore results
54
55
56
57
For self-efficacy, high score means lower self-efficacy. The results show that drug
users who used more marijuana and had lower self-efficacy, more likely to have
drug use problems
58
59
60