Professional Documents
Culture Documents
This study attempts to create a model for predicting writing scores for high
school students. Such a model could be used in a variety of ways in real life
application, such as determining which students could benefit most from additional
writing instruction. To generate said model I have taken a host of relevant variables
including gender, race, socioeconomic status, ability in other academic areas, type
of program and school, and analyzed them using multivariate linear regression and
logit modeling. My regression analysis attempts to establish what variables are
important in predicting a student’s writing score. I follow this analysis with a logit
model that attempts to establish what variables are important in predicting whether
a student will score high or not on a writing test. After assessing the two models I
came up with, one regression and one logit, I have concluded that gender and
performance in other academic areas are the variables that can help make such
predictions.
A priori expectations
represents the intercept coefficient, which is hard to interpret and in this case will
be relatively meaningless so I have no a priori expectation for it.
The last two variables and attempt to capture the effects of different
program types as general program and vocational program respectively. Here I
have chosen to omit the academic program, making it our base case for
comparison. For this reason I anticipate both coefficients to be negative because in
part 2 a standard T-test showed that on average students in academic programs
outperformed all others in writing scores.
I ran the regression using the above model and got some expected results
but also many unexpected results, all of which are summarized in table 1 under
model 1. As I expected, the coefficient on female was highly significant and
positive. Additionally all the coefficients on the other test scores were positive and
only the reading test score wasn’t significant and even it was barely above the .05
threshold. No other variable was found to be significant, however, and many had
signs opposite of what I had expected. Since I expected other variables to be
important, I investigated co-linearity via a VIF test to see if that could be causing
some of the unexpected results. A VIF test revealed that all of the variables for
other test scores seemed to be co-linear. Since they all appear to be equally
important, instead of using one as a proxy for all the others I decided to make a
new variable that was an average of reading score, math score, science score, and
social studies score. This way I could replace the four test score variables with the
one test average variable I created. I ran the regression again using the updated
model expecting to see some changes in the significance of some of my variables.
The same can be said for the other dropped variables. since they all likely
impact overall test scores and not just writing scores. In other words, if two
students have identical test scores in all other subjects, it is unlikely that just
because a student is white or attends private school he will outperform his
counterpart in writing alone. Gender remains significant because females according
to our T-tests appear to differ significantly in writing but do not differ significantly in
average test scores. In this instance if two students, one male and one female, had
identical other test scores, the female would be expected to outperform her
counterpart on a writing test.
Final Model
The final model ends up being rather simple with only two dependent
variables—average test score and gender.
Both variables had their expected signs and were highly significant using the
current model. The final regression had an R-squared of 0.60, which means about
60% of the variability in a student’s writing score can be explained by the model.
The intercept coefficient was 6.58 but this has no practical interpretation.
The coefficient for the variable representing gender was approximately 5.50
meaning, all else held constant, a female would be expected to score 5.50 points
higher on the writing test. A 95% confidence interval suggested it could be as much
as 7.20 additional points and as low as 3.81 additional points.
The coefficient for the variable of average test score was approximately 0.83
meaning, all else held constant, a 1.00 point increase in average test score would
be expected to result in a 0.83 point increase in writing score. A 95% confidence
interval suggested it could be as much as 0.93 additional points and as low as 0.73
additional points.