Professional Documents
Culture Documents
Jesse Harden
Department of Mathematics and Statistics
Radford University
Authors Note
Abstract
Introduction
When predicting a prospective students Cumulative GPA, university
officials often consider two variables: the students High School GPA and SAT
Score. Several models of prediction already exist, such as Astin & Osegueras
simple linear model (0.077(High School GPA) + 0.000322(SAT Score)
0.2092) (Campbell, 2008), and Freshman GPA is often used to test the
validity of measures involved in admissions decisions (Wilson, 1983, p. 1).
With this in mind, I decided to investigate whether or not Freshman GPA is a
better predictor of Cumulative GPA than the usual measures, namely High
School GPA and SAT Scores. While such research may not be of interest to
admissions, it is certainly valuable to anyone looking to locate students who
need assistance or who are likely to be successful.
Of course, admissions offices are also often concerned with admitting
applicants who will eventually graduate from their institution. Thus, I also
wanted to test if there was a significant relationship between traditional GPA
predictors, as well as Freshman GPA, and whether or not a student
graduates, as well as how long it takes them to graduate.
Therefore, this papers focus is on determining both a workable model
for predicting Cumulative GPA with traditional predictors and Freshman GPA,
as well as investigating any potential relationship between GPA Predictors
and Retention Measures, specifically whether or not a student graduated
from their first college, and how many traditional semesters (Fall/Spring) it
took them to graduate.
This study has been broken down into two parts; the first part focuses
on building models to predict Cumulative GPA, while the second part deals
with retention measures and the traditional GPA predictors.
PART 1
Notes on Data Used for Study Part 1
Data for this study was taken from all Radford University
undergraduate students who entered as freshman since Fall 2000, and thus
is not truly random. Furthermore, the subset of only students who also
graduated from Radford University, entered in a Fall Semester, and had a
High School GPA & SAT Score on file were used in this part of the study.
Caution should be taken when extrapolating any conclusions, especially to
other universities.
The variables studied from the data include High School GPA (HS.GPA),
SAT Scores (SAT), and Freshman GPA (FMGPA), which are used to predict
Cumulative GPA (CUM.GPA).
producing p-values of much less than 0.001, and was tested for non-constant
variance using the bptest() function of the lmtest package with an alpha of
0.05. Both tests were done using RStudio, with both Freshman GPA exhibiting
non-constant variance. To account for this, I used Whites HeteroscedasticityCorrected Covariance Matrices to test significance, and compared the
parameter estimates to the ones found with summary(). Both were
significant, and there was no meaningful difference between any of the
parameter estimates. All R^2 values are drawn from the summary() function
output.
CUM.GPA = 1.25125+ 0.56203(HS.GPA), R^2 = 0.2669
CUM.GPA = 2.020 + 0.001008 (SAT), R^2 = 0.06728
CUM.GPA = 1.473662+ 0.529696(FMGPA), R^2 = 0.4854
These equations suggest that Freshman GPA may help determine Cumulative
GPA to a greater extent than SAT Scores and High School GPA.
6.911e-5(HS.GPA:SAT:FMGPA)
Call:
lm(formula = CUM.GPA ~ (HS.GPA + SAT + FMGPA)^3)
Residuals:
Min
1Q Median
3Q
Max
-1.28542 -0.21344 0.01477 0.21785 2.04091
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
3.096e+00 8.898e-01 3.480 0.000504 ***
HS.GPA
-2.771e-01 2.882e-01 -0.962 0.336269
SAT
-7.018e-04 8.783e-04 -0.799 0.424288
FMGPA
-4.634e-01 2.811e-01 -1.649 0.099236 .
HS.GPA:SAT
5.841e-05 2.834e-04 0.206 0.836727
HS.GPA:FMGPA
2.212e-01 8.852e-02 2.499 0.012474 *
SAT:FMGPA
4.266e-04 2.731e-04 1.562 0.118273
HS.GPA:SAT:FMGPA -6.911e-05 8.556e-05 -0.808 0.419301
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.3141 on 11484 degrees of freedom
Multiple R-squared: 0.5395,
Adjusted R-squared: 0.5392
F-statistic: 1922 on 7 and 11484 DF, p-value: < 2.2e-16
Clearly, many of the variables used in this regression model are not
showing statistical significance. In order to correct this, I used the step()
function in R to reduce the model in the direction of backwards, yielding the
following:
CUM.GPA = 2.393 0.05053(HS.GPA) 9.158e-6(SAT) 0.2411(FMGPA)
1.645e-4(HS.GPA:SAT) + 0.1504(HS.GPA:FMGPA) + 2.092e4(SAT:FMGPA)
Call:
lm(formula = CUM.GPA ~ HS.GPA + SAT + FMGPA + HS.GPA:SAT + HS.GPA:FMGPA +
SAT:FMGPA)
Residuals:
Min
1Q Median
3Q
Max
-1.28387 -0.21349 0.01499 0.21737 2.04705
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.393e+00 1.852e-01 12.924 < 2e-16 ***
HS.GPA
-5.053e-02 6.591e-02 -0.767 0.4433
SAT
-9.158e-06 1.897e-04 -0.048 0.9615
FMGPA
-2.411e-01 5.662e-02 -4.258 2.08e-05 ***
HS.GPA:SAT -1.645e-04 6.456e-05 -2.548 0.0109 *
HS.GPA:FMGPA 1.504e-01 1.215e-02 12.379 < 2e-16 ***
SAT:FMGPA
2.092e-04 4.601e-05 4.547 5.49e-06 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.3141 on 11485 degrees of freedom
Multiple R-squared: 0.5395,
Adjusted R-squared: 0.5392
F-statistic: 2242 on 6 and 11485 DF, p-value: < 2.2e-16
This final model doesnt just have nearly the same adjusted R-squared
as the other models; all of its variables exhibit relatively decent significance.
While FMGPAs p-value is greater than 0.1 in this model, it is still far from the
0.4433 or 0.9615 of HS.GPA & SAT from the previous reduced model.
Part 1 Conclusions
All three multiple regression models show a moderate correlation, fairly
stronger than that of Astin & Osegueras model with an R^2 of 0.319
(Campbell, 2008). However, Freshman GPA alone resulted in a similar, if
slightly lesser, correlation to these mixed models. This suggests that
9
Part 2
Notes on Data Used for Study Part 2
Data for this study was taken from all Radford University
undergraduate students who entered as freshman since Fall 2000, and thus
10
is not truly random. The data was further divided into three subsets: one is
the same subset used in Part 1, which consists of only students who also
graduated from Radford University, entered in a Fall Semester, and had a
High School GPA & SAT Score on file, while the other two consist of all entries
with a High School GPA who are not still attending and of all entries with a
Freshman GPA who are not still attending, respectively. Again, caution should
be taken when extrapolating any conclusions, especially to other universities.
The variables studied from the sampled observations include High
School GPA (HS.GPA), and Freshman GPA (FMGPA), which are used to predict
whether or not a student graduated from Radford University (DID.GRAD).
Given the incredibly small values for fit, further testing was deemed
unnecessary.
Part 2 Conclusion
11
Code
library(lmtest)
12
library(car)
attach(gpaDataMod)
qqnorm(HS.GPA, main="QQ HS.GPA")
qqnorm(SAT, main="QQ SAT")
qqnorm(HS.GPA, main="QQ FMGPA")
qqnorm(SAT, main="QQ CUM.GPA")
14
15
References
Alice, M. (2015, September 13). How to perform a logistic regression in R. Retrieved
from R-Bloggers: http://www.r-bloggers.com/how-to-perform-a-logisticregression-in-r/
Campbell, J. (2008). Analysis of institutional data in predicting student retention
utilizing knowledge discovery and statistical techniques. Northern Arizona
University. Retrieved from https://books.google.com/books?id=VajhPWlLwvsC
16
17