Professional Documents
Culture Documents
Justin Reich
Harvard University
justin_reich@mail.harvard.edu
class learningan opportunity to make a better life for
themselves and their families [7]. In ideal circumstances
for class-blind educational opportunities, we would hope
to see societys least affluent students equally or overrepresented in the distribution of registrants. The earliest
evidence of MOOC enrollment, however, suggests that
most registrants in courses already have a bachelors degree
[4, 9], raising the question of whether MOOCs are
reinforcing the advantages of the haves rather than
educating the have-nots [5]. Understanding patterns of
MOOC enrollment and engagement for students of different
backgrounds is critical to understanding the role that
MOOCs can play in ameliorating or exacerbating
educational inequalities.
In order to understand and address educational inequalities,
MOOC researchers need detailed demographic information
about participating students. In particular, we are interested
in demographic information that provides insight into
registrants socioeconomic status (SES), which is broadly
defined as ones access to financial, social, cultural, and
human capital resources [11]. A panel of experts
assembled by the National Center for Education Statistics
identifies a triad of measures that are generally agreed to be
most useful in characterizing student SES: family income,
parental educational attainment, and parental occupational
status. This information is regularly available to higher
education researchers, as colleges typically ask applicants
to report their parents educational attainment and
occupation, and applications for financial aid require
detailed information on family income and assets.
These SES measures, however, are not available for MOOC
registrants. In an effort to maximize enrollment, MOOC
providers minimize barriers to entry, so students can
register with only a few mouse clicks. As a result, MOOC
providers rely on brief, voluntary surveys to gather
demographic data. For instance, to register for with edX,
individuals are only asked to provide their gender, year of
birth, level of education, and address. These forms do not
paint a detailed demographic picture of the registrant. Many
courses then offer more detailed surveys for registered
students, but course faculty and MOOC providers are
rightly hesitant about soliciting personal information in the
early days of a course. Consequently, investigating the
impact of MOOCs on students from different backgrounds
requires creative approaches to leveraging available data.
on edX. This site registration includes a voluntary fourquestion survey asking for student gender, level of
education, year of birth, and address. Address is collected
as a single open-response text field. Of our 79,921
registrants60,892 or 76% completed the entire site
registration survey.
Upon enrolling in a HarvardX course, registrants were then
asked to complete a pre-course survey, but no mechanism
existed to enforce completion (students could register for
the course without even starting the survey). Registrants
were asked to complete the survey through in-course
prompts and emails, but response rates were generally low
as a proportion of all registrants, ranging from 14% to 45%.
This survey included approximately 20 items, including a
question asking students to self report their mothers and
fathers highest level of education, and response rates to
this item varied from 13% to 34% across courses.
We use three additional sources of publicly available data
in our analyses. The first source contains data from the
American Community Survey (5-year estimates for 20082012) organized at the zip code level and available for
download from the U.S. Census Bureau website [14, 15].
The U.S. Census Bureau delivers the American Community
Survey (ACS) each year to several million American
residents, and survey items include age, education, and
income [16]. This dataset offers the simplest way for
enriching our dataset with neighborhood-level demographic
data. The second source also relies on the ACS 5-year
estimates, but this dataset is organized at the person level
and available for download from the Minnesota Population
Centers IPUMS website [12]. We use this dataset to
compare the U.S. and HarvardX distributions of parent
education for adolescents. Lastly, we use a 2013 U.S.
demographic dataset available from Esri, a for-profit
provider of geographic information systems software and
data. Esris 2013 demographic dataset [6] provides
estimates for median household income and age
distributions within each census block group, and we used
the ArcGIS software package to match registrants to
census block group-level variables.
3. DATA ANALYTIC STRATEGY
.00003
.04
Density
.00002
.03
.00001
Density
.02
.01
0
20
40
Age
HX Age Distribution
60
80
Second, we obtain measures of parental education from precourse surveys, and use our more complete address data to
test for response bias. Again, as an exploratory step, we
compare the distributions of parental education of
HarvardX registrants and U.S. persons conditional on age.
Third, we append our HarvardX dataset to the ACS and
Esri datasets and fit regression models to estimate the
likelihood of a U.S. resident registering for a HarvardX
course, conditioned on our two SES measures and age.
From these models, we can evaluate the substantive effects
of parent education and neighborhood income on MOOC
enrollment. We also estimate SES differences by course,
which confirms our general findings.
NEIGHBORHOOD
INCOME
50000
100000
150000
Zip Code Median Income
US
US Age Distribution
4. DERIVING
ADDRESS
FROM
200000
250000
HarvardX
20
40
Age
US
60
80
HarvardX
90
CBG Median Income (Thousands)
60
70
80
50
Percent
30
20
10
0
40
< HS
HS
Assoc
US
BA
Grad
HarvardX
!!
.0008
HS
0.451
(0.413)
.0002
0
(.)
Assoc
1.250**
(0.438)
< HS
(2)
Linear
Parent
Ed
.0006
(1)
Parent Ed
Dummies
Probability
.0004
Bach
2.271***
(0.392)
Grad
3.054***
(0.387)
3
Parent Ed
Parent Ed
0.851***
(0.0499)
186320
41248.5
0.0638
5
Observations
186320
neg2ll
41239.8
r2_p
0.0640
df_m
8
Standard errors in parentheses
*
p < 0.05, ** p < 0.01, *** p < 0.001
Coefficients for Age Dummy Variables Omitted
(2)
Age Group
Interactions
0.0123***
(0.000124)
0.0135***
(0.000769)
13.17 * med_hinc
0
(.)
18.24 * med_hinc
-0.0000824
(0.000840)
25.29 * med_hinc
-0.00265**
(0.000845)
30.34 * med_hinc
-0.00266**
(0.000852)
35.39 * med_hinc
-0.00287***
(0.000869)
40.49 * med_hinc
-0.00191*
(0.000817)
50.59 * med_hinc
0.000287
(0.000820)
60.69 * med_hinc
0.00132
(0.000850)
232277133
828981.4
0.0230
64
.0015
Probability
.0005
.001
med_hinc
(1)
med_hinc
50
100
150
CBG Median Household Income (Thousands)
HX Reg Prob (Age = 17)
Observations
232277133
neg2ll
829125.6
r2_p
0.0228
df_m
57
Standard errors in parentheses
*
**
***
p < 0.05, p < 0.01, p < 0.001
Coefficients for Age Dummy Variables Omitted
200
Table 3: Regression models predicting for nine HarvardX 2013-2014 courses the difference in neighborhood income for
HarvardX registrants compared to U.S. residents, including variability by age. Coefficients are scaled in thousands of dollars.
(1)
Heroes
(3)
Clinical
Trials
22.59***
(3.911)
(4)
Public
Health
19.01***
(2.588)
(5)
Health
Policy
18.95***
(3.452)
(6)
Genomics
19.92***
(3.125)
(2)
Early
Christ
14.92***
(2.888)
31.49***
(4.674)
(7)
Science &
Cooking
19.70***
(1.562)
30.81***
(2.542)
(9)
Global
Health
23.88***
(4.567)
13.17 * hx_reg
0
(.)
0
(.)
0
(.)
0
(.)
0
(.)
0
(.)
0
(.)
0
(.)
0
(.)
18.24 * hx_reg
-6.601
(3.409)
-5.321
(3.090)
-5.238
(4.163)
-3.986
(2.769)
-0.393
(3.607)
-18.16***
(4.998)
-4.480**
(1.687)
-15.63***
(2.790)
-6.520
(4.757)
25.29 * hx_reg
-12.06***
(3.414)
-9.807**
(3.034)
-14.08***
(4.092)
-11.69***
(2.770)
-9.142*
(3.588)
-24.81***
(4.873)
-8.942***
(1.642)
-20.25***
(2.788)
-13.60**
(4.783)
30.34 * hx_reg
-13.80***
(3.491)
-8.377**
(3.043)
-11.67**
(4.101)
-12.61***
(2.832)
-9.625**
(3.641)
-21.94***
(4.925)
-8.592***
(1.649)
-19.29***
(2.873)
-14.39**
(4.874)
35.39 * hx_reg
-15.70***
(3.648)
-10.11**
(3.086)
-8.487*
(4.182)
-13.43***
(2.936)
-7.137
(3.727)
-18.27***
(5.026)
-6.308***
(1.684)
-17.83***
(3.001)
-15.07**
(5.086)
40.49 * hx_reg
-10.73**
(3.402)
-6.831*
(2.987)
-3.881
(4.111)
-11.76***
(2.834)
-3.916
(3.628)
-10.39*
(4.947)
-2.569
(1.635)
-14.86***
(2.807)
-8.839
(4.910)
50.59 * hx_reg
-4.353
(3.455)
-3.450
(2.982)
2.060
(4.238)
-7.043*
(2.923)
-0.395
(3.696)
-6.150
(5.144)
3.046
(1.651)
-8.945**
(2.874)
-6.392
(5.023)
1.185
(4.920)
232235955
0.0158
64
-5.529
(3.356)
232237352
0.0158
64
5.037
(3.892)
232237346
0.0158
64
-6.804
(6.572)
232234831
0.0158
64
3.666*
(1.714)
232253657
0.0159
64
-7.740**
(2.908)
232236799
0.0158
64
-1.654
(5.370)
232234896
0.0158
64
hx_reg
60.69 * hx_reg
2.240
0.0637
(3.606)
(3.020)
Observations
232235693
232240911
r2
0.0158
0.0158
df_m
64
64
Standard errors in parentheses
*
p < 0.05, ** p < 0.01, *** p < 0.001
Coefficients for Age Dummy Variables Omitted
(8)
China
7. DISCUSSION
from
SSRN:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2381263
[10] Kling, J. R., Liebman, J. B. and Katz, L. F. Experimental
analysis of neighborhood effects. Econometrica, 75, 1 ( 2007), 83119.
[11] National Center for Education Statistics. Improving the
Measurement of Socioeconomic Status for the National
Assessment of Educational Progress: A Theoretical Foundation.
National Center for Education Statistics, Washington, D.C., 2012.
Retrieved
October
15,
2014
from
NCES:
http://nces.ed.gov/nationsreportcard/pdf/researchcenter/socioecon
omic_factors.pdf
[12] Ruggles, S., Alexander, J. T., Genadek, K., Schroeder, M. B.
and Sobek, M. Integrated Public Use Microdata Series: Version
5.0. ( 2010).
[13] U.S. Census Bureau. American Community Survey, 2012.
(2013).
Retrieved
October
12,
2014,
from
http://www.census.gov/acs/www/Downloads/questionnaires/2012/
Quest12.pdf