You are on page 1of 12

50

Assessing the quality of last menstrual period date on California


birth records
Michelle Pearla, Megan L. Wiera and Martin Kharrazib
a

Sequoia Foundation, La Jolla, and bCalifornia Department of Health Services, Genetic Disease Screening Program, Richmond, CA, USA

Summary
Correspondence:
Michelle Pearl, Sequoia
Foundation c/o Genetic
Disease Screening Program,
California Department of
Public Health, 850 Marina Bay
Parkway, Rm. F175, Mail Stop
8200, Richmond, CA 94804,
USA.
E-mail:
michelle.pearl@cdph.ca.gov

Conicts of interest:
the authors have declared no
conicts of interest.

Pearl M, Wier ML, Kharrazi M. Assessing the quality of last menstrual period date
on California birth records. Paediatric and Perinatal Epidemiology 2007; 21(Suppl. 2):
5061.
Birth certicate last menstrual period (LMP) date is widely used to estimate gestational
age in the US. While data quality concerns have been raised, no large population-based
study has isolated data quality issues by comparing birth record LMP (Birth LMP) with
reliable LMP dates from another source. We assessed LMP data quality in 2002 California singleton livebirth records (n = 515 381) and in a subset of records with linked
prenatally collected LMP from Californias statewide Prenatal Expanded Alphafetoprotein Screening Program (XAFP) (n = 105 936). Missing or incomplete LMP data
affected 13% of birth records; 17% of those had complete LMP within XAFP records.
Data quality indicators supported XAFP LMP as more accurate than Birth LMP, with
a lower prevalence of digit preference, post-term delivery, out-of-range gestational age
estimates and implausible birthweight-for-gestational age. The bimodal birthweight
distribution evident at 2031 weeks gestation based on Birth LMP was nearly absent
with XAFP LMP-based gestational age. Approximately 32% of the second birthweight
mode was explained by apparent clerical errors in Birth LMP month. Digit preference
errors, particularly day 1, were associated with gestational age overestimation. Preterm
delivery rates were higher according to Birth (7.6%) vs. XAFP LMP (7.2%). One-fth of
observed preterm and over half of observed post-term births using Birth LMP were not
true cases; 15% of true preterm cases were missed. African American or Hispanic, less
educated, and publicly or uninsured women were most likely to be misclassied and
have large LMP date discrepancies attributable to clerical or digit preference error.
The implementation of a revised birth certicate is an opportunity for targeted training
and data entry checks that could substantially improve LMP accuracy on birth
records.
Keywords: birth records, LMP date, accuracy, gestational age.

Introduction
Last menstrual period (LMP) date is the most widely
available source for estimating gestational age from
birth certicates in the US, and is the only source from
the California certicate of livebirth before 2007.
However, gestational age estimates from LMP in
general, and from birth records in particular, are prone
to error, as exhibited by digit preference13 and implausible values relative to birthweight.4 Errors in gestational age estimates from LMP have resulted in excess
post-term births relative to ultrasound estimates1,5 and

a bimodal birthweight distribution among very early


preterm deliveries6,7 not observed for very early
preterm deliveries identied through clinical and
ultrasound estimates.8,9
It is unknown to what extent birth certicate LMP
data quality is affected by recall difculties and clerical
error, beyond limitations inherent in the LMP dating
method and its assumption of conception 14 days after
the rst day of menstrual bleeding (e.g. cycle length
variability, amenorrhoea, non-menstrual vaginal bleeding mistaken for a normal period).10 Digit preference in

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

Quality of LMP date


the reported day of the month, an indication of recall
error, is prevalent in LMP from birth records as well as
medical records.13 Quantication of clerical errors in
recording and entry, such as month or year discrepancies and month/day transpositions, requires comparison of LMP from different sources, yet no such
population-based comparisons have been published.
The goals of this analysis are to (1) establish whether
prenatally collected LMP data from Californias centralised prenatal screening programme is more accurate
than LMP data from linked birth records; (2) quantify
the magnitude and impact of gestational age reporting
errors; (3) determine to what extent clerical and recall
error contribute to discrepancies in LMP dating; and
(4) identify population subgroups most affected by
poor LMP data quality. By comparing LMP dates from
birth records with a population-based source of reliable LMP data, the study design isolates reporting
error in LMP rather than errors inherent in the LMP
dating methodology.

Methods
California singleton livebirth records from 2002
(n = 515 389) were linked to data from pregnant
women enrolled in the statewide Expanded Alphafetoprotein Screening Program (XAFP) between July
2001 and December 2002. The XAFP is a voluntary,
triple marker screening programme offered to all
women entering prenatal care by 20 weeks gestation.
In order to interpret serological markers, the programme requires an estimate of gestational age based
on ultrasound, LMP, or physical examination, which
is reported by the medical provider at the time of
maternal blood collection (between 15 and 20 weeks
gestation) and double-key entered by programme
personnel. The programme assigns a best estimate of
gestational age that prioritises ultrasound when available as the gold standard, unless otherwise specied
by the provider. Between 20% and 25% of records are
routinely veried with providers before serological
interpretation, and those with positive or uninterpretable screen results (roughly an additional 8%) receive
further follow-up to conrm gestational age.
Probabilistic matching was used to link records
from the XAFP and birth certicates, using mothers
name, date of birth, social security number, delivery
date, XAFP accession date, telephone number, street
address, city and zip code.11 A conservative certainty
cut-off was used to minimise false matches. Overall,

51

327 218 livebirth records (63%) linked to an XAFP


record from the same pregnancy. As a quality control
measure, 1800 records with large gestational age discrepancies or whose birth records indicated no prenatal care before 6 months gestation were reviewed for
matching accuracy, yielding six likely mismatches
(0.4%). No mismatches were found from manual
review of records with out-of-range gestational age
values based on XAFP LMP (<20 or >45 weeks,
n = 45).
Of 515 389 birth records in 2002, eight birth records
with missing birthweight, 29 468 missing date of LMP
and 37 155 missing only day of LMP were excluded,
yielding 448 758 complete records. Comparisons with
XAFP LMP data are based on 105 936 birth records
with complete LMP date linked to an XAFP record
with LMP date as the best estimate of gestational
age.
Data quality indicators evaluated include the proportion with post-term deliveries, out-of-range gestational
age, implausible birthweight-for-gestational age, very
preterm births with implausibly high birthweights
(second birthweight mode), and digit preference. Gestational age was calculated as the neonates date of
birth minus the LMP date, with those <20 completed
weeks or >44 completed weeks considered out-ofrange and excluded from rate calculations. Preterm
was dened as 2036 completed weeks and post-term
as 4244 completed weeks. Implausible birthweightfor-gestational age was determined according to
National Center for Health Statistics cut points
(<20 weeks, 1000 g; 2023 weeks, 2000 g; 2427
weeks, 3000 g; 2831 weeks, 4000 g; 3247 weeks,
1000 g).12
To examine the bimodal birthweight distribution,
birthweight density plots were generated from birth
records for births between 20 and 27 weeks and
2831 weeks gestation, as dened by Birth LMP and
XAFP LMP, using kernel density estimation.13 Birthweights 2200 g at 2027 weeks gestation and
2700 g at 2831 weeks were considered to be in the
second birthweight mode. LMP days of the month
with frequency greater than expected by chance
include 1, 5, 10, 15, 20, 25 and 28. The overall expected
proportion with preferred digits is 23.0%, and the
expected proportion for digits 128 is 3.3%. The magnitude of measurement error in gestational age from
Birth LMP dates was estimated by the difference
between Birth LMP and XAFP LMP gestational
age estimates. Positive differences represent

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

52

M. Pearl et al.

overestimation of birth gestational age relative to


XAFP. Among discrepant records, the R2 from linear
regression models of birthweight on gestational age,
dened by either Birth LMP or XAFP LMP, was
assessed. We further examined false-positive preterm
and post-term rates (1specicity = false positives/
true negatives), false-negative preterm rates
(1sensitivity = false negatives/true positives), and
false-positive preterm and post-term screen rates
(1positive predictive value = false positives/screen
positives), treating XAFP gestational age as the gold
standard.
Two error ags were evaluated to explain discrepancies: clerical error and digit preference (indicating
recall error). Clerical error types were suggested by
Blair et al.14 as well as the distribution of observed discrepancies and include: dates that differ in only the
month or year eld; dates that differ by 1 in the tens
digit of the day eld (e.g. day 1 vs. 11); transposed
month and day; LMP equal to the delivery date; or
LMP 28 days or less before the childs date of birth,
possibly reecting an estimated delivery date. The electronic birth recording system used to enter 90% of
records in California in 2002 did not allow LMP entries
with dates beyond the delivery date. The XAFP data
entry programme triggers a double-check for LMP
dates beyond the date of blood collection. Records with
preferred digit LMP days were labelled digit preference errors if the date was discrepant from the XAFP
date and the discrepancy was not also considered a
clerical error. The proportion of discrepancies and poor
data quality indicators explained by each error type
was evaluated by calculating the percentage change in
prevalence of each indicator when substituting XAFP
LMP values for Birth LMP values for records agged as
either clerical or digit-preference error.
The relationship between birth certicate demographic and obstetric characteristics and data quality,
misclassication and gestational age estimates was
examined by comparing prevalence across subgroups
dened by: self-reported race/ethnicity, with Hispanic
ethnicity stratied by mothers birthplace [US-born or
foreign-born (Mexico in 87.5% of cases)]; maternal age;
years of completed education categorised as <12, 12
and >12; parity (number of livebirths before current
delivery); and source of payment for delivery, grouped
as Medi-Cal (Californias Medicaid programme),
private insurance, uninsured or other (Medicare,
workers compensation, other governmental and nongovernmental programmes).

Results
Data completeness and population selection factors are
assessed in Table 1. In 2002 birth records, 12.9% of
deliveries were missing LMP dates, 55.8% of those
missing day only (data not shown). Missing or incomplete LMP data on birth records were associated with
African American and US-born Hispanic race/
ethnicity, younger maternal age, higher prevalence of
low birthweight, less than high-school education, and
Medi-Cal coverage (Table 1). Of records with missing
or incomplete LMP, 39.2% had complete ultrasound
data and 16.6% had complete LMP data in linked XAFP
records (data not shown).
Compared with non-XAFP participants, XAFP participants were more likely to be under the age of
34 years, to have no previous livebirths, to have completed more than 12 years of education, and to be privately insured (Table 1). Among XAFP participants,
women with LMP as opposed to ultrasound best estimates were more likely to be foreign-born Hispanic,
have less than high-school education, and have MediCal coverage. Both preterm and post-term birth rates
derived from birth certicate LMP were higher among
XAFP participants with ultrasound best estimates compared with those with LMP best estimates (8.9% vs.
7.7% and 8.2% vs. 3.7%, respectively).
XAFP LMP appears to suffer from fewer data quality
problems than Birth LMP, as evidenced by fewer outof-range gestational age values, fewer preferred digits,
lower post-term rates and lack of a bimodal birthweight distribution at early gestational ages (Table 2).
Preterm birth prevalence was higher according to
linked Birth LMP than XAFP LMP (7.6% vs. 7.2%).
Birth records linked to XAFP records had lower prevalence of out-of-range gestational age, post-term births
and implausible birthweight-for-gestational age than
the overall birth population. Day 1 was the most commonly reported day in overall birth records, and day 15
was most commonly reported by both Birth LMP and
XAFP LMP within the linked sample. While digit preference is evident in both data sources for LMP date,
over-reporting of days 1 and 15 of the month was
higher in Birth LMP vs. XAFP LMP dates (Table 2).
The proportion of very preterm births falling within
the second birthweight mode was largest among the
overall birth population (26.7% of all births between 20
and 31 weeks), and was four times greater when using
Birth LMP than XAFP LMP to estimate gestational age
in the linked sample (Table 2). The second birthweight

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

Quality of LMP date

53

Table 1. Characteristics of linked and unlinked study populations, California 2002 Live Birth and Prenatal Expanded Alpha-fetoprotein
Screening Program (XAFP) records

2002 Livebirths
(n = 515 381)
Missing or
incomplete
LMP date
n = 66 623
(12.9%)
%
Race/ethnicity
White
29.1
African American
7.8
Asian
8.1
Hispanic, US-born
21.9
Hispanic, foreign-born
28.9
Pacic Islander
3.5
American Indian/
0.6
Alaskan Native
Age (years)
<20
11.8
2024
26.1
2534
47.5
>34
14.6
Education (years)
<12
31.3
12
31.6
>12
37.1
Previous livebirths (parity)
0
34.5
1
32.6
2+
32.9
Birthweight (g)
<1500
1.2
15002499
5.0
2500
93.8
Method of payment for delivery
Medi-Cal
47.7
Any private
48.8
Uninsured
2.4
Other
1.1
Birth LMP gestational age (completed weeks)
<20
NA
2031
NA
3236
NA
3741
NA
4244
NA
>44
NA
Preterm:e 2036
NA
Post-term:e 4244
NA

2002 Livebirthsa
with Birth LMP
(n = 448 758)

2002 LiveBirthsa with Birth


LMP, linked to XAFP
(n = 270 746)

With
LMP
date
n = 448 758
(87.1%)
%

Not linked
to XAFPb
n = 178 012
(39.7%)
%

Linked to
XAFPc
n = 270 746
(60.3%)
%

XAFP
Ultrasoundd
n = 164 810
(60.9%)
%

XAFP
LMPd
n = 105 936
(39.1%)
%

31.2
5.6
8.8
17.5
33.0
3.4
0.4

32.1
6.0
7.5
16.1
34.4
3.4
0.5

30.7
5.4
9.6
18.3
32.2
3.4
0.4

32.0
5.5
9.9
18.1
30.6
3.6
0.4

28.6
5.3
9.1
18.7
34.7
3.2
0.3

9.5
23.1
51.0
16.4

11.4
24.5
41.0
23.1

8.2
22.2
57.6
12.0

7.4
21.2
58.3
13.1

9.4
23.9
56.4
10.3

28.8
28.3
42.9

32.2
28.2
39.6

26.5
28.3
45.1

24.9
28.2
47.0

29.1
28.7
42.3

40.0
31.7
28.4

38.1
30.1
31.8

41.2
32.7
26.1

40.7
32.9
26.5

41.9
32.5
25.6

0.9
4.0
95.2

0.9
4.2
95.0

0.8
3.8
95.4

0.9
4.0
95.1

0.7
3.6
95.8

42.6
53.0
2.3
2.0

47.3
44.4
4.1
4.2

39.6
58.7
1.2
0.6

36.0
62.3
1.1
0.6

45.1
53.1
1.3
0.5

0.1
1.3
7.6
83.0
6.5
1.6
9.0
6.6

0.1
1.5
8.3
81.7
6.7
1.7
10.0
6.8

0.1
1.1
7.2
83.9
6.3
1.5
8.4
6.4

0.1
1.2
7.5
81.1
8.0
2.2
8.9
8.2

0.1
0.9
6.7
88.2
3.6
0.6
7.7
3.7

Excludes n = 8 records missing birthweight.


Also includes n = 12 239 records that linked to XAFP but had no XAFP LMP or ultrasound data.
c
Records with XAFP LMP or ultrasound data.
d
Best estimate of gestational age used by the state-sponsored prenatal screening programme to interpret serological markers.
e
Denominator excludes records with gestational ages <20 and >44 completed weeks.
LMP, last menstrual period; NA, not applicable.
b

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

54

M. Pearl et al.

Table 2. Data quality indicators by study


population and LMP data source,
California 2002 Live Birth and Prenatal
Expanded Alpha-fetoprotein Screening
Program (XAFP) records

2002 Livebirths linked to


XAFP records with LMP

2002 Livebirths
Birth LMP
(n = 448 758)
%

Birth LMP
(n = 105 936)
%

XAFP LMP
(n = 105 936)
%

0.1
0.6
7.6
88.7
3.7

0.02
0.03
7.2
90.5
2.3

6.2
4.3
4.4
6.3
4.9
4.2
4.0
34.3
0.1

4.3
4.3
4.4
5.7
4.7
4.1
4.0
31.4
0.02

% (n)
14.1 (47)
21.0 (124)
18.5 (171)

% (n)
1.9 (6)
6.4 (33)
4.7 (39)

Gestational age at birth (completed weeks)


Out-of-range: <20
0.1
Out-of-range: >44
1.6
Preterm:a 2036
9.0
Term:a 3741
84.4
Post-term:a 4244
6.6
Digit preference, LMP dayb
Day 1
7.3
Day 5
4.4
Day 10
4.6
Day 15
6.9
Day 20
5.2
Day 25
4.3
Day 28
4.0
Any preferred digit
36.7
Implausible birthweight-for0.2
gestational age
Second birthweight modec
% (n)
2027 weeks
22.6 (477)
2831 weeks
29.2 (1015)
Overall: 2031 weeks
26.7 (1492)

Denominator excludes records with gestational ages <20 and >44 completed weeks.
Expected frequency of preferred digits is 3.3%.
c
Proportion with birthweight 2200 g among deliveries 2027 weeks and 2700 g
among deliveries 2831 weeks.
LMP, last menstrual period.
a

0.0008
0.0006
0.0004
0.0002

XAFP
Birth

0.0000

Figure 1. Birthweight distribution within


LMP-based gestational age 2027 completed
weeks from birth (n = 333) and Prenatal
Expanded Alpha-fetoprotein Screening
Program (XAFP, n = 315) records, California
2002 Linked Birth and Prenatal Screening
records.

Probability density

0.0010

0.0012

1000

2000

3000

4000

5000

Birthweight (g)

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

Quality of LMP date

0.0004

0.0006

0.0008

Figure 2. Birthweight distribution within


LMP-based gestational age 2831 completed
weeks from birth (n = 590) and Prenatal
Expanded Alpha-fetoprotein Screening
Program (XAFP, n = 517) records, California
2002 Linked Birth and Prenatal Screening
records.

0.0002

XAFP
Birth

0.0000

Probability density

55

1000

2000

3000

4000

5000

Birthweight (g)

mode all but disappeared within 2027 weeks gestation when gestational age was derived from XAFP
LMP (Fig. 1), and was greatly attenuated between 28
and 31 weeks (Fig. 2).
The majority of Birth LMP and XAFP LMP dates are
identical (71.1%), and 65.0% of discrepancies amount
to 1 week in either direction (Table 3). Among discrepant records, XAFP LMP-derived days of gestation
have a stronger association with birthweight than Birth
LMP-derived days of gestation (n = 30 624; R2 = 0.27
and R2 = 0.01, respectively). Large (>2 weeks) gestational age overestimates are 75% more common than
large underestimates (Table 3; 3.7% vs. 2.1%), and
account for 97.2% of gestational ages >44 weeks and

41.6% of post-term births (data not shown). Birth LMP


dates with preferred digits have larger discrepancies
and greater gestational age overestimation than dates
with non-preferred digits. Among Birth LMP dates
with day 1 of the month, 16.2% overestimate gestational age by more than 2 weeks whereas 2.6% underestimate gestational age. The vast majority of records in
the second birthweight mode (79.5%) underestimate
gestational age by more than 31 days relative to XAFP
LMP gestational age (Table 3).
Table 4 shows the cross-classication of gestational
age categories according to Birth LMP and XAFP
LMP. Within Birth LMP-based gestational age groups
of 2031 and 3236 weeks, 12.4% and 21.4%, respec-

Birth LMP data quality indicators

Birth minus XAFP


Gestational age (days)
32+
1531
814
17
0 (no difference)
-1 to -7
-8 to -14
-15 to -31
-32+
14 days
>14 days

% Overall
(n = 105 936)

% Among
preferred digits
(n = 36 333)

% Among
day 1
(n = 6614)

% Implausible
birthweight-forgestational age
(n = 91)

% Among
2nd birthweight
mode
(n = 171)

0.9
2.8
2.5
8.7
71.1
10.1
1.8
1.6
0.5
2.1
3.7

1.3
4.6
3.9
9.8
65.7
9.8
2.3
2.1
0.5
2.6
5.9

3.5
12.8
9.1
13.6
52.1
4.9
1.5
2.2
0.5
2.6
16.2

6.6
0.0
0.0
1.1
7.7
0.0
1.1
1.1
82.4
83.5
6.6

0.0
0.0
0.0
0.6
9.9
1.2
1.2
7.6
79.5
87.1
0.0

Table 3. Magnitude of
difference between gestational
ages calculated from Birth
LMP vs. XAFP LMP date, by
data quality indicators,
California 2002 Linked Birth
and Prenatal Expanded
Alpha-fetoprotein Screening
Program (XAFP) records
(n = 105 936)

LMP, last menstrual period.

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

M. Pearl et al.

56

Table 4. Distribution of XAFP gestational age within Birth LMP gestational age categories (completed weeks), California 2002 Linked
Birth and Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records (n = 105 936)
XAFP LMP-based gestational agea

Birth LMP-based gestational


age
(completed weeks)

<20

2031

3236

<20
2031
3236
3741
4244
>44

10
3
1
3
0
0

4
723
52
30
1
22

2
77
5540
1055
57
34

0.0
17

0.8
832

0.0
2

Total
%
N
Missing
%
N
Preterm false-positive rateb
Preterm false-negative rateb
Preterm false-positive
screen rateb
Post-term false-positive rateb
Post-term false-positive
screen rateb

4244

>44

33
114
1 523
91 694
2 010
503

2
5
10
598
1770
32

0
1
2
7
5
13

6.4
6765

90.5
95 877

2.3
2417

0.0
28

100.0

1.1
123

8.7
959

87.6
9 693

2.6
284

0.1
9

100.0
11 070

1 652/97 724
1 143/7 535
1 652/8 044

=
=
=

1.7%
15.2%
20.5%

2 068/102 876
2 068/3 838

=
=

2.0%
53.9%

3741

Total %

Total N

0.0
0.9
6.7
88.2
3.6
0.6

51
923
7 128
93 387
3 843
604

105 936

Bolded diagonal values indicate birth records correctly categorised according to XAFP gestational age categories.
Calculations exclude Birth and XAFP gestational ages <20 and >44 completed weeks (total n = 105 259). Because post-term births derived
from either LMP source may be unreliable, a post-term false-negative rate is not presented.
LMP, last menstrual period.
b

0%
All discrepant records
(n = 30 624)
14 days difference
(n = 2 236)
+14 days difference
(n = 3 929)

20%

9.3%

9.7%

29.9%

20.8%

<20 weeks, Birth LMP


(n = 51)
>44 weeks, Birth LMP
(n = 604)
Preterm false negatives
(n = 1 143)

Figure 3. Birth/XAFP LMP date


discrepancies and poor data quality
indicators: proportion explained by clerical
error and digit preference error, California
2002 Linked Birth and Prenatal Expanded
Alpha-fetoprotein Screening Program
(XAFP) records (n = 105 936). LMP, last
menstrual period.

Preterm false positives


(n = 1 652)
Post-term false
positives (n = 2 068)
Second birthweight
mode, 2031 weeks
(n = 171)

Any clerical error

40%

60%

100%

27.3%

6.6%

24.6%

25.3%

21.0%

0.0% 0.0%

60.0%

15.8%

35.1%

22.0%

80%

15.5%

14.0%

25.7%

4.8%
23.5%

33.1%

20.7%

27.9%

17.9%

0.6%
31.6%

15.8%

Day 1 digit preference error (non-clerical) Other digit preference error (non-clerical)

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

Quality of LMP date


tively, are term births based on XAFP LMP. More
than half of post-term births and 83.3% of those
>44 weeks according to Birth LMP are term births
based on XAFP LMP. The rate of false-negative
preterm births is 15.0%, and 20.5% of observed
preterm births and 53.9% of observed post-term
births are false positives. While the majority of these
misclassications result from discrepancies of
>2 weeks, 30.6% of preterm false negatives, 22.4% of
preterm false positives and 22.9% of post-term false
positives result from discrepancies of 14 days (data
not shown). Birth records missing LMP dates with
linked XAFP LMP data have higher preterm rates
than linked records not missing LMP dates (9.8% and
7.2%, respectively) (Table 4).

57

Of all gestational age discrepancies, 46.3% can be


described as either clerical or digit preference errors.
Clerical errors observed from discrepancies between
Birth LMP and XAFP LMP dates represent 2.7% of all
linked records and 9.3% of all discrepancies, whereas
the prevalence of non-clerical digit preference error is
10.7% of all linked records and 37.0% of all discrepancies. Among clerical errors, 2.2% are whole year
deviations, 0.9% possible confusions with estimated
delivery date, 47.7% whole month deviations, 1.2%
month/day transpositions, and 47.8% 10-day deviations. Among clerical errors, XAFP LMP gestational
age is more closely related to birthweight (R2 = 0.33),
whereas no relationship exists between Birth LMP gestational age and birthweight (R2 = 0.00).

Table 5. Maternal and infant characteristics by gestational age categories and data quality indicators, California 2002 Linked Birth and
Prenatal Expanded Alpha-fetoprotein Screening Program (XAFP) records (n = 105 936)

-14
days
%
Overall
2.1
Race/ethnicity
White
1.2
African American
2.2
Asian
1.6
Hispanic, US-born
2.1
Hispanic, foreign-born
3.1
Pacic Islander
1.4
Native American
1.7
Age (years)
<20
2.9
2024
2.5
2534
1.9
>35
1.8
Education (years)
<12
3.2
12
2.2
>12
1.3
Previous livebirths (parity)
0
1.9
1
2.1
>1
2.5
Method of payment for delivery
Medi-Cal
3.0
Any private
1.4
Uninsured
2.4
Other
2.0

+14
days
%

Digit
preference
error,
Birth LMP
%

Clerical
error,
Birth
LMP
%

Preterm
rate,
Birth
LMP
%

Preterm
rate,
XAFP
LMP
%

Preterm
falsenegative
ratea
%

Preterm
falsepositive
ratea
%

Preterm
falsepositive
screen ratea
%

3.7

10.7

2.7

7.6

7.2

15.2

1.7

20.5

2.9
5.4
2.8
4.0
4.2
3.1
3.9

9.6
12.9
8.9
11.6
11.4
9.1
11.7

1.8
2.7
2.2
2.7
3.6
2.3
1.7

6.1
11.3
7.0
8.1
8.1
9.9
7.6

5.9
10.9
6.5
7.8
7.3
9.5
7.3

11.5
13.1
13.3
16.6
18.2
11.9
19.2

1.0
2.1
1.3
1.8
2.3
1.7
1.8

14.8
16.1
18.7
20.0
26.4
15.6
22.2

3.9
4.0
3.5
3.8

11.3
11.6
10.3
10.5

3.1
3.0
2.5
2.7

9.1
7.8
7.1
9.0

8.9
7.1
6.6
8.6

17.4
16.8
14.6
12.3

1.9
2.0
1.6
1.5

19.4
23.5
20.5
15.7

4.6
3.9
2.9

11.8
11.4
9.4

3.6
2.7
2.0

8.6
7.7
6.9

7.9
7.3
6.5

19.7
16.0
10.8

2.4
1.7
1.2

25.9
20.5
16.0

3.2
3.7
4.6

9.7
10.8
12.2

2.4
2.7
3.1

7.8
6.8
8.5

7.5
6.1
8.0

13.7
14.3
18.3

1.4
1.7
2.1

17.0
23.4
23.0

4.4
3.1
4.9
2.3

12.0
9.6
9.6
12.1

3.4
2.1
3.5
2.7

8.5
6.9
8.2
6.1

7.9
6.5
7.6
5.2

18.4
12.0
17.0
3.4

2.2
1.2
2.0
1.1

24.2
16.6
23.2
17.7

a
Excludes Birth and XAFP gestational ages <20 and >44 completed weeks (total n = 105 259, see Table 4 for detail).
LMP, last menstrual period.

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

58

M. Pearl et al.

Proportions displayed in Fig. 3 represent the amount


by which the prevalence of each data quality indicator
decreases when clerical or digit preference errors in
birth records are corrected using XAFP LMP as the gold
standard. Clerical errors are associated more with large
underestimates than overestimates of gestational age,
resulting in 33.1% of the preterm false positives and
31.6% of the second birthweight mode observed
between 20 and 31 weeks (all of the latter involved
errors in the month eld). Digit preference error, especially day 1 error, is associated with large gestational age
overestimates. Day 1 errors, while representing only
2.7% of linked records, disproportionately contribute to
post-term out-of-range gestational ages, post-term false
positives and missed preterm cases.
Discrepancies between Birth LMP and XAFP LMP
gestational age estimates vary by population subgroup
(Table 5). Large underestimation of gestational age,
clerical errors and false-positive preterm rates are
apparent among foreign-born Hispanics, younger
women with less than high-school education, women
with high parity and with Medi-Cal or no insurance.
Large overestimation of gestational age, digit preference and post-term false-positive rates are observed
among African Americans, Native Americans, women
with low education level, high parity, and Medi-Cal or
no insurance. Preference for LMP day 1 is most prevalent among African Americans (data not shown) while
clerical errors are more prevalent among foreign-born
Hispanics.
Rates of preterm birth are approximately 5% lower
across population subgroups when dened according
to XAFP LMP than according to Birth LMP, with the
exception of foreign-born Hispanics, whose preterm
rates are 10% lower using XAFP LMP (Table 5). The
preterm birth rate among foreign-born Hispanics
appears to be lower than that for US-born Hispanics
using XAFP LMP, while rates are identical using Birth
LMP. Other preterm birth rate comparisons among
subgroups change little based on LMP data source.
However, overall preterm birth rates mask substantial
misclassication in both directions. Among African
Americans, for example, 13.1% of preterm cases are
missed using Birth LMP, whereas 16.1% of presumed
preterm cases are not true cases. Foreign-born Hispanics have the highest preterm false-positive and falsepositive screen rates based on Birth LMP (2.3% and
26.4%, respectively). Native Americans and foreignborn Hispanics have the highest preterm false-negative
rates (19.2% and 18.2%, respectively). Medi-Cal cover-

age and lack of insurance, high parity, young age and


low education level are associated with high misclassication of preterm births in both directions.
Overall post-term rates are 36% lower using XAFP
LMP compared with Birth LMP; however, this
decrease is higher among African Americans, women
aged over 35 years, women with high parity and
women with Medi-Cal or no insurance. African Americans and the uninsured have the highest post-term
false-positive rates (2.6% each), followed by Native
Americans, women with less than high-school education and women with Medi-Cal (data not shown).

Discussion
This is the rst study to compare LMP dates from birth
certicates with a large, population-based source of
reliable, prenatally collected LMP data in order to
isolate data reporting errors. Birth LMP was discrepant
with XAFP LMP nearly a third of the time, resulting in
one-fth of preterm births and half of post-term births
from birth records representing false positives, and
15% of true preterm cases being missed. Agreement
within 1 week was larger in the current study than a
previous comparison of LMP-based gestational age
from birth records with gestational age from medical
charts among normal-birthweight babies in northern
California (89% and 7778%, respectively); however,
some chart estimates in that smaller study were
derived from ultrasound.15
While menstrual dating has inherent aws for estimating gestational age, the recording of LMP date itself
is prone to errors amenable to improvement. Californias centralised XAFP prenatal screening programme
is the largest in the country, serving approximately 70%
of pregnant women in the State. As accurate gestational
age is needed for interpretation of risks for trisomies
and neural tube defects, XAFP data provide a
population-based source of gestational age in California. Until now, only vital records have provided sufcient numbers of very early deliveries to examine the
bimodal distribution of birthweight. The second birthweight mode at early gestations appears to be largely
an issue of clerical and recall error, rather than pathological non-menstrual bleeding misidentied as a
normal menstrual cycle.6 XAFP LMP is more accurate
than LMP from birth certicates, as demonstrated by
lower rates of digit preference, out-of-range gestational
ages, implausible birthweight-for-gestational age and
post-term births. Over half of large discrepancies in

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

Quality of LMP date


LMP dates were explained by suspected clerical and
digit preference errors, indicating that quality control
measures have the potential to improve gestational age
estimates.
Clerical errors may arise from recording dates from
the wrong eld (e.g. estimated due date14 or childs
date of birth), manual error transcribing a date into a
chart or worksheet, or typographical error on data
entry. In this analysis, assessment of clerical error may
have been incomplete as misread digits in the day eld
were only assessed if the tens digit differed by one or
the month and day were transposed. On the other
hand, random or recall error may have resulted in suspected clerical error by chance. Among discrepant
records agged for clerical error, birthweight was
strongly associated with XAFP LMP gestational age
while lacking association with Birth LMP gestational
age, suggesting errors are predominantly in Birth
LMP.
In 2002, Californias XAFP programme required
double-key entry of all dates, thus providing built-in
error checks during data entry, verication of key
elds with providers where any data element was
missing, and follow-up of non-negative screening
results, which probably account for improved data
quality. The State vital statistics electronic data entry
programme requires conrmation of dates of LMP
that precede birth by more than 1 year and gestations
less than 140 days with birthweight >2000 g. Implementing double-key entry and expanding data checks
to other situations including mistaking the estimated
due date for the LMP, additional implausible birthweight entries and out-of-range gestational age estimates, could yield substantial improvements in Birth
LMP data quality.
Birth LMP dates with preferred digits were more
likely than those with non-preferred digits to differ
from XAFP LMP dates by more than 2 weeks (8.5% vs.
4.4%). Increased discrepancies associated with preferred vs. non-preferred digits have also been reported
comparing gestational age estimates from XAFP LMP
dates with ultrasound gestational age estimates.3
Increased digit preference prevalence in Birth LMP
relative to XAFP LMP implies that mothers are directly
asked for LMP information during birth registration.
Increasing duration of LMP recall has been associated
with gestational age overestimation16 and may be one
explanation for the overestimation of gestational age
that we observed for Birth LMP dates with preferred
digits. Maternal querying and missing LMP dates may

59

both result from missing prenatal charts at the time of


birth registration.
The prevalence of digit preference in day of LMP
dates changed little between 1987 and 2002 (35.9% and
36.7%, respectively).3 While preference for day 1 in
Birth LMP dates was associated with large gestational
age overestimations and missed preterm cases in our
study, its prevalence in California birth records has
decreased from 7.7% in 2001 and 7.3% in 2002 to 5.7%
in 2003. Researchers should assess the degree of day 1
digit preference when relying on LMP dates for gestational age estimates, particularly among vulnerable
subpopulations.
LMP data in California birth records were not more
complete in 2002 than they were in 1987 (12.9% vs.
12.7%).3 Nationally in 2002, 5.1% of birth records were
missing only day of LMP and 5.5% were also missing
month and year (J. Martin, 7 Nov 2005, pers. comm.).
Missing LMP data threaten external validity of preterm
birth estimates. Births missing LMP data are disproportionately from vulnerable populations and have higher
risk of infant mortality.17,18 Implausible gestational ages
are frequently excluded from analysis, further compounding the missing data problem. In California birth
records, missing day is imputed as 15 for gestational age
calculation. However, unlike other States, clinical or
obstetric estimate of gestational age was unavailable to
substitute for records with incomplete LMP or out-ofrange gestational age estimates until 2007.
Direct linkage of birth records to XAFP records
where LMP dates were considered the best estimate
of gestational age ensures the most direct LMP error
assessment possible on a large, population-based
sample. However, the population of women with
XAFP LMP as the best gestational age estimate differed
from the general birth population, with fewer women
over the age of 35 years, with post-high school education, and fewer preterm or low-birthweight deliveries.
Women aged over 35 years often elect to have a diagnostic test (e.g. amniocentesis) rather than a screening
test. Beyond selection factors related to prenatal screening participation, women in this study had LMP dates
considered reliable for screening interpretation. It is
likely that the LMP dates in birth records for these
women are more reliable than the LMP dates of
women who were referred for ultrasound, as suggested by the higher Birth LMP-based post-term rates
among women with ultrasound dating. Similarly, relative to linked records, the overall birth population had
higher digit preference and post-term rates and a

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

60

M. Pearl et al.

larger proportion in the second birthweight mode. For


these reasons, the discrepancies we observed probably
underestimate the true extent of LMP reporting errors in
the general population of California births.
Studies comparing birth certicate LMP with ultrasound gestational age estimates need to consider the
role of reporting error in vital records in addition to
inherent biological or methodological limitations of
LMP dating. Direct comparison of XAFP LMP with
ultrasound could also lead to biased conclusions
regarding the quality of the LMP dating method. We
observed an excess of post-term births based on XAFP
LMP dates within the subsample of XAFP records with
both LMP and ultrasound data, suggesting an overrepresentation of unreliable XAFP LMP dates necessitating ultrasound conrmation. This small subgroup of
XAFP participants with both XAFP LMP and ultrasound data, comprising 14% of all XAFP participants
and 1% of 1987 California livebirths, has been the focus
of previous research.3
Women of African American and Hispanic origin,
with less education and higher parity, and with public
or no insurance coverage were disproportionately
affected by misclassication and missing LMP data on
birth records. Foreign-born Hispanics had the highest
rates of clerical error and underestimated gestational
age, and a high preterm false-positive rate. However,
recall error, indicated by digit preference, was less pronounced than among African Americans and Native
Americans. The appearance of higher preterm rates
among US-born Hispanics relative to foreign-born
Hispanics using gestational age from XAFP LMP dates
suggests that reporting error in birth records, and clerical error in particular, may hide a Hispanic paradox for
preterm delivery similar to that observed for birthweight, as hypothesised by others.19 Indices based on
gestational age, such as small-for-gestational age or
adequacy of prenatal care, may also be biased among
these segments of the population.
Beginning in 2007, the obstetric estimate was added
to the California birth certicate, intended to reect
ultrasound dating where available.20 Linked data from
the XAFP programme suggest that at least 39% of birth
records missing LMP could potentially have an obstetric estimate informed by ultrasound. Because women
obtaining ultrasound during pregnancy are not representative of the birth population, as well as for other
reasons, LMP dating will still be the primary source for
population monitoring of preterm delivery. We conclude that some limitations previously attributed to the

LMP dating method may be ameliorated through data


quality control measures. The training surrounding the
implementation of the revised birth certicate provides
an opportunity to emphasise appropriate sources for
gestational age data and to enhance data-checking
protocols.

Acknowledgements
This paper was partially supported through contract
CQ004942-LOS with the Centers for Disease Control
and Prevention, Atlanta, GA. The authors are indebted
to Joyce A. Martin of the Centers for Disease Control
and Prevention, National Center for Health Statistics,
and Alan Oppenheim of the California Department of
Health Services, Center for Health Statistics for insight
regarding national and State birth certicate data; Bob
Currier and Marie Roberson of the California Department of Health Services, Genetic Disease Branch and
Patricia M. Dietz of the Centers for Disease Control
and Prevention, National Center for Chronic Disease
Prevention and Health Promotion for their thoughtful
comments; Alan Hubbard of University of California,
Berkeley for statistical support; Allen Hom and Steve
Graham of the Sequoia Foundation for data linkage;
and Deborah Hildebrandt and Marissa Root for manuscript assistance.

References
1 Savitz DA, Terry JW Jr, Dole N, Thorp JM Jr, Siega-Riz AM,
Herring AH. Comparison of pregnancy dating by last
menstrual period, ultrasound scanning, and their
combination. American Journal of Obstetrics and Gynecology
2002; 187:16601666.
2 Frazier TM. Error in reported date of last menstrual period.
American Journal of Obstetrics and Gynecology 1959;
77:915918.
3 Waller DK, Spears WD, Gu Y, Cunningham GC. Assessing
number-specic error in the recall of onset of last menstrual
period. Paediatric and Perinatal Epidemiology 2000;
14:263267.
4 Alexander GR, Himes JH, Kaufman RB, Mor J, Kogan M. A
United States national reference for fetal growth. Obstetrics
and Gynecology 1996; 87:163168.
5 Kramer MS, McLean FH, Boyd ME, Usher RH. The validity
of gestational age estimation by menstrual dating in term,
preterm, and postterm gestations. JAMA 1988;
260:33063308.
6 David RJ. The quality and completeness of birthweight and
gestational age data in computerized birth les. American
Journal of Public Health 1980; 70:964973.

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

Quality of LMP date


7 Vahratian A, Buekens P, Bennett TA, Meyer RE, Kogan MD,
Yu SM. Preterm delivery rates in North Carolina: are they
really declining among non-Hispanic African Americans?
American Journal of Epidemiology 2004; 159:5963.
8 Mustafa G, David RJ. Comparative accuracy of clinical
estimate versus menstrual gestational age in
computerized birth certicates. Public Health Reports 2001;
116:1521.
9 Dietz PM, England LJ, Callaghan WM, Pearl M, Wier ML,
Kharrazi M. A comparison of LMP-based and
ultrasound-based estimates of gestational age using linked
California livebirth and prenatal screening records.
Paediatric and Perinatal Epidemiology 2007; 21 (Suppl. 2):
6271.
10 Alexander GR, Allen MC. Conceptualization, measurement,
and use of gestational age. I. Clinical and public health
practice. Journal of Perinatology 1996; 16:5359.
11 SuperMATCH Concepts and Reference, Version 3.10. Boston:
Vality Technology Incorporated, March 2001.
12 National Center for Health Statistics. Instruction Manual,
Computer Edits for Natality Data, Part 12. Hyattsville, MD: US
Department of Health and Human Services, Centers for
Disease Control and Prevention, 1995.
13 Scott DW. Multivariate Density Estimation: Theory, Practice
and Visualization. New York: John Wiley & Sons 1992.

61

14 Blair E, Liu Y, Cosgrove P. Choosing the best estimate of


gestational age from routinely collected population-based
perinatal data. Paediatric and Perinatal Epidemiology 2004;
18:270276.
15 Emery ES 3rd, Eaton A, Grether JK, Nelson KB. Assessment
of gestational age using birth certicate data compared with
medical record data. Paediatric and Perinatal Epidemiology
1997; 11:313321.
16 Wegienka G, Baird DD. A comparison of recalled date of
last menstrual period with prospectively recorded dates.
Journal of Womens Health 2005; 14:248252.
17 Buekens P, Delvoye P, Wollast E, Robyn C. Epidemiology of
pregnancies with unknown last menstrual period. Journal of
Epidemiology and Community Health 1984; 38:7980.
18 Gould JB, Chavez G, Marks AR, Liu H. Incomplete birth
certicates: a risk marker for infant mortality. American
Journal of Public Health 2002; 92:7981.
19 Deeb-Sossa N, Agans RP, Butron-Riveros BC, Balcazar H,
Kalsbeek WD, Buekens P. Development and testing of
interview questions to determine last menstrual period in
Mexican immigrant populations. Journal of Immigrant Health
2004; 6:127136.
20 Wier ML, Pearl M, Kharrazi M. Gestational age estimation
on United States live birth certicates: a historical overview.
Paediatric and Perinatal Epidemiology 2007; 21 (Suppl. 2):412.

Paediatric and Perinatal Epidemiology, 21 (Suppl. 2), 5061. 2007 The Authors. Journal Compilation 2007 Blackwell Publishing Ltd

You might also like