You are on page 1of 10

J Forensic Sci, 2017

doi: 10.1111/1556-4029.13564
PAPER Available online at: onlinelibrary.wiley.com

ANTHROPOLOGY

Kelly R. Kamnikar,1 M.A.; Amber M. Plemons,1 M.A.; and Joseph T. Hefner,1 Ph.D.

Intraobserver Error in Macromorphoscopic


Trait Data*,,

ABSTRACT: As part of a much larger investigation into the use of macromorphoscopic trait data by forensic anthropologists to estimate
ancestry from unidentified skeletal remains, we conducted a fourteen-year (20022016) intraobserver error study. Motivated by the development
of a large macromorphoscopic databasewhich will potentially utilize data collected in 2002quantification of observer error, the impact of
technological improvements in macromorphoscopic trait data collection and observer experience is necessary. To maximize comparisons
between the two samples, ten macromorphoscopic traits were assessed. Results revealed three patterns of error relating to observer experience,
the introduction of new technologies, and error inherent in the method. Overall, this study found the effect of error on macromorphoscopic trait
analysis could be predicted and did not significantly impact their utility.

KEYWORDS: forensic science, forensic anthropology, ancestry, macromorphoscopic traits, quantitative data analysis, observer experience

Creation of the biological profile for unidentified human skeletal trait data collection. The second objective targets the data collected
remains is a starting point in many forensic anthropological inves- in 2002 to ensure reliability and cohesion between the datasets for
tigations. Ancestry estimation is one key aspect of the biological inclusion of those data into the Macromorphoscopic Databank
profile and is accomplished through multiple methods, including: (MaMD), a reference and research tool for macromorphoscopic
cranial metrics (13), postcranial metrics (4), dental metrics (58), trait analysis. To that end, we piloted a fourteen-year (20022016)
dental morphology (912), and macromorphoscopic trait analysis intraobserver error study.
(1316). Macromorphoscopic traits are quasicontinuous morpho- Interobserver error quantifies differences in independent
logical traits of the cranium, which are reflected as soft-tissue dif- observations of the same feature (or measurement) between two
ferences in the living (16,17). These traits qualify a bones shape, or more individuals in an effort to identify potential sources of
the presence or absence of a feature, and/or a features degree of error. Intraobserver error assesses consistency in observations of
expression (14,18). Macromorphoscopic traits have historical ties the same feature recorded by a single observer over multiple
to Hootons Harvard List and were traditionally applied post hoc attempts (3). We address interobserver error for macromorpho-
to explain and support ancestry estimations (14,18). Using 11 non- scopic data in a separate paper, focusing herein on intraobserver
metric traits, Hefner (14) refined, quantified, and standardized their error in macromorphoscopic data collection.
use in ancestry estimation, placing macromorphoscopics in a more Metric and nonmetric analyses of cranial and postcranial ele-
objective framework. Further studies demonstrated the value of ments are commonly used to document and quantify human varia-
macromorphoscopic trait analysis when applied to populations tion in biological anthropology. A single observer or multiple
expanded beyond Hefners original study (1923). observers may participate in the data collection process, which can
The current research addresses long-term intraobserver error on last days, weeks, and/or months. Additional data collection can
macromorphoscopic traits in two ways. First, we quantify the occur years after the original data were collected. Inter- and intraob-
implications of observer error introduced as observer experience server analyses provide insight into the precision of observations
increases or technological improvements are included, in an effort and can aid in the identification of potential sources of error.
to measure the impact those factors have on macromorphoscopic Although long-term intraobserver error studies (i.e., >10 years) are
limited in the literature, several studies address the implications of
inter- and intraobserver error, illuminating potential factors con-
1
Department of Anthropology, Michigan State University, 655 Auditorium tributing to data collection error. In general, these studies come to
Drive, East Lansing, MI 48824. similar conclusions about the factors influencing error, which can
*Presented in part at the 69th Annual Scientific Meeting of the American include measurement technique, observer experience, and error
Academy of Forensic Sciences, February 13-18, 2017, in New Orleans, LA.
Funded in part by the Award No. 2015-DN-BX-K012, awarded by the inherent in a research design. One major assumption of data analy-
National Institute of Justice, Office of Justice Programs, United States sis is that observations can be replicated over time (2426). In
Department of Justice. reality, they can be difficult to replicate with a high degree of preci-
The opinions, findings, and conclusions or recommendations expressed in sion (24). Incorrectly repeating some measure or observation is the
this presentation are those of the authors and do not necessarily reflect the
views of the Department of Justice.
very nature of intraobserver error, with roots in two factors.
Received 17 Mar. 2017; and in revised form 28 April 2017; accepted 28
One factor influencing observer-introduced error (and bias) is
April 2017. the observers level of experience. Several studies report high

2017 American Academy of Forensic Sciences 1


2 JOURNAL OF FORENSIC SCIENCES

levels of error in data collected by multiple observers of varying (NBC) and was first publicly introduced in 2009 (14). A contour
experience levels. The proposed solution is often training prior gauge is critical for standardizing NBC scoring; it is a more
to data collection (2527), but this is not always possible. New objective method to assess the nuances of the individual charac-
method development should consider the role of experience to ter states. Finally, observer experience will likely have an impact
achieve consistent scores between observers and between periods on trait scores. Observer experience and familiarity with macro-
of observation. Additional sources of error are biases built into a morphoscopic traits influence trait scoring because more experi-
method through problematic protocols such as unclear definitions enced observers have been exposed to a greater range of human
or more serious mistakes in method design and implementation. variation. Extreme trait values, often associated with the trait list
Jamison and Zegura (28) recommend assessing intravariable cor- approach (15), are much more rare when an observer has docu-
relations to identify error sources. Low levels of correlation may mented crania from all over the world.
indicate difficulty in accurately measuring or scoring a variable
(28), which would necessitate refinement and revision of the
Materials and Methods
method before implementation in research and data collection.
Macromorphoscopic data analysis has been subjected to rela- Ten macromorphoscopic traits were assessed to ensure consis-
tively few intraobserver error studies. Hefners (14) original tency between the two data collection periods. These include
paper presented the results of a small-scale intraobserver error anterior nasal spine (ANS), inferior nasal aperture (INA), interor-
test, but observations were separated by only two weeks. bital breadth (IOB), malar tubercle (MT), nasal aperture width
Regardless, significant intraobserver error was identified for (NAW), nasal bone contour (NBC), nasal overgrowth (NO),
anterior nasal spine and supranasal suture (14). A separate inter- postbregmatic depression (PBD), posterior zygomatic tubercle
observer error study on macromorphoscopic traits found consid- (PZT), and zygomaticomaxillary suture (ZS). These ten traits
erable variation in scoring between observers of different were selected to maximize sample sizes and to evaluate a range
experience levels; some traits had agreement levels lower than of trait-value possibilities.
expected for random allocation (29). The authors, again, recom- Paired data were collected for American Black (n = 127) and
mended observers become familiar with identifying and assess- American White (n = 58) individuals from the Robert J. Terry
ing macromorphoscopic traits prior to data collection. Collection. The Terry Collection is a large documented skeletal
In this study, discrepancies between observations are expected collection housed at the Smithsonian Institution, National
to result from the following changes in data collection protocols: Museum of Natural History, in Washington, DC. The collection
(i) the introduction of images in the MMS (Macromorphoscopic) includes over 1500 American Black and American White indi-
interface; (ii) alterations in scales/values between 2002 and viduals of known biological parameters with birth-years ranging
2016; (iii) the introduction and implementation of a contour from 1822 to 1943 (30). The Terry Collection was chosen to
gauge to evaluate nasal bone contour; and (iv) increased obser- facilitate data comparisons. Macromorphoscopic trait data for the
ver experience. Each of these is briefly addressed. 2002 sample were collected for thesis research using paper forms
The MMS v1.61 is a data collection program that represents a and then standard texts, references, and line drawings (Fig. 1).
modified, free-standing version of the Macromorphoscopic Mod- The same data were recollected in 2016 using MMS v1.61
ule in Osteoware (Smithsonian Institution 2016). MMS is a (Fig. 2), specifically to assess intraobserver error and observer
graphical user interface comprising a standard window with a bias.
definition for each trait and character state, as well as illustra-
tions and descriptive comments. The illustrations offer visual
Statistical Analyses
clarification for each character state (i.e., the possible expres-
sions of a trait) and are supplemented with anatomically based The 2002 and 2016 samples were combined into a single data
definitions (18). Second, MMS uses a numerical scoring scale table for analysis. Several approaches to identify intraobserver
for the traits. Since its implementation in 2005, some of these error were considered. First, we implemented a quadratic-
scales have shifted slightly, generally by one digit. For example, weighted Cohens kappa test to assess error. This test was cho-
in 2002, a scoring scale ranged from 0 to 4, while in 2016, the sen over a traditional (unweighted) Cohens kappa, as the
same scale had been adjusted to range between 1 and 5. Third, a weighted approach is better suited to test the agreement of
contour gauge was first recommended for use in 2005. This tool ordered data (1 < 2 < 3) (31) unlike the traditional test, which
is used for assessing trait expression in nasal bone contour does not control for the grade of disagreement. Following

FIG. 1One example of original paper forms used to score macromorphoscopic traits in 2002.
KAMNIKAR ET AL. . INTRAOBSERVER ERROR 3

FIG. 2Updated MMS v1.61 interface.

Jamison and Zeguras (28) recommendations for assessing TABLE 1Results of Cohens kappa test for all ten macromorphoscopic
intraobserver error, we also incorporated correlation analyses. A traits.
correlation analysis using the corrplot package in R (32)
explored rater agreement. Kernel density graphs were created to Observed Standard Frequency of
visualize differences between the two years. Separated by ances- Trait Levels Kappa Error Agreements/N %
try and data collection year (2002 or 2016), these graphs provide NBC 5 0.69 0.04 91 77.78
insight into observer error. Finally, classification accuracies from NAW 3 0.64 0.06 116 62.70
a canonical analysis of the principal coordinates using all 10 IOB 3 0.64 0.05 113 61.08
INA 5 0.63 0.01 92 78.63
traits and a reduced model using seven traits with the highest
PBD 2 0.62 0.05 137 74.05
levels of observer agreement were calculated and assessed for NO 2 0.58 0.06 136 73.51
statistically significant differences between the 2002 and 2016 ANS 3 0.49 0.06 78 68.42
samples. MT 4 0.10 0.29 68 36.76
ZS 3 0.06 nc 52 28.11
PZT 4 <0.01 nc 63 34.05
Results
Bolded values represent moderate-to-good observer agreement.
Kernel density estimate plots, by year, are provided in Figures nc = not calculated.
4 through 8. No obvious data collection errors are noted. The
results of the intraobserver error assessments are provided shows neither a positive nor negative correlation for PZT, sug-
below. gesting no agreement between the two collection intervals.
The Cohens kappa test indicates moderate-to-good observer (Fig. 3, left plot). Reassessing the level of agreement (Cohens
agreement for seven of the ten traits between the two observa- kappa) between MT and PZT identified a higher than expected
tion periods. The frequency of agreement between observation level of agreement, particularly compared to the original calcula-
periods ranges between a high of 74.05% (PBD) to as low as tions (Table 2).
28.11% (ZS) (Table 1.). To determine how observer bias and/or observer error affected
Three of the ten macromorphoscopic traits had significantly classification accuracies, two CAP analyses were conducted
different rates of agreement, indicating relatively high intraob- using the chi-square distance metric (x-validated with substitu-
server error. The three underperforming traits include MT (confi- tion). Classification accuracies were assessed between years
dence interval jw = 0.0290.046; levels = 4); ZS (CI using all ten traits and, again, using the seven traits with high
jw = 0.0310.093; levels = 3); and PZT (CI jw = 0.0290.046; observer agreement in a reduced model. These accuracies
levels = 4). assessed significant differences, by year. Despite the observer
Figure 3 illustrates the correlation coefficients between ANS, error noted above, all of the models had high classification accu-
INA, IOB, NAW, NO, PBD, and PZT. All intertrait compar- racies and are similarly distributed. The 2002 sample correctly
isons, except PZT, were positively correlated, indicating gener- classified 92.2% of the sample using the seven-variable model
ally good observer agreement, or low observer bias. The matrix and 93.5% of the sample using the ten-variable model. Similarly,
4 JOURNAL OF FORENSIC SCIENCES

FIG. 3Plots illustrating correlation coefficients before and after data reconciliation. Blue values indicate a positive correlation between traits, and red
values indicate a negative correlation between traits.

TABLE 2Updated Cohens kappa tests for malar tubercle (MT) and poste- TABLE 4Results of the CAP analysis using seven variables (chi-square
rior zygomatic tubercle (PZT). distance metric).

Observed Standard Frequency of Group Percent Correct Black White


Trait Levels Kappa Error Agreement/N %
(2002)
MT 4 0.36 0.07 89 76.07 Black 92.71 89 7
PZT 4 0.38 0.06 103 65.19 White 95.35 2 41
Total: 93.53 91 48
(2016)
Black 89.66 104 12
TABLE 3Results of the CAP analysis using all ten variables (chi-square White 94.12 3 48
distance metric). Total: 91.02 109 60

Group Percent Correct Black White


(2002) Observer Experience
Black 90.91 100 10
White 95.56 2 43 Trait frequencies for the individual character states differed
Total: 92.26 102 53 between the two datasets. These differences are potentially
(2016) related to observer experience and an increased exposure to a
Black 87.29 103 15 wider range of human variation between the data collection peri-
White 90.38 5 47
Total: 88.24 108 62 ods. Augmented observer experience likely results from more
familiarity with the intricacies of the macromorphoscopic method
and with the individual character state expressions.
Some of the differences noted in the kernel density plots
the 2016 sample correctly classified 91.1% and 88.2% of the (Figs 48) illustrate shifts in character state frequencies from
sample using these same models, respectively (Tables 3 and 4.) extreme expressions (either minimum or maximum scores) to more
intermediate expressions. Extreme character expressions were less
frequent in the 2016 data for the following traits: ANS, IOB, and
Discussion
NAW (Fig. 4a,b); however, scores for INA included more extreme
To identify intraobserver error and document potential obser- trait values. All four of these traits are expressed as three or more
ver bias, three general patterns (i.e., observer bias and/or intraob- character states. Comparison of the ANS data highlights this shift;
server error) influencing trait scores were recognized. These the most frequent expression in 2002 for both American Whites
patterns and data discrepancies include (i) differences attributed and American Blacks was the minimum character expression
to observer experience; (ii) an increase in objectivity through the (ANS = 1). In the 2016 sample, however, the greatest frequencies
introduction of new technologies for macromorphoscopic trait corresponded to the intermediate character expression (ANS = 2).
scoring; and (iii) the identification of errors inherent in the origi- Trait frequencies for IOB and NAW were similarly shifted. INA
nal methodological design. Each of these is outlined in detail scores were the exception. Within both the American Black and
below. the American White sample, the most frequent expressions shifted
KAMNIKAR ET AL. . INTRAOBSERVER ERROR 5

FIG. 4Density plots for traits exhibiting movement toward their middle expression. Dashed lines represent sectioning points used in OSSA.

to a more extreme character states (i.e., American Black = INA 2 Observer experience is potentially the most interesting factor
to INA 1; American White sample = INA 4 to INA 5). influencing observer error and bias. Intermediate values for all ten
The distribution of binary traits, dichotomous variables defin- traits demonstrated a lower proportion of agreement between
ing presence/absence data, also proved interesting. The kernel observation periods; the 2016 sample demonstrated a higher pro-
density plots for NO illustrate a very low rate of change in trait portion of intermediate scores, suggesting more experienced
frequencies between observation periods. Values for PBD show observers are less likely to assign extreme trait values, at least for
discrepancies (Fig. 5), which we attribute to observer experience. the American Black and American White samples used herein.
As an observer becomes more familiar with the manifestation of This prompts an interesting question: When scoring IOB and
PBD, more subtle depressions are scored as present. NAW, would you preferentially accept observations provided by a
A proper understanding of macromorphoscopic trait expression seasoned expert over those provided by a novice? The only valid
is linked to experiencenovice observers should rely heavily on way to answer this question is validation of both observations in a
written definitions and images, while more experienced observers quantifiable manner. Future research should identify methods to
will only do so initially or when a unique expression is encoun- quantify observer experience and assess its true role in the collec-
tered. Any difficulty distinguishing character state manifestations tion of macromorphoscopic trait data. Observer experience seem-
may be attributed to poorly worded or inadequate definitions or ingly plays a role when scores are binary, and the identification of
the failure of the method to capture all of a traits character state modest expressions is fundamental, such as when scoring PBD.
expressions. We propose amendments to the MMS method to
clarify unclear trait definitions in a forthcoming publication,
New and Improved Technologies
specifically for IOB and NAW. They suggest scoring these fea-
tures as a ratio of the relative size of the feature to the entire mid- As Hefner (14,33,34) first introduced the MMS method, adjust-
facial region, which Hefner (14) attempted but failed to do. ments have been made to the scaling system (i.e., character states)
6 JOURNAL OF FORENSIC SCIENCES

FIG. 5Density plots for binary traits. Dashed lines represent sectioning points used in OSSA.

through the institution of new technologies, such as a contour 2002, the scoring scale for ZS ranged from 0 to 3; by 2016, the
gauge and necessary minor adjustments in scoring strategies. scale excluded a value for unobservable and decreased in range
These modifications may impact trait frequencies, particularly if to only include values between 0 and 2. Issues like this one may
errors occur during data maintenance (e.g., database management). indicate that some of the data from 2002 should not be included in
Kernel density plots identified scaling difference for ZS (Fig. 6) the final, released version of the MaMD. Two issues were also
and the impact of a contour gauge on scores for NBC (Fig. 7). In identified in the NBC density graph. First, using a contour gauge
KAMNIKAR ET AL. . INTRAOBSERVER ERROR 7

FIG. 6Density plots illustrating differences in scales.

FIG. 7Density plot illustrating the improvement of scores with the introduction of new technologies. Dashed lines represent sectioning points used in
OSSA.

produced well-defined, although relatively evenly distributed well-defined character state boundaries. This issue is less easily
clusters, indicating greater objectivity in NBC scores. Second, the remedied. During the development of the OSSA method (15), data
2002 trait score frequencies show data smoothing without for NBC were inadvertently transformed (optimized) in the main
8 JOURNAL OF FORENSIC SCIENCES

FIG. 8Density plots for Posterior Zygomatic Tubercle (PZT) and Malar Tubercle (MT).

data file. As such, these data no longer represent actual observa- scored with repeatability, although that study did identify SPS
tions and as such cannot be included in the MaMD. and ANS as the traits with the lowest level of repeatability. We
did not test intraobserver error on SPS in the current study, as
Hefner (14) suggested eliminating SPS from consideration until
Error Inherent in the Method
a better definition is provided. However, we were able to iden-
Hefners (14) original study shows moderate-to-high intraob- tify several problematic traits. In an attempt to understand the
server agreement between all traits and suggests traits can be observed bias or error in PZT scores between 2002 and 2016,
KAMNIKAR ET AL. . INTRAOBSERVER ERROR 9

we reexamined Hefners (33) thesis, where the majority of these supports Klales and Kenyherczs (29) recommendation of a
traits were first clarified. We noted inconsistencies therein training period prior to macromorphoscopic trait data collection.
between the current definitions for PZT and MT and a striking As new technologies supplant older approaches, further reduc-
correspondence to older definitions of traits coded with similar tion in observer error is possible.
three letter abbreviations. In his thesis, Hefner used the abbrevi-
ation MT for the marginal tubercle, which is now identified
Acknowledgments
as the posterior zygomatic tubercle (PZT); ZMT originally
identified the zygomatic tubercle, but that trait is now referred The authors would like to thank D. Hunt and C.J. Dudar from
to as the malar tubercle (MT). Further investigation of the fre- the Smithsonian Institution, National Museum of Natural History
quency table data in that thesis identified the inadvertent inter- for access to the collections, and M. Ancern for special consider-
change between these two variables. Exchanging the 2002 data ations. We would also like to thank Kandus C. Linde for assis-
between MT and PZT corrected the issues we identified above tance with the collection of the 2016 data. The authors would
in the intertrait correlations, which now demonstrate a signifi- also like to thank the two anonymous reviewers, whose com-
cant correlation (Fig. 3, right matrix). Jamison and Zegura (28) ments and insights strengthened the manuscript.
noted the importance of testing intraobserver error to identify
such issues in a dataset. Once the data were exchanged, density
plots for trait frequencies of MT and PZT (Fig. 8) show no References
significant differences. In fact, PZT frequencies in the 2002 and 1. Ousley SD, Jantz RL. Fordisc 3 and statistical methods for estimating
2016 data for American Whites are almost identical. Trait sex and ancestry. In: Dirkmaat DC, editor. A companion to forensic
frequencies for MT do shift to lower values in the American anthropology. Oxford, U.K.: Wiley-Blackwell, 2012;31129.
2. Konigsberg L, Algee-Hewitt B, Steadman DW. Estimation and evidence
White sample, but score frequencies for MT remain similar in forensic anthropology: sex and race. Am J Phys Anthropol
among the American Black sample for both data collection 2009;139:7790.
periods. 3. Dudzik B, Kolatorowicz A. Craniometric data analysis and estimation of
Results from this study may explain some of the error identi- biodistance. In: Pilloud M, Hefner J, editors. Biological distance analysis:
forensic and bioarchaeological perspectives. New York, NY: Elsevier,
fied by other practitioners. For example Klales and Kenyhercz 2016;3560.
(29) compared their results to Hefners original study (14) and 4. Spradley MK. Ancestry estimation from the postcranial skeleton. In:
suggest sectioning points between groups similar to Hefner and Berg G, Taala S, editors. Biological affinity in forensic identification of
Ousleys OSSA (15), with the exception of NBC. Klales human skeletal remains: beyond black and white. Boca Raton, FL: CRC
and Kenyhercz (29) also report lower frequencies of extreme Press, 2015;8394.
5. Pilloud M, Kenyhercz MW. Dental metrics in biodistance analysis. In:
scores for ANS, INA, NAW, and PBD, but a higher frequency Pilloud M, Hefner JT, editors. Biological distance analysis: forensic
of more extreme scores reported for IOB and transverse palatine and bioarchaeological perspectives. New York, NY: Elsevier,
suture (not included in the current study). They suggest these 2016;13555.
differences may reflect temporal differences between the two 6. Harris E, Foster C. Size matters: discrimination between American blacks
and whites, males and females, using tooth crown dimensions. In: Berg
sample populations; however, the differences between the Terry G, Taala S, editors. Biological affinity in forensic identification of
and Todd collection could also reflect errors in the original data human skeletal remains: beyond black and white. Boca Raton, FL: CRC
collection protocol from 2002 and greater experience with Press, 2015;20938.
macromorphoscopic traits. 7. Pilloud M, Hefner JT, Hanihara T, Hayashi A. The use of tooth crown
measurements in the assessment of ancestry. J Forensic Sci
2014;59:1493501.
Conclusions 8. Lease L, Sciulli P. Brief communication. discrimination between
European-American and African-American children based on deciduous
Quantifying short- and long-term intraobserver error is impor- dental metrics and morphology. Am J Phys Anthropol 2005;126:
tant for validating more broadly used methods. Our objectives 5660.
9. Pilloud M, Edgar H, George RW, Scott G. Dental morphology in biodis-
were to identify causes of intraobserver error in macromorpho- tance analysis. In: Pilloud M, Hefner JT, editors. Biological distance
scopic trait data collection and to determine whether data col- analysis: forensic and bioarchaeological perspectives. New York, NY:
lected in 2002 are reliable and appropriate for use as reference Elsevier, 2016;10933.
data in the MaMD. 10. Edgar H. Prediction of race using characteristics of dental morphology.
Intraobserver error tests, in this case, identified mistakes with J Forensic Sci 2005;50:26973.
11. Edgar H. Estimation of ancestry using dental morphological characteris-
trait nomenclature that occurred early on in the methods intro- tics. J Forensic Sci 2013;58:S38.
duction and subsequent development. This error was not noted 12. Scott G. Population variation of Carabellis trait. Hum Biol 1980;52:63
earlier, because frequency tables between the two traits were not 78.
significantly different. This mistake could potentially impact val- 13. Gill G, Rhine S. Skeletal attribution of race: methods for forensic anthro-
pology. Albuquerque, NM: Anthropological Papers of the Maxwell
idation studies and interobserver error tests. At times, these traits Museum of Anthropology, 1990.
were identified as inadequate for ancestry assessments. In point 14. Hefner JT. Cranial nonmetric variation and estimating ancestry. J Foren-
of fact, the opposite may be true. Statistical reanalysis of PZT sic Sci 2009;54:98595.
and MT could clarify this predicament. 15. Hefner JT, Ousley S. Statistical classification methods for estimating
Overall, error between traits in this study was negligible; thus ancestry using morphoscopic traits. J Forensic Sci 2014;59:88390.
16. Hefner JT. Biological distance analysis, cranial morphoscopic traits, and
confirming that most of the 2002 data [ANS, INA, IOB, MT, ancestry assessment in forensic anthropology. In: Pilloud M, Hefner JT,
NAW, NO, PBD, PZT, and ZS] are suitable for inclusion in the editors. Biological distance analysis: forensic and bioarchaeological per-
MaMD. The 2002 data for NBC and SPS will not be included spectives. New York, NY: Elsevier, 2016;30115.
in the reference databank. 17. Brues AM. The once and future diagnosis of race. Skeletal attribution of
race: methods for forensic anthropology. Albuquerque, NM: Maxwell
This study set out to evaluate and quantify long-term intraob- Museum of Anthropology, 1990;19.
server error on macromorphoscopic trait analysis. Our results 18. Plemons AM, Hefner JT. Ancestry estimation using macromorphoscopic
suggest familiarity and training will decrease observer error and traits. Acad Forensic Pathol 2016;6:40012.
10 JOURNAL OF FORENSIC SCIENCES

19. LAbbe E, Rooyen C, Nawrocki S, Becker P. An evaluation of non- 29. Klales AR, Kenyhercz MW. Morphological assessment of ancestry using
metric cranial traits used to estimate ancestry in a South African sample. cranial macromorphoscopics. J Forensic Sci 2014;60:1320.
Forensic Sci Int 2011;209:E17. 30. Usher B. Reference samples: the first step in linking biology and age in
20. Hurst C. Morphoscopic trait expressions used to identify Southwest His- the human skeleton. In: Hoppa R, Vaupel J, editors. Paleodemography:
panics. J Forensic Sci 2012;57:85965. age distributions from skeletal samples. Cambridge, U.K.: Cambridge
21. Kenyhercz MW, Klales AR, Rainwater CW, Fredette SM. The optimized University Press, 2002;2947.
summed scored attributes method for the classification of U.S. Blacks 31. Cohen J. A coefficient of agreement for nominal scales. Edu Psychol
and Whites: a validation study. J Forensic Sci 2017;62:17480. Meas 1960;20:3746.
22. Monsalve T, Hefner JT. Macromorphoscopic trait expression in a cranial 32. Wei T. Corrplot: visualization of a correlation matrix, in R package.
sample from Medellin, Colombia. Forensic Sci Int 2016;266:574. Vienna, Austria: R Foundation for Statistical Computing, 2013.
23. Birkby W, Fenton T, Anderson B. Identifying southwest Hispanics using 33. Hefner JT. Assessing nonmetric cranial traits currently used in forensic
nonmetric traits and the cultural profile. J Forensic Sci 2008;53:2933. determination of ancestry [masters thesis]. Gainesville, FL: The Univer-
24. Utermohle C, Zegura S. Intra and interobserver error in craniometry: a sity of Florida, 2003.
cautionary tale. Am J Phys Anthropol 1982;57:30310. 34. Hefner JT. The statistical determination of ancestry using cranial non-
25. Utermohle C, Zegura S, Heathcore G. Multiple observers, humidity, and metric traits [doctoral dissertation]. Gainesville, FL: The University of
choice of precision statistics: factors influencing craniometric data qual- Florida, 2007.
ity. Am J Phys Anthropol 1983;61:8595.
26. Yzerniac S, Lougheed S, Handford P. Measurement error and morpho-
metric studies: statistical power and observer experience. System Biol Additional information and reprint requests:
1992;41:47182. Kelly R. Kamnikar, M.A.
27. Perini T, Oliveira GL, Ornellas JS, Oliveira FP. Technical error of mea- Department of Anthropology
surement in anthropometry. Revista Brasileira de Medicina do Esporte Michigan State University
2005;11:815. 655 Auditorium Drive
28. Jamison P, Zegura S. A univariate and multivariate examination of East Lansing
measurement error in anthropometry. Am J Phys Anthropol 1974;40: MI 48824
197203. E-mail: kellykamnikar@gmail.com

You might also like