Utility of Critical Items Within The Recognition Memory Test and Word Choice Test (2017)

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/315263891
Utility of critical items within the Recognition

Memory Test and Word Choice Test Utility of
critical items within the...
Article in Applied Neuropsychology: Adult April 2017

DOI: 10.1080/23279095.2017.1298600
CITATIONS READS
0 38
8 authors, including:
Laszlo A Erdodi Christopher A Abeare

University of Windsor University of Windsor
41 PUBLICATIONS 118 CITATIONS 27 PUBLICATIONS 178 CITATIONS
SEE PROFILE SEE PROFILE
Brandon Zuccato Sanya Sagar

University of Windsor University of Windsor
6 PUBLICATIONS 10 CITATIONS 7 PUBLICATIONS 4 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Laszlo A Erdodi on 17 March 2017.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Applied Neuropsychology: Adult
ISSN: 2327-9095 (Print) 2327-9109 (Online) Journal homepage: http://www.tandfonline.com/loi/hapn21
Utility of critical items within the Recognition

Memory Test and Word Choice Test
Laszlo A. Erdodi, Bradley T. Tyson, Christopher A. Abeare, Brandon G.

Zuccato, Jaspreet K. Rai, Kristian R. Seke, Sanya Sagar & Robert M. Roth
To cite this article: Laszlo A. Erdodi, Bradley T. Tyson, Christopher A. Abeare, Brandon G.
Zuccato, Jaspreet K. Rai, Kristian R. Seke, Sanya Sagar & Robert M. Roth (2017): Utility of critical
items within the Recognition Memory Test and Word Choice Test, Applied Neuropsychology: Adult
To link to this article: http://dx.doi.org/10.1080/23279095.2017.1298600
Published online: 17 Mar 2017.
Submit your article to this journal
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=hapn21
Download by: [University of Windsor], [Dr Laszlo A. Erdodi] Date: 17 March 2017, At: 11:50
APPLIED NEUROPSYCHOLOGY: ADULT
http://dx.doi.org/10.1080/23279095.2017.1298600
Utility of critical items within the Recognition Memory Test and

Word Choice Test
Laszlo A. Erdodia,b, Bradley T. Tysonc,b, Christopher A. Abearea, Brandon G. Zuccatoa, Jaspreet K. Raia,
Kristian R. Sekea, Sanya Sagara and Robert M. Rothb
a
Department of Psychology, University of Windsor, Windsor, Ontario, Canada; bDepartment of Psychiatry, Geisel School of Medicine at
Dartmouth, Lebanon, New Hampshire, USA; cWestern Washington Medical Group, Everett, Washington, USA
ABSTRACT KEYWORDS
This study was designed to examine the clinical utility of critical items within the Recognition Critical item analysis;
Memory Test (RMT) and the Word Choice Test (WCT). Archival data were collected from a mixed performance validity testing;
clinical sample of 202 patients clinically referred for neuropsychological testing (54.5% male; mean Recognition Memory Test;
Word Choice Test
age 45.3 years; mean level of education 13.9 years). The credibility of a given response set was
psychometrically defined using three separate composite measures, each of which was based on
multiple independent performance validity indicators. Critical items improved the classification
accuracy of both tests. They increased sensitivity by correctly identifying an additional 217% of
the invalid response sets that passed the traditional cutoffs based on total score. They also
increased specificity by providing additional evidence of noncredible performance in response sets
that failed the total score cutoff. The combination of failing the traditional cutoff, but passing
critical items was associated with increased risk of misclassifying the response set as invalid. Critical
item analysis enhances the diagnostic power of both the RMT and WCT. Given that critical items
require no additional test material or administration time, but help reduce both false positive and
false negative errors, they represent a versatile, valuable, and time- and cost-effective supplement
to performance validity assessment.
Introduction is an emerging consensus in the field that an objective

evaluation of performance validity must be an integral
The current climate of health care is characterized by
part of the assessment process (Bush, Heilbronner, &
increasing emphasis on time- and cost-effective service
Ruff, 2014; Chafetz et al., 2015). The administration of
delivery. As a result, neuropsychologists are under
multiple, non-redundant performance validity tests
growing pressure to administer shorter test batteries.
(PVTs) distributed throughout the assessment has been
In order to maximize the quantity and quality of
identified as the best approach for differentiating
information gleaned from these brief assessments, the
credible from non-credible response sets (Boone, 2009;
strategic selection of assessment tools has never been
Larrabee, 2012).
more important. This shift toward a more resource-
Given that PVTs provide little information about
conscious model of assessment is reflected in the
cognitive functioning, it is becoming increasingly
development of abbreviated batteries (e.g., Repeatable
important for neuropsychologists to glean information
Battery for the Assessment of Neuropsychological
about performance validity without additional test
Status; Randolph, 1998) and shorter versions of existing
material or increased administration and scoring time.
tests (e.g., Boston Naming Test-15: Morris et al., 1989;
Over the past decades, researchers have explored cre-
California Verbal Learning Test Second Edition
ative ways of improving the signal detection properties
[CVLT-II] Short Form: Delis, Kramer, Kaplan, & Ober,
of existing PVTs, including the development of new
2000).
indicators within existing neuropsychological tests
In addition to conducting an adequate assessment of
(Arnold et al., 2005; Erdodi, Tyson, Abeare, et al.,
cognitive functioning, however, neuropsychologists
2016; Greiffenstein, Baker, & Gola, 1994).
must also assess performance validity. Indeed, the
Boone, Salazar, Lu, Warner-Chacon, and Razani
clinical utility of neuropsychological testing depends
(2002) developed a recognition trial to the Rey 15-item
on the examinees ability and willingness to demon-
test that adds only about 30 seconds in administration
strate their true ability level (Bigler, 2015), and there
CONTACT Laszlo A. Erdodi lerdodi@gmail.com 168 Chrysler Hall South, 401 Sunset Ave, Windsor ON N9B 3P4.
2017 Taylor & Francis Group, LLC
2 L. A. ERDODI ET AL.
time, but significantly improves the instruments limit of theoretical chance level responding is 32. In
sensitivity while maintaining high specificity. Similarly, other words, random responding and a 64% overall
the first trial of the Test of Memory Malingering accuracy can coexist, suggesting that a large proportion
(TOMM; Tombaugh, 1996), although initially conceived of test items have poor negative predictive power.
as an inactive learning trial, has been shown to Since test items tend to vary in difficulty level and
effectively discriminate valid from invalid cognitive test hence, in their relative contribution to the diagnostic
performance. Specifically, Trial 1 has been found to accuracy of the overall scale, a critical item analysis has
have adequate sensitivity and specificity against the the potential to increase the clinical utility of the instru-
standard, full administration of the TOMM (Bauer, ment by identifying items that best discriminate between
OBryant, Lynch, McCaffrey, & Fisher, 2007; Fazio, credible and noncredible response sets. It has long been
Denning, & Denney, 2017); Hilsabeck, Gordon, recognized in psychometric theory that shorter tests can
Hietpas-Wilson, & Zartman, 2011; Horner, Bedwell, & be more reliable than longer tests if they are based on
Duong, 2006; Wisdom, Brown, Chen, & Collins, carefully calibrated items (Embretson, 1996). Although
2012), and other stand-alone PVTs used in isolation averaging performance across a large number of item
(Denning, 2012) and in combination (Jones, 2013; responses with heterogeneous item characteristic curves
Kulas, Axelrod, & Rinaldi, 2014). Based on this is common practice in test development, it can weaken
evidence, some researchers suggested that Trial 1 of the measurement model. Conversely, reducing the num-
the TOMM can function as a stand-alone PVT (Bauer ber of test items to a select few that have the strongest
et al., 2007; Hilsabeck et al., 2011; Horner et al., 2006; relationship with the target construct can preserve
OBryant, Engel, Kleiner, Vasterling, & Black, 2007). (Bilker, Wierzbicki, Brensinger, Gur, & Gur, 2014) or
Another example of this after-market enhancement even improve (Erdodi, Jongsma, & Issa, 2017) overall
of an existing PVT was the introduction of a time-cutoff diagnostic power. Therefore, we hypothesized that
to the Recognition Memory Test (RMT; Warrington, critical items would enhance the overall classification
1984), which effectively differentiated valid and invalid accuracy of the RMT and WCT by increasing either
responders independent of the traditional accuracy the sensitivity or the specificity of the total score to
score. It also boosted the RMTs overall sensitivity when invalid responding.
combined with the accuracy score while maintaining Cutoff scores based on such critical items can offer
high specificity. (M. S. Kim, Boone, Victor, Marion, additional information about performance validity that
et al., 2010). Similarly, it was recently shown that adding is non-redundant with results obtained from cutoffs
a time-cutoff to the Word Choice Test (WCT; Pearson, based on the total score. This second opinion, in turn,
2009) not only enhanced the sensitivity of the accuracy can be used to confirm or challenge the outcome based
score, but also functioned as an independent validity on traditional cutoffs. The availability of multiple indica-
indicator (Erdodi, Tyson, Shahein, et al., 2017). tors of performance validity within a single PVT is
The present study was designed to explore the especially useful in the interpretation of scores that
clinical utility of critical items within the RMT and fall in the indeterminate range (near passes; Bigler,
WCT. Previous research suggests that while the RMT 2012, 2015), where the classification of an examinees
is more difficult than the WCT at the raw score level, performance as either Pass or Fail is particularly difficult.
once the cutoffs are adjusted to account for the
difference, the two instruments have comparable
classification accuracy (Davis, 2014; Erdodi, Kirsch, Method
Lajiness-ONeill, Vingilis, & Medoff, 2014). However,
Participants
despite the imperfect classification accuracy of
traditional cutoffs based on RMT and WCT total scores, The sample consisted of 202 patients (54.5% female,
the discriminant power of item-level data has not been 87.1% right-handed) clinically referred for neuropsy-
investigated within these tests. chological testing at a northeastern academic medical
Cutoffs established by earlier studies (RMT 39: center. Mean age was 45.3 years (SD 16.8), while
Iverson & Franzen, 1994; RMT 42: M. S. Kim, Boone, mean level of education was 13.9 years (SD 2.7).
Victor, Marion, et al., 2010; Erdodi, Kirsch, et al., 2014; The most common diagnostic categories were psychi-
WCT 42: Barhon, Batchelor, Meares, Chekaluk, & atric (44.1%), traumatic brain injury (37.1%), mixed
Shores, 2015; WCT 46: Davis, 2014) and the neurological (15.8%), or general medical (3%) con-
technical manual (WCT 3247; Pearson, 2009) imply ditions. Overall, patients reported a mild level of
that one can provide a correct answer on the majority of depression (MBDI-II 16.9, SDBDI-II 12.0) and anxiety
the items and still fail these PVTs. Moreover, the upper (MBAI 13.1, SDBAI 10.0).
APPLIED NEUROPSYCHOLOGY: ADULT 3
Table 1. List of tests administered.

Test name Abbreviation Norms %ADM
Beck Anxiety Inventory BAI 60.9
Beck Depression Inventory, 2nd edition BDI-II 89.1
California Verbal Leaning Test, 2nd edition CVLT-II Manual 99.0
Complex Ideational Material CIM Heaton 44.6
Conners Continuous Performance Test, 2nd edition CPT-II Manual 62.3
Letter and Category Fluency Test FAS & animals Heaton 91.1
Finger Tapping Test FTT Heaton 52.5
Recognition Memory Test RMT 100.0
Rey Complex Figure Test RCFT Manual 90.1
Wechsler Adult Intelligence Scale, 4th edition WAIS-IV Manual 98.0
Wechsler Memory Scale, 4th edition WMS-IV Manual 97.0
Wide Range Achievement Test, 4th edition WRAT-4 Manual 70.8
Wisconsin Card Sorting Test WCST Manual 88.1
Word Choice Test WCT 100.0
Note. T: Heaton: Demographically adjusted norms published by Heaton, Miller, Taylor, and Grant (2004); Manual: Normative data published in the technical
manual; %ADM: Percentage of the sample to which each test was administered.
Materials counted as an overall Fail (1). Failing multiple

indicators within the same component did not change
A core battery of neuropsychological tests was adminis-
the outcome (1). Missing scores were counted as Pass
tered to the majority of the sample (Table 1). However,
(0). The heterogeneity in stimulus properties, testing
the exact test list varied based on the unique assessment
paradigm, sensory modality, and number of indicators
needs of individual patients. The main criterion PVT
contributing to the final outcome (i.e., valid vs. invalid)
was a composite of eleven independent validity indica-
in each of the constituent PVTs likely results in a non-
tors labeled Validity Index Eleven (VI-11). The
linear combination of the cumulative evidence on the
VI-11 reflects the traditional approach of counting the
credibility of the overall neurocognitive profile. How-
number of PVT failures along dichotomized (Pass/Fail)
ever, such method variance is a ubiquitous feature in
cutoffs (Boone et al., 2002; M. S. Kim, Boone, Victor,
performance validity research, and is generally con-
Marion et al., 2010; Nelson et al., 2003), a well-
sidered more of a strength than a weakness (Boone,
established practice that represents the conceptual
2007; Iverson & Binder, 2000; Larrabee, 2003, 2014;
foundation of performance validity assessment (Boone,
Lichtenstein, Erdodi, & Linnea, 2017).
2013; Larrabee, 2012).
The total value of the VI-11 was computed by
Some components of the VI-11 had multiple
summing its components. A VI-11 1 was considered
different indicators (Table 2). Failing any of these was
Table 2. Base rates of failure for VI-11 components, cutoffs, and references for each indicator.
Test BRFail Indicator Cutoff Reference
Animals 18.3 T-score 33 Hayward, Hall, Hunt, and Zubrick (1987); Sugarman and Axelrod (2015)
CIM 7.9 Raw score 9 Erdodi and Roth (2016); Erdodi, Tyson, Abeare, et al. (2016)
T-score 29 Erdodi and Roth (2016); Erdodi, Tyson, Abeare, et al. (2016)
CVLT-II 16.8 HitsRecognition 10 Greve, Curtis, Bianchini, and Ord (2009); Wolfe et al. (2010)
FCR 15 Bauer et al. (2007); D. Delis (personal communication, May 2012)
Digit Span 29.2 RDS 7 Greiffenstein et al. (1994); Pearson (2009)
ACSS 6 Axelrod, Fichteberg, Millis, and Wertheimer (2006); Spencer et al. (2013); Trueblood (1994)
LDF 4 Heinly, Greve, Bianchini, Love, and Brennan (2005)
FAS 12.4 T-score 33 Curtis, Thompson, Greve, and Bianchini (2008); Sugarman and Axelrod (2015)
Rey-15 12.4 Recall 9 Lezak (1995); Boone et al. (2002)
RCFT 34.2 Copy raw 26 Lu, Boone, Cozolino, and Mitchell (2003); Reedy et al. (2013)
3-min raw 9.5 Lu et al. (2003); Reedy et al. (2013)
TPRecognition 6 Lu et al. (2003); Reedy et al. (2013)
Atyp RE 1 Blaskewitz, Merten, and Brockhaus (2009); Lu et al. (2003)
Symbol Search 20.8 ACSS 6 Etherton, Bianchini, Heinly, and Greve (2006); Erdodi, Abeare, et al. (2017)
WCST 17.3 FMS 2 Larrabee (2003); Suhr and Boyer (1999)
LRE >1.9 Greve, Bianchini, Mathias, Houston, and Crouch (2002); Suhr and Boyer (1999)
WMS-IV LM 19.8 I ACSS 3 Bortnik et al. (2010)
II ACSS 4 Bortnik et al. (2010)
Recognition 20 Bortnik et al. (2010); Pearson (2009)
WMS-IV VR 19.3 Recognition 4 Pearson (2009)
Note. BRFail: Base rate of failure (% of the sample that failed one or more indicators within the test); CIM: Complex Ideational Material; CVLT-II: California Verbal
Learning Test, 2nd edition; FCR: Forced choice recognition; RDS: Reliable digit span; ACSS: Age-corrected scaled score; LDF: longest digit span forward; RCFT:
Rey Complex Figure Test; TPRecognition: Recognition true positives; Atyp RE: Atypical recognition errors; WCST: Wisconsin Card Sorting Test; FMS: Failure to
maintain set; UE: Unique errors; LRE: Logistical regression equation; WMS-IV: Wechsler Memory Scale, 4th edition; LM: Logical Memory; VR: Visual
Reproduction.
a Pass. Given that the most liberal cutoff available was PVT alter the classification accuracy of the instrument
applied to a relatively high number of constituent under investigation (Erdodi, Tyson, Shahein, et al.,
PVTs, the model is optimized for sensitivity by design. 2017), arguing for methodological pluralism in calibrat-
Therefore, to protect against false positive errors, a ing new tests (Erdodi, Abeare, et al., 2017).
higher threshold (3) was used to define Fail on the In addition, the EI-5s have the advantage of
VI-11. A score of two was considered inconclusive capturing the underlying continuity in performance
and hence, excluded from further analyses involving validity by differentiating between near passes (Bigler,
the VI-11 to preserve the diagnostic purity of the cri- 2012, 2015) and extreme forms of failure (Table 3).
terion groups (Erdodi & Roth, 2016; Greve & Bianchini, Two-thirds of the sample obtained values 1 on both
2004; Lichtenstein, Erdodi, Rai, Mazur-Mosiewicz, & versions of the EI-5, placing them in the passing range.
Flaro, 2016; Sugarman & Axelrod, 2015). Around 20% of the sample obtained EI-5 values of two
As an aggregate measure of several PVTs represent- or three, indicating either a single failure at the most
ing a wide range of sensory modalities and testing conservative cutoff or multiple failures at more liberal
paradigms, the VI-11 is a representative measure of per- cutoffs. Regardless of the specific combination, this
formance validity that incorporates information from range of performance starts to raise doubts about the
multiple independent instruments. At the same time, credibility of the profile, without providing evidence
it is a heterogeneous composite that may introduce a that is strong enough to render the entire data set
source of error into the signal detection analyses. To invalid. Therefore, this range was labeled as Borderline,
address that, two new composite measures were and excluded from calculating classification accuracy.
developed, labeled Erdodi Index. The first one was An EI-5 value 4, however, indicates either multiple
constructed by aggregating five forced-choice recog- failures at the most liberal cutoffs, or at least two failures
nition based PVTs (EI-5REC), and the second one by at more conservative cutoffs. As such, this range of
aggregating five processing speed based PVTs performance provides sufficient evidence to confidently
(EI-5PSP), following the methodology described by classify the profile as invalid.
Erdodi and colleagues (Erdodi, Pelletier, & Roth, 2016;
Erdodi, Roth, et al., 2014).
Procedure
The two versions of the EI-5 were designed to mirror
the dual nature of the RMT and WCT: the overall rec- All tests were administered and scored by trained staff
ognition accuracy score (EI-5REC) and the time taken to psychometricians, pre-doctoral interns or post-doctoral
complete the recognition trial (EI-5PSP). As such, they fellows under the supervision of licensed psychologists
serve as modality-specific criterion measures, providing with specialty training in neuropsychology, following
a more nuanced analysis of the RMTs and WCTs standard instructions. The RMT and WCT were
classification accuracy. Previous research found that administered in counterbalanced order, either at the
the inherent signal detection properties of the reference beginning or the end of the test battery. The study
Table 3. Components of the EI-5s with different levels of cutoff scores and corresponding base rates of failure.
EI-5REC EI-5PSP
Components 0 1 2 3 Components 0 1 2 3
FCRCVLT-II 16 15 14 13 Animals T >33 2533 2124 20
BR 85.6 4.6 3.1 6.7 BR 81.7 9.9 4.5 4.0
LMWMS-IV Recognition >20 1820 17 16 CPT-II #Fail 0 1 2 3
BR 84.7 8.9 2.5 4.0 BR 71.8 12.4 4.5 11.4
RCFT REC-TP >6 6 4 3 FAS T >33 3233 2831 27
BR 86.7 6.1 4.4 2.8 BR 86.1 5.9 4.0 4.0
VPAWMS-IV Recognition >35 3235 2829 27 FTT # Fail 0 1 2
BR 85.1 8.4 4.5 2.0 BR 92.7 5.8 1.5
VRWMS-IV Recognition >4 4 3 2 WAIS-IV CD >5 5 4 3
BR 83.2 7.4 4.0 5.4 BR 85.8 4.1 6.1 4.1
Note. EI-5REC: Erdodi Index Five-variable model based on measures of recognition memory; EI-5PSP: Erdodi Index Five-variable model based on measures of
processing speed; LM: Logical Memory (Bortnik et al., 2010; Pearson, 2009); VPA: Verbal Paired Associates (Pearson, 2009); VR Recog: Visual Reproduction
(Pearson, 2009); FCRCVLT-II: California Verbal Learning Test, 2nd Edition, Forced Choice Recognition (Bauer, Yantz, Ryan, Warned, & McCaffrey, 2005;
D. Delis, personal communication, May 2012; Erdodi, Kirsch, et al., 2014; Erdodi, Roth, et al., 2014); RCFT REC-TP: Rey Complex Figure Test recognition
true positives (Lu et al., 2003; Reedy et al., 2013); FTT Failures: Finger tapping test, number of scores at 35/28 dominant hand and 66/58 combined
mean raw scores (Arnold et al., 2005; Axelrod, Meyers, & Davis, 2014); FAS: Letter fluency T-score (Curtis et al., 2008; Sugarman & Axelrod, 2015);
Animals: Category fluency T-score (Sugarman & Axelrod, 2015); CPT-II Failures: Conners Continuous Performance Test, 2nd edition; number of T-scores
>70 on Omissions, Hit Reaction Time Standard Error, Variability, and Perseverations (Erdodi, Roth, et al., 2014; Lange et al., 2013; Ord, Boettcher, Greve,
& Bianchini, 2010); WAIS-IV CD: Coding age-corrected scaled score (Etherton et al., 2006; N. Kim, Boone, Victor, Lu, et al., 2010; Trueblood, 1994); BR:
Base rate (%).
was approved by the ethics board of the hospital where dependent variables were statistically significant. Effect
the data were collected and the university where the sizes ranged from .07 (medium) to .41 (very large).
project was finalized. Relevant APA ethical guidelines All post hoc contrasts were significant, excepting the
regulating research with human participants were RMT completion time and EI-5PSP. On these two out-
followed throughout the study. come measures, the Borderline vs. Fail contrast did
not reach significance (Table 5).
Finally, although all ANOVAs using the trichoto-
Data analysis mized EI-5PSP (Pass-Borderline-Fail) as the independent
variable, and the RMT, WCT, VI-11 and EI-5REC as
Descriptive statistics (mean, standard deviation, base
dependent variables were statistically significant, effect
rates of failure) were reported for relevant variables.
sizes were noticeably smaller (g2 .06.26). As before,
The main inferential analyses were one-way analyses
post hoc contrasts between Borderline vs. Fail conditions
of variance (ANOVAs), independent t-tests and
were non-significant, with the exception of the VI-11 as
Chi-square tests of independence. Effect size estimates
the outcome measure (Table 6). These analyses provide
were computed using partial eta squared (g2) and
empirical support for the three validity composites, as
Cohens d. Sensitivity and specificity were calculated
well as the exclusion of the Borderline scores. Patients
using standard formulas (Grimes & Schultz, 2005).
scoring in this indeterminate range had significantly
more evidence of invalid responding than those in the
Pass condition. At the same time, they did not demon-
Results
strate PVT failures severe enough to be classified as
Validating the criterion measures invalid beyond a reasonable doubt. The spiking
within-group variability in the Borderline group further
All ANOVAs using the trichotomized VI-11 (Pass-
substantiates concerns that assigning these patients to
Borderline-Fail) as the independent variable and the
either the Pass or the Fail group would inadvertently
RMT, WCT, EI-5REC and EI-5PSP as dependent variables
misclassify a large proportion of this subsample.
were statistically significant. Effect size estimates ranged
from large (g2 .16) to very large (g2 .36). Although
all post hoc contrasts between the Pass and Fail con-
Classification accuracy of traditional cutoffs
ditions were significant, the Pass vs. Borderline contrast
failed to reach significance on the RMT and WCT accu- At the 42 cutoff proposed by M. S. Kim, Boone,
racy scores. Likewise, the Borderline vs. Fail contrast Victor, Marion, et al., (2010), the RMT had .46
was not significant on the WCT completion time sensitivity at .88 specificity against the VI-11. This is
(Table 4). comparable to the .47 sensitivity and .85 specificity
Similarly, all ANOVAs using the trichotomized against the EI-5PSP. Classification accuracy improved
EI-5REC (Pass-Borderline-Fail) as the independent notably against the EI-5REC (.88 sensitivity at .91
variable and the RMT, WCT, VI-11 and EI-5PSP as specificity).
Table 4. Results of one-way ANOVAs on RMT, WCT, EI-5REC, and EI-5PSP scores across VI-11classification ranges.
VI-11
01 2 3
n 101 n 32 n 69
PASS BOR FAIL F p g2 Significant post hocs
RMTAccuracy M 46.9 44.1 39.9 21.0 <.001 .17 PASS vs. FAIL; BOR vs. FAIL
SD 3.8 9.0 9.1
RMTTime M 132.9 154.9 194.2 17.3 <.001 .15 PASS vs. FAIL; PASS vs. BOR
SD 62.8 56.1 75.7
WCTAccuracy M 49.0 46.5 44.0 18.7 <.001 .16 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.6 7.2 7.3
WCTTime M 110.7 132.6 177.7 18.7 <.001 .16 PASS vs. FAIL; BOR vs. FAIL
SD 50.1 56.5 94.8
EI-5REC M 0.2 1.2 3.0 55.6 <.001 .36 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 0.6 1.2 2.7
EI-5PSP M 0.6 1.7 2.7 22.3 <.001 .18 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.1 1.9 2.9
Note. Post hoc pairwise contrasts were computed using the least significant difference method; VI-11: Validity Index Eleven; BOR: Borderline; g2: Partial eta
squared; RMTAccuracy: Recognition Memory Test Words (Accuracy score); RMTTime: Recognition Memory Test Words (Completion time in seconds);
WCTAccuracy: Word Choice Test (Accuracy score); WCTTime: Word Choice Test (Completion time in seconds); EI-5REC: Erdodi Index Five-variable model
based on measures of recognition memory; EI-5PSP: Erdodi Index Five-variable model based on measures of processing speed.
Table 5. Results of one-way ANOVAS on RMT, WCT, VI-11, and EI-5PSP scores across EI-5REC classification ranges.
EI-5REC
01 23 4
n 138 n 41 n 23
RMTAccuracy M 46.8 42.1 32.4 56.50 <.001 .36 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 4.1 9.5 8.3
RMTTime M 141.0 184.1 209.8 13.80 <.001 .12 PASS vs. BOR; PASS vs. FAIL
SD 67.5 62.5 76.6
WCTAccuracy M 48.9 45.5 37.4 68.40 <.001 .41 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.8 6.8 8.6
WCTTime M 114.4 165.6 225.3 31.20 <.001 .24 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 50.5 67.4 125.3
VI-11 M 1.2 3.6 4.6 73.00 <.001 .42 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.5 1.7 1.6
EI-5PSP M 1.1 2.4 2.1 7.32 <.001 .07 PASS vs. BOR; PASS vs. FAIL
SD 1.7 3.0 2.3
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-5REC: Erdodi Index Five-variable model based on measures of
recognition memory; BOR: Borderline; g2: Partial eta square; RMTAccuracy: Recognition Memory Test Words (Accuracy score); RMTTime: Recognition Memory
Test Words (Completion time in seconds); WCTAccuracy: Word Choice Test (Accuracy score); WCTTime: Word Choice Test (Completion time in seconds); VI-11:
Validity Index Eleven; EI-5PSP: Erdodi Index Five-variable model based on measures of processing speed.
Previous research suggests that a WCT cutoff of 45 proportion of correct responses in the valid group was
corresponds to an RMT 42 (Davis, 2014; Erdodi, at least 15% higher compared to the invalid group;
Kirsch, et al., 2014). This cutoff had .41 sensitivity at and (3) The item met the first two criteria against all
.95 specificity against the VI-11, which is similar to three criterion PVTs. This last restriction was intro-
the .33 sensitivity and .86 specificity observed against duced to minimize the effect of instrumentation
the EI-5PSP. Sensitivity improved against the EI-5REC artifacts and therefore, improve the generalizability of
(.74), while specificity remained essentially the same the findings, ensuring that the critical items will per-
(.94). form well against a variety of different criterion PVTs.
Identifying a pool of critical items Establishing groups of critical items

The failure rate on each RMT and WCT item was The seven best items meeting all three criteria were
compared between those who passed and those who selected for further analyses (critical seven or CR-7)
failed the criterion PVTs. The items that were retained within both tests. Next, a smaller group of five critical
met the following inclusion criteria: (1) The proportion items (CR-5) was created by dropping the two CR-7
of correct responses was significantly higher in the items with the least discriminant power. Finally, the
valid group compared to the invalid group; (2) The number of critical items was further reduced to three
Table 6. Results of one-way ANOVAs on RMT, WCT, VI-11, and EI-5REC scores across EI-5PSP classification ranges.
EI-5PSP
01 23 4
n 133 n 48 n 21
RMTAccuracy M 45.8 41.5 39.8 10.10 <.001 .09 PASS vs. BOR; PASS vs. FAIL
SD 6.2 8.5 10.2
RMTTime M 135.5 193.5 209.2 21.10 <.001 .18 PASS vs. BOR; PASS vs. FAIL
SD 57.9 80.1 73.3
WCTAccuracy M 47.9 45.4 44.4 6.02 <.01 .06 PASS vs. BOR; PASS vs. FAIL
SD 3.9 7.3 9.2
WCTTime M 122.5 166.0 160.7 7.35 <.01 .07 PASS vs. BOR; PASS vs. FAIL
SD 70.3 86.9 54.6
VI-11 M 1.4 2.8 4.5 34.20 <.001 .26 PASS vs. BOR; PASS vs. FAIL; BOR vs. FAIL
SD 1.6 1.9 2.0
EI-5REC M 1.0 1.8 2.1 3.96 <.05 .04 PASS vs. BOR; PASS vs. FAIL
SD 2.0 2.4 2.3
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-5PSP: Erdodi Index Five-variable model based on measures of
processing speed; BOR: Borderline; g2: Partial eta square; RMTAccuracy: Recognition Memory Test Words (Accuracy score); RMTTime: Recognition Memory
Test Words (Completion time in seconds); WCTAccuracy: Word Choice Test (Accuracy score); WCTTime: Word Choice Test (Completion time in seconds);
VI-11: Validity Index Eleven; EI-5REC: Erdodi Index Five-variable model based on measures of recognition memory.
(CR-3), retaining only those with the highest discrimi- specificity (.971.00), but low sensitivity (.10.19).
nant power. The value of each subset of critical items Further details are displayed in Table 7.
reflects the number of incorrect responses (i.e., higher
values indicate stronger evidence of invalid perfor- Signal detection performance of critical items in
mance). Having these three combinations of critical the WCT
items increases the chances of identifying non-credible
responding as it provides alternative detection strate- A CR-7WCT cutoff 1 produced an acceptable combi-
gies. The specific combination of critical items is not nation of sensitivity (.58) and specificity (.84) against
disclosed within this manuscript to protect test security the VI-11, but failed to reach the lower threshold for
and to guard the newly developed diagnostic tool from specificity against the EI-5s. Increasing the cutoff to
unauthorized use. However, the information will be 2 resulted in notable improvement in specificity
provided to qualified clinicians. Interested readers (.91.99), with relatively well-preserved, although
should contact the first author. fluctuating sensitivity (.34.70). Further increasing the
cutoff to 3 achieved minimal gains in specificity
Signal detection performance of critical items in (.92.99), but sacrificed some of the sensitivity (.19.65).
the RMT A CR-5WCT cutoff 1 produced good combinations
of sensitivity (.49.87) and specificity (.84.95) against
A CR-7RMT cutoff 3 achieved good specificity (.91.96) all criterion PVTs. Increasing the cutoff to 2 produced
and variable sensitivity (.38.70). Increasing the cutoff the predictable trade-off between rising specificity
to 4 produced the predictable trade-off between (.951.00) and declining sensitivity (.27.65). Raising
improved specificity (.971.00) and declining sensitivity the cutoff to 3 resulted in consistently high speci-
(.18.35). Increasing the cutoff to 5 reached the point ficity (.951.00), but low and fluctuating sensitivity
of diminishing return with minimal gains in specificity (.14.48).
(.981.00), but further deterioration in sensitivity Similarly, a CR-3WCT cutoff 1 produced good
(.09.24). combinations of sensitivity (.40.74) and specificity
A CR-5RMT cutoff 2 produced an acceptable combi- (.88.95) against all criterion PVTs. Increasing the cut-
nation of sensitivity (.54) and specificity (.86) against the off to 2 sacrificed half of the sensitivity (.22.57) for
VI-11, but failed to reach the lower threshold for speci- small gains in specificity (.951.00). As with the RMT,
ficity against the EI-5s. Increasing the cutoff to 3 failing all three of the CR-3WCT items was associated
resulted in a marked increase in specificity (.95.99), with near-perfect specificity (.981.00), but low sensi-
but a proportional loss in sensitivity (.27.39). A further tivity (.05.13). Further details are displayed in Table 8.
increase to 4 sacrificed much of the sensitivity (.10.22)
for negligible gains in specificity (.971.00).
Unique contribution of critical items to the
A CR-3RMT cutoff 1 failed to reach minimum speci-
classification accuracy of the RMT and WCT
ficity against any of the criterion PVTs. Therefore, it
received no further consideration. However, increasing To objectively evaluate the unique contribution of
the cutoff to 2 produced good combinations of sensi- critical items above and beyond traditional cutoffs, we
tivity (.33.43) and specificity (.91.95). Failing all three examined the profiles that failed one cutoff, but passed
of the CR-3RMT items was associated with near-perfect the other in relation to VI-11 scores. Between 1.8% and
Table 7. Sensitivity and specificity of three combinations of critical items within the RMT across different cutoffs.
VI-11 EI-5REC EI-5PSP
n 170 n 161 n 154
34.2% 14.3% 13.6%
Cutoff BRFail (%) SENS SPEC SENS SPEC SENS SPEC
CR-7RMT 3 17.6 .39 .96 .70 .92 .38 .91
4 7.0 .18 1.00 .35 .97 .29 .97
5 4.0 .09 1.00 .13 .98 .24 .98
CR-5RMT 2 28.6 .54 .86 .78 .83 .52 .80
3 11.1 .27 .99 .39 .95 .38 .95
4 3.5 .10 1.00 .22 .99 .14 .97
CR-3RMT 1 40.2 .67 .72 .78 .69 .76 .69
2 15.6 .33 .95 .43 .91 .33 .92
3 4.5 .10 1.00 .17 .99 .19 .97
Note. VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail 3); EI-5REC: Recognition memory based validity composite
(Pass 1, Fail 4); EI-5PSP: Processing speed based validity composite (Pass 1, Fail 4); SENS: Sensitivity; SPEC: Specificity; BRFail: Base rate of failure (% of
sample that scored below the given cutoff); RMT: Recognition Memory Test Words; CR: Critical items.
Table 8. Sensitivity and specificity of three combinations of critical items within the WCT across different cutoffs.
VI-11 EI-5REC EI-5PSP
n 170 n 161 n 154
34.2% 14.3% 13.6%
Cutoff BRFail (%) SENS SPEC SENS SPEC SENS SPEC
CR-7WCT 1 31.8 .58 .84 .91 .83 .52 .74
2 14.6 .34 .99 .70 .97 .38 .91
3 10.6 .25 .99 .65 .98 .19 .92
CR-5WCT 1 22.2 .49 .95 .87 .93 .52 .84
2 10.6 .27 1.00 .65 .99 .29 .95
3 6.6 .16 1.00 .48 1.00 .14 .95
CR-3WCT 1 19.2 .40 .95 .74 .93 .43 .88
2 9.1 .22 1.00 .57 .99 .24 .95
3 2.0 .05 1.00 .13 1.00 .05 .98
Note. VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail 3); EI-5REC: Recognition memory based validity composite
(Pass 1, Fail 4); EI-5PSP: Processing speed based validity composite (Pass 1, Fail 4); SENS: Sensitivity; SPEC: Specificity; BRFail: Base rate of failure (% of
sample that scored below the given cutoff); WCT: Word Choice Test; CR: Critical items.
5.0% of patients who passed the traditional cutoff on the One of these could be considered a false positive
RMT (>42) failed select CRRMT cutoffs (Table 9). based on the combination of clinical history and
Within the WCT, between 5.6% and 17.5% of those neurocognitive profile: a retired physician in his 70s
who passed the traditional cutoff (>45) failed select diagnosed with amnestic Mild Cognitive Impairment.
CRWCT cutoffs (Table 10). These results suggest that He scored in the Borderline range on the VI-11 and
critical items increase the sensitivity of both tests, while EI-5REC, had a WCT score of 47, and 11 unique errors
maintaining high specificity. on the WCST. His FSIQ was 118, with a Coding ACSS
Failing critical items increases the confidence in the of 15. His performance on the acquisition trials of the
decision to classify the response set as invalid even in CVLT-II was high average (43/80), but his long-delay
examinees who already failed the traditional cutoff. free recall was borderline (3/16). He obtained a perfect
Among the subset of patients who failed the traditional score on the Rey-15 and the first trial of the Test of
cutoff on the RMT (42), those who also failed CRRMT Memory Malingering. Thus, his performance is broadly
had higher VI-11 scores, providing stronger evidence of consistent with his diagnosis.
non-credible presentation. As with the RMT, higher The fifth patient was a 19-year-old woman who
VI-11 scores were observed among patients who failed scored in the Borderline range on the VI-11 EI-5REC
the CRWCT cutoffs even within the subsample that failed and EI-5PSP. She failed the WCT (40) and the Test of
the traditional WCT cutoff (45). Memory Malingering (353636), produced eight
Conversely, patients who failed the traditional cutoff, unique errors on the WCST, a Vocabulary Digit Span
but provided correct answers on the critical items were ACSS of 6, and a CVLT-II logistic regression equation
examined separately. Everybody who scored 42 on the score (.72) in the failing range (Wolfe et al., 2010). In
RMT failed at least one critical item. Eight of them addition, she produced several combinations of scores
produced CR-7RMT 1. Three of these patients were that are internally inconsistent. Given the identifiable
clear false positive errors based on their PVT profile: external incentive to appear impaired (on athletic
they passed all three reference PVTs (VI-11, EI-5REC scholarship, struggling in her classes, seeking an ADHD
and EI-5PSP), the most liberal cutoff on the WCT diagnosis, stimulant medication, and academic accom-
(47) and other free-standing PVTs. The remaining modations), she met criteria for Malingered
five patients will be discussed below.
Table 9. VI-11 scores as a function of passing or failing the traditional RMT cutoff or select cutoffs on the critical items.
RMT Pass (>42) RMT Fail (42)
VI-11 VI-11
n M SD p d n M SD p d
CR-7RMT <3 144 1.59 1.71 <.05 .78 20 2.50 2.24 <.05 .72
3 6 3.33 2.66 29 3.97 1.80
CR-5RMT <3 147 1.62 1.73 <.05 .79 30 2.97 2.30 <.05 .52
3 3 3.67 3.22 19 4.00 1.60
CR-3RMT <2 142 1.61 1.76 .06 .55 26 3.00 2.28 .10 .24
2 8 2.63 1.92 23 3.78 1.83
Note. RMT: Recognition Memory Test (Words); VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail 3); CR: Critical
items.
Table 10. VI-11 scores as a function of passing or failing the traditional WCT cutoff or select cutoffs on the critical items.
WCT Pass (>45) WCT Fail (45)
VI-11 VI-11
n M SD p d n M SD p d
CR-7WCT <2 132 1.54 1.71 <.05 .48 3 2.00 1.00 .06 1.18
2 28 2.43 2.01 35 3.89 2.03
CR-5WCT 0 148 1.53 1.68 <.05 1.14 6 2.33 1.21 <.05 .99
1 12 3.67 2.06 32 4.00 2.05
CR-3WCT 0 151 1.59 1.71 <.05 .91 9 3.00 1.80 .11 .38
1 9 3.44 2.30 29 3.97 2.06
Note. WCT: Word Choice Test; VI-11: Validity composite based on eleven independent embedded indicators (Pass 1, Fail 3); CR: Critical items.
Neurocognitive Dysfunction introduced by Slick, as the Test of Memory Malingering (485050) and
Sherman, and Iverson (1999). produced a profile consistent with his diagnosis.
The sixth patient was a 47-year-old woman with a The third patient was a 39-year-old woman
history of childhood abuse who scored in the Borderline diagnosed with Personality Disorder NOS with
range on the EI-5REC, failed the VI-11 (3), the WCT Borderline Features and fibromyalgia. She passed the
(40), and the Dot Counting Test (28.3) in addition to EI-5REC, scored in the Borderline range on the
the validity cutoffs in Coding (ACSS 5) and Symbol EI-5PSP, but failed the VI-11, the RMT (30), Test of
Search (ACSS 4). Her CVLT-II profile was internally Memory Malingering (334239) and Rey-15 (7). As
inconsistent; with a low average acquisition score of such, her neurocognitive profile can be confidently
42/80 and a Forced Choice Recognition score of 11/16 classified as invalid. Nevertheless, she demonstrated
(invalid beyond reasonable doubt). Overall, her profile average to high average performance on tests of
can be considered invalid, and hence, a true positive. memory and processing speed.
The last two patients are more difficult to classify.
One of them was a 59-year-old woman with a history
Discussion
of incestuous sexual trauma. She passed the EI-5PSP,
but failed the WCT (43), Test of Memory Malingering This study examined the potential of critical items
(344749), the EI-5REC (8), and the VI-11 (7). Despite within the RMT and WCT to enhance the diagnostic
ample evidence of invalid performance, all her PVT power of the traditional cutoffs based on total scores
failures were limited to memory tests, which was in a large sample of patients clinically referred for neu-
consistent with her self-reported decline of attention ropsychological assessment. Our hypothesis that critical
and memory. In addition, she performed in the items would improve classification accuracy was sup-
high average range on Coding, letter fluency and the ported. Critical items increased the sensitivity for both
Stroop test. tests by correctly identifying 217% of response sets that
The last patient had an identifiable external incentive passed the traditional cutoffs as invalid. They also
to appear impaired in combination with a documented increased specificity by providing additional empirical
history of stroke and brain tumor treated with radiation. evidence that response sets identified as invalid by the
She passed the EI-5REC, EI-5PSP, and VI-11, but failed traditional cutoffs were correctly classified as such. In
the WCT (47) and the logistic regression equation addition, a total score in the failing range combined
(.74) by Wolfe et al. (2010). In addition, she performed with correct responses on the critical items was associa-
in the average to high average range on Coding, Logical ted with an increased risk of a false positive error,
Memory, animal fluency, and Trail Making Test. indicating the need for further analysis of the PVT
On the WCT, three patients failed the traditional profile. Although a simultaneous increase in both
cutoff (45) but provided correct answers on all of sensitivity and specificity seems paradoxical at face
the CR-7WCT items. Two of them were clear false value, this bidirectional improvement in classification
positive errors: a 29-year-old man with 17 years of edu- accuracy follows the inner logic behind critical item
cation referred for a mild TBI and a 57-year-old man analysis.
with 18 years of education referred for Parkinsons dis- The heuristic underlying this method is that the total
ease. The former passed the EI-5REC, EI-5PSP, VI-11, the number of items failed and the specific combination of
RMT (49), as well as the Test of Memory Malingering items failed contribute nonredundant information
(475050) and produced a largely intact neurocogni- about performance validity (Bortnik et al., 2010;
tive profile. The latter scored in the Borderline range Killgore & DellaPietra, 2000). If an examinee passed
on the EI-5REC, and VI-11, passed the RMT (43) as well the cutoff based on total score, but failed the cutoff
based on critical items, the new method increased interpretation of a given score on the RMT or WCT
sensitivity by correctly detecting a non-credible above and beyond the total score by demonstrating that
examinee that was missed by the traditional method. even within the subset of patients who passed the old
Conversely, if an examinee just barely failed the most cutoff based on all 50 items, there was a significant dif-
liberal cutoff based on total score, some would interpret ference between those who passed and those who failed
this performance as a near-pass (Bigler, 2012), and the cutoff based on critical items. For example, those
argue that it actually represents a false positive error who scored >42 on RMT and also passed the CR-
(i.e., the instrument has unacceptably low specificity). 7RMT, had statistically and clinically significantly lower
However, if the examinee in question also failed a VI-11 scores (i.e., more likely to be valid) than those
certain combination of critical items associated with who scored >42 on RMT, but failed the CR-7RMT.
higher specificity, that pattern of performance would Depending on the specific combination of item-level
strengthen the evidence that the overall profile is indeed scores, this approach can promote superior overall
invalid. As such, critical item analysis could effectively classification accuracy. For example, while an RMT total
increase the specificity of a given diagnostic decision. score of 48 is considered a clear Pass (Iverson &
Critical items could also serve as a safeguard against Franzen, 1994; M. S. Kim, Boone, Victor, Marion,
false positive errors. If an examinee failed the traditional et al., 2010), if the two incorrect responses were from
cutoff based on total score, but provided correct CR-3RMT, that score provides strong evidence (.91.95
responses on the critical items, the profile may warrant specificity) of non-credible responding. Even if the total
a more in-depth analysis. Our data suggest that at least score is already below the cutoff, critical items can still
half of such cases were incorrectly classified as invalid enhance the confidence in classifying it as invalid. For
(i.e., they are false positives). In addition, patients with example, a WCT total score of 46 is already considered
complex psychiatric history were overrepresented a Fail, with specificity between .87 (Davis, 2014) and .92
among those who were eventually determined to be true (Erdodi, Kirsch, et al., 2014). However, if at least two of
positives. If this finding replicates in larger samples, the the incorrect responses were from the CR-5WCT,
discrepancy between the total score and critical item specificity increases to .951.00.
cutoff on the RMT and WCT could provide useful in When compared to the RMT, critical items appear
subtyping invalid performance. Since malingering (Slick more useful within the WCT. They expand sensitivity
et al., 1999), a cry for help (Berry et al., 1996), and further (618%) compared to RMT (25%). Also,
psychogenic interference (Erdodi, Tyson, Abeare, among patients who failed the traditional cutoff based
et al., 2016) differ in etiology, distinguishing among on total score, passing or failing the critical items had
them would enhance diagnostic accuracy as well as a stronger relationship with number of PVT failures
improve the clinical management of patients currently within the WCT (d: .381.18) relative to the RMT
lumped together on the failing side of PVTs. (d: .24.72). This finding adds to the accumulating
The three levels of critical items (CR-7, CR-5, and empirical evidence supporting the clinical utility of the
CR-3) employ different detection strategies. Essentially, WCT (Barhon et al., 2015; Davis, 2014; Erdodi, Kirsch,
they trade the size of the item pool for the number of et al., 2014; Miller et al., 2011).
item failures required to deem a response set invalid. This study developed three different sets of critical
For example, within the RMT 3 failures on the CR-7 items within the RMT and WCT that enhance the
or CR-5 have comparable classification accuracy to 2 classification accuracy of both instruments in a clinically
failures on the CR-3. Likewise, within the WCT 2 meaningful way. Critical item analysis requires no
failures on the CR-7 or CR-5 have similar classification additional test material or administration time, yet
accuracy to 1 failures on the CR-3. provides a time- and cost-effective alternative to evalu-
However, critical items in both tests capitalize on the ate performance validity independent of traditional total
inherent differences among test items regarding their score cutoffs. In a sense, they re-examine the data and
ability to differentiate between credible and non- provide a second opinion regarding clinical classi-
credible responding. They rescale the original test by fication of a given RMT or WCT response set.
eliminating inactive items, and only retain the ones with Additionally, they provide objective, data-driven infor-
the highest discriminant power (Embretson, 1996). As mation to address the contested issue of near passes
such, they provide evaluators with an alternative (Bigler, 2012, 2015), with clear clinical and forensic
method of assessing performance validity. Instead of implications.
counting how many of the total items the examinee The findings should be interpreted within the context
failed, they are focusing on which items were failed. of the studys limitations. The sample was geographi-
Therefore, critical item analyses enhance the cally restricted, and diagnostically heterogeneous.
Future research would benefit from replication using Bigler, E. D. (2015). Neuroimaging as a biomarker in
different, more homogenous samples and different symptom validity and performance validity testing. Brain
reference PVTs. Finally, given that the clinical utility Imaging and Behavior, 9(3), 421444. doi:10.1007/s11682-
015-9409-1
of critical item analysis ultimately depends on the Bilker, W. B., Wierzbicki, M. R., Brensinger, C. M., Gur, R. E.,
number of incorrect answers that occur on these & Gur, R. C. (2014). Development of abbreviated eight-
specific items, the model is still vulnerable to chance. item form of the Penn verbal reasoning test. Assessment,
Therefore, the generalizability of the results can only 21, 669678. doi:10.1177/1073191114524270
be determined by independent replications of the Blaskewitz, N., Merten, T., & Brockhaus, R. (2009). Detection
of suboptimal effort with the Rey complex figure test and
present study.
recognition trial. Applied Neuropsychology, 16, 5461.
doi:10.1080/09084280802644227
Acknowledgments Boone, K. B. (2007). Assessment of feigned cognitive impair-
ment. A neuropsychological perspective. New York, NY:
This project received no financial support from outside funding Guilford.
agencies. Relevant ethical guidelines regulating research Boone, K. B. (2009). The need for continuous and
involving human participants were followed throughout the comprehensive sampling of effort/response bias during
project. All data collection, storage and processing was done neuropsychological examination. The Clinical Neuropsy-
in compliance with the Helsinki Declaration. The authors have chologist, 23(4), 729741. doi:10.1080/13854040802427803
no disclosures to make that could be interpreted as conflict of Boone, K. B. (2013). Clinical practice of forensic neuropsychology.
interests. New York, NY: Guilford.
Boone, K. B., Salazar, X., Lu, P., Warner-Chacon, K., &
References Razani, J. (2002). The Rey 15-item recognition trial: A
technique to enhance sensitivity of the Rey 15-item
Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., & memorization test. Journal of Clinical and Experimental
McPhearson, S. (2005). Sensitivity and specificity of finger Neuropsychology, 24(5), 561573.
tapping test scores for the detection of suspect effort. The Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler,
Clinical Neuropsychologist, 19(1), 105120. doi:10.1080/ E., Victor, T. L., & Zeller, M. A. (2010). Examination of
13854040490888567 various WMS-III logical memory scores in the assessment
Axelrod, B. N., Fichteberg, N. L., Millis, S. R., & Wertheimer, of response bias. The Clinical Neuropsychologist, 24(2),
J. C. (2006). Detecting incomplete effort with digit span 344357. doi:10.1080/13854040903307268
from the Wechsler Adult Intelligence Scale Third Edition. Bush, S. S., Heilbronner, R. L., & Ruff, R. M. (2014). Psycho-
The Clinical Neuropsychologist, 10, 513523. doi:10.1080/ logical assessment of symptom and performance validity,
13854040590967117 response bias, and malingering: Official position of the
Axelrod, B. N., Meyers, J. E., & Davis, J. J. (2014). Finger association for scientific advancement in psychological
tapping test performance as a measure of performance injury and law. Psychological Injury and Law, 7(3),
validity. The Clinical Neuropsychologist, 28(5), 876888. 197205. doi:10.1007/s12207-014-9198-7
doi:10.1080/13854046.2014.907583 Chafetz, M. D., Williams, M. A., Ben-Porath, Y. S., Bianchini,
Barhon, L. I., Batchelor, J., Meares, S., Chekaluk, E., & Shores, K. J., Boone, K. B., Kirkwood, M. W., Ord, J. S. (2015).
E. A. (2015). A comparison of the degree of effort involved Official position of the American academy of clinical
in the TOMM and the ACS word choice test using a dual- neuropsychology social security administration policy on
task paradigm. Applied Neuropsychology: Adult, 22(2), validity testing: Guidance and recommendations for
114123. change. The Clinical Neuropsychologist, 29(6), 723740.
Bauer, L., OBryant, S. E., Lynch, J. K., McCaffrey, R. J., & doi:10.1080/13854046.2015.1099738
Fisher, J. M. (2007). Examining the test of memory malin- Curtis, K. L., Thompson, L. K., Greve, K. W., & Bianchini,
gering trial 1 and word memory test immediate recognition K. J. (2008). Verbal fluency indicators of malingering in
as screening tools for insufficient effort. Assessment, 14(3), traumatic brain injury: Classification accuracy in known
215222. doi:10.1177/1073191106297617 groups. The Clinical Neuropsychologist, 22, 930945.
Bauer, L., Yantz, C. L., Ryan, L. M., Warned, D. L., & doi:10.1080/13854040701563591
McCaffrey, R. J. (2005). An examination of the California Davis, J. J. (2014). Further consideration of advanced clinical
verbal learning test II to detect incomplete effort in a trau- solutions word choice: Comparison to the recognition
matic brain injury sample. Applied Neuropsychology, 12(4), memory test Words and classification accuracy on a
202207. doi:10.1207/s15324826an1204_3 clinical sample. The Clinical Neuropsychologist, 28(8),
Berry, D. T. R., Adams, J. J., Clark, C. D., Thacker, S. R., 12781294. doi:10.1080/13854046.2014.975844
Burger, T. L., Wetter, M. W., Baer, R. A., & Borden, Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A. (2000).
J. W. (1996). Detection of a cry for help on the MMPI-2: California Verbal Learning Test Second Edition, Adult
An analog investigation. Journal of Personality Assessment, Version, manual. San Antonio, TX: Psychological
67(1), 2636. Corporation.
Bigler, E. D. (2012). Symptom validity testing, effort and Denning, J. H. (2012). The efficiency and accuracy of the test
neuropsychological assessment. Journal of the International of memory malingering trial 1, errors on the first 10 items
Neuropsychological Society, 18, 632642. doi:10.1017/ of the test of memory malingering, and five embedded
s1355617712000252 measures in predicting invalid test performance. Archives
of Clinical Neuropsychology, 27(4), 417432. doi:10.1093/ A methodological commentary with recommendation.

arclin/acs044 Archives of Clinical Neuropsychology, 19, 533541.
Embretson, S. E. (1996). The new rules of measurement. Greve, K. W., Bianchini, K. J., Mathias, C. W., Houston, R. J.,
Psychological Assessment, 8(4), 341. & Crouch, J. A. (2002). Detecting malingered neurocogni-
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T., tive dysfunction with the wisconsin card sorting test:
Kucharski, B., Zuccato, B. G., & Roth, R. M. (2017). A preliminary investigation in traumatic brain injury. The
WAIS-IV processing speed scores as measures of non- Clinical Neuropsychologist, 16(2), 179191.
credible respondingThe third generation of embedded Greve, K. W., Curtis, K. L., Bianchini, K. J., & Ord, J. S.
performance validity indicators. Psychological Assessment, (2009). Are the original and second edition of the
29(2), 148157. doi:10.1037/pas0000319 California verbal learning test equally accurate in detecting
Erdodi, L. A., Jongsma, K. A., & Issa, M. (2017). The 15-item malingering? Assessment, 16(3), 237248.
version of the Boston naming test as an index of English Grimes, D. A., & Schulz, K. F. (2005). Refining clinical
proficiency. The Clinical Neuropsychologist, 31(1), 168178. diagnosis with likelihood ratios. The Lancet, 365(9469),
Erdodi, L. A., Kirsch, N. L., Lajiness-ONeill, R., Vingilis, E., & 15001505. doi:10.1016/s0140-6736(05)66422-7
Medoff, B. (2014). Comparing the recognition memory test Hayward, L., Hall, W., Hunt, M., & Zubrick, S. R. (1987). Can
and the word choice test in a mixed clinical sample: Are localized brain impairment be simulated on neuropsycho-
they equivalent? Psychological Injury and Law, 7(3), logical test profiles? Australian and New Zealand Journal
255263. doi:10.1007/s12207-014-9197-8 of Psychiatry, 21, 8793. doi:10.3109/00048678709160904
Erdodi, L. A., Pelletier, C. L., & Roth, R. M. (2016). Elevations Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004).
on select Conners CPT-II scales indicate noncredible Revised comprehensive norms for an expanded Halstead-
responding in adults with traumatic brain injury. Applied Reitan battery: Demographically adjusted neuropsychologi-
Neuropsychology: Adult, 110. doi:10.1080/23279095.2016. cal norms for African American and Caucasian adults. Lutz,
1232262 [Advance online publication] FL: Psychological Assessment Resources.
Erdodi, L. A., & Roth, R. M. (2016). Low scores on BDAE Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., &
complex ideational material are associated with invalid Brennan, A. (2005). WAIS digit-span-based indicators of
performance in adults without aphasia. Applied Neuropsy- malingered neurocognitive dysfunction: Classification
chology: Adult, 111. doi:10.1080/23279095.2016.1154856 accuracy in traumatic brain injury. Assessment, 12(4),
[Advance online publication] 429444.
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-ONeill, R., Hilsabeck, R. C., Gordon, S. N., Hietpas-Wilson, T., &
& Medoff, B. (2014). Aggregating validity indicators Zartman, A. L. (2011). Use of trial 1 of the test of memory
embedded in Conners CPT-II outperforms individual cut- malingering (TOMM) as a screening measure of effort: Sug-
offs at separating valid from invalid performance in adults gested discontinuation rules. The Clinical Neuropsychologist,
with traumatic brain injury. Archives of Clinical Neuropsy- 25(7), 12281238. doi:10.1080/13854046.2011.589409
chology, 29(5), 456466. doi:10.1093/arclin/acu026 Horner, M. D., Bedwell, J. S., & Duong, A. (2006). Abbrevi-
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D., ated form of the test of memory malingering. International
Pelletier, C. L., Rai, J. K., & Roth, R. M. (2016). The BDAE Journal of Neuroscience, 116, 11811186. doi:10.1080/
complex ideational material A measure of receptive 00207450500514029
language or performance validity? Psychological Injury Iverson, G. I., & Binder, L. M. (2000). Detecting exaggeration
and Law, 9, 112120. doi:10.1007/s12207-016-9254-6 and malingering in neuropsychological assessment. Journal
Erdodi, L. A., Tyson, B. T., Shahein, A., Lichtenstein, J. D., of Head Trauma Rehabilitation, 15(2), 829858.
Abeare, C. A., Pelletiere, C. L., Roth, R. M. (2017). The doi:10.1097/00001199-200004000-00006
power of timing: Adding a time-to-completion cutoff to Iverson, G. L., & Franzen, M. D. (1994). The recognition
the Word Choice Test and Recognition Memory Test memory test, digit span, and Knox cube test as markers
improves classification accuracy. Journal of Clinical and of malingered memory impairment. Assessment, 1(4),
Experimental Neuropsychology, 39(4), 369383. doi:10.1080/ 323334.
13803395.2016.1230181 [Advance online publication] Jones, A. (2013). Test of memory malingering: Cutoff scores
Etherton, J. L., Bianchini, K. J., Heinly, M. T., & Greve, K. W. for psychometrically defined malingering groups in a
(2006). Pain, malingering, and performance on the military sample. The Clinical Neuropsychologist, 27(6),
WAIS-III processing speed index. Journal of Clinical and 10431059. doi:10.1080/13854046.2013.804949
Experimental Neuropsychology, 28, 12181237. doi:10.1080/ Killgore, W. D., & DellaPietra, L. (2000). Using the WMS-III
13803390500346595 to detect malingering: Empirical validation of the rarely
Fazio, R. L., Denning, J. H., & Denney, R. L. (2017). TOMM missed index (RMI). Journal of Clinical and Experimental
Trial 1 as a performance validity indicator in a criminal Neuropsychology, 22(6), 761771. doi:10.1076/jcen.22.6.
forensic sample. The Clinical Neuropsychologist, 31(1): 761.960
251267. Kim, M. S., Boone, K. B., Victor, T., Marion, S. D., Amano, S.,
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Cottingham, M. E., Zeller, M. A. (2010). The Warrington
Validation of malingered amnesia measures with a large recognition memory test for words as a measure of
clinical sample. Psychological Assessment, 6, 218224. response bias: Total score and response time cutoffs
doi:10.1037//1040-3590.6.3.218 developed on real world credible and noncredible sub-
Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut- jects. Archives of Clinical Neuropsychology, 25, 6070.
offs on psychometric indicators of negative response bias: doi:10.1093/arclin/acp088
Kim, N., Boone, K. B., Victor, T., Lu, P., Keatinge, C., & OBryant, S. E., Engel, L. R., Kleiner, J. S., Vasterling, J. J., &
Mitchell, C. (2010). Sensitivity and specificity of a Black, F. W. (2007). Test of memory malingering (TOMM)
digit symbol recognition trial in the identification of trial 1 as a screening measure for insufficient effort. The
response bias. Archives of Clinical Neuropsychology, 25, Clinical Neuropsychologist, 21, 511521.
420428. Ord, J. S., Boettcher, A. C., Greve, K. J., & Bianchini, K. J.
Kulas, J. F., Axelrod, B. N., & Rinaldi, A. R. (2014). Cross- (2010). Detection of malingering in mild traumatic brain
validation of supplemental test of memory malingering injury with the Conners Continuous performance test-II.
scores as performance validity measures. Psychological Journal of Clinical and Experimental Neuropsychology,
Injury and Law, 7(3), 236244. doi:10.1007/s12207-014- 32(4), 380387. doi:10.1080/13803390903066881
9200-4 Pearson. (2009). Advanced clinical solutions for the WAIS-IV
Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T., and WMS-IV Technical manual. San Antonio, TX:
Pancholi, S., Bhagwat, A., & French, L. M. (2013). Clinical Author.
utility of the Conners continuous performance test-II to Randolph, C. (1998). Repeatable Battery for the Assessment of
detect poor effort in U.S. military personnel following Neuropsychological Status (RBANS): Manual. San Antonio,
traumatic brain injury. Psychological Assessment, 25(2), TX: Psychological Corporation.
339352. doi:10.1037/a0030915 Reedy, S. D., Boone, K. B., Cottingham, M. E., Glaser, D. F.,
Larrabee, G. J. (2003). Detection of malingering using atypical Lu, P. H., Victor, T. L., Wright, M. J. (2013). Cross
performance patterns on standard neuropsychological validation of the Lu and colleagues (2003) Rey-Osterrieth
tests. The Clinical Neuropsychologist, 17(3), 410425. complex figure test effort equation in a large known-group
doi:10.1076/clin.17.3.410.18089 sample. Archives of Clinical Neuropsychology, 28, 3037.
Larrabee, G. J. (2012). Assessment of malingering. In G. J. doi:10.1093/arclin/acs106
Larrabee (Ed.), Forensic neuropsychology: A scientific Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999).
approach (pp. 117159) (2nd ed.). New York, NY: Oxford Diagnostic criteria for malingered neurocognitive
University Press. dysfunction: Proposed standards for clinical practice and
Larrabee, G. J. (2014). False-positive rates associated with the research. The Clinical Neuropsychologist, 13(4), 545561.
use of multiple performance and symptom validity doi:10.1076/1385-4046(199911)13:04;1-y;ft545
tests. Archives of Clinical Neuropsychology, 29, 364373. Spencer, R. J., Axelrod, B. N., Drag, L. L., Waldron-Perrine,
doi:10.1093/arclin/acu019 B., Pangilinan, P. H., & Bieliauskas, L. A. (2013).
Lezak, M. D. (1995). Neuropsychological assessment. New WAIS-IV reliable digit span is no more accurate than age
York, NY: Oxford University Press. corrected scaled score as an indicator of invalid perfor-
Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2017). mance in a veteran sample undergoing evaluation for
Introducing a forced-choice recognition task to the mTBI. The Clinical Neuropsychologist, 27(8), 13621372.
California Verbal Learning TestChildrens Version. Child doi:10.1080/13854046.2013.845248
Neuropsychology, 23(3): 284299. doi:10.1080/09297049. Sugarman, M. A., & Axelrod, B. N. (2015). Embedded
2015.1135422 measures of performance validity using verbal fluency tests
Lichtenstein, J. D., Erdodi, L. A., Rai, J. K., Mazur-Mosiewicz, in a clinical sample. Applied Neuropsychology: Adult, 22(2),
A., & Flaro, L. (2016). Wisconsin card sorting test embed- 141146.
ded validity indicators developed for adults can be extended Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin card
to children. Child Neuropsychology, 114. doi:10.1080/ sorting test in the detection of malingering in student
09297049.2016.1259402 [Advance online publication] simulator and patient samples. Journal of Clinical and
Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003). Experimental Psychology, 21(5), 701708.
Effectiveness of the Rey-Osterrieth complex figure test Tombaugh, T. N. (1996). Test of Memory Malingering.
and the Meyers and Meyers recognition trial in the detec- New York, NY: Multi-Health Systems.
tion of suspect effort. The Clinical Neuropsychologist, Trueblood, W. (1994). Qualitative and quantitative
17(3), 426440. characteristics of malingered and other invalid WAIS-R
Miller, J. B., Millis, S. R., Rapport, L. J., Bashem, J. R., Hanks, and clinical memory data. Journal of Clinical and Experi-
R. A., & Axelrod, B. N. (2011). Detection of insufficient mental Neuropsychology, 14(4), 697607. doi:10.1080/
effort using the advanced clinical solutions for the 01688639408402671
Wechsler Memory scale. The Clinical Neuropsychologist, Warrington, E. K. (1984). Recognition Memory Test manual.
25(1), 160172. doi:10.1080/13854046.2010.533197 Berkshire, UK: NFERNelson.
Morris, J. C., Heyman, A., Mohs, R. C., Hughes, J. P., van Wisdom, N. M., Brown, W. L., Chen, D. K., & Collins, R. L.
Belle, G., Fillenbaum, G., Clark, C. (1989). The (2012). The use of all three tests of memory malingering
consortium to establish a registry for Alzheimers disease trials in establishing the level of effort. Archives of
(CERAD). Part I. Clinical and neuropsychological Clinical Neuropsychology, 27, 208212. doi:10.1093/arclin/
assessment of Alzheimers disease. Neurology, 39(9), acr107
11591165. Wolfe, P. L., Millis, S. R., Hanks, R., Fichtenberg, N., Larrabee,
Nelson, N. W., Boone, K., Dueck, A., Wagener, L., Lu, P., & G. J., & Sweet, J. J. (2010). Effort indicators within the
Grills, C. (2003). The relationship between eight measures California verbal learning test-II (CVLT-II). The Clinical
of suspect effort. The Clinical Neuropsychologist, 17(2), Neuropsychologist, 24(1), 153168. doi:10.1080/
263272. doi:10.1076/clin.17.2.263.16511 13854040903107791
View publication stats

Utility of Critical Items Within The Recognition Memory Test and Word Choice Test (2017)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Utility of Critical Items Within The Recognition Memory Test and Word Choice Test (2017)

Uploaded by

Copyright:

Available Formats

See

Utility of critical items within the Recognition

Article in Applied Neuropsychology: Adult April 2017

Laszlo A Erdodi Christopher A Abeare

SEE PROFILE SEE PROFILE

Brandon Zuccato Sanya Sagar

SEE PROFILE SEE PROFILE

ISSN: 2327-9095 (Print) 2327-9109 (Online) Journal homepage: http://www.tandfonline.com/loi/hapn21

Utility of critical items within the Recognition

Laszlo A. Erdodi, Bradley T. Tyson, Christopher A. Abeare, Brandon G.

To link to this article: http://dx.doi.org/10.1080/23279095.2017.1298600

Published online: 17 Mar 2017.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Utility of critical items within the Recognition Memory Test and

Introduction is an emerging consensus in the field that an objective

Table 1. List of tests administered.

Materials counted as an overall Fail (1). Failing multiple

Identifying a pool of critical items Establishing groups of critical items

of Clinical Neuropsychology, 27(4), 417432. doi:10.1093/ A methodological commentary with recommendation.

View publication stats

You might also like