Ets 212009

Assessing Critical Thinking in Higher
Education: Current State and Directions

for Next-Generation Assessment
June 2014
Research Report
ETS RR14-10
Ou Lydia Liu
Lois Frankel
Katrina Crotts Roohr
ETS Research Report Series
EIGNOR EXECUTIVE EDITOR
James Carlson
Principal Psychometrician
ASSOCIATE EDITORS
Beata Beigman Klebanov
Research Scientist
Heather Buzick
Research Scientist
Brent Bridgeman
Distinguished Presidential Appointee
Keelan Evanini
Managing Research Scientist
Marna Golub-Smith
Principal Psychometrician
Shelby Haberman
Gary Ockey
Research Scientist
Donald Powers
Managing Principal Research Scientist
Gautam Puhan
Senior Psychometrician
John Sabatini
Managing Principal Research Scientist
Matthias von Davier
Director, Research
Rebecca Zwick
PRODUCTION EDITORS
Kim Fryer
Manager, Editing Services
Ayleen Stellhorn
Editor
Since its 1947 founding, ETS has conducted and disseminated scientifc research to support its products and services, and
to advance the measurement and education felds. In keeping with these goals, ETS is committed to making its research
freely available to the professional community and to the general public. Published accounts of ETS research, including
papers in the ETS Research Report series, undergo a formal peer-review process by ETS staf to ensure that they meet
established scientifc and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that
outside organizations may provide as part of their own publication processes. Peer review notwithstanding, the positions
expressed in the ETS Research Report series and other published accounts of ETS research are those of the authors and
not necessarily those of the Ofcers and Trustees of Educational Testing Service.
Te Daniel Eignor Editorshipis namedinhonor of Dr. Daniel R. Eignor, who from2001 until 2011 servedthe Researchand
Development division as Editor for the ETS Research Report series. Te Eignor Editorship has been created to recognize
the pivotal leadership role that Dr. Eignor played in the research publication process at ETS.
ETS Research Report Series ISSN 2330-8516
RESEARCH REPORT
Assessing Critical Thinking in Higher Education: Current
State and Directions for Next-Generation Assessment
Ou Lydia Liu, Lois Frankel, & Katrina Crotts Roohr
Educational Testing Service, Princeton, NJ
Critical thinking is one of the most important skills deemednecessary for college graduates tobecome efective contributors inthe global
workforce. Te frst part of this article provides a comprehensive review of its defnitions by major frameworks in higher education and
the workforce, existing assessments and their psychometric qualities, and challenges surrounding the design, implementation, and use
of critical thinking assessment. In the second part, we ofer an operational defnition that is aligned with the dimensions of critical
thinking identifed from the reviewed frameworks and discuss the key assessment considerations when designing a next-generation
critical thinking assessment. Tis article has important implications for institutions that are currently using, planning to adopt, or
designing an assessment of critical thinking.
Keywords Critical thinking; student learning outcomes; higher education; next-generation assessment
doi:10.1002/ets2.12009
Critical thinking is one of the most frequently discussed higher order skills, believed to play a central role in logical
thinking, decision making, and problem solving (Butler, 2012; Halpern, 2003). It is also a highly contentious skill in that
researchers debate about its defnition; its amenability to assessment; its degree of generality or specifcity; and the evi-
dence of its practical impact on peoples academic achievements, career advancements, and personal life choices. Despite
contention, critical thinking has received heightened attention from educators and policy makers in higher education
and has been included as one of the core learning outcomes of college students by many institutions. For example, in a
relatively recent survey conducted by the Association of American Colleges and Universities (AAC&U, 2011), 95% of the
chief academic ofcers from 433 institutions rated critical thinking as one of the most important intellectual skills for
their students. Te fnding resonated with voices from the workforce, in that 81% of the employers surveyed by AAC&U
(2011) wanted colleges to place a stronger emphasis on critical thinking. Similarly, Casner-Lotto and Barrington (2006)
found that among 400 surveyed employers, 92.1% identifed critical thinking/problem solving as a very important skill
for 4-year college graduates to be successful in todays workforce. Critical thinking was also considered important for high
school and 2-year college graduates as well.
Te importance of critical thinking is further confrmed in a recent research study conducted by Educational Test-
ing Service (ETS, 2013). In this research, provosts or vice presidents of academic afairs from more than 200 institutions
were interviewed regarding the most commonly measured general education skills, and critical thinking was one of the
most frequently mentioned competencies considered essential for both academic and career success. Te focus on critical
thinking also extends to international institutions and organizations. For instance, the Assessment of Higher Education
Learning Outcomes (AHELO) project sponsored by the Organisation for Economic Co-operation and Development
(OECD, 2012) includes critical thinking as a core competency when evaluating general learning outcomes of college
students across nations.
Despite the widespread attention on critical thinking, no clear-cut defnition has been identifed. Markle, Brenneman,
Jackson, Burrus, and Robbins (2013) reviewed seven frameworks concerning general education competencies deemed
important for higher education and/or workforce: (a) the Assessment and Teaching of 21st Century Skills, (b) Lumina
Foundations Degree Qualifcations Profle, (c) the Employment and Training Administration Industry Competency
Model Clearinghouse, (d) European Higher Education Area Competencies (Bologna Process), (e) Framework for Higher
Education Qualifcations, (f) Framework for Learning and Development Outcomes, and (g) AAC&Us Liberal Education
Corresponding author: O. L. Liu, E-mail: lliu@ets.org
ETS Research Report No. RR-14-10. 2014 Educational Testing Service 1
O. L. Liu et al. Assessing Critical Thinking in Higher Education
and Americas Promise (LEAP; see Table 1). Although the defnitions in various frameworks overlap, they also vary to a
large degree in terms of the core features underlying critical thinking.
In the frst part of this paper, we review existing defnitions and assessments of critical thinking. We then discuss the
challenges and considerations in designing assessments for critical thinking, focusing on itemformat, scoring, validity and
reliability evidence, and relevance to instruction. In the second part of this paper, we propose an approach for developing
a next-generation critical thinking assessment by providing an operational defnition for critical thinking and discussing
key assessment features.
We hope that our review of existing assessments in light of construct representation, item format, and validity evi-
dence will beneft higher education institutions as they choose among available assessments. Critical thinking has gained
widespread attentionas recognitionof the importance of college learning outcomes assessment has increased. As indicated
by a recent survey on the current state of student learning outcomes assessment (Kuh, Jankowski, Ikenberry, & Kinzie,
2014), the percentage of higher education institutions using an external general measure of student learning outcomes
grewfromless than 40%to nearly 50%from2009 to 2013. We also hope that our proposed approach for a next-generation
critical thinking assessment will informinstitutions when they develop their own assessments. We call for close collabora-
tions between institutions and testing organizations in designing a next-generation critical thinking assessment to ensure
that the assessment will have instructional value and meet industry technical standards.
Part I: Current State of Assessments, Research, and Challenges
Denitions of Critical Thinking
One of the most debatable features about critical thinking is what constitutes critical thinkingits defnition. Table 1
shows defnitions of critical thinking drawn from the frameworks reviewed in the Markle et al. (2013) paper. Te dif-
ferent sources of the frameworks (e.g., higher education and workforce) focus on diferent aspects of critical thinking.
Some value the reasoning process specifc to critical thinking, while others emphasize the outcomes of critical thinking,
such as whether it can be used for decision making or problem solving. An interesting phenomenon is that none of the
frameworks referenced in the Markle et al. paper ofers actual assessments of critical thinking based on the groups def-
nition. For example, in the case of the VALUE (Valid Assessment of Learning in Undergraduate Education) initiative as
part of the AAC&Us LEAP campaign, VALUE rubrics were developed with the intent to serve as generic guidelines when
faculty members design their own assessments or grading activities. Tis approach provides great fexibility to faculty and
accommodates local needs. However, it also raises concerns of reliability in terms of howfaculty members use the rubrics.
A recent AAC&U research study found that the percent agreement in scoring was fairly low when multiple raters scored
the same student work using the VALUE rubrics (Finley, 2012). For example, the percentage of perfect agreement of using
four scoring categories across multiple raters was only 36% when the critical thinking rubric was applied.
In addition to the frameworks discussed by Markle et al. (2013), there are other infuential research eforts on critical
thinking. Unlike the frameworks discussed by Market et al., these research eforts have led to commercially available crit-
ical thinking assessments. For example, in a study sponsored by the American Philosophical Association (APA), Facione
(1990b) spearheaded the efort to identify a consensus defnition of critical thinking using the Delphi approach, an expert
consensus approach. For the APA study, 46 members recognized as having experience or expertise in critical thinking
instruction, assessment, or theory, shared reasoned opinions about critical thinking. Te experts were asked to provide
their own list of the skill and dispositional dimensions of critical thinking. Afer rounds of discussion, the experts reached
an agreement on the core cognitive dimensions (i.e., key skills or dispositions) of critical thinking: (a) interpretation, (b)
analysis, (c) evaluation, (d) inference, (e) explanation, and (f) self-regulationmaking it clear that a person does not
have to be profcient at every skill to be considered a critical thinker. Te experts also reached consensus on the afective,
dispositional components of critical thinking, such as inquisitiveness with regard to a wide range of issues, concern
to become and remain generally well-informed, and alertness to opportunities to use CT [critical thinking] (Facione,
1990b, p. 13). Two decades later, the approach AAC&U took to defne critical thinking was heavily infuenced by the APA
defnitions.
Halpern also led a noteworthy research and assessment efort on critical thinking. In her 2003 book, Halpern defned
critical thinking as
2 ETS Research Report No. RR-14-10. 2014 Educational Testing Service
T
a
b
l
e
1
D
e
f
n
i
t
i
o
n
s
o
f
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
F
r
o
m
C
u
r
r
e
n
t
F
r
a
m
e
w
o
r
k
s
o
f
L
e
a
r
n
i
n
g
O
u
t
c
o
m
e
s
F
r
a
m
e
w
o
r
k
A
u
t
h
o
r
C
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
t
e
r
m
C
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
(
o
r
e
q
u
i
v
a
l
e
n
t
)
d
e
f
n
i
t
i
o
n
A
s
s
e
s
s
m
e
n
t
a
n
d
T
e
a
c
h
i
n
g
o
f
2
1
s
t
C
e
n
t
u
r
y
S
k
i
l
l
s
(
A
T
C
2
1
S
)
U
n
i
v
e
r
s
i
t
y
o
f
M
e
l
b
o
u
r
n
e
,
s
p
o
n
s
o
r
e
d
b
y
C
i
s
c
o
,
I
n
t
e
l
,
a
n
d
M
i
c
r
o
s
o
f
W
a
y
s
o
f
t
h
i
n
k
i
n
g
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
,
p
r
o
b
l
e
m
s
o
l
v
i
n
g
,
a
n
d
d
e
c
i
s
i
o
n
m
a
k
i
n
g
T
e
w
a
y
s
o
f
t
h
i
n
k
i
n
g
c
a
n
b
e
c
a
t
e
g
o
r
i
z
e
d
i
n
t
o
k
n
o
w
l
e
d
g
e
,
s
k
i
l
l
s
,
a
n
d
a
t
t
i
t
u
d
e
s
/
v
a
l
u
e
s
/
e
t
h
i
c
s
(
K
S
A
V
E
)
.
K
n
o
w
l
e
d
g
e
i
n
c
l
u
d
e
s
:
(
a
)
r
e
a
s
o
n
e
f
e
c
t
i
v
e
l
y
,
u
s
e
s
y
s
t
e
m
s
t
h
i
n
k
i
n
g
,
a
n
d
e
v
a
l
u
a
t
e
e
v
i
d
e
n
c
e
;
(
b
)
s
o
l
v
e
p
r
o
b
l
e
m
s
;
a
n
d
(
c
)
c
l
e
a
r
l
y
a
r
t
i
c
u
l
a
t
e
.
S
k
i
l
l
s
i
n
c
l
u
d
e
:
(
a
)
r
e
a
s
o
n
e
f
e
c
t
i
v
e
l
y
a
n
d
(
b
)
u
s
e
s
y
s
t
e
m
s
t
h
i
n
k
i
n
g
.
A
t
t
i
t
u
d
e
s
/
v
a
l
u
e
s
/
e
t
h
i
c
s
i
n
c
l
u
d
e
:
(
a
)
m
a
k
e
r
e
a
s
o
n
e
d
j
u
d
g
m
e
n
t
s
a
n
d
d
e
c
i
s
i
o
n
s
,
(
b
)
s
o
l
v
e
p
r
o
b
l
e
m
s
,
a
n
d
(
c
)
a
t
t
i
t
u
d
i
n
a
l
d
i
s
p
o
s
i
t
i
o
n
(
B
i
n
k
l
e
y
e
t
a
l
.
,
2
0
1
2
)
T
e
D
e
g
r
e
e
Q
u
a
l
i
f
c
a
t
i
o
n
s
P
r
o
f
l
e
(
D
Q
P
)
2
.
0
L
u
m
i
n
a
F
o
u
n
d
a
t
i
o
n
A
n
a
l
y
t
i
c
a
l
i
n
q
u
i
r
y
A
s
t
u
d
e
n
t
w
h
o
(
a
)
i
d
e
n
t
i
f
e
s
a
n
d
f
r
a
m
e
s
a
p
r
o
b
l
e
m
o
r
q
u
e
s
t
i
o
n
i
n
s
e
l
e
c
t
e
d
a
r
e
a
s
o
f
s
t
u
d
y
a
n
d
d
i
s
t
i
n
g
u
i
s
h
e
s
a
m
o
n
g
e
l
e
m
e
n
t
s
o
f
i
d
e
a
s
,
c
o
n
c
e
p
t
s
,
t
h
e
o
r
i
e
s
o
r
p
r
a
c
t
i
c
a
l
a
p
p
r
o
a
c
h
e
s
t
o
t
h
e
p
r
o
b
l
e
m
o
r
q
u
e
s
t
i
o
n
(
a
s
s
o
c
i
a
t
e
s
l
e
v
e
l
)
,
(
b
)
d
i
f
e
r
e
n
t
i
a
t
e
s
a
n
d
e
v
a
l
u
a
t
e
s
t
h
e
o
r
i
e
s
a
n
d
a
p
p
r
o
a
c
h
e
s
t
o
s
e
l
e
c
t
e
d
c
o
m
p
l
e
x
p
r
o
b
l
e
m
s
w
i
t
h
i
n
t
h
e
c
h
o
s
e
n
f
e
l
d
o
f
s
t
u
d
y
a
n
d
a
t
l
e
a
s
t
o
n
e
o
t
h
e
r
f
e
l
d
(
b
a
c
h
e
l
o
r
s
l
e
v
e
l
)
,
a
n
d
(
c
)
d
i
s
a
g
g
r
e
g
a
t
e
s
,
r
e
f
o
r
m
u
l
a
t
e
s
a
n
d
a
d
a
p
t
s
p
r
i
n
c
i
p
a
l
i
d
e
a
s
,
t
e
c
h
n
i
q
u
e
s
o
r
m
e
t
h
o
d
s
a
t
t
h
e
f
o
r
e
f
r
o
n
t
o
f
t
h
e
f
e
l
d
o
f
s
t
u
d
y
i
n
c
a
r
r
y
i
n
g
o
u
t
a
n
e
s
s
a
y
o
r
p
r
o
j
e
c
t
(
m
a
s
t
e
r
s
l
e
v
e
l
;
A
d
e
l
m
a
n
,
E
w
e
l
l
,
G
a
s
t
o
n
,
&
S
c
h
n
e
i
d
e
r
,
2
0
1
4
,
p
p
.
1
9
2
0
)
T
e
E
m
p
l
o
y
m
e
n
t
a
n
d
T
r
a
i
n
i
n
g
A
d
m
i
n
i
s
t
r
a
t
i
o
n
I
n
d
u
s
t
r
y
C
o
m
p
e
t
e
n
c
y
M
o
d
e
l
C
l
e
a
r
i
n
g
h
o
u
s
e
U
.
S
.
D
e
p
a
r
t
m
e
n
t
o
f
L
a
b
o
r
(
U
S
D
O
L
)
,
E
m
p
l
o
y
m
e
n
t
a
n
d
T
r
a
i
n
i
n
g
A
d
m
i
n
i
s
t
r
a
t
i
o
n
C
r
i
t
i
c
a
l
a
n
d
a
n
a
l
y
t
i
c
a
l
t
h
i
n
k
i
n
g
A
p
e
r
s
o
n
w
h
o
p
o
s
s
e
s
s
e
s
s
u
f
c
i
e
n
t
i
n
d
u
c
t
i
v
e
a
n
d
d
e
d
u
c
t
i
v
e
r
e
a
s
o
n
i
n
g
a
b
i
l
i
t
y
t
o
p
e
r
f
o
r
m
[
t
h
e
i
r
]
j
o
b
s
u
c
c
e
s
s
f
u
l
l
y
;
c
r
i
t
i
c
a
l
l
y
r
e
v
i
e
w
s
,
a
n
a
l
y
z
e
s
,
s
y
n
t
h
e
s
i
z
e
s
,
c
o
m
p
a
r
e
s
a
n
d
i
n
t
e
r
p
r
e
t
s
i
n
f
o
r
m
a
t
i
o
n
;
d
r
a
w
s
c
o
n
c
l
u
s
i
o
n
s
f
r
o
m
r
e
l
e
v
a
n
t
a
n
d
/
o
r
m
i
s
s
i
n
g
i
n
f
o
r
m
a
t
i
o
n
;
u
n
d
e
r
s
t
a
n
d
s
t
h
e
p
r
i
n
c
i
p
l
e
s
u
n
d
e
r
l
y
i
n
g
t
h
e
r
e
l
a
t
i
o
n
s
h
i
p
a
m
o
n
g
f
a
c
t
s
a
n
d
a
p
p
l
i
e
s
t
h
i
s
u
n
d
e
r
s
t
a
n
d
i
n
g
w
h
e
n
s
o
l
v
i
n
g
p
r
o
b
l
e
m
s
(
i
.
e
.
,
r
e
a
s
o
n
i
n
g
)
a
n
d
i
d
e
n
t
i
f
e
s
c
o
n
n
e
c
t
i
o
n
s
b
e
t
w
e
e
n
i
s
s
u
e
s
;
q
u
i
c
k
l
y
u
n
d
e
r
s
t
a
n
d
s
,
o
r
i
e
n
t
s
t
o
,
a
n
d
l
e
a
r
n
s
n
e
w
a
s
s
i
g
n
m
e
n
t
s
;
s
h
i
f
s
g
e
a
r
s
a
n
d
c
h
a
n
g
e
s
d
i
r
e
c
t
i
o
n
w
h
e
n
w
o
r
k
i
n
g
o
n
m
u
l
t
i
p
l
e
p
r
o
j
e
c
t
s
o
r
i
s
s
u
e
s
(
i
.
e
.
,
m
e
n
t
a
l
a
g
i
l
i
t
y
;
U
S
D
O
L
,
2
0
1
3
)
A
F
r
a
m
e
w
o
r
k
f
o
r
Q
u
a
l
i
f
c
a
t
i
o
n
s
o
f
t
h
e
E
u
r
o
p
e
a
n
H
i
g
h
e
r
E
d
u
c
a
t
i
o
n
A
r
e
a
(
B
o
l
o
g
n
a
P
r
o
c
e
s
s
)
E
u
r
o
p
e
a
n
C
o
m
m
i
s
s
i
o
n
:
E
u
r
o
p
e
a
n
H
i
g
h
e
r
E
d
u
c
a
t
i
o
n
A
r
e
a
N
o
t
s
p
e
c
i
f
e
d
d
e
f
n
e
d
i
n
t
e
r
m
s
o
f
s
k
i
l
l
s
r
e
l
a
t
e
d
t
o
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
r
e
q
u
i
r
e
d
o
f
s
t
u
d
e
n
t
s
c
o
m
p
l
e
t
i
n
g
t
h
e
f
r
s
t
c
y
c
l
e
(
e
.
g
.
,
b
a
c
h
e
l
o
r
s
l
e
v
e
l
)
S
t
u
d
e
n
t
s
c
o
m
p
l
e
t
i
n
g
t
h
e
f
r
s
t
-
c
y
c
l
e
q
u
a
l
i
f
c
a
t
i
o
n
(
e
.
g
.
,
b
a
c
h
e
l
o
r
s
l
e
v
e
l
)
c
a
n
a
p
p
l
y
t
h
e
i
r
k
n
o
w
l
e
d
g
e
a
n
d
u
n
d
e
r
s
t
a
n
d
i
n
g
i
n
a
m
a
n
n
e
r
t
h
a
t
i
n
d
i
c
a
t
e
s
a
p
r
o
f
e
s
s
i
o
n
a
l
a
p
p
r
o
a
c
h
t
o
t
h
e
i
r
w
o
r
k
o
r
v
o
c
a
t
i
o
n
,
a
n
d
h
a
v
e
c
o
m
p
e
t
e
n
c
e
s
t
y
p
i
c
a
l
l
y
d
e
m
o
n
s
t
r
a
t
e
d
t
h
r
o
u
g
h
d
e
v
i
s
i
n
g
a
n
d
s
u
s
t
a
i
n
i
n
g
a
r
g
u
m
e
n
t
s
a
n
d
s
o
l
v
i
n
g
p
r
o
b
l
e
m
s
w
i
t
h
i
n
t
h
e
i
r
f
e
l
d
o
f
s
t
u
d
y
a
n
d
h
a
v
e
t
h
e
a
b
i
l
i
t
y
t
o
g
a
t
h
e
r
a
n
d
i
n
t
e
r
p
r
e
t
r
e
l
e
v
a
n
t
d
a
t
a
(
u
s
u
a
l
l
y
w
i
t
h
i
n
t
h
e
i
r
f
e
l
d
o
f
s
t
u
d
y
)
t
o
i
n
f
o
r
m
j
u
d
g
m
e
n
t
s
t
h
a
t
i
n
c
l
u
d
e
r
e
f
e
c
t
i
o
n
o
n
r
e
l
e
v
a
n
t
s
o
c
i
a
l
,
s
c
i
e
n
t
i
f
c
o
r
e
t
h
i
c
a
l
i
s
s
u
e
s
(
M
i
n
i
s
t
r
y
o
f
S
c
i
e
n
c
e
T
e
c
h
n
o
l
o
g
y
a
n
d
I
n
n
o
v
a
t
i
o
n
,
2
0
0
5
,
p
.
1
9
4
)
F
r
a
m
e
w
o
r
k
f
o
r
H
i
g
h
e
r
E
d
u
c
a
t
i
o
n
Q
u
a
l
i
f
c
a
t
i
o
n
s
(
Q
A
A
-
F
H
E
Q
)
Q
u
a
l
i
t
y
A
s
s
u
r
a
n
c
e
A
g
e
n
c
y
f
o
r
H
i
g
h
e
r
E
d
u
c
a
t
i
o
n
N
o
t
s
p
e
c
i
f
e
d
d
e
f
n
e
d
i
n
t
e
r
m
s
o
f
s
k
i
l
l
s
r
e
l
a
t
e
d
t
o
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
d
e
m
o
n
s
t
r
a
t
e
d
b
y
s
t
u
d
e
n
t
s
r
e
c
e
i
v
i
n
g
a
b
a
c
h
e
l
o
r
s
d
e
g
r
e
e
w
i
t
h
h
o
n
o
r
s
A
s
t
u
d
e
n
t
w
h
o
i
s
a
b
l
e
t
o
c
r
i
t
i
c
a
l
l
y
e
v
a
l
u
a
t
e
a
r
g
u
m
e
n
t
s
,
a
s
s
u
m
p
t
i
o
n
s
,
a
b
s
t
r
a
c
t
c
o
n
c
e
p
t
s
a
n
d
d
a
t
a
(
t
h
a
t
m
a
y
b
e
i
n
c
o
m
p
l
e
t
e
)
,
t
o
m
a
k
e
j
u
d
g
m
e
n
t
s
,
a
n
d
t
o
f
r
a
m
e
a
p
p
r
o
p
r
i
a
t
e
q
u
e
s
t
i
o
n
s
t
o
a
c
h
i
e
v
e
a
s
o
l
u
t
i
o
n
o
r
i
d
e
n
t
i
f
y
a
r
a
n
g
e
o
f
s
o
l
u
t
i
o
n
s
t
o
a
p
r
o
b
l
e
m
(
Q
A
A
,
2
0
0
8
,
p
.
1
9
)
F
r
a
m
e
w
o
r
k
f
o
r
L
e
a
r
n
i
n
g
a
n
d
D
e
v
e
l
o
p
m
e
n
t
O
u
t
c
o
m
e
s
T
e
C
o
u
n
c
i
l
f
o
r
t
h
e
A
d
v
a
n
c
e
m
e
n
t
o
f
S
t
a
n
d
a
r
d
s
(
C
A
S
)
i
n
E
d
u
c
a
t
i
o
n
C
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
I
d
e
n
t
i
f
e
s
i
m
p
o
r
t
a
n
t
p
r
o
b
l
e
m
s
,
q
u
e
s
t
i
o
n
s
,
a
n
d
i
s
s
u
e
s
;
a
n
a
l
y
z
e
s
,
i
n
t
e
r
p
r
e
t
s
,
a
n
d
m
a
k
e
s
j
u
d
g
m
e
n
t
s
o
f
t
h
e
r
e
l
e
v
a
n
c
e
a
n
d
q
u
a
l
i
t
y
o
f
i
n
f
o
r
m
a
t
i
o
n
;
a
s
s
e
s
s
e
s
a
s
s
u
m
p
t
i
o
n
s
a
n
d
c
o
n
s
i
d
e
r
s
a
l
t
e
r
n
a
t
i
v
e
p
e
r
s
p
e
c
t
i
v
e
s
a
n
d
s
o
l
u
t
i
o
n
s
(
C
A
S
B
o
a
r
d
o
f
D
i
r
e
c
t
o
r
s
,
2
0
0
8
,
p
.
2
)
L
i
b
e
r
a
l
E
d
u
c
a
t
i
o
n
a
n
d
A
m
e
r
i
c
a
s
P
r
o
m
i
s
e
(
L
E
A
P
)
A
s
s
o
c
i
a
t
i
o
n
o
f
A
m
e
r
i
c
a
n
C
o
l
l
e
g
e
s
a
n
d
U
n
i
v
e
r
s
i
t
i
e
s
C
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
A
h
a
b
i
t
o
f
m
i
n
d
c
h
a
r
a
c
t
e
r
i
z
e
d
b
y
t
h
e
c
o
m
p
r
e
h
e
n
s
i
v
e
e
x
p
l
o
r
a
t
i
o
n
o
f
i
s
s
u
e
s
,
i
d
e
a
s
,
a
r
t
i
f
a
c
t
s
,
a
n
d
e
v
e
n
t
s
b
e
f
o
r
e
a
c
c
e
p
t
i
n
g
o
r
f
o
r
m
u
l
a
t
i
n
g
a
n
o
p
i
n
i
o
n
o
r
c
o
n
c
l
u
s
i
o
n
(
R
h
o
d
e
s
,
2
0
1
0
,
p
.
1
)
the use of those cognitive skills or strategies that increase the probability of a desirable outcome. It is used to
describe thinking that is purposeful, reasoned, and goal directedthe kind of thinking involved in solving problems,
formulating inferences, calculating likelihoods, and making decisions, when the thinker is using skills that are
thoughtful and efective for the particular context and type of thinking task. (Halpern, 2003, p. 6)
Halperns approach to critical thinking has a strong focus on the outcome or utility aspect of critical thinking, in that
critical thinking is conceptualized as a tool to facilitate decision making or problem solving. Halpern recognized sev-
eral key aspects of critical thinking, including verbal reasoning, argument analysis, assessing likelihood and uncertainty,
making sound decisions, and thinking as hypothesis testing (Halpern, 2003).
Tese two research eforts, led by Facione and Halpern, lent themselves to two commercially available assessments
of critical thinking, the California Critical Tinking Skills Test (CCTST) and the Halpern Critical Tinking Assessment
(HCTA), respectively, which are described in detail in the following section, where we discuss existing assessments. Inter-
ested readers are also pointed to research concerning constructs overlapping with critical thinking, such as argumentation
(Godden &Walton, 2007; Walton, 1996; Walton, Reed, & Macagno, 2008) and reasoning (Carroll, 1993; Powers & Dwyer,
2003).
Existing Assessments of Critical Thinking
Multiple Themes of Assessments
As with the multivariate nature of the defnitions ofered for critical thinking, critical thinking assessments also tend to
capture multiple themes. Table 2 presents some of the most popular assessments of critical thinking, including the CCTST
(Facione, 1990a), California Critical Tinking Disposition Inventory (CCTDI; Facione & Facione, 1992), WatsonGlaser
Critical Tinking Appraisal (WGCTA; Watson & Glaser, 1980), EnnisWeir Critical Tinking Essay Test (Ennis & Weir,
1985), Cornell Critical Tinking Test (CCTT; Ennis, Millman, &Tomko, 1985), ETS
Profciency Profle (EPP; ETS, 2010),

Collegiate Learning Assessment+(CLA+; Council for Aid to Education, 2013), Collegiate Assessment of Academic Prof-
ciency (CAAPProgramManagement, 2012), andthe HCTA(Halpern, 2010). Te last columninTable 2 shows howcritical
thinking is operationally defned in these widely used assessments. Te assessments overlap in a number of key themes,
such as reasoning, analysis, argumentation, and evaluation. Tey also difer along a few dimensions, such as whether crit-
ical thinking should include decision making and problemsolving (e.g., CLA+, HCTA, and California Measure of Mental
Motivation [CM3]), be integrated with writing (e.g., CLA+), or involve metacognition (e.g., CM3).
Assessment Format
Te majority of the assessments exclusively use selected-response items such as multiple-choice or Likert-type items (e.g.,
CAAP, CCTST, and WGCTA). EPP, HCTA, and CLA+use a combination of multiple-choice and constructed-response
items (though the essay is optional in EPP), and the EnnisWeir test is an essay test. Given the limited testing time, only
a small number of constructed-response items can typically be used in a given assessment.
Test and Scale Reliability
Although constructed-response items have great face validity and have the potential to ofer authentic contexts in assess-
ments, they tend to have lower levels of reliability than multiple-choice items for the same amount of testing time (Lee,
Liu, & Linn, 2011). For example, according to a recent report released by the sponsor of the CLA+, the Council for Aid to
Education (Zahner, 2013), the reliability of the 60-min constructed-response section is only .43. Te test-level reliability
is .87, largely driven by the reliability of CLA+s 30-min short multiple-choice section.
Because of the multidimensional nature of critical thinking, many existing assessments include multiple subscales and
report subscale scores. Te main advantage of subscale scores is that they provide detailed information about test takers
critical thinking ability. Te downside, however, is that these subscale scores are typically challenged by their unsatis-
factory reliability and the lack of distinction between scales. For example, CCTST reports scores on overall reasoning
skills and subscale scores on fve aspects of critical thinking: (a) analysis, (b) evaluation, (c) inference, (d) deduction, and
(e) induction. However, Leppa (1997) reported that the subscales have low internal consistency, from .21 to .51, much
T
a
b
l
e
2
E
x
i
s
t
i
n
g
A
s
s
e
s
s
m
e
n
t
s
o
f
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
T
e
s
t
V
e
n
d
o
r
F
o
r
m
a
t
D
e
l
i
v
e
r
y
L
e
n
g
t
h
F
o
r
m
s
a
n
d
i
t
e
m
s
T
e
m
e
s
/
t
o
p
i
c
s
C
a
l
i
f
o
r
n
i
a
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
D
i
s
p
o
s
i
t
i
o
n
I
n
v
e
n
t
o
r
y
(
C
C
T
D
I
)
I
n
s
i
g
h
t
A
s
s
e
s
s
m
e
n
t
(
C
a
l
i
f
o
r
n
i
a
A
c
a
d
e
m
i
c
P
r
e
s
s
)
a
S
e
l
e
c
t
e
d
-
r
e
s
p
o
n
s
e
(
L
i
k
e
r
t
s
c
a
l
e
e
x
t
e
n
t
t
o
w
h
i
c
h
s
t
u
d
e
n
t
s
a
g
r
e
e
o
r
d
i
s
a
g
r
e
e
)
O
n
l
i
n
e
o
r
p
a
p
e
r
/
p
e
n
c
i
l
3
0
m
i
n
7
5
i
t
e
m
s
(
s
e
v
e
n
s
c
a
l
e
s
:
9
1
2
i
t
e
m
s
p
e
r
s
c
a
l
e
)
T
i
s
t
e
s
t
c
o
n
t
a
i
n
s
s
e
v
e
n
s
c
a
l
e
s
o
f
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
:
(
a
)
t
r
u
t
h
-
s
e
e
k
i
n
g
,
(
b
)
o
p
e
n
-
m
i
n
d
e
d
n
e
s
s
,
(
c
)
a
n
a
l
y
t
i
c
i
t
y
,
(
d
)
s
y
s
t
e
m
a
t
i
c
i
t
y
,
(
e
)
c
o
n
f
d
e
n
c
e
i
n
r
e
a
s
o
n
i
n
g
,
(
f
)
i
n
q
u
i
s
i
t
i
v
e
n
e
s
s
,
a
n
d
(
g
)
m
a
t
u
r
i
t
y
o
f
j
u
d
g
m
e
n
t
(
F
a
c
i
o
n
e
,
F
a
c
i
o
n
e
,
&
S
a
n
c
h
e
z
,
1
9
9
4
)
C
a
l
i
f
o
r
n
i
a
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
S
k
i
l
l
s
T
e
s
t
(
C
C
T
S
T
)
I
n
s
i
g
h
t
A
s
s
e
s
s
m
e
n
t
(
C
a
l
i
f
o
r
n
i
a
A
c
a
d
e
m
i
c
P
r
e
s
s
)
M
u
l
t
i
p
l
e
-
c
h
o
i
c
e
(
M
C
)
O
n
l
i
n
e
o
r
p
a
p
e
r
/
p
e
n
c
i
l
4
5
m
i
n
3
4
i
t
e
m
s
(
v
i
g
n
e
t
t
e
b
a
s
e
d
)
T
e
C
C
T
S
T
r
e
t
u
r
n
s
s
c
o
r
e
s
o
n
t
h
e
f
o
l
l
o
w
i
n
g
s
c
a
l
e
s
:
(
a
)
a
n
a
l
y
s
i
s
,
(
b
)
e
v
a
l
u
a
t
i
o
n
,
(
c
)
i
n
f
e
r
e
n
c
e
,
(
d
)
d
e
d
u
c
t
i
o
n
,
(
e
)
i
n
d
u
c
t
i
o
n
,
a
n
d
(
f
)
o
v
e
r
a
l
l
r
e
a
s
o
n
i
n
g
s
k
i
l
l
s
(
F
a
c
i
o
n
e
,
1
9
9
0
a
)
C
a
l
i
f
o
r
n
i
a
M
e
a
s
u
r
e
o
f
M
e
n
t
a
l
M
o
t
i
v
a
t
i
o
n
(
C
M
3
)
I
n
s
i
g
h
t
A
s
s
e
s
s
m
e
n
t
(
C
a
l
i
f
o
r
n
i
a
A
c
a
d
e
m
i
c
P
r
e
s
s
)
S
e
l
e
c
t
e
d
-
r
e
s
p
o
n
s
e
(
4
-
p
o
i
n
t
L
i
k
e
r
t
s
c
a
l
e
:
s
t
r
o
n
g
l
y
d
i
s
a
g
r
e
e
t
o
s
t
r
o
n
g
l
y
a
g
r
e
e
)
O
n
l
i
n
e
o
r
p
a
p
e
r
/
p
e
n
c
i
l
2
0
m
i
n
7
2
i
t
e
m
s
T
i
s
a
s
s
e
s
s
m
e
n
t
m
e
a
s
u
r
e
s
a
n
d
r
e
p
o
r
t
s
s
c
o
r
e
s
o
n
t
h
e
f
o
l
l
o
w
i
n
g
a
r
e
a
s
:
(
a
)
l
e
a
r
n
i
n
g
o
r
i
e
n
t
a
t
i
o
n
,
(
b
)
c
r
e
a
t
i
v
e
p
r
o
b
l
e
m
s
o
l
v
i
n
g
,
(
c
)
c
o
g
n
i
t
i
v
e
i
n
t
e
g
r
i
t
y
,
(
d
)
s
c
h
o
l
a
r
l
y
r
i
g
o
r
,
a
n
d
(
e
)
t
e
c
h
n
o
l
o
g
i
c
a
l
o
r
i
e
n
t
a
t
i
o
n
(
I
n
s
i
g
h
t
A
s
s
e
s
s
m
e
n
t
,
2
0
1
3
)
C
o
l
l
e
g
i
a
t
e
A
s
s
e
s
s
m
e
n
t
o
f
A
c
a
d
e
m
i
c
P
r
o
f
c
i
e
n
c
y
(
C
A
A
P
)
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
A
C
T
M
C
P
a
p
e
r
/
p
e
n
c
i
l
4
0
m
i
n
3
2
i
t
e
m
s
(
i
n
c
l
u
d
e
s
f
o
u
r
p
a
s
s
a
g
e
s
r
e
p
r
e
s
e
n
t
a
t
i
v
e
o
f
i
s
s
u
e
s
c
o
m
m
o
n
l
y
e
n
c
o
u
n
t
e
r
e
d
i
n
a
p
o
s
t
s
e
c
o
n
d
a
r
y
c
u
r
r
i
c
u
l
u
m
)
T
e
C
A
A
P
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
m
e
a
s
u
r
e
s
s
t
u
d
e
n
t
s
s
k
i
l
l
s
i
n
a
n
a
l
y
z
i
n
g
e
l
e
m
e
n
t
s
o
f
a
n
a
r
g
u
m
e
n
t
,
e
v
a
l
u
a
t
i
n
g
a
n
a
r
g
u
m
e
n
t
,
a
n
d
e
x
t
e
n
d
i
n
g
a
r
g
u
m
e
n
t
s
(
C
A
A
P
P
r
o
g
r
a
m
M
a
n
a
g
e
m
e
n
t
,
2
0
1
2
)
C
o
l
l
e
g
i
a
t
e
L
e
a
r
n
i
n
g
A
s
s
e
s
s
m
e
n
t
+
(
C
L
A
+
)
C
o
u
n
c
i
l
f
o
r
A
i
d
t
o
E
d
u
c
a
t
i
o
n
(
C
A
E
)
P
e
r
f
o
r
m
a
n
c
e
t
a
s
k
(
P
T
)
a
n
d
M
C
O
n
l
i
n
e
9
0
m
i
n
(
6
0
m
i
n
f
o
r
P
T
;
3
0
m
i
n
f
o
r
M
C
)
2
6
i
t
e
m
s
(
o
n
e
P
T
;
2
5
M
C
)
T
e
C
L
A
+
P
T
s
m
e
a
s
u
r
e
h
i
g
h
e
r
o
r
d
e
r
s
k
i
l
l
s
i
n
c
l
u
d
i
n
g
:
(
a
)
a
n
a
l
y
s
i
s
a
n
d
p
r
o
b
l
e
m
s
o
l
v
i
n
g
,
(
b
)
w
r
i
t
i
n
g
e
f
e
c
t
i
v
e
n
e
s
s
,
a
n
d
(
c
)
w
r
i
t
i
n
g
m
e
c
h
a
n
i
c
s
.
T
e
M
C
i
t
e
m
s
a
s
s
e
s
s
(
a
)
s
c
i
e
n
t
i
f
c
a
n
d
q
u
a
n
t
i
t
a
t
i
v
e
r
e
a
s
o
n
i
n
g
,
(
b
)
c
r
i
t
i
c
a
l
r
e
a
d
i
n
g
a
n
d
e
v
a
l
u
a
t
i
o
n
,
a
n
d
(
c
)
c
r
i
t
i
q
u
i
n
g
a
n
a
r
g
u
m
e
n
t
(
Z
a
h
n
e
r
,
2
0
1
3
)
T
a
b
l
e
2
C
o
n
t
i
n
u
e
d
T
e
s
t
V
e
n
d
o
r
F
o
r
m
a
t
D
e
l
i
v
e
r
y
L
e
n
g
t
h
F
o
r
m
s
a
n
d
i
t
e
m
s
T
e
m
e
s
/
t
o
p
i
c
s
C
o
r
n
e
l
l
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
T
e
s
t
(
C
C
T
T
)
T
e
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
C
o
.
M
C
C
o
m
p
u
t
e
r
b
a
s
e
d
(
u
s
i
n
g
t
h
e
s
o
f
w
a
r
e
)
o
r
p
a
p
e
r
/
p
e
n
c
i
l
5
0
m
i
n
(
c
a
n
a
l
s
o
b
e
a
d
m
i
n
i
s
t
e
r
e
d
u
n
t
i
m
e
d
)
L
e
v
e
l
X
:
7
1
i
t
e
m
s
L
e
v
e
l
X
i
s
i
n
t
e
n
d
e
d
f
o
r
s
t
u
d
e
n
t
s
i
n
G
r
a
d
e
s
5
1
2
+
a
n
d
m
e
a
s
u
r
e
s
t
h
e
f
o
l
l
o
w
i
n
g
s
k
i
l
l
s
:
(
a
)
i
n
d
u
c
t
i
o
n
,
(
b
)
d
e
d
u
c
t
i
o
n
,
(
c
)
c
r
e
d
i
b
i
l
i
t
y
,
a
n
d
(
d
)
i
d
e
n
t
i
f
c
a
t
i
o
n
o
f
a
s
s
u
m
p
t
i
o
n
s
(
T
e
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
C
o
.
,
2
0
1
4
)
L
e
v
e
l
Z
:
5
2
i
t
e
m
s
L
e
v
e
l
Z
i
s
i
n
t
e
n
d
e
d
f
o
r
s
t
u
d
e
n
t
s
i
n
G
r
a
d
e
s
1
1
1
2
+
a
n
d
m
e
a
s
u
r
e
s
t
h
e
f
o
l
l
o
w
i
n
g
s
k
i
l
l
s
:
(
a
)
i
n
d
u
c
t
i
o
n
,
(
b
)
d
e
d
u
c
t
i
o
n
,
(
c
)
c
r
e
d
i
b
i
l
i
t
y
,
(
d
)
i
d
e
n
t
i
f
c
a
t
i
o
n
o
f
a
s
s
u
m
p
t
i
o
n
s
,
(
e
)
s
e
m
a
n
t
i
c
s
,
(
f
)
d
e
f
n
i
t
i
o
n
,
a
n
d
(
g
)
p
r
e
d
i
c
t
i
o
n
i
n
p
l
a
n
n
i
n
g
e
x
p
e
r
i
m
e
n
t
s
(
T
e
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
C
o
.
,
2
0
1
4
)
E
n
n
i
s
W
e
i
r
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
E
s
s
a
y
T
e
s
t
M
i
d
w
e
s
t
P
u
b
l
i
c
a
t
i
o
n
s
E
s
s
a
y
P
a
p
e
r
/
p
e
n
c
i
l
4
0
m
i
n
N
i
n
e
-
p
a
r
a
g
r
a
p
h
e
s
s
a
y
/
l
e
t
t
e
r
T
i
s
a
s
s
e
s
s
m
e
n
t
m
e
a
s
u
r
e
s
t
h
e
f
o
l
l
o
w
i
n
g
a
r
e
a
s
o
f
t
h
e
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
c
o
m
p
e
t
e
n
c
e
:
(
a
)
g
e
t
t
i
n
g
t
h
e
p
o
i
n
t
,
(
b
)
s
e
e
i
n
g
r
e
a
s
o
n
s
a
n
d
a
s
s
u
m
p
t
i
o
n
s
,
(
c
)
s
t
a
t
i
n
g
o
n
e
s
p
o
i
n
t
,
(
d
)
o
f
e
r
i
n
g
g
o
o
d
r
e
a
s
o
n
s
,
(
e
)
s
e
e
i
n
g
o
t
h
e
r
p
o
s
s
i
b
i
l
i
t
i
e
s
,
a
n
d
(
f
)
r
e
s
p
o
n
d
i
n
g
a
p
p
r
o
p
r
i
a
t
e
l
y
t
o
a
n
d
/
o
r
a
v
o
i
d
i
n
g
a
r
g
u
m
e
n
t
w
e
a
k
n
e
s
s
e
s
(
E
n
n
i
s
&
W
e
i
r
,
1
9
8
5
)
E
T
S
P
r
o
f
c
i
e
n
c
y
P
r
o
f
l
e
(
E
P
P
)
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
E
T
S
M
C
O
n
l
i
n
e
a
n
d
p
a
p
e
r
/
p
e
n
c
i
l
A
b
o
u
t
4
0
m
i
n
(
f
u
l
l
t
e
s
t
i
s
2
h
)
2
7
i
t
e
m
s
(
s
t
a
n
d
a
r
d
f
o
r
m
)
T
e
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
c
o
m
p
o
n
e
n
t
o
f
t
h
i
s
t
e
s
t
m
e
a
s
u
r
e
s
a
s
t
u
d
e
n
t
s
a
b
i
l
i
t
y
t
o
:
(
a
)
d
i
s
t
i
n
g
u
i
s
h
b
e
t
w
e
e
n
r
h
e
t
o
r
i
c
a
n
d
a
r
g
u
m
e
n
t
a
t
i
o
n
i
n
a
p
i
e
c
e
o
f
n
o
n
f
c
t
i
o
n
p
r
o
s
e
,
(
b
)
r
e
c
o
g
n
i
z
e
a
s
s
u
m
p
t
i
o
n
s
a
n
d
t
h
e
b
e
s
t
h
y
p
o
t
h
e
s
i
s
t
o
a
c
c
o
u
n
t
f
o
r
i
n
f
o
r
m
a
t
i
o
n
p
r
e
s
e
n
t
e
d
,
(
c
)
i
n
f
e
r
a
n
d
i
n
t
e
r
p
r
e
t
a
r
e
l
a
t
i
o
n
s
h
i
p
b
e
t
w
e
e
n
v
a
r
i
a
b
l
e
s
,
a
n
d
(
d
)
d
r
a
w
v
a
l
i
d
c
o
n
c
l
u
s
i
o
n
s
b
a
s
e
d
o
n
i
n
f
o
r
m
a
t
i
o
n
p
r
e
s
e
n
t
e
d
(
E
T
S
,
2
0
1
0
)
T
a
b
l
e
2
C
o
n
t
i
n
u
e
d
T
e
s
t
V
e
n
d
o
r
F
o
r
m
a
t
D
e
l
i
v
e
r
y
L
e
n
g
t
h
F
o
r
m
s
a
n
d
i
t
e
m
s
T
e
m
e
s
/
t
o
p
i
c
s
H
a
l
p
e
r
n
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
A
s
s
e
s
s
m
e
n
t
(
H
C
T
A
)
S
c
h
u
h
f
r
i
e
d
P
u
b
l
i
s
h
i
n
g
,
I
n
c
.
F
o
r
c
e
d
c
h
o
i
c
e
(
M
C
,
r
a
n
k
i
n
g
,
o
r
r
a
t
i
n
g
o
f
a
l
t
e
r
n
a
t
i
v
e
s
)
a
n
d
o
p
e
n
-
e
n
d
e
d
C
o
m
p
u
t
e
r
b
a
s
e
d
6
0
8
0
m
i
n
,
b
u
t
t
e
s
t
i
s
u
n
t
i
m
e
d
(
F
o
r
m
S
1
)
2
0
m
i
n
,
b
u
t
t
e
s
t
i
s
u
n
t
i
m
e
d
(
F
o
r
m
S
2
)
2
5
s
c
e
n
a
r
i
o
s
o
f
e
v
e
r
y
d
a
y
e
v
e
n
t
s
(
f
v
e
p
e
r
s
u
b
c
a
t
e
g
o
r
y
)
S
1
:
B
o
t
h
o
p
e
n
-
e
n
d
e
d
a
n
d
f
o
r
c
e
d
c
h
o
i
c
e
i
t
e
m
s
S
2
:
A
l
l
f
o
r
c
e
d
c
h
o
i
c
e
i
t
e
m
s
T
i
s
t
e
s
t
m
e
a
s
u
r
e
s
f
v
e
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
s
u
b
s
k
i
l
l
s
:
(
a
)
v
e
r
b
a
l
r
e
a
s
o
n
i
n
g
s
k
i
l
l
s
,
(
b
)
a
r
g
u
m
e
n
t
a
n
d
a
n
a
l
y
s
i
s
s
k
i
l
l
s
,
(
c
)
s
k
i
l
l
s
i
n
t
h
i
n
k
i
n
g
a
s
h
y
p
o
t
h
e
s
i
s
t
e
s
t
i
n
g
,
(
d
)
u
s
i
n
g
l
i
k
e
l
i
h
o
o
d
a
n
d
u
n
c
e
r
t
a
i
n
t
y
,
a
n
d
(
e
)
d
e
c
i
s
i
o
n
-
m
a
k
i
n
g
a
n
d
p
r
o
b
l
e
m
-
s
o
l
v
i
n
g
s
k
i
l
l
s
(
H
a
l
p
e
r
n
,
2
0
1
0
)
W
a
t
s
o
n
G
l
a
s
e
r
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
A
p
p
r
a
i
s
a
l
t
o
o
l
(
W
G
C
T
A
)
P
e
a
r
s
o
n
M
C
O
n
l
i
n
e
a
n
d
p
a
p
e
r
/
p
e
n
c
i
l
S
t
a
n
d
a
r
d
:
4
0
6
0
m
i
n
(
F
o
r
m
s
A
a
n
d
B
)
i
f
t
i
m
e
d
8
0
i
t
e
m
s
T
e
W
G
C
T
A
i
s
c
o
m
p
o
s
e
d
o
f
f
v
e
t
e
s
t
s
:
(
a
)
i
n
f
e
r
e
n
c
e
,
(
b
)
r
e
c
o
g
n
i
t
i
o
n
o
f
a
s
s
u
m
p
t
i
o
n
s
,
(
c
)
d
e
d
u
c
t
i
o
n
,
(
d
)
i
n
t
e
r
p
r
e
t
a
t
i
o
n
,
a
n
d
(
e
)
e
v
a
l
u
a
t
i
o
n
o
f
a
r
g
u
m
e
n
t
s
.
E
a
c
h
t
e
s
t
c
o
n
t
a
i
n
s
b
o
t
h
n
e
u
t
r
a
l
a
n
d
c
o
n
t
r
o
v
e
r
s
i
a
l
r
e
a
d
i
n
g
p
a
s
s
a
g
e
s
a
n
d
s
c
e
n
a
r
i
o
s
e
n
c
o
u
n
t
e
r
e
d
a
t
w
o
r
k
,
i
n
t
h
e
c
l
a
s
s
r
o
o
m
,
a
n
d
i
n
t
h
e
m
e
d
i
a
.
A
l
t
h
o
u
g
h
t
h
e
r
e
a
r
e
f
v
e
t
e
s
t
s
,
o
n
l
y
t
h
e
t
o
t
a
l
s
c
o
r
e
i
s
r
e
p
o
r
t
e
d
(
W
a
t
s
o
n
&
G
l
a
s
e
r
,
2
0
0
8
a
,
2
0
0
8
b
)
S
h
o
r
t
f
o
r
m
:
3
0
m
i
n
i
f
t
i
m
e
d
4
0
i
t
e
m
s
W
a
t
s
o
n
G
l
a
s
e
r
I
I
:
4
0
m
i
n
i
f
t
i
m
e
d
4
0
i
t
e
m
s
M
e
a
s
u
r
e
s
a
n
d
p
r
o
v
i
d
e
s
i
n
t
e
r
p
r
e
t
a
b
l
e
s
u
b
s
c
o
r
e
s
f
o
r
t
h
r
e
e
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
s
k
i
l
l
d
o
m
a
i
n
s
t
h
a
t
a
r
e
b
o
t
h
c
o
n
t
e
m
p
o
r
a
r
y
a
n
d
b
u
s
i
n
e
s
s
r
e
l
e
v
a
n
t
,
i
n
c
l
u
d
i
n
g
t
h
e
a
b
i
l
i
t
y
t
o
:
(
a
)
r
e
c
o
g
n
i
z
e
a
s
s
u
m
p
t
i
o
n
s
,
(
b
)
e
v
a
l
u
a
t
e
a
r
g
u
m
e
n
t
s
,
a
n
d
(
c
)
d
r
a
w
c
o
n
c
l
u
s
i
o
n
s
(
W
a
t
s
o
n
&
G
l
a
s
e
r
,
2
0
1
0
)
.
a
I
n
s
i
g
h
t
A
s
s
e
s
s
m
e
n
t
a
l
s
o
o
w
n
s
o
t
h
e
r
,
m
o
r
e
s
p
e
c
i
a
l
i
z
e
d
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
t
e
s
t
s
,
s
u
c
h
a
s
t
h
e
B
u
s
i
n
e
s
s
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
S
k
i
l
l
s
T
e
s
t
(
B
C
T
S
T
)
a
n
d
t
h
e
H
e
a
l
t
h
S
c
i
e
n
c
e
s
R
e
a
s
o
n
i
n
g
T
e
s
t
(
H
S
R
T
)
.
lower than the reliabilities (i.e., .68 to .70) reported by the authors of CCTST (Ku, 2009). Another example is that the
WGCTA provides subscale scores on inference, recognition of assumption, deduction, interpretation, and evaluation of
arguments. Studies found that the internal consistency of some of these subscales was low and had a large range, from .17
to .74 (Loo &Torpe, 1999). Additionally, there was no clear evidence of distinct subscales, since a single-component scale
was discovered from 60 published studies in a meta-analysis (Bernard et al., 2008). Studies also reported unstable factor
structure and low reliability for the CCTDI (Kakai, 2003; Walsh & Hardy, 1997; Walsh, Seldomridge, & Badros, 2007).
Comparability of Forms
Following reasons such as test security and construct representation, most assessments employ multiple forms. Te com-
parability among forms is another source of concern. For example, Jacobs (1999) found that the Form B of CCTST was
signifcantly more difcult than Form A. Other studies also found that there is low comparability between the two forms
on the CCTST (Bondy, Koenigseder, Ishee, & Williams, 2001).
Validity
Table 3 presents some of the more recent validity studies for existing critical thinking assessments. Most studies focus on
the correlation of critical thinking scores with scores on other general cognitive measures. For example, critical thinking
assessments showed moderate correlations with general cognitive assessments such as SAT
or GRE
tests (e.g., Ennis,

2005; Giancarlo, Blohm, & Urdan, 2004; Liu, 2008; Stanovich & West, 2008; Watson & Glaser, 2010). Tey also showed
moderate correlations with course grades and GPA (Gadzella et al., 2006; Giancarlo et al., 2004; Halpern, 2006; Hawkins,
2012; Liu & Roohr, 2013; Williams et al., 2003). A few studies have looked at the relationship of critical thinking to behav-
iors, job performance, or life events. Ejiogu, Yang, Trent, and Rose (2006) examined the scores on the WGCTA and found
that they positively correlated moderately with job performance (corrected r =.32 to .52). Butler (2012) examined the
external validity of the HCTA and concluded that those with higher critical thinking scores had fewer negative life events
than those with lower critical thinking skills (r =.38).
Our review of validity evidence for existing assessments revealed that the quality and quantity of research support var-
ied signifcantly among existing assessments. Common problems with existing assessments include insufcient evidence
of distinct dimensionality, unreliable subscores, noncomparable test forms, and unclear evidence of diferential validity
across groups of test takers. In a review of the psychometric quality of existing critical thinking assessments, Ku (2009)
reported a phenomenon that the studies conducted by researchers not afliated with the authors of the tests tend to report
lower psychometric quality of the tests than the studies conducted by the authors and their afliates.
For future research, a component of validity that is missing from many of the existing studies is the incremental pre-
dictive validity of critical thinking. As Kuncel (2011) pointed out, evidence is needed to clarify critical thinking skills
prediction of desirable outcomes (e.g., job performance) beyond what is predicted by other general cognitive measures.
Without controlling for other types of general cognitive ability, it is difcult to evaluate the unique contributions that crit-
ical thinking skills make to the various outcomes. For example, the Butler (2012) study did not control for any measures
of participants general cognitive ability. Hence, it leaves room for an alternative explanation that other aspects of peoples
general cognitive ability, rather than critical thinking, may have contributed to their life success.
Challenges in Designing Critical Thinking Assessment
Authenticity Versus Psychometric Quality
A major challenge in designing an assessment for critical thinking is to strike a balance between the assessments authen-
ticity and its psychometric quality. Most current assessments rely on multiple-choice items when measuring critical
thinking. Te advantages of such assessments lie in their objectivity, efciency, high reliability, and low cost. Typically,
within the same amount of testing time, multiple-choice items are able to provide more information about what the test
takers know as compared to constructed-response items (Lee et al., 2011). Wainer and Tissen (1993) reported that the
scoring of 10 constructed-response items costs about $30, while the cost for scoring multiple-choice items to achieve
the same level of reliability was only 1. Although multiple-choice items cost less to score, they typically cost more in
T
a
b
l
e
3
V
a
l
i
d
i
t
y
E
v
i
d
e
n
c
e
A
u
t
h
o
r
/
y
e
a
r
C
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
a
s
s
e
s
s
m
e
n
t
S
u
b
j
e
c
t
s
S
a
m
p
l
e
s
i
z
e
V
a
l
i
d
i
t
y
B
u
t
l
e
r
(
2
0
1
2
)
H
C
T
A
C
o
m
m
u
n
i
t
y
c
o
l
l
e
g
e
s
t
u
d
e
n
t
s
;
s
t
a
t
e
u
n
i
v
e
r
s
i
t
y
s
t
u
d
e
n
t
s
;
a
n
d
c
o
m
m
u
n
i
t
y
a
d
u
l
t
s
1
3
1
S
i
g
n
i
f
c
a
n
t
m
o
d
e
r
a
t
e
c
o
r
r
e
l
a
t
i
o
n
w
i
t
h
t
h
e
r
e
a
l
-
w
o
r
l
d
o
u
t
c
o
m
e
s
o
f
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
i
n
v
e
n
t
o
r
y
(
r
(
1
3
1
)
=
.
3
8
)
,
m
e
a
n
i
n
g
t
h
o
s
e
w
i
t
h
h
i
g
h
e
r
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
s
c
o
r
e
s
r
e
p
o
r
t
e
d
f
e
w
e
r
n
e
g
a
t
i
v
e
l
i
f
e
e
v
e
n
t
s
E
j
i
o
g
u
e
t
a
l
.
(
2
0
0
6
)
W
G
C
T
A
S
h
o
r
t
F
o
r
m
A
n
a
l
y
s
t
s
i
n
a
g
o
v
e
r
n
m
e
n
t
a
g
e
n
c
y
8
4
S
i
g
n
i
f
c
a
n
t
m
o
d
e
r
a
t
e
c
o
r
r
e
l
a
t
i
o
n
s
c
o
r
r
e
c
t
e
d
f
o
r
c
r
i
t
e
r
i
o
n
u
n
r
e
l
i
a
b
i
l
i
t
y
r
a
n
g
i
n
g
f
r
o
m
.
3
2
t
o
.
5
2
w
i
t
h
s
u
p
e
r
v
i
s
o
r
y
r
a
t
i
n
g
s
o
f
j
o
b
p
e
r
f
o
r
m
a
n
c
e
b
e
h
a
v
i
o
r
s
;
h
i
g
h
e
s
t
c
o
r
r
e
l
a
t
i
o
n
s
w
e
r
e
w
i
t
h
a
n
a
l
y
s
i
s
a
n
d
p
r
o
b
l
e
m
s
o
l
v
i
n
g
(
r
(
6
8
)
=
.
5
2
)
,
a
n
d
w
i
t
h
j
u
d
g
m
e
n
t
a
n
d
d
e
c
i
s
i
o
n
m
a
k
i
n
g
(
r
(
6
8
)
=
.
5
2
)
E
n
n
i
s
(
2
0
0
5
)
E
n
n
i
s
W
e
i
r
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
E
s
s
a
y
T
e
s
t
U
n
d
e
r
g
r
a
d
u
a
t
e
s
i
n
a
n
e
d
u
c
a
t
i
o
n
a
l
p
s
y
c
h
o
l
o
g
y
c
o
u
r
s
e
(
T
a
u
b
e
,
1
9
9
7
)
1
9
8
M
o
d
e
r
a
t
e
c
o
r
r
e
l
a
t
i
o
n
w
i
t
h
W
G
C
T
A
(
r
(
1
8
7
)
=
.
3
7
)
L
o
w
t
o
m
o
d
e
r
a
t
e
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
p
e
r
s
o
n
a
l
i
t
y
a
s
s
e
s
s
m
e
n
t
s
r
a
n
g
i
n
g
f
r
o
m
.
2
4
t
o
.
3
5
L
o
w
t
o
m
o
d
e
r
a
t
e
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
S
A
T
v
e
r
b
a
l
(
r
(
1
5
5
)
=
.
4
0
)
,
S
A
T
q
u
a
n
t
i
t
a
t
i
v
e
(
r
(
1
5
5
)
=
.
2
8
)
,
a
n
d
G
P
A
(
r
(
1
7
1
)
=
.
2
8
)
M
a
l
a
y
u
n
d
e
r
g
r
a
d
u
a
t
e
s
w
i
t
h
E
n
g
l
i
s
h
a
s
a
s
e
c
o
n
d
l
a
n
g
u
a
g
e
(
M
o
o
r
e
,
1
9
9
5
)
6
0
C
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
S
A
T
v
e
r
b
a
l
(
p
r
e
t
e
s
t
:
r
(
6
0
)
=
.
3
4
,
p
o
s
t
t
e
s
t
:
r
(
6
0
)
=
.
5
9
)
,
T
O
E
F
L
(
p
r
e
:
r
(
6
0
)
=
.
3
5
,
p
o
s
t
:
r
(
6
0
)
=
.
4
8
)
,
A
C
T
(
p
r
e
:
r
(
6
0
)
=
.
2
5
,
p
o
s
t
:
r
(
6
0
)
=
.
6
6
)
,
T
W
E
(
p
r
e
:
r
(
6
0
)
=
.
5
6
,
p
o
s
t
:
r
(
6
0
)
=
.
0
7
)
,
S
P
M
(
p
r
e
:
r
(
6
0
)
=
.
4
1
,
p
o
s
t
:
r
(
6
0
)
=
.
3
5
)
1
0
t
h
-
,
1
1
t
h
-
,
a
n
d
1
2
t
h
-
g
r
a
d
e
s
t
u
d
e
n
t
s
(
N
o
r
r
i
s
,
1
9
9
5
)
1
7
2
L
o
w
t
o
m
o
d
e
r
a
t
e
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
W
G
C
T
A
(
r
(
1
7
2
)
=
.
2
8
)
,
C
C
T
T
(
r
(
1
7
2
)
=
.
3
2
)
,
a
n
d
T
e
s
t
o
n
A
p
p
r
a
i
s
i
n
g
O
b
s
e
r
v
a
t
i
o
n
s
(
r
(
1
7
2
)
=
.
2
5
)
G
a
d
z
e
l
l
a
e
t
a
l
.
(
2
0
0
6
)
W
G
C
T
A
S
h
o
r
t
F
o
r
m
S
t
a
t
e
u
n
i
v
e
r
s
i
t
y
s
t
u
d
e
n
t
s
(
p
s
y
c
h
o
l
o
g
y
,
e
d
u
c
a
t
i
o
n
a
l
p
s
y
c
h
o
l
o
g
y
,
a
n
d
s
p
e
c
i
a
l
e
d
u
c
a
t
i
o
n
u
n
d
e
r
g
r
a
d
u
a
t
e
m
a
j
o
r
s
;
g
r
a
d
u
a
t
e
s
t
u
d
e
n
t
s
)
5
8
6
L
o
w
t
o
m
o
d
e
r
a
t
e
l
y
h
i
g
h
s
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
c
o
u
r
s
e
g
r
a
d
e
s
r
a
n
g
i
n
g
f
r
o
m
.
2
0
t
o
.
6
2
(
r
(
5
6
5
)
=
.
3
0
f
o
r
t
o
t
a
l
g
r
o
u
p
;
r
(
5
6
)
=
.
6
2
f
o
r
p
s
y
c
h
o
l
o
g
y
m
a
j
o
r
s
)
G
i
d
d
e
n
s
a
n
d
G
l
o
e
c
k
n
e
r
(
2
0
0
5
)
C
C
T
S
T
;
C
C
T
D
I
B
a
c
c
a
l
a
u
r
e
a
t
e
n
u
r
s
i
n
g
p
r
o
g
r
a
m
i
n
t
h
e
s
o
u
t
h
w
e
s
t
e
r
n
U
n
i
t
e
d
S
t
a
t
e
s
2
1
8
S
t
u
d
e
n
t
s
w
h
o
p
a
s
s
e
d
t
h
e
N
C
L
E
X
h
a
d
s
i
g
n
i
f
c
a
n
t
l
y
h
i
g
h
e
r
t
o
t
a
l
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
s
c
o
r
e
s
o
n
t
h
e
C
C
T
S
T
e
n
t
r
y
t
e
s
t
(
t
(
1
0
1
)
=
.
2
.
5
*
,
d
=
1
.
0
)
,
C
C
T
S
T
e
x
i
t
t
e
s
t
(
t
(
1
9
1
)
=
3
.
0
*
*
,
d
=
.
8
1
)
,
a
n
d
t
h
e
C
C
T
D
I
e
x
i
t
t
e
s
t
(
t
(
1
8
3
)
=
2
.
6
*
*
,
d
=
.
7
2
)
t
h
a
n
s
t
u
d
e
n
t
s
w
h
o
f
a
i
l
e
d
t
h
e
N
C
L
E
X
H
a
l
p
e
r
n
(
2
0
0
6
)
H
C
T
A
S
t
u
d
y
1
:
J
u
n
i
o
r
a
n
d
s
e
n
i
o
r
s
t
u
d
e
n
t
s
f
r
o
m
h
i
g
h
s
c
h
o
o
l
a
n
d
c
o
l
l
e
g
e
i
n
C
a
l
i
f
o
r
n
i
a
8
0
h
i
g
h
s
c
h
o
o
l
,
8
0
c
o
l
l
e
g
e
M
o
d
e
r
a
t
e
s
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
t
h
e
A
r
l
i
n
T
e
s
t
o
f
F
o
r
m
a
l
R
e
a
s
o
n
i
n
g
(
r
=
.
3
2
)
f
o
r
b
o
t
h
g
r
o
u
p
s
S
t
u
d
y
2
:
U
n
d
e
r
g
r
a
d
u
a
t
e
a
n
d
s
e
c
o
n
d
-
y
e
a
r
m
a
s
t
e
r
s
s
t
u
d
e
n
t
s
f
r
o
m
C
a
l
i
f
o
r
n
i
a
S
t
a
t
e
U
n
i
v
e
r
s
i
t
y
,
S
a
n
B
e
r
n
a
r
d
i
n
o
1
4
5
u
n
d
e
r
g
r
a
d
u
a
t
e
s
,
3
2
m
a
s
t
e
r
s
M
o
d
e
r
a
t
e
t
o
m
o
d
e
r
a
t
e
l
y
h
i
g
h
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
t
h
e
N
e
e
d
f
o
r
C
o
g
n
i
t
i
o
n
s
c
a
l
e
(
r
=
.
3
2
)
,
G
P
A
(
r
=
.
3
0
)
,
S
A
T
V
e
r
b
a
l
(
r
=
.
5
8
)
,
S
A
T
M
a
t
h
(
r
=
.
5
0
)
,
G
R
E
A
n
a
l
y
t
i
c
(
r
=
.
5
9
)
G
i
a
n
c
a
r
l
o
e
t
a
l
.
(
2
0
0
4
)
C
M
3
9
t
h
-
a
n
d
1
1
t
h
-
g
r
a
d
e
p
u
b
l
i
c
s
c
h
o
o
l
s
t
u
d
e
n
t
s
i
n
n
o
r
t
h
e
r
n
C
a
l
i
f
o
r
n
i
a
(
v
a
l
i
d
a
t
i
o
n
s
t
u
d
y
2
)
4
8
4
S
t
a
t
i
s
t
i
c
a
l
l
y
s
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
r
a
n
g
e
s
b
e
t
w
e
e
n
f
o
u
r
C
M
3
s
u
b
s
c
a
l
e
s
(
l
e
a
r
n
i
n
g
,
c
r
e
a
t
i
v
e
p
r
o
b
l
e
m
s
o
l
v
i
n
g
,
m
e
n
t
a
l
f
o
c
u
s
,
a
n
d
c
o
g
n
i
t
i
v
e
i
n
t
e
g
r
i
t
y
)
a
n
d
m
e
a
s
u
r
e
s
o
f
m
a
s
t
e
r
y
g
o
a
l
s
(
r
(
4
8
2
)
=
.
0
9
t
o
.
6
7
)
,
s
e
l
f
-
e
f
c
a
c
y
(
r
(
4
8
2
)
=
.
2
2
t
o
.
4
7
)
,
S
A
T
9
M
a
t
h
(
r
(
3
7
9
)
=
.
1
8
t
o
.
3
3
)
,
S
A
T
9
R
e
a
d
i
n
g
(
r
(
3
8
7
)
=
.
1
3
t
o
.
4
3
)
,
S
A
T
9
S
c
i
e
n
c
e
(
r
(
3
8
0
)
=
.
1
1
t
o
.
2
2
)
,
S
A
T
9
L
a
n
g
u
a
g
e
/
W
r
i
t
i
n
g
(
r
(
3
8
2
)
=
.
0
9
t
o
.
1
7
)
,
S
A
T
9
S
o
c
i
a
l
S
c
i
e
n
c
e
(
r
(
3
7
9
)
=
.
0
9
t
o
.
1
8
)
,
a
n
d
G
P
A
(
r
(
4
6
8
)
=
.
1
9
t
o
.
3
5
)
9
t
h
-
t
o
1
2
t
h
-
g
r
a
d
e
a
l
l
-
f
e
m
a
l
e
c
o
l
l
e
g
e
p
r
e
p
a
r
a
t
o
r
y
s
t
u
d
e
n
t
s
i
n
M
i
s
s
o
u
r
i
(
v
a
l
i
d
a
t
i
o
n
s
t
u
d
y
3
)
5
8
7
S
t
a
t
i
s
t
i
c
a
l
l
y
s
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
r
a
n
g
e
s
b
e
t
w
e
e
n
f
o
u
r
C
M
3
s
u
b
s
c
a
l
e
s
(
l
e
a
r
n
i
n
g
,
c
r
e
a
t
i
v
e
p
r
o
b
l
e
m
s
o
l
v
i
n
g
,
m
e
n
t
a
l
f
o
c
u
s
,
a
n
d
c
o
g
n
i
t
i
v
e
i
n
t
e
g
r
i
t
y
)
a
n
d
P
S
A
T
M
a
t
h
(
r
(
4
3
4
)
=
.
1
5
t
o
.
3
7
)
,
P
S
A
T
V
e
r
b
a
l
(
r
(
4
3
4
)
=
.
2
0
t
o
.
3
1
)
,
P
S
A
T
W
r
i
t
i
n
g
(
r
(
2
9
1
)
=
.
2
1
t
o
.
3
3
)
,
P
S
A
T
s
e
l
e
c
t
i
o
n
i
n
d
e
x
(
r
(
4
3
4
)
=
.
2
3
t
o
.
4
0
)
,
a
n
d
G
P
A
(
r
(
5
8
0
)
=
.
2
1
t
o
.
4
6
)
H
a
w
k
i
n
s
(
2
0
1
2
)
C
C
T
S
T
S
t
u
d
e
n
t
s
e
n
r
o
l
l
e
d
i
n
u
n
d
e
r
g
r
a
d
u
a
t
e
E
n
g
l
i
s
h
c
o
u
r
s
e
s
a
t
a
s
m
a
l
l
l
i
b
e
r
a
l
a
r
t
s
c
o
l
l
e
g
e
1
1
7
M
o
d
e
r
a
t
e
s
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
b
e
t
w
e
e
n
t
o
t
a
l
s
c
o
r
e
a
n
d
G
P
A
(
r
=
.
4
5
)
.
M
o
d
e
r
a
t
e
s
i
g
n
i
f
c
a
n
t
s
u
b
s
c
a
l
e
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
G
P
A
r
a
n
g
e
d
f
r
o
m
.
2
7
t
o
.
4
3
T
a
b
l
e
3
C
o
n
t
i
n
u
e
d
A
u
t
h
o
r
/
y
e
a
r
C
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
a
s
s
e
s
s
m
e
n
t
S
u
b
j
e
c
t
s
S
a
m
p
l
e
s
i
z
e
V
a
l
i
d
i
t
y
L
i
u
a
n
d
R
o
o
h
r
(
2
0
1
3
)
E
P
P
C
o
m
m
u
n
i
t
y
c
o
l
l
e
g
e
s
t
u
d
e
n
t
s
f
r
o
m
1
3
i
n
s
t
i
t
u
t
i
o
n
s
4
6
,
4
0
2
S
t
u
d
e
n
t
s
w
i
t
h
h
i
g
h
e
r
G
P
A
a
n
d
s
t
u
d
e
n
t
s
w
i
t
h
m
o
r
e
c
r
e
d
i
t
h
o
u
r
s
p
e
r
f
o
r
m
e
d
h
i
g
h
e
r
o
n
t
h
e
E
P
P
a
s
c
o
m
p
a
r
e
d
t
o
s
t
u
d
e
n
t
s
w
i
t
h
l
o
w
G
P
A
a
n
d
f
e
w
e
r
c
r
e
d
i
t
h
o
u
r
s
G
P
A
w
a
s
t
h
e
s
t
r
o
n
g
e
s
t
s
i
g
n
i
f
c
a
n
t
p
r
e
d
i
c
t
o
r
o
f
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
(
=
.
2
1
,
2
=
.
0
4
)
W
a
t
s
o
n
a
n
d
G
l
a
s
e
r
(
2
0
1
0
)
W
G
C
T
A
U
n
d
e
r
g
r
a
d
u
a
t
e
e
d
u
c
a
t
i
o
n
a
l
p
s
y
c
h
o
l
o
g
y
s
t
u
d
e
n
t
s
(
T
a
u
b
e
,
1
9
9
7
)
1
9
8
M
o
d
e
r
a
t
e
s
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
S
A
T
V
e
r
b
a
l
(
r
(
1
5
5
)
=
.
4
3
)
,
S
A
T
M
a
t
h
(
r
(
1
5
5
)
=
.
3
9
)
,
G
P
A
(
r
(
1
7
1
)
=
.
3
0
)
,
a
n
d
E
n
n
i
s
W
e
i
r
(
r
(
1
8
7
)
=
.
3
7
)
.
L
o
w
t
o
m
o
d
e
r
a
t
e
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
p
e
r
s
o
n
a
l
i
t
y
a
s
s
e
s
s
m
e
n
t
s
r
a
n
g
i
n
g
f
r
o
m
.
0
7
t
o
.
3
3
T
r
e
e
s
e
m
e
s
t
e
r
s
o
f
f
r
e
s
h
m
a
n
n
u
r
s
i
n
g
s
t
u
d
e
n
t
s
i
n
e
a
s
t
e
r
n
P
e
n
n
s
y
l
v
a
n
i
a
(
B
e
h
r
e
n
s
,
1
9
9
6
)
1
7
2
M
o
d
e
r
a
t
e
l
y
h
i
g
h
s
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
w
i
t
h
f
a
l
l
s
e
m
e
s
t
e
r
G
P
A
r
a
n
g
i
n
g
f
r
o
m
.
5
1
t
o
.
5
9
E
d
u
c
a
t
i
o
n
m
a
j
o
r
s
i
n
a
n
e
d
u
c
a
t
i
o
n
a
l
p
s
y
c
h
o
l
o
g
y
c
o
u
r
s
e
a
t
a
s
o
u
t
h
w
e
s
t
e
r
n
s
t
a
t
e
u
n
i
v
e
r
s
i
t
y
(
G
a
d
z
e
l
l
a
,
B
a
l
o
g
l
u
,
&
S
t
e
p
h
e
n
s
,
2
0
0
2
)
1
1
4
S
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
b
e
t
w
e
e
n
t
o
t
a
l
s
c
o
r
e
a
n
d
G
P
A
(
r
=
.
2
8
)
a
n
d
s
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
b
e
t
w
e
e
n
t
h
e
f
v
e
W
G
C
T
A
s
u
b
s
c
a
l
e
s
a
n
d
G
P
A
r
a
n
g
i
n
g
f
r
o
m
.
0
2
t
o
.
3
4
W
i
l
l
i
a
m
s
e
t
a
l
.
(
2
0
0
3
)
C
C
T
S
T
;
C
C
T
D
I
F
i
r
s
t
-
y
e
a
r
d
e
n
t
a
l
h
y
g
i
e
n
e
s
t
u
d
e
n
t
s
f
r
o
m
s
e
v
e
n
U
.
S
.
b
a
c
c
a
l
a
u
r
e
a
t
e
u
n
i
v
e
r
s
i
t
i
e
s
2
0
7
S
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
b
e
t
w
e
e
n
t
h
e
C
C
T
S
T
a
n
d
C
C
T
D
I
a
t
b
a
s
e
l
i
n
e
(
r
=
.
4
1
)
a
n
d
a
t
s
e
c
o
n
d
s
e
m
e
s
t
e
r
(
r
=
.
2
6
)
S
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
b
e
t
w
e
e
n
C
C
T
S
T
a
n
d
k
n
o
w
l
e
d
g
e
,
f
a
c
u
l
t
y
r
a
t
i
n
g
s
,
a
n
d
c
l
i
n
i
c
a
l
r
e
a
s
o
n
i
n
g
r
a
n
g
i
n
g
f
r
o
m
.
2
4
t
o
.
3
7
a
t
b
a
s
e
l
i
n
e
,
a
n
d
f
r
o
m
.
2
3
t
o
.
3
1
a
t
t
h
e
s
e
c
o
n
d
s
e
m
e
s
t
e
r
.
F
o
r
t
h
e
C
C
T
D
I
,
s
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
r
a
n
g
e
d
f
r
o
m
.
1
5
t
o
.
1
9
a
t
b
a
s
e
l
i
n
e
w
i
t
h
k
n
o
w
l
e
d
g
e
,
f
a
c
u
l
t
y
r
a
t
i
n
g
s
,
a
n
d
c
l
i
n
i
c
a
l
r
e
a
s
o
n
i
n
g
,
a
n
d
w
i
t
h
f
a
c
u
l
t
y
r
e
a
s
o
n
i
n
g
(
r
=
.
2
1
)
a
t
s
e
c
o
n
d
s
e
m
e
s
t
e
r
T
e
C
C
T
D
I
w
a
s
a
m
o
r
e
c
o
n
s
i
s
t
e
n
t
p
r
e
d
i
c
t
o
r
o
f
s
t
u
d
e
n
t
p
e
r
f
o
r
m
a
n
c
e
(
4
.
9
1
2
.
3
%
v
a
r
i
a
n
c
e
e
x
p
l
a
i
n
e
d
)
t
h
a
n
t
r
a
d
i
t
i
o
n
a
l
p
r
e
d
i
c
t
o
r
s
s
u
c
h
a
s
a
g
e
,
G
P
A
,
n
u
m
b
e
r
o
f
c
o
l
l
e
g
e
h
o
u
r
s
(
2
.
1
4
.
1
%
v
a
r
i
a
n
c
e
e
x
p
l
a
i
n
e
d
)
W
i
l
l
i
a
m
s
,
S
c
h
m
i
d
t
,
T
i
l
l
i
s
s
,
W
i
l
k
i
n
s
,
a
n
d
G
l
a
s
n
a
p
p
(
2
0
0
6
)
C
C
T
S
T
;
C
C
T
D
I
F
i
r
s
t
-
y
e
a
r
d
e
n
t
a
l
h
y
g
i
e
n
e
s
t
u
d
e
n
t
s
f
r
o
m
t
h
r
e
e
U
.
S
.
b
a
c
c
a
l
a
u
r
e
a
t
e
d
e
n
t
a
l
h
y
g
i
e
n
e
p
r
o
g
r
a
m
s
7
8
S
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
b
e
t
w
e
e
n
C
C
T
S
T
a
n
d
C
C
T
D
I
(
r
=
.
2
9
)
a
t
b
a
s
e
l
i
n
e
S
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
b
e
t
w
e
e
n
C
C
T
S
T
a
n
d
N
B
D
H
E
M
u
l
t
i
p
l
e
-
C
h
o
i
c
e
(
r
=
.
3
5
)
a
n
d
C
a
s
e
-
B
a
s
e
d
t
e
s
t
s
(
r
=
.
4
7
)
a
t
b
a
s
e
l
i
n
e
a
n
d
a
t
p
r
o
g
r
a
m
c
o
m
p
l
e
t
i
o
n
(
r
=
.
3
0
a
n
d
.
3
3
,
r
e
s
p
e
c
t
i
v
e
l
y
)
.
S
i
g
n
i
f
c
a
n
t
c
o
r
r
e
l
a
t
i
o
n
s
b
e
t
w
e
e
n
C
C
T
D
I
a
n
d
N
B
D
H
E
C
a
s
e
-
B
a
s
e
d
a
t
b
a
s
e
l
i
n
e
(
r
=
.
2
5
)
a
n
d
a
t
p
r
o
g
r
a
m
c
o
m
p
l
e
t
i
o
n
(
r
=
.
4
0
)
C
C
T
S
T
w
a
s
a
m
o
r
e
c
o
n
s
i
s
t
e
n
t
p
r
e
d
i
c
t
o
r
o
f
s
t
u
d
e
n
t
p
e
r
f
o
r
m
a
n
c
e
o
n
b
o
t
h
N
B
D
H
E
M
u
l
t
i
p
l
e
-
C
h
o
i
c
e
(
1
0
.
5
%
v
a
r
i
a
n
c
e
e
x
p
l
a
i
n
e
d
)
a
n
d
N
B
D
H
E
C
a
s
e
-
B
a
s
e
d
s
c
o
r
e
s
(
1
8
.
4
%
v
a
r
i
a
n
c
e
e
x
p
l
a
i
n
e
d
)
t
h
a
n
t
r
a
d
i
t
i
o
n
a
l
p
r
e
d
i
c
t
o
r
s
s
u
c
h
a
s
a
g
e
,
G
P
A
,
n
u
m
b
e
r
o
f
c
o
l
l
e
g
e
h
o
u
r
s
N
o
t
e
.
T
W
E
=
T
e
s
t
o
f
W
r
i
t
t
e
n
E
n
g
l
i
s
h
;
S
P
M
=
C
o
m
p
o
s
i
t
e
s
c
o
r
e
f
o
r
t
h
e
n
a
t
i
o
n
a
l
-
l
e
v
e
l
M
a
l
a
y
s
i
a
n
C
e
r
t
i
f
c
a
t
e
o
f
E
d
u
c
a
t
i
o
n
;
N
C
L
E
X
=
N
a
t
i
o
n
a
l
C
o
u
n
c
i
l
L
i
c
e
n
s
u
r
e
E
x
a
m
i
n
a
t
i
o
n
;
N
B
D
H
E
=
N
a
t
i
o
n
a
l
B
o
a
r
d
D
e
n
t
a
l
H
y
g
i
e
n
e
E
x
a
m
i
n
a
t
i
o
n
.
*
p
<
.
0
5
.
*
*
p
.
0
1
.
assessment development than constructed-response items. Tat being said, the overall cost structure of multiple-choice
versus constructed-response items will depend onthe number of scores that are derived froma givenitemover its lifecycle.
Studies also show high correlations of multiple-choice items and constructed-response items of the same constructs
(Klein et al., 2009). Rodriguez (2003) investigated the construct equivalence between the two item formats through
a meta-analysis of 63 studies and concluded that these two formats are highly correlated when measuring the same
contentmean correlation around .95 with item stem equivalence and .92 without stem equivalence. Te Klein
et al. (2009) study compared the construct validity of three standardized assessments of college learning outcomes
(i.e., EPP, CLA, and CAAP) including critical thinking. Te school-level correlation between a multiple-choice and a
constructed-response critical thinking test was .93.
Given that there may be situations where constructed-response items are more expensive to score and that multiple-
choice items can measure the same constructs equally well in some cases, one might argue that it makes more sense to
use all multiple-choice items and disregard constructed-response items; however, with constructed-response items, it is
possible to create more authentic contexts and assess students ability to generate rather than select responses. In real-life
situations where critical thinking skills need to be exercised, there will not be choices provided. Instead, people will be
expected to come up with their own choices and determine which one is more preferable based on the question at hand.
Research has long established that the ability to recognize is diferent fromthe ability to generate (Frederiksen, 1984; Lane,
2004; Shepard, 2000). In the case of critical thinking, constructed-response items could be a better proxy of real-world
scenarios than multiple-choice items.
We agree with researchers who call for multiple item formats in critical thinking assessments (e.g., Butler, 2012;
Halpern, 2010; Ku, 2009). Constructed-response items alone will not be able to meet the psychometric standards due to
their low internal consistency, one type of reliability. A combination of multiple item formats ofers the potential for an
authentic and psychometrically sound assessment.
Instructional Value Versus Standardization
Another challenge of designing a standardized critical thinking assessment for higher education is the need to pay atten-
tion to the assessments instructional relevance. Faculty members are sometimes concerned about the limited relevance
of general student learning outcomes assessment results, as these assessments tend to be created in isolation from cur-
riculum and instruction. For example, although most institutions think that critical thinking is a necessary skill for their
students (AAC&U, 2011), not many ofer courses to foster critical thinking specifcally. Terefore, even if the assessment
results showthat students at a particular institutionlack critical thinking skills, no specifc department, program, or faculty
would claim responsibility for it, which greatly limits the practical use of the assessment results. It is important to identify
the common goals of general higher education and translate them into the design of the learning outcomes assessment.
Te VALUE rubrics created by AAC&U (Rhodes, 2010) are great examples of how a common framework can be created
to align expectations about college students critical thinking skills. While one should pay attention to the assessments
instructional relevance, one should also keep in mind that the tension will always exist between instructional relevance
andstandardizationof the assessment. Standardizedassessment canofer comparability andgeneralizability across institu-
tions and programs within an institution. An assessment designed to refect closely the objectives and goals of a particular
program will have great instructional relevance and will likely ofer rich diagnostic information about the students in that
program, but it may not serve as a meaningful measure of outcomes for students in other programs. When designing an
assessment for critical thinking, it is essential to fnd that balance point so the assessment results bear meaning for the
instructors and provide information to support comparisons across programs and institutions.
Institutional Versus Individual Use
Another concern is whether the assessment should be designed to provide results for institutional use or individual
use, a decision that has implications for psychometric considerations such as reliability and validity. For an institutional
level assessment, the results only need to be reliable at the group level (e.g., major, department), while for an individual
assessment, the results have to be reliable at the individual test-taker level. Typically, more items are required to achieve
acceptable individual-level reliability than institution-level reliability. When assessment results are used only at an aggre-
gate level, which is howthey are currently used by most institutions, the validity of the test scores is in question as students
may not expend their maximum efort when answering the items. Student motivation when taking a low-stakes assess-
ment has long been a source of concern. A recent study by Liu, Bridgeman, and Adler (2012) confrmed that motivation
plays a signifcant role in afecting student performance on low-stakes learning outcomes assessment in higher education.
Conclusions about students learning gains in college could signifcantly vary depending on whether they are motivated
to take the test or not. If possible, the assessment should be designed to provide reliable information about individual test
takers, which allows test takers to possibly beneft fromthe test (e.g., obtaining a certifcate of achievement). Te increased
stakes may help boost students motivation while taking such assessments.
General Versus Domain-Specic Assessment
Critical thinking has been defned as a generic skill in many of the existing frameworks and assessments (e.g., Bangert-
Drowns & Bankert, 1990; Ennis, 2003; Facione, 1990b; Halpern, 1998). On one hand, many educators and philosophers
believe that critical thinking is a set of skills and dispositions that can be applied across specifc domains (Davies, 2013;
Ennis, 1989; Moore, 2011). Te generalists depict critical thinking as an enabling skill similar to reading and writing, and
argue that it can be taught outside the context of a specifc discipline. On the other hand, the specifsts view about critical
thinking is that it is a domain-specifc skill and that the type of critical thinking skills required for nursing would be very
diferent from those practiced in engineering (Tucker, 1996). To date, much of the debate remains at the theoretical level,
withlittle empirical evidence confrming the generalizationor specifcity of critical thinking (Nicholas &Labig, 2013). One
empirical study has yielded mixed fndings. Powers and Enright (1987) surveyed 255 faculty members in six disciplinary
domains to gain understanding of the kind of reasoning and analytical abilities required for successful performance at the
graduate level. Te authors found that some general skills, such as reasoning or problem solving in situations in which
all the needed information is not known, were valued by faculty in all domains (p. 670). Despite the consensus on some
skills, faculty members across subject domains showed marked diference in terms of their perceptions of the importance
of other skills. For example, knowing the rules of formal logic was rated of high importance for computer science but
not for other disciplines (p. 678).
Tuning USA is one of the eforts that considers critical thinking in a domain-specifc context. Tuning USA is a faculty-
driven process that aims to align goals and defne competencies at each degree level (i.e., associates, bachelors, and
masters) within a discipline (Institute for Evidence-Based Change, 2010). For Tuning USA, there are goals to foster critical
thinking within certain disciplinary domains, such as engineering and history. For example, for engineering students who
work on design, critical thinking suggests that they develop an appreciation of the uncertainties involved, and the use
of engineering judgment (p. 97) and that they understand consideration of risk assessment, societal and environmental
impact, standards, codes, regulations, safety, security, sustainability, constructability, and operability at various stages of
the design process (p. 97).
In addition, there is insufcient empirical evidence showing that, as a generic skill, critical thinking is distinguishable
from other general cognitive abilities measured by validated assessments such as the SAT and GRE tests (see Kuncel,
2011). Kuncel, therefore, argued that instead of being a generic skill, critical thinking is more appropriately studied as
a domain-specifc construct. Tis view may be correct, or at least plausible, but there also needs to be empirical evi-
dence demonstrating that critical thinking is a domain-specifc skill. It is true that examples of critical thinking ofered
by members of the nursing profession may be very diferent from those cited by engineers, but content knowledge plays a
signifcant role in this distinction. Would it be reasonable to assume that skillful critical thinkers can be successful when
they transfer from one profession to another with sufcient content training? Whether and how content knowledge can
be disentangled from higher order critical thinking skills, as well as other cognitive and afective faculties, await further
investigation.
Despite the debate over the nature of critical thinking, most existing critical thinking assessments treat this skill as
generic. Apart from the theoretical reasons, it is much more costly and labor-intensive to design, develop, and score a
critical thinking assessment for each major feld of study. If assessments are designed only for popular domains with large
numbers of students, students in less popular majors are deprived of the opportunity to demonstrate their critical thinking
skills. From a score user perspective, because of the interdisciplinary nature of many jobs in the 21st century workforce,
many employers value generic skills that can be transferable from one domain to another (AAC&U, 2011; Chronicle of
Higher Education, 2012; Hart Research Associates, 2013), which makes an assessment of critical thinking in a particular
domain less attractive.
Total Versus Subscale Scores
Another challenge related to critical thinking assessment is whether to ofer subscale scores. Given the multidimensional
nature of the critical thinking construct, it is a natural tendency for assessment developers to consider subscale scores for
critical thinking. Subscale scores have the advantages of ofering detailed information about test takers performance on
each of the subscales and also have the potential to provide diagnostic information for teachers or instructors if the scores
are going to be used for formative purposes (Sinharay, Puhan, & Haberman, 2011). However, one should not lose sight of
the psychometric requirements when ofering subscale scores. Evidence is needed to demonstrate that there is a real and
reliable distinction among the subscales. Previous research reveals that for some of the existing critical thinking assess-
ments, there is lack of support for the factor structure based on which subscale scores are reported (e.g., CCTDI; Kakai,
2003; Walsh & Hardy, 1997; Walsh et al., 2007). Another psychometric requirement is that the subscale scores have to be
reliable enough to be of real value to score users from sample to sample and time to time. Owing to limited testing time,
many existing assessments include only a small number of items in each subscale, which will likely afect the reliability of
the subscale score. For example, the CLA+s performance tasks constitute one of the subscales of CLA+critical thinking
assessment. Te performance tasks typically include a small number of constructed-response items, and the reported reli-
ability is only .43 for this subscale on one of the CLA+forms (Zahner, 2013). Subscale scores with low levels of reliability
could provide misleading information for score users and threaten the validity of any decisions based on the subscores,
despite the good intention to provide more details for stakeholders.
In addition to psychometric considerations, the choice to ofer a total test score alone or with subscale scores also
depends on how the critical thinking scores will be used. For example, from a score users perspective, such as for an
employer, a holistic judgment of a candidates critical thinking skills could be more valuable than the evaluation of several
discrete aspects of critical thinking, since, in real-life settings, critical thinking is typically exercised as an integrated skill
(e.g., evaluation, analysis, and argumentation) in problem solving or decision making. One of the future directions of
research could focus on the comparison between the predictive validity of discrete versus aggregated critical thinking
scores in predicting life, work, or academic success.
Human Versus Automated Scoring
As many researchers agree that multiple assessment formats are needed for critical thinking assessment, the use of
constructed-response items raises questions of scoring. Te high cost and rater subjectivity are frequent concerns for
human scoring of constructed-response items (Adams, Whitlow, Stover, & Johnson, 1996; Ku, 2009; Williamson, Xi,
& Breyer, 2012). Automated scoring could be a viable solution to these concerns. Tere are automated scoring tools
designed to score both short-answer questions (e.g., c-rater scoring engine; Leacock & Chodorow, 2003; c-rater-ML)
and essay questions (e.g., e-rater
scoring engine; Bridgeman, Trapani, & Attali, 2012; Burstein, Chodorow, & Leacock,
2004; Burstein & Marcu, 2003). A distinction is that for short-answer items, automated scoring evaluates the content of
the responses (e.g., accuracy of knowledge), while for essay questions it evaluates the writing quality of the responses (e.g.,
grammar, coherence, and argumentation). When the assessment results carry moderate to high stakes, it is important to
examine the accuracy of automated scores to make sure they achieve an acceptable level of agreement with valid human
scores. In many cases, automated scoring can be used as a substitute for the second human rater and can be compared
with the score from the frst human rater. If discrepancies beyond what is typically allowed between two human raters
occur between the human and machine scores, additional human scoring will be introduced for adjudication.
Faculty Involvement
In addition to summative uses such as accreditation, accountability, and benchmarking, an important formative use of
student learning outcomes scores could be to provide diagnostic information for faculty to improve instruction. In the
spring 2013 survey of the current state of student learning outcomes assessment in U.S. higher education by the National
Institute for Learning Outcomes Assessment (NILOA), close to 60% of the provosts from 1,202 higher education insti-
tutions indicated that having more faculty members use the assessment results was their top priority (Kuh et al., 2014).
Standardized student learning outcomes assessments have long faced criticism that they lack instructional relevance. In
our review, that is not a problem with standardized assessments per se, but an inherent problem when two diametrically
diferent purposes or uses are imposed on a single assessment. When standardization is called for to summarize informa-
tion beyond content domains for hundreds or even thousands of students, it is less likely that the assessments can cater to
the unique instructional characteristics the students have been exposed to, making it difcult for the assessment results to
provide information that is specifc and meaningful for each instructor. Creative strategies need to be employed to some-
how unify these summative and formative purposes. A possible strategy is to introduce a customization component to a
standardized assessment, allowing faculty, either by institution or by disciplinary domain, to be involved in the assessment
design, sampling, analysis, and score interpretation process. For any student learning outcomes assessment results to be
of instructional value, faculty should be closely involved in the development process and fully understand the outcome of
the assessment.
Part II: A Proposed Framework for Next-Generation Critical Thinking Assessment
Operational Denition of Critical Thinking
Based on a broad review of existing frameworks of critical thinking in higher education (e.g., LEAP and Degree Qual-
ifcations Profle [DQP]) and empirical research on critical thinking (e.g., Halpern, 2003, 2010; Ku, 2009), we propose
an operational defnition for a next-generation critical thinking assessment (Table 4). Tis framework consists of fve
dimensions, including two analytical dimensions (i.e., evaluating evidence and its use; analyzing arguments); two syn-
thetic dimensions, which assess students abilities to understand implications and consequences and to produce their own
arguments; and one dimension relevant to all of the analytical and synthetic dimensionsunderstanding causation and
explanation.
We defne each of the dimensions in Table 4, along with a brief description and foci for assessing each dimension. For
example, an important analytical dimension is evaluate evidence and its use. Tis dimension considers evidence in larger
contexts, appropriate use of experts and other sources, checking for bias, and evaluating how well the evidence provided
contributes to the conclusion for which it is profered. Tis dimension (like the others in our framework) is aligned with
defnitions and descriptions from several of the existing frameworks involving critical thinking, such as Luminas DQP
and AAC&Us VALUE rubrics within the LEAP campaign, as well as assessments involving critical thinking such as the
Programme for International Student Assessments (PISA) problem-solving framework.
Assessment Design for a Next-Generation Critical Thinking Construct
In the following section, we discuss the structural features, task types, contexts, item formats, and accessibility when
designing a next-generation critical thinking assessment.
Structural Features and Task Types
To measure the dimensions defned in our construct, it is important to consider item types with a variety of structural
features and a variety of task types, which provide elements of authenticity and engaging methods for test takers to interact
with material. Tese features go beyond the more standard multiple-choice, short-answer, and essay types (although these
types remain available for use). See Table 5 for some possible structural features that can be employed for a critical thinking
assessment. Because task types specifcally address the foci of assessment, and structural features describe a variety of ways
the tasks could be presented for the best combination of authenticity and measurement efciency, the possible task types
are provided separately in Table 6.
Contexts and Formats
Each task can be undertaken in a variety of contexts that are relevant to higher education. One major division of contexts
is between the qualitative and quantitative realms. Considerations of evidence and claims, implications, and argument
structure are equally relevant to both realms, even though the types of evidence and claims, as well as the format in which
they are presented, may difer. Within and across these realms are broad subject-matter contexts that are central to most
higher education programs, including: (a) social science, (b) humanities, and (c) natural science. Assessments based on
this framework would include representation from all of these major areas, as well as of both qualitative and quantitative
T
a
b
l
e
4
C
r
i
t
i
c
a
l
T
i
n
k
i
n
g
F
r
a
m
e
w
o
r
k
D
i
m
e
n
s
i
o
n
s
D
e
s
c
r
i
p
t
i
o
n
a
n
d
r
a
t
i
o
n
a
l
e
F
o
c
i
o
f
a
s
s
e
s
s
m
e
n
t
A
n
a
l
y
t
i
c
a
l
d
i
m
e
n
s
i
o
n
s
E
v
a
l
u
a
t
e
e
v
i
d
e
n
c
e
a
n
d
i
t
s
u
s
e
E
v
i
d
e
n
c
e
p
r
o
v
i
d
e
d
i
n
s
u
p
p
o
r
t
o
f
a
p
o
s
i
t
i
o
n
c
a
n
b
e
e
v
a
l
u
a
t
e
d
a
p
a
r
t
f
r
o
m
t
h
e
p
o
s
i
t
i
o
n
a
d
v
a
n
c
e
d
I
n
t
h
e
f
o
c
i
o
f
a
s
s
e
s
s
m
e
n
t
,
t
h
e
f
a
c
t
u
a
l
b
a
s
i
s
f
o
r
t
h
e
e
v
i
d
e
n
c
e
m
a
y
b
e
r
e
l
a
t
e
d
t
o
,
b
u
t
m
a
y
a
l
s
o
b
e
e
v
a
l
u
a
t
e
d
i
n
d
e
p
e
n
d
e
n
t
l
y
o
f
,
e
v
a
l
u
a
t
i
o
n
s
o
f
s
o
u
r
c
e
s
a
n
d
/
o
r
b
i
a
s
e
s
E
v
a
l
u
a
t
e
e
v
i
d
e
n
c
e
i
n
l
a
r
g
e
r
c
o
n
t
e
x
t
C
o
n
s
i
d
e
r
t
h
e
l
a
r
g
e
r
c
o
n
t
e
x
t
,
w
h
i
c
h
m
a
y
i
n
c
l
u
d
e
g
e
n
e
r
a
l
k
n
o
w
l
e
d
g
e
,
a
d
d
i
t
i
o
n
a
l
b
a
c
k
g
r
o
u
n
d
i
n
f
o
r
m
a
t
i
o
n
p
r
o
v
i
d
e
d
,
o
r
a
d
d
i
t
i
o
n
a
l
e
v
i
d
e
n
c
e
i
n
c
l
u
d
e
d
w
i
t
h
i
n
a
n
a
r
g
u
m
e
n
t
E
v
a
l
u
a
t
e
r
e
l
e
v
a
n
c
e
a
n
d
e
x
p
e
r
t
i
s
e
o
f
s
o
u
r
c
e
s
C
o
n
s
i
d
e
r
t
h
e
r
e
l
i
a
b
i
l
i
t
y
o
f
s
o
u
r
c
e
(
p
e
r
s
o
n
,
o
r
g
a
n
i
z
a
t
i
o
n
,
a
n
d
d
o
c
u
m
e
n
t
)
o
f
e
v
i
d
e
n
c
e
i
n
c
l
u
d
e
d
i
n
a
n
a
r
g
u
m
e
n
t
.
I
n
e
v
a
l
u
a
t
i
n
g
s
o
u
r
c
e
s
,
s
t
u
d
e
n
t
s
s
h
o
u
l
d
b
e
a
b
l
e
t
o
c
o
n
s
i
d
e
r
s
u
c
h
f
a
c
t
o
r
s
a
s
r
e
l
e
v
a
n
t
e
x
p
e
r
t
i
s
e
,
a
c
c
e
s
s
t
o
i
n
f
o
r
m
a
t
i
o
n
R
e
c
o
g
n
i
z
e
p
o
s
s
i
b
i
l
i
t
i
e
s
o
f
b
i
a
s
i
n
e
v
i
d
e
n
c
e
o
f
e
r
e
d
C
o
n
s
i
d
e
r
p
o
t
e
n
t
i
a
l
b
i
a
s
e
s
i
n
p
e
r
s
o
n
s
o
r
o
t
h
e
r
s
o
u
r
c
e
s
p
r
o
v
i
d
i
n
g
o
r
o
r
g
a
n
i
z
i
n
g
d
a
t
a
,
i
n
c
l
u
d
i
n
g
p
o
t
e
n
t
i
a
l
m
o
t
i
v
a
t
i
o
n
s
a
s
o
u
r
c
e
m
a
y
h
a
v
e
f
o
r
p
r
o
v
i
d
i
n
g
t
r
u
t
h
f
u
l
o
r
m
i
s
l
e
a
d
i
n
g
i
n
f
o
r
m
a
t
i
o
n
A
p
i
e
c
e
o
f
e
v
i
d
e
n
c
e
,
t
h
o
u
g
h
w
e
l
l
f
o
u
n
d
e
d
,
m
a
y
y
e
t
b
e
u
s
e
d
i
n
a
p
p
r
o
p
r
i
a
t
e
l
y
,
t
o
d
r
a
w
a
c
o
n
c
l
u
s
i
o
n
t
h
a
t
i
t
d
o
e
s
n
o
t
s
u
p
p
o
r
t
,
o
r
r
e
p
r
e
s
e
n
t
e
d
a
s
p
r
o
v
i
d
i
n
g
m
o
r
e
s
u
p
p
o
r
t
t
h
a
n
i
s
w
a
r
r
a
n
t
e
d
E
v
a
l
u
a
t
e
r
e
l
e
v
a
n
c
e
o
f
e
v
i
d
e
n
c
e
a
n
d
h
o
w
w
e
l
l
i
t
s
u
p
p
o
r
t
s
t
h
e
c
o
n
c
l
u
s
i
o
n
s
t
a
t
e
d
o
r
i
m
p
l
i
e
d
i
n
t
h
e
a
r
g
u
m
e
n
t
E
v
a
l
u
a
t
e
o
v
e
r
a
l
l
r
e
l
e
v
a
n
c
e
o
f
e
v
i
d
e
n
c
e
f
o
r
t
h
e
c
o
n
c
l
u
s
i
o
n
E
v
a
l
u
a
t
e
c
o
n
s
i
s
t
e
n
c
y
o
f
c
o
n
c
l
u
s
i
o
n
s
d
r
a
w
n
o
r
p
o
s
i
t
e
d
w
i
t
h
e
v
i
d
e
n
c
e
p
r
e
s
e
n
t
e
d
.
E
v
a
l
u
a
t
e
s
t
r
e
n
g
t
h
o
f
e
v
i
d
e
n
c
e
o
f
e
r
e
d
A
n
a
l
y
z
e
a
n
d
e
v
a
l
u
a
t
e
a
r
g
u
m
e
n
t
s
I
t
c
a
n
b
e
d
i
f
c
u
l
t
t
o
e
v
a
l
u
a
t
e
a
n
a
r
g
u
m
e
n
t
w
i
t
h
o
u
t
a
n
a
d
e
q
u
a
t
e
g
r
a
s
p
o
f
i
t
s
s
t
r
u
c
t
u
r
e
:
w
h
a
t
i
s
a
s
s
u
m
e
d
(
i
m
p
l
i
c
i
t
l
y
o
r
e
x
p
l
i
c
i
t
l
y
)
?
H
o
w
d
o
e
s
t
h
e
a
u
t
h
o
r
i
n
t
e
n
d
t
h
e
p
r
e
m
i
s
e
s
t
o
l
e
a
d
t
o
t
h
e
c
o
n
c
l
u
s
i
o
n
?
A
r
e
t
h
e
r
e
i
n
t
e
r
m
e
d
i
a
t
e
a
r
g
u
m
e
n
t
s
t
e
p
s
?
K
n
o
w
i
n
g
t
h
e
r
e
l
a
t
i
o
n
s
h
i
p
s
a
m
o
n
g
p
a
r
t
s
o
f
a
n
a
r
g
u
m
e
n
t
i
s
h
e
l
p
f
u
l
i
n
f
n
d
i
n
g
i
t
s
s
t
r
o
n
g
a
n
d
w
e
a
k
p
o
i
n
t
s
A
n
a
l
y
z
e
a
r
g
u
m
e
n
t
s
t
r
u
c
t
u
r
e
I
d
e
n
t
i
f
y
s
t
a
t
e
d
a
n
d
u
n
s
t
a
t
e
d
p
r
e
m
i
s
e
s
,
c
o
n
c
l
u
s
i
o
n
s
,
i
n
t
e
r
m
e
d
i
a
t
e
s
t
e
p
s
.
U
n
d
e
r
s
t
a
n
d
t
h
e
l
a
n
g
u
a
g
e
o
f
a
r
g
u
m
e
n
t
a
t
i
o
n
,
r
e
c
o
g
n
i
z
i
n
g
l
i
n
g
u
i
s
t
i
c
c
u
e
s
E
v
a
l
u
a
t
e
a
r
g
u
m
e
n
t
s
t
r
u
c
t
u
r
e
D
i
s
t
i
n
g
u
i
s
h
v
a
l
i
d
f
r
o
m
i
n
v
a
l
i
d
a
r
g
u
m
e
n
t
s
,
i
n
c
l
u
d
i
n
g
r
e
c
o
g
n
i
z
i
n
g
s
t
r
u
c
t
u
r
a
l
f
a
w
s
t
h
a
t
m
a
y
b
e
p
r
e
s
e
n
t
i
n
a
n
i
n
v
a
l
i
d
a
r
g
u
m
e
n
t
,
s
u
c
h
a
s
h
o
l
e
s
i
n
r
e
a
s
o
n
i
n
g
S
y
n
t
h
e
t
i
c
d
i
m
e
n
s
i
o
n
s
U
n
d
e
r
s
t
a
n
d
i
m
p
l
i
c
a
t
i
o
n
s
a
n
d
c
o
n
s
e
q
u
e
n
c
e
s
T
e
c
o
n
c
l
u
s
i
o
n
o
f
a
n
a
r
g
u
m
e
n
t
i
s
n
o
t
a
l
w
a
y
s
e
x
p
l
i
c
i
t
l
y
s
t
a
t
e
d
.
F
u
r
t
h
e
r
m
o
r
e
,
a
r
g
u
m
e
n
t
s
a
n
d
p
o
s
i
t
i
o
n
s
o
n
i
s
s
u
e
s
c
a
n
h
a
v
e
c
o
n
s
e
q
u
e
n
c
e
s
a
n
d
i
m
p
l
i
c
a
t
i
o
n
s
t
h
a
t
g
o
b
e
y
o
n
d
t
h
e
o
r
i
g
i
n
a
l
a
r
g
u
m
e
n
t
:
I
f
w
e
a
c
c
e
p
t
s
o
m
e
p
a
r
t
i
c
u
l
a
r
p
r
i
n
c
i
p
l
e
,
w
h
a
t
f
o
l
l
o
w
s
?
W
h
a
t
m
i
g
h
t
b
e
s
o
m
e
p
o
s
s
i
b
l
e
r
e
s
u
l
t
s
(
i
n
t
e
n
d
e
d
o
r
o
t
h
e
r
w
i
s
e
)
o
f
a
r
e
c
o
m
m
e
n
d
e
d
c
o
u
r
s
e
o
f
a
c
t
i
o
n
?
D
r
a
w
o
r
r
e
c
o
g
n
i
z
e
c
o
n
c
l
u
s
i
o
n
s
f
r
o
m
e
v
i
d
e
n
c
e
p
r
o
v
i
d
e
d
W
h
e
n
a
c
o
n
c
l
u
s
i
o
n
i
s
n
o
t
e
x
p
l
i
c
i
t
l
y
s
t
a
t
e
d
i
n
a
n
a
r
g
u
m
e
n
t
o
r
c
o
l
l
e
c
t
i
o
n
o
f
e
v
i
d
e
n
c
e
,
d
r
a
w
o
r
r
e
c
o
g
n
i
z
e
d
e
d
u
c
t
i
v
e
a
n
d
s
u
p
p
o
r
t
e
d
c
o
n
c
l
u
s
i
o
n
s
E
x
t
r
a
p
o
l
a
t
e
i
m
p
l
i
c
a
t
i
o
n
s
T
a
k
e
t
h
e
r
e
a
s
o
n
i
n
g
t
o
t
h
e
n
e
x
t
s
t
e
p
(
s
)
t
o
u
n
d
e
r
s
t
a
n
d
w
h
a
t
f
u
r
t
h
e
r
c
o
n
s
e
q
u
e
n
c
e
s
a
r
e
s
u
p
p
o
r
t
e
d
o
r
d
e
d
u
c
t
i
v
e
l
y
i
m
p
l
i
e
d
b
y
a
n
a
r
g
u
m
e
n
t
o
r
c
o
l
l
e
c
t
i
o
n
o
f
e
v
i
d
e
n
c
e
T
a
b
l
e
4
C
o
n
t
i
n
u
e
d
D
i
m
e
n
s
i
o
n
s
D
e
s
c
r
i
p
t
i
o
n
a
n
d
r
a
t
i
o
n
a
l
e
F
o
c
i
o
f
a
s
s
e
s
s
m
e
n
t
D
e
v
e
l
o
p
s
o
u
n
d
a
n
d
v
a
l
i
d
a
r
g
u
m
e
n
t
s
T
i
s
d
i
m
e
n
s
i
o
n
r
e
c
o
g
n
i
z
e
s
t
h
a
t
s
t
u
d
e
n
t
s
s
h
o
u
l
d
b
e
a
b
l
e
t
o
n
o
t
o
n
l
y
u
n
d
e
r
s
t
a
n
d
a
n
d
e
v
a
l
u
a
t
e
a
r
g
u
m
e
n
t
s
m
a
d
e
b
y
o
t
h
e
r
s
,
b
u
t
a
l
s
o
t
o
d
e
v
e
l
o
p
t
h
e
i
r
o
w
n
a
r
g
u
m
e
n
t
s
w
h
i
c
h
a
r
e
v
a
l
i
d
(
b
a
s
e
d
o
n
g
o
o
d
r
e
a
s
o
n
i
n
g
)
a
n
d
s
o
u
n
d
(
v
a
l
i
d
a
n
d
b
a
s
e
d
o
n
g
o
o
d
e
v
i
d
e
n
c
e
)
D
e
v
e
l
o
p
v
a
l
i
d
a
r
g
u
m
e
n
t
s
E
m
p
l
o
y
r
e
a
s
o
n
i
n
g
s
t
r
u
c
t
u
r
e
s
t
h
a
t
p
r
o
p
e
r
l
y
l
i
n
k
e
v
i
d
e
n
c
e
w
i
t
h
c
o
n
c
l
u
s
i
o
n
s
D
e
v
e
l
o
p
s
o
u
n
d
a
r
g
u
m
e
n
t
s
S
e
l
e
c
t
o
r
p
r
o
v
i
d
e
a
p
p
r
o
p
r
i
a
t
e
e
v
i
d
e
n
c
e
,
a
s
p
a
r
t
o
f
a
v
a
l
i
d
a
r
g
u
m
e
n
t
R
e
l
e
v
a
n
t
t
o
a
n
a
l
y
t
i
c
a
l
a
n
d
s
y
n
t
h
e
t
i
c
d
i
m
e
n
s
i
o
n
s
U
n
d
e
r
s
t
a
n
d
c
a
u
s
a
t
i
o
n
a
n
d
e
x
p
l
a
n
a
t
i
o
n
T
i
s
d
i
m
e
n
s
i
o
n
i
s
a
p
p
l
i
c
a
b
l
e
t
o
a
n
d
w
o
r
k
s
w
i
t
h
a
l
l
o
f
t
h
e
a
n
a
l
y
t
i
c
a
l
a
n
d
s
y
n
t
h
e
t
i
c
d
i
m
e
n
s
i
o
n
s
,
b
e
c
a
u
s
e
i
t
c
a
n
i
n
v
o
l
v
e
c
o
n
s
i
d
e
r
a
t
i
o
n
s
o
f
e
v
i
d
e
n
c
e
,
i
m
p
l
i
c
a
t
i
o
n
s
,
a
n
d
a
r
g
u
m
e
n
t
s
t
r
u
c
t
u
r
e
,
a
s
w
e
l
l
a
s
e
i
t
h
e
r
e
v
a
l
u
a
t
i
o
n
o
r
a
r
g
u
m
e
n
t
p
r
o
d
u
c
t
i
o
n
.
C
a
u
s
e
s
o
r
e
x
p
l
a
n
a
t
i
o
n
s
f
e
a
t
u
r
e
p
r
o
m
i
n
e
n
t
l
y
i
n
a
w
i
d
e
r
a
n
g
e
o
f
c
r
i
t
i
c
a
l
t
h
i
n
k
i
n
g
c
o
n
t
e
x
t
s
E
v
a
l
u
a
t
e
c
a
u
s
a
l
c
l
a
i
m
s
,
i
n
c
l
u
d
i
n
g
d
i
s
t
i
n
g
u
i
s
h
i
n
g
c
a
u
s
a
t
i
o
n
f
r
o
m
c
o
r
r
e
l
a
t
i
o
n
,
a
n
d
c
o
n
s
i
d
e
r
i
n
g
p
o
s
s
i
b
l
e
a
l
t
e
r
n
a
t
i
v
e
c
a
u
s
e
s
o
r
e
x
p
l
a
n
a
t
i
o
n
s
G
e
n
e
r
a
t
e
o
r
e
v
a
l
u
a
t
e
e
x
p
l
a
n
a
t
i
o
n
s
Table 5 Possible Assessment Structural Features
Structural feature Description
Mark material in text Tis structure requires examinees to mark up a text according to instructions provided.
Select statements From a group of statements provided, examinees select statements that individually or jointly
play a particular role.
Create/fll out table Examinees create or fll in a table according to directions given.
Produce a diagram Based on material supplied, produce or fll in a diagram that analyzes or evaluates that
material.
Multistep selections Examinees go through a series of steps involving making selections, the results of which then
generate further selections to make.
Short constructed-response Examinees must respond in their own words to a prompt based on text, graph, or other
stimuli.
Essay Based on material supplied, examinees write an essay evaluating an argument made for a
particular conclusion or produce an argument of their own to support a position on an
assigned topic.
Single- and multiple-selection
multiple-choice
Examinees select one or more answer choices from those provided. Tey may be instructed to
select a particular number of choices or to select all that apply. Te number of choices
ofered may vary.
Table 6 Possible Task Types for Next-Generation Critical Tinking Assessment
Task type Description
Categorize information Examinees categorize a set of statements drawn from or pertaining to a stimulus.
Identify features Examinees identify one or more specifed features in an argument or list of statements. Such features
might include opinions, hypotheses, facts, supporting evidence, conclusions, emotional appeals,
reasoning errors, and so forth.
Recognize evidence/
conclusion relationships
Examinees match evidence statements with the conclusions they support or undermine.
Recognize inconsistency From a list of statements, or an argument, examinees indicate two that are inconsistent with one
another or one that is inconsistent with all of the others.
Revise argument Examinees improve a provided argument according to provided directions.
Supply critical questions Examinees provide or identify types of information that must be sought in order to evaluate an
argument or claim (Godden & Walton, 2007).
Multistep argument
evaluation or creation
To go beyond a surface understanding of relationships between evidence and conclusions
(supporting, undermining, irrelevant), examinees proceed through a series of steps to evaluate an
argument.
Detailed argument analysis Examinees analyze the structure of an argument, indicating premises, intermediate and fnal
conclusions, and the paths used to reach the conclusions.
Compare arguments Two or more arguments for or against a claim are provided. Examinees compare or describe possible
interactions between the arguments.
Draw conclusion/extrapolate
information
Examinees draw inferences from information provided or extrapolate additional likely
consequences.
Construct argument Based on information provided, examinees construct an argument for or against a particular claim,
or, construct an argument for or against a provided claim, drawing on ones own knowledge and
experience.
material appropriate to a given subject area. Te need to include quantitative material and skills (e.g., understanding of
basic statistical topics such as sample size and representation) is borne out by literature indicating that quantitative literacy
is one of the least prepared skill domains reported by college graduates (McKinsey & Company, 2013).
In addition to varying contexts, evidence, arguments, and claims, it is recommended that a critical thinking assessment
include material presented in a variety of formats, as it is important for higher education to equip students with the ability
to think critically about materials in various formats. Item formats can include graphs, charts, maps, images or fgures,
audio, and/or video material as evidence for a claim, or may be entirely presented using audio and/or video. In addition,
a variety of textual or linguistic style formats may be used (e.g., letter to editor, public address, and formal debate). In
these cases, it is important for assessment developers to be clear about the extent to which the use of a particular format is
intended primarily as an authentic method of conveying the evidence and/or argument, and when it is instead intended
to be used to test students ability to work with those specifc formats. Using the language of evidence-centered design
(e.g., Hansen &Mislevy, 2008), this can be referred to as distinguishing cases where the ability to use a particular format is
focal to the intended construct (and thus is essential to the item) fromthose where it is nonfocal to the intended construct
(and thus the format can, as needed, be replaced with one that is more accessible). Items that require the use of certain
nonfocal abilities can pose an unnecessary accessibility challenge, as we discuss below.
Delivery Modes and Accessibility
Accessibility to individuals with disabilities is important to ensure that an assessment is valid for all test takers, as well
as to ensure fairness and inclusiveness. Based on data from the U.S. Department of Education and National Center for
Education Statistics (Snyder & Dillow, 2012, Table 242) in 20072008, about 11% of undergraduate students reported
having a disability. Accessibility for individuals with disabilities or those not fuent in the target language or culture must
be considered when determining whether and how to use the format elements described above in assessment design. In
cases where the item formats are introduced primarily for authenticity, as opposed to direct measurement of facility with
the format, alternate modes of presentation should be made available. With these considerations in mind, it is important
to design an assessment with a variety of delivery modes. For example, for a computer-based item requiring examinees
to categorize statements, most examinees could do so by using a drag-and-drop (or a click-to-select, click-to-place) inter-
face. Such interfaces are difcult, however, for individuals with disabilities that interfere with mouse use, such as visual
or motor impairments. Because these mouse-mediated methods of categorizing are only means to record responses, not
the construct being tested, examinees could alternatively fll in a screen reader-friendly table, use a screen-readable drop-
down menu, or type in their responses. Similarly, when examinees are asked to select statements in a passage, they might
click on them to highlight with a mouse, make selections from a screen reader-friendly drop-down list, or type out the
relevant statements. As each item and item type is developed, care must be taken to ensure that there will be convenient
and accessible methods for accessing the questions and stimulus material and for entering responses. Tat is, the assess-
ment should employ features that enhance authenticity and face validity for most test takers, but that do not undermine
accessibility and, hence, validity for test takers with disabilities and without access to alternate methods of interacting with
the material.
Some of the considerations advanced above may be clarifed by a sample item(Figure 1), ftting into one of the synthetic
dimensions: develop sound and valid arguments. Tis item requires the examinee to synthesize provided information to
create an argument for an assigned conclusion (that the temperature in the tropics was signifcantly higher 60 million
years ago than it is now). Te task type (Table 6) is construct argument, and its structural feature (Table 5) is select
statements, which involves typing their numbers into boxes. Other selection methods are possible without changing the
construct, such as clicking to highlight, dragging and dropping into a list of selections, and typing or dictating the numbers
matching the selected statements. Because the item is amenable to a variety of interaction methods, it is fully accessible
while breaking the bounds of a traditional multiple-choice item. Finally, it is in the natural science context, making use of
qualitative reasoning.
Potential Advantages of the Proposed Framework and Assessment Considerations
Tere are several features that distinguish the proposed framework and assessment from existing frameworks and assess-
ments. First, it intends to capture both the analytical and synthetic dimensions of critical thinking. Te dimensions are
clearly defned, and the operational defnitions are concrete enough to be translated into assessments. Some of the exist-
ing assessments lump multiple constructs together and vaguely call them critical thinking and reasoning without clearly
defning what each component means. In our view, our framework and assessment specifcations build on many existing
eforts and represent the critical step from transforming a framework into an efective assessment. Second, our consid-
erations for a proposed critical thinking assessment recommend employing multiple assessment formats, in addition to
traditional multiple-choice items and short-answer items. Innovative item types can enhance the measurement of a wide
Directions: Read the background information and then perform the task.
Background
Titanoboa cerrejonensis is a prehistoric snake that lived in the tropics about 60 million years ago
Task: Identify three of the following statements that together constitute an argument in support of the
claim that the temperature in the tropics was significantly higher 60 million years ago than it is now.
1. As they are today, temperatures 60 million years ago were significantly higher in the tropics than
in temperate latitudes.
2. High levels of carbon dioxide in the atmosphere lead to high temperatures on Earths surface.
3. Larger coldblooded animals require higher ambient temperatures to maintain a necessary
metabolic rate.
4. Like other coldblooded animals, Titanoboa depended on its surroundings to maintain its body
temperature.
5. Muscular activity would have led to a temporary increase in the body temperature of Titanoboa.
6. Titanoboa is several times larger than the largest snakes now in existence.
In the boxes below, type in the numbers that correspond to the statements you select.
Figure 1 A sample synthetic dimension item (i.e., develop sound and valid arguments). Tis item also shows the construct argument
task type, the select-statements structural feature, and natural science context.
range of critical thinking skills and are likely to help students engage in test taking. Tird, the new framework and assess-
ment emphasize the critical balance between the authenticity of the assessment and its technical quality. Te assessment
should include both real-world and higher level academic materials, as well as students analyses or creation of extended
arguments. At the same time, rigorous analyses should be done to ensure the psychometric standards of the assessment.
Finally, our considerations for assessment emphasize the commitment of providing access to test takers with disabilities,
including low-incidence sensory disabilities (e.g., blindness), which is unparalleled among existing assessments. Given
the substantial percentage of disabled students in undergraduate education, it is necessary to ensure that the hundreds of
thousands of students whose access is otherwise denied will have the opportunity to demonstrate their critical thinking
ability.
Conclusion
Designing a next-generation critical thinking assessment is a complicated efort and requires the collaboration between
domain experts, assessment developers, measurement experts, institutions, and faculty members. Coordinated eforts are
required throughout the process of assessment development, including defning the construct, designing the assessment,
pilot testing and feld testing to evaluate the psychometric quality of the assessment items and establish scales, setting
standards to determine the profciency levels, and researching validity. An assessment will also likely undergo iterations
for improved validity, reliability, and connections to general undergraduate education. With the proposed framework
for a next-generation critical thinking assessment, we hope to make the assessment approach more transparent to the
stakeholders and alert assessment developers and score users to the many issues that infuence the quality and practical
uses of critical thinking scores.
References
Adams, M. H., Whitlow, J. F., Stover, L. M., & Johnson, K. W. (1996). Critical thinking as an educational outcome: An evaluation of
current tools of measurement. Nurse Education, 21(3), 2332.
Adelman, C., Ewell, P., Gaston, P., & Schneider, C. G. (2014). Te Degree Qualifcations Profle 2.0: Defning U.S. degrees through demon-
stration and documentation of college learning. Indianapolis, IN: Lumina Foundation.
Association of American Colleges and Universities. (2011). Te LEAP vision for learning: Outcomes, practices, impact, and employers
view. Washington, DC: Author.
Bangert-Drowns, R. L., & Bankert, E. (1990, April). Meta-analysis of efects of explicit instruction for critical thinking. Paper presented
at the annual meeting of the American Educational Research Association, Boston, MA.
Behrens, P. J. (1996). Te WatsonGlaser Critical Tinking Appraisal and academic performance of diploma school students. Journal
of Nursing Education, 35, 3436.
Bernard, R., Zhang, D., Abrami, P., Sicoly, F., Borokhovski, E., & Surkes, M. (2008). Exploring the structure of the WatsonGlaser
Critical Tinking Appraisal: One scale or many subscales? Tinking Skills and Creativity, 3, 1522.
Binkley, M., Erstad, O., Herman, J., Raizen, S., Ripley, M., & Rumble, M. (2012). Defning 21st century skills. In P. Grifn, B.
McGaw, & E. Care (Eds.), Assessment and teaching of 21st century skills (pp. 1766). New York, NY: Springer Science and Business
Media B.V.
Bondy, K., Koenigseder, L., Ishee, J., & Williams, B. (2001). Psychometric properties of the California Critical Tinking Tests. Journal
of Nursing Measurement, 9, 309328.
Bridgeman, B., Trapani, C., & Attali, Y. (2012). Comparison of human and machine scoring of essays: Diferences by gender, ethnicity,
and country. Applied Measurement in Education, 25(1), 2740.
Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: Te Criterion Online Service. AI Magazine, 25(3),
2736.
Burstein, J., & Marcu, D. (2003). Automated evaluation of discourse structure in student essays. In M. D. Shermis & J. Burstein (Eds.),
Automated essay scoring: A cross-disciplinary perspective (pp. 209229). Mahwah, NJ: Routledge.
Butler, H. A. (2012). Halpern Critical Tinking Assessment predicts real-world outcomes of critical thinking. Applied Cognitive Psy-
chology, 25(5), 721729.
CAAP Program Management. (2012). ACT CAAP technical handbook 20112012. Iowa City, IA: Author. Retrieved from http://www.
act.org/caap/pdf/CAAP-TechnicalHandbook.pdf
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York, NY: Cambridge University Press.
CAS Board of Directors. (2008). Council for the advancement of standards: Learning and development outcomes. Retrieved from
http://standards.cas.edu/getpdf.cfm?PDF=D87A29DC-D1D6-D014-83AA8667902C480B
Casner-Lotto, J., & Barrington, L. (2006). Are they really ready to work? Employers perspectives on the basic knowledge and applied skills
of new entrants to the 21st century U.S. workforce. New York, NY: Te Conference Board, Inc.
Chronicle of Higher Education. (2012). Te role of higher education in career development: Employer perceptions [PowerPoint slides].
Retrieved from http://chronicle.com/items/biz/pdf/Employers%20Survey.pdf
Council for Aid to Education. (2013). CLA+ overview. Retrieved from http://cae.org/performance-assessment/category/cla-overview/
Te Critical Tinking Co. (2014). Cornell Critical Tinking Test level Z. Retrieved fromhttp://www.criticalthinking.com/cornell-critical-
thinking-test-level-z.html
Davies, M. (2013). Critical thinking and the disciplines reconsidered. Higher Education Research and Development, 32(4), 529544.
Educational Testing Service. (2010). ETS Profciency Profle users guide. Princeton, NJ: Author.
Educational Testing Service. (2013). Quantitative market research [PowerPoint slides]. Princeton, NJ: Author.
Ejiogu, K. C., Yang, Z., Trent, J., & Rose, M. (2006, May). Understanding the relationship between critical thinking and job performance.
Poster presented at the 21st annual conference of the Society for Industrial and Organizational Psychology, Dallas, TX.
Ennis, R. H. (1989). Critical thinking and subject specifcity: Clarifcation and needed research. Educational Researcher, 18(3),
410.
Ennis, R. H. (2003). Critical thinking assessment. In D. Fasko (Ed.), Critical thinking and reasoning (pp. 293310). Cresskill, NJ: Hamp-
ton Press.
Ennis, R. H. (2005). Supplement to the test/manual entitled the EnnisWeir Critical Tinking Essay Test. Urbana: Department of Educa-
tional Policy Studies, University of Illinois at UrbanaChampaign.
Ennis, R. H., Millman, J., & Tomko, T. N. (1985). Cornell Critical Tinking Tests. Pacifc Grove, CA: Midwest Publications.
Ennis, R. H., & Weir, E. (1985). Te EnnisWeir Critical Tinking Essay Test. Pacifc Grove, CA: Midwest Publications.
Facione, P. A. (1990a). Te California Critical Tinking Skills Test-college level. Technical report #2. Factors predictive of CTskills. Millbrae,
CA: California Academic Press.
Facione, P. A. (1990b). Critical thinking: Astatement of expert consensus for purposes of educational assessment and instructions. Research
fndings and recommendations. Millbrae, CA: California Academic Press.
Facione, P. A., &Facione, N. C. (1992). Te California Critical Tinking Dispositions Inventory. Millbrae, CA: California Academic Press.
Facione, N. C., Facione, P. A., & Sanchez, C. A. (1994). Critical thinking disposition as a measure of competent clinical judgment: Te
development of the California Critical Tinking Disposition Inventory. Journal of Nursing Education, 33(8), 345350.
Finley, A. P. (2012). How reliable are the VALUE rubrics? Peer Review: Emerging Trends and Key Debates in Undergraduate Education,
14(1), 3133.
Frederiksen, N. (1984). Te real test bias: Infuence of testing on teaching and learning. American Psychologist, 39, 193202.
Gadzella, B. M., Baloglu, M., & Stephens, R. (2002). Prediction of GPA with educational psychology grades and critical thinking scores.
Education, 122(3), 618623.
Gadzella, B. M., Hogan, L., Masten, W., Stacks, J., Stephens, R., & Zascavage, V. (2006). Reliability and validity of the WatsonGlaser
Critical Tinking Appraisal-forms for diferent academic groups. Journal of Instructional Psychology, 33(2), 141143.
Giancarlo, C. A., Blohm, S. W., & Urdan, T. (2004). Assessing secondary students disposition toward critical thinking: Development
of the California Measure of Mental Motivation. Educational and Psychological Measurement, 64(2), 347364.
Giddens, J., & Gloeckner, G. W. (2005). Te relationship of critical thinking to performance on the NCLEX-RN. Journal of Nursing
Education, 44, 8589.
Godden, D. M., & Walton, D. (2007). Advances in the theory of argumentation schemes and critical questions. Informal Logic, 27(3),
267292.
Halpern, D. F. (1998). Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive
monitoring. American Psychologist, 53, 449455.
Halpern, D. F. (2003). Tought and knowledge: An introduction to critical thinking. Mahwah, NJ: Erlbaum.
Halpern, D. F. (2006). Is intelligence critical thinking? Why we need a new defnition of intelligence. In P. C. Kyllonen, R. D. Roberts,
& L. Stankov (Eds.), Extending intelligence: Enhancement and new constructs (pp. 349370). New York, NY: Erlbaum.
Halpern, D. F. (2010). Halpern Critical Tinking Assessment manual. Vienna, Austria: Schuhfried GmbH.
Hansen, E. G., & Mislevy, R. J. (2008). Design patterns for improving accessibility for test takers with disabilities (Research Report No.
RR-08-49). Princeton, NJ: Educational Testing Service.
Hart Research Associates. (2013). It takes more than a major: Employer priorities for college learning and student success. Washington,
DC: Author Retrieved from http://www.aacu.org/leap/documents/2013_EmployerSurvey.pdf
Hawkins, K. T. (2012). Tinking and reading among college undergraduates: An examination of the relationship between critical thinking
skills and voluntary reading (Doctoral dissertation). University of Tennessee, Knoxville. Retrieved from http://trace.tennessee.edu/
utk_graddiss/1302
Insight Assessment. (2013). California Measure of Mental Motivation level III. Retrieved from http://www.insightassessment.com/
Products/Products-Summary/Critical-Tinking-Attributes-Tests/California-Measure-of-Mental-Motivation-Level-III
Institute for Evidence-Based Change. (2010). Tuning educational structures: A guide to the process. Version 1.0. Encinitas, CA: Author
Retrieved from http://tuningusa.org/TuningUSA/tuningusa.publicwebsite/b7/b70c4e0d-30d5-4d0d-ba75-e29c52c11815.pdf
Jacobs, S. S. (1999). Te equivalence of forms A and B of the California Critical Tinking Skills Test. Measurement and Evaluation in
Counseling and Development, 31(4), 211222.
Kakai, H. (2003). Re-examining the factor structure of the California Critical Tinking Disposition Inventory. Perceptual and Motor
Skills, 96, 435438.
Klein, S., Liu, O. L., Sconing, J., Bolus, R., Bridgeman, B., Kugelmass, H., ... Steedle, J. (2009). Test validity study (TVS) report. New York,
NY: Collegiate Learning Assessment.
Ku, K. Y. L. (2009). Assessing students critical thinking performance: Urging for measurements using multi-response format. Tinking
Skills and Creativity, 4, 7076.
Kuh, G. D., Jankowski, N., Ikenberry, S. O., & Kinzie, J. (2014). Knowing what students know and can do: Te current state of
student learning outcomes assessment in U.S. colleges and universities. Champaign, IL: National Institute for Learning Outcomes
Assessment.
Kuncel, N. R. (2011, January). Measurement and meaning of critical thinking. Report presented at the National Research Councils 21st
Century Skills Workshop, Irvine, CA.
Lane, S. (2004). Validity of high-stakes assessment: Are students engaged in complex thinking? Educational Measurement: Issues and
Practice, 23(3), 614.
Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4),
389405.
Lee, H.-S., Liu, O. L., & Linn, M. C. (2011). Validating measurement of knowledge integration in science using multiple-choice and
explanation items. Applied Measurement in Education, 24(2), 115136.
Leppa, C. J. (1997). Standardized measures of critical thinking: Experience with the California Critical Tinking Tests. Nurse Education,
22, 2933.
Liu, O. L. (2008). Measuring learning outcomes in higher education using the measure of academic profciency and progress (MAPP)
(Research Report No. RR-08-47). Princeton, NJ: Educational Testing Service.
Liu, O. L., Bridgeman, B., & Adler, R. M. (2012). Measuring learning outcomes in higher education: Motivation matters. Educational
Researcher, 41(9), 352362.
Liu, O. L., & Roohr, K. C. (2013). Investigating 10-year trends of learning outcomes at community colleges (Research Report No. RR-13-
34). Princeton, NJ: Educational Testing Service.
Loo, R., & Torpe, K. (1999). A psychometric investigation of scores on the WatsonGlaser Critical Tinking Appraisal new forms.
Educational and Psychological Measurement, 59, 9951003.
Markle, R., Brenneman, M., Jackson, T., Burrus, J., & Robbins, S. (2013). Synthesizing frameworks of higher education student learning
outcomes (Research Report No. RR-13-22). Princeton, NJ: Educational Testing Service.
McKinsey & Company. (2013). Voice of the graduate. Philadelphia, PA: Author. Retrieved from http://mckinseyonsociety.com/
downloads/reports/Education/UXC001%20Voice%20of%20the%20Graduate%20v7.pdf
Ministry of Science Technology and Innovation. (2005). A framework for qualifcations of the European higher education area. Bologna
working group on qualifcations frameworks. Copenhagen, Denmark: Author.
Moore, R. A. (1995). Te relationship between critical thinking, global English language profciency, writing, and academic development
for 60 Malaysian second language learners (Unpublished doctoral dissertation). Indiana University, Bloomington.
Moore, T. J. (2011). Critical thinking and disciplinary thinking: A continuing debate. Higher Education Research and Development,
30(3), 261274.
Nicholas, M. C., & Labig, C. E. (2013). Faculty approaches to assessing critical thinking in the humanities and the natural and social
sciences: Implications for general education. Te Journal of General Education, 62(4), 297319.
Norris, S. P. (1995). Format efects on critical thinking test performance. Te Alberta Journal of Educational Research, 41(4), 378406.
OECD. (2012). Educationat a glance 2012: OECDindicators. Paris, France: OECDPublishing. Retrievedfromhttp://www.oecd.org/edu/
EAG%202012_e-book_EN_200912.pdf
Powers, D. E., & Dwyer, C. A. (2003). Toward specifying a construct of reasoning (Research Memorandum No. RM-03-01). Princeton,
NJ: Educational Testing Service.
Powers, D. E., &Enright, M. K. (1987). Analytical reasoning skills in graduate study: Perception of faculty in six felds. Journal of Higher
Education, 58(6), 658682.
Quality Assurance Agency. (2008). Te framework for higher education qualifcations in England, Wales and Northern Ireland: August
2008. Mansfeld, England: Author.
Rhodes, T. L. (Ed.) (2010). Assessing outcomes and improving achievement: Tips and tools for using rubrics. Washington, DC: Association
of American Colleges and Universities.
Rodriguez, M. C. (2003). Construct equivalence of multiple-choice and constructed-response items: A random efects synthesis of
correlations. Journal of Educational Measurement, 40(2), 163184.
Shepard, L. A. (2000). Te role of assessment in a learning culture. Educational Researcher, 29(7), 414.
Sinharay, S., Puhan, G., & Haberman, S. J. (2011). An NCME instructional module on subscores. Educational Measurement: Issues and
Practice, 30(3), 2940.
Snyder, T. D., & Dillow, S. A. (2012). Digest of education statistics 2011 (NCES 2012001). Washington, DC: U.S. Department of
Education, Institute of Education Sciences, National Center for Education Statistics. Retrieved from http://nces.ed.gov/pubs2012/
2012001.pdf
Stanovich, K. E., & West, R. F. (2008). On the relative independence of thinking biases and cognitive ability. Journal of Personality and
Social Psychology, 94(4), 672695.
Taube, K. T. (1997). Critical thinking ability and disposition as factors of performance on a written critical thinking test. Te Journal of
General Education, 46(2), 129164.
Tucker, R. W. (1996). Less than critical thinking. Assessment and Accountability Forum, 6(3/4), 16.
U.S. Department of Labor. (2013). Competency model clearinghouse: Critical and analytical thinking. Retrieved from http://www.
careeronestop.org/competencymodel/blockModel.aspx?tier_id=2&block_id=12
Wainer, H., & Tissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test
construction. Applied Measurement in Education, 6(2), 103118.
Walsh, C. M., & Hardy, R. C. (1997). Factor structure stability of the California Critical Tinking Disposition Inventory across sex and
various students majors. Perceptual and Motor Skills, 85, 12111228.
Walsh, C. M., Seldomridge, L. A., & Badros, K. K. (2007). California Critical Tinking Disposition Inventory: Further factor analytic
examination. Perceptual and Motor Skills, 104, 141151.
Walton, D. N. (1996). Argumentation schemes for presumptive reasoning. Mahwah, NJ: Erlbaum.
Walton, D., Reed, C., & Macagno, F. (2008). Argumentation schemes. Cambridge, England: Cambridge University Press.
Watson, G., & Glaser, E. M. (1980). WatsonGlaser Critical Tinking Appraisal, forms A and B manual. San Antonio, TX: Te Psycho-
logical Corporation.
Watson, G., &Glaser, E. M. (2008a). WatsonGlaser Critical Tinking Appraisal, forms Aand Bmanual. Upper Saddle River, NJ: Pearson
Education.
Watson, G., & Glaser, E. M. (2008b). WatsonGlaser Critical Tinking Appraisal short form manual. Pearson Education: Upper Saddle
River, NJ.
Watson, G., & Glaser, E. M. (2010). WatsonGlaser II Critical Tinking Appraisal: Technical manual and users guide. San Antonio, TX:
NCS Pearson.
Williams, K. B., Glasnapp, D., Tilliss, T., Osborn, J., Wilkins, K., Mitchell, S., ... Schmidt, C. (2003). Predictive validity of critical thinking
skills for initial clinical dental hygiene performance. Journal of Dental Education, 67(11), 11801192.
Williams, K. B., Schmidt, C., Tilliss, T. S. I., Wilkins, K., & Glasnapp, D. R. (2006). Predictive validity of critical thinking skills and
dispositions for the National Board Dental Hygiene Examination: A preliminary investigation. Journal of Dental Education, 70(5),
536544.
Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement:
Issues and Practice, 31(1), 213.
Zahner, D. (2013). Reliability and validityCLA+. New York, NY: Council for Aid to Education. Retrieved from http://cae.org/images/
uploads/pdf/Reliability_and_Validity_of_CLA_Plus.pdf
Action Editor: Donald Powers
Reviewers: Douglas Baldwin and Paul Deane
E-RATER, ETS, the ETS logo, GRE, LISTENING. LEARNING. LEADING., TOEFL, and TWE are registered trademarks of Educational
Testing Service (ETS). C-RATER is a trademark of ETS. SAT is a registered trademark of the College Board. All other trademarks
are property of their respective owners.
Find other ETS-published reports by searching the ETS ReSEARCHER database at http://search.ets.org/researcher/

Ets 212009

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ets 212009

Uploaded by

Copyright:

Available Formats

Assessing Critical Thinking in Higher

Education: Current State and Directions

Profciency Profle (EPP; ETS, 2010),

tests (e.g., Ennis,

You might also like