You are on page 1of 22

P1: GAE

Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Clinical Child and Family Psychology Review, Vol. 4, No. 1, 2001

The Child and Adolescent Functional Assessment Scale


(CAFAS): Review and Current Status1

Michael P. Bates2

Measures of impairment in psychological and behavioral functioning have a long history in


the field of children’s mental health, and appear particularly useful in eligibility determina-
tion, treatment planning, and outcome evaluation of services for children and adolescents with
serious emotional disturbance (SED). One recently developed multidimensional measure of
functional impairment—the Child and Adolescent Functional Assessment Scale (CAFAS; K.
Hodges, 1989, 1997)—has enjoyed widespread use nationwide. It has been adopted as a tool
for making treatment eligibility decisions and documenting outcomes on a statewide level in
more than 20 states and on a local level in dozens of research and demonstration projects. In
this paper, the technical merits of the CAFAS are closely examined, with the conclusion that
empirical evidence is lacking to support its valid use in making the types of treatment decisions
for which it is currently being employed across the nation. Furthermore, there appears to be
little concern among mental health researchers, practitioners, administrators, and state legis-
lators about these apparent limitations of the CAFAS. The potential benefits of establishing
objective and valid level-of-need criteria, using the CAFAS are numerous and the interest in
doing so is clear; however, the psychometric limitations of the scale identified in this review
need to be addressed before its full potential can be realized.
KEY WORDS: functional impairment; measurement.

Measures of impairment in psychological and be- Whereas the earliest measures of functional impair-
havioral functioning have a long history in the field of ment were hailed for providing simple scores along
children’s mental health. These level-of-functioning a single dimension of global functioning, they have
(LOF) scales have many promising features such as also been criticized for containing vague descriptors
cost and time effectiveness, clinical utility, and under- and being susceptible to rater bias. Recently, multi-
standability to a wide audience. They also appear to be dimensional measures of functional impairment have
particularly promising for use with children and ado- been developed that, presumably, have greater resis-
lescents with serious emotional disturbance (SED).3 tance to rater bias. One of these multidimensional
measures—the Child and Adolescent Functional As-
sessment Scale (CAFAS; Hodges, 1989, 1997)—has
1 This article was adapted from portions of the author’s doctoral enjoyed widespread use nationwide. For example, it
dissertation.
2 Counseling, Clinical, School Psychology Program, Graduate has been adopted on a statewide level in more than 20
School of Education, University of California, Santa Barbara, states and on a local level in dozens of research and
California, 93106-9490; e-mail: mbates@education.ucsb.edu. demonstration projects. In fact, several states are us-
3 The term serious emotional disturbance (SED) was replaced with
ing it as the sole determinant of placement and fund-
the term emotional disturbance (ED) in the 1997 reauthorization ing decisions for children’s behavioral health services.
of the Individuals with Disabilities Education Act (IDEA). Many
Given this widespread adoption of the CAFAS, it is
scholars have also used the less stigmatizing term emotional and
behavioral disorders (EBD) to refer to this population. SED is timely and necessary to review this scale within the
used in this paper because this is the term used in both the IDEA context of LOF assessment. The purpose of this pa-
and Center for Mental Health Services (CMHS) definitions. per is to describe the current status of the CAFAS in

63
1096-4037/01/0300-0063$19.50/0 °
C 2001 Plenum Publishing Corporation
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

64 Bates

the field of children’s mental health services, to crit- & Smukler, 1998). This requirement represents a
ically examine its technical qualities, and to propose significant change from traditional reimbursement
future research activities that may enhance its validity. models, in which a diagnostic classification—such as a
diagnosis from the Diagnostic and Statistical Manual
of Mental Disorders, 4th edition (American Psychi-
BACKGROUND atric Association, 1994)—was sufficient to establish
eligibility for any available services (Hodges & Gust,
Measures of functional impairment have numer- 1995; Pokorny, 1991).
ous uses in the diagnosis, treatment, and evaluation In addition to these uses for LOF scales, much
of children’s mental health problems. For example, has been written about their utility in outcome assess-
the definitions of serious emotional disturbance (SED) ment (e.g., Lambert, 1994; Newman, 1980; Pokorny,
issued by the Center for Mental Health Services 1991). Historically, LOF measures have been widely
(CMHS, 1999) and contained within the Individual used in outcome assessment. For example, Lambert
with Disabilities Education Act (IDEA, 1990) both and McRoberts (1993) found that LOF indicators
cite functional impairment as a critical component of comprised 53% of the therapist-completed outcome
SED. The CMHS, which has funded more than 40 na- measures used in psychotherapy treatment studies
tionwide sites for developing systems of care for com- published in the Journal of Consulting and Clini-
prehensive services for youths with SED, recently is- cal Psychology between 1986 and 1991. One reason
sued the following definition of children with SED as: that LOF measures appear to be especially useful in
outcome evaluation is that they provide a standard
. . . persons from birth up to age 18 who currently means of comparing clients across diagnoses or set-
or at any time during the past year have had a diag-
nosable mental, behavioral, or emotional disorder of
tings or both (Burlingame, Lambert, Reisinger, Neff,
sufficient duration to meet diagnostic criteria speci- & Mosier, 1995). Burlingame et al. (1995, p. 228) sum-
fied within DSM-III-R (or the most recent edition of marized this point nicely:
DSM) that resulted in functional impairment which
substantially interferes with or limits the child’s role
Risk assessment establishes the pretreatment degree
or functioning in family, school, or community activ-
of severity of the patient to the level playing field
ities. (Federal Register, 1993, p. 29425)
when comparing outcomes from different providers,
Similarly, the IDEA definition of SED requires that clinics, or patient groups. Outcome assessment pro-
cedures used in risk assessment should ensure that
certain characteristics exist “over a long period of time one is comparing apples with apples when it comes to
and to a marked degree that adversely affects a child’s initial severity of patients’ disorders. If initial patient
educational performance” [34 CRF 300.5(b)(8)]. Un- severity is not accounted for, then one health care
der both of these definitions, the assessment of func- institution may erroneously appear to exhibit poorer
tional impairment is a required component. outcomes due solely to treating more or less symp-
tomatically severe cases. Reliable risk assessment
The construct of global functioning has also is even more important in mental health outcomes
become an important component of determining eli- where improvement is measured in shades of gray in
gibility to receive mental health services. For example, contrast to the black-and-white comparisons often
CMHS administers a block grant program to allocate possible in other areas of the health care industry.
funds to community mental health agencies for the
provision of services to youths with SED. As part of This aspect of LOF measures appears particularly use-
the application for this process, states must estimate ful for studies of treatment effectiveness for youths
the incidence (number of new cases) and prevalence with SED because this classification is a heteroge-
(total number of cases per year) of SED, using, neous and complex diagnostic category encompass-
in part, LOF measures (Federal Register, 1993). ing a variety of emotional and behavioral problems.
Similarly, many states have adopted a managed care This definitional complexity creates a problem for re-
perspective and seek Medicaid reimbursement for searchers who wish to study this population. Espe-
mental health services for children and adolescents cially under the system of care model that typically
with mental health problems. Under this system, encourages broader service eligibility, two individu-
eligibility for Medicaid-funded services is contin- als with SED may exhibit vastly different symptoms
gent upon demonstration that the youth exhibits or behaviors and may require different interventions.
some level of functional impairment (Anderson, Thus, LOF assessment may be the best tool to provide
Berlant, Mauch, & Maloney, 1996; Srebnik, Uehara, a common metric by which to compare these youths.
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 65

Sechrest and colleagues (Sechrest, McKnight, (GUNTAC, 2000; Hodges, Wong, & Latessa, 1998).
& McKnight, 1996) extended this argument. They Table I lists these 30 states and describes how they
strongly recommended that treatment outcome scales are using the scale.
be calibrated, not only to provide standardized nor- The primary uses of the CAFAS, at least on the
mative scores, but to develop a standard measure by statewide level, appear to be for performance out-
which to assess meaningful change. They advocated come assessment and service eligibility determina-
the use of procedures to associate changes in a scale’s tion. For example, in August 1995, the state of Florida
scores with actual change in behavior or functioning. began using the CAFAS as part of its state-legislated
For example, they purported that “actual change in mandate to collect performance outcome data for
behavior or functioning is critical for assessing treat- all children receiving mental health services (Massey,
ment outcome, rather than simply inferring change et al., 1998). In Virginia, the CAFAS was selected as
from a metric of uncertain meaning” (p. 1065). A de- one component of a statewide performance and out-
crease of 10 scale units on a depression scale, for exam- come system (POMS) to assess outcomes within the
ple, might represent decrease of some degree in the public mental health system (Koch & Brunk, 1998).
intensity or severity of specific symptoms. Yet, how Beginning April 1998, in California, the CAFAS is
much of a change in behavior actually occurred and completed for every youth receiving mental health
what impact might such changes have on meaningful services through every county mental health depart-
indicators such as functional status or quality of life? ment (G. M. Pettigrew, personal communication, July
According to these authors, it is critical to establish 21, 1997). As shown in Table I, numerous other
change in functioning as the meaningful criterion of states, including Delaware, Georgia, Maine, North
the effectiveness of psychotherapeutic intervention. Dakota, South Carolina, and Tennessee, are currently
From this perspective, LOF measures should play a using the CAFAS for performance assessment pur-
key role in the calibration of other psychological mea- poses. Additionally, Illinois and Kentucky are cur-
sures and documentation of “real-life” changes in so- rently considering implementing the scale statewide
cial, emotional, and behavioral status. (J. Call, personal communication, January 28, 2000;
The uses of LOF measures for purposes of GUNTAC, 2000). Ohio recently switched from the
assessment, eligibility determination, and outcome CAFAS to the Ohio Youth Scales (Ogles, Melendez,
evaluation in children’s mental health services are Davis, & Lunnen, 1999) for their performance assess-
numerous. Furthermore, there is a clear interest in ment (M. Wood, personal communication, February
collecting LOF data on both the state and local levels 9, 2000).
[Georgetown National Technical Assistance Center Many states have implemented systematic col-
(GUNTAC), 2000]. Perhaps the most widely used lection of the CAFAS to determine eligibility for ser-
LOF scale, at least on the statewide level, is the vices. For example, the North Carolina Department
CAFAS. The following section presents a brief sur- of Mental Health is currently using the CAFAS to
vey of current CAFAS usage on both the state and establish service eligibility for youths with mental
local levels. health needs (Behar & Stelle, 1997; S. Clark, personal
communication, November 19, 1997). The CAFAS
is also being used statewide in Virginia to deter-
EXTENT OF USE OF CAFAS mine levels of care to manage services funded by
the recent Comprehensive Services Act (Kirkman,
Statewide Implementation et al., 1999). Louisiana and Massachusetts are also
using the CAFAS to determine level of need for
Many states have adopted policies or passed Medicaid funded services (Hersch, 1998; Lemoine
legislation mandating the use of the CAFAS on a & McDermott, 1998). Michigan is in the process
statewide basis. It appears that this trend has in- of developing empirical service eligibility guidelines
creased over the past few years. For example, in a late- using the CAFAS and other data (i.e., risk fac-
1993 survey of state usage of LOF measures, Hodges tors, clinical condition) to predict type and inten-
and Gust (1995) found that four states (Arizona, New sity of services (Hodges, Warren, & Wotring, 1998).
Hampshire, North Carolina, and Wisconsin) were North Dakota uses the CAFAS and diagnostic cri-
consistently using the CAFAS statewide. As of July teria to assign youths to one of four eligibility cat-
2000, there were 30 states that have implemented or egories (J. Perry, personal communication, January
are considering implementing the CAFAS statewide 31, 2000). In addition, both Georgia and South
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

66 Bates

Table I. Summary of Statewide Implementation of the CAFAS


State Purpose of CAFAS use Approx. date of implementation Source(s)
AL Using CAFAS along with a battery of other measures At least since 1999 Georgetown University
(CBCL, YSR, and Parent Questionnaire) National Technical Assistance
for outcome evaluation on a statewide basis. Center (GUNTAC), 2000
AZ Cutoff total score of 90 on CAFAS qualifies youth At least since Hodges & Gust, 1995;
for Intensive Case Management Services October, 1993 Schwartz & Perkins, 1997
funded by the Division of Behavioral
Health Services of the Arizona Department
of Health Services (considering revising
criteria to include diagnostic information).
CA Component of state-mandated performance outcome April 1, 1998 G. M. Pettigrew, personal
assessment for all youths receiving Department of communication, July 21,
Mental Health services for 2 months or longer. 1997; GUNTAC, 2000
DE Clinical service management teams using CAFAS At least since 1999 R. Ray, personal
for treatment planning and outcome evaluation communication,
with all youths receiving Medicaid or state- January 31, 2000
funded services.
FL Component of state-legislated collection of August, 1995 Massey, Kershaw, Armstrong,
performance outcome data for all children Shepard, & Wu, 1998
receiving services funded by the
Department of Children and Families.
GA All providers will be mandated to collect CAFAS March 1, 2000 GUNTAC, 2000; S. Lindsey,
as component of the Performance Measurement personal communication,
& Evaluation System (PERMES). Will become January 28, 2000
sole criterion for determining eligibility
and level-of-need.
IL Piloting the CAFAS as part of a study on the At least since 1999 GUNTAC, 2000
feasibility of implementing MHSIP Consumer
Oriented Report Card.
IN Using Miniscale version (with two added At least since 1997 J. Phillips, personal
subscales: Environment and Reliance) for communication,
performance assessment. January 28, 2000
KY Currently used in some programs. Recommended July, 1999 GUNTAC, 2000
for use by KY Managed Care Outcomes
Committee. May be integrated with statewide
evaluation protocol.
LA Sole criterion to establish level-of-need (LON) December, 1995 Lemoine, Speier, Ellzey,
to receive one of 3 Medicaid-funded service & Pine, 1997; Lemoine
packages (high, medium, and low). & McDermott, 1998
ME In process of implementing CAFAS along with At least since 1999 S. Amero, personal
other measures (CALOCUS, BERS) for communication,
performance assessment, service planning, February 1, 2000
and outcome evaluation for youths receiving
Mental Health case management services.
MD Piloting CAFAS via phone interviews with a At least since 1998 GUNTAC, 2000
sample of total youths served as evaluation
of first year of managed care reform.
MA Cut-off score of 80 using six of eight subscales, July 1, 1996 Irvin & Hersch, 1997;
in conjunction with diagnosable disorder of Hersch, 1998
1-year duration, to determine eligibility
for services funded by Department of
Mental Health.
MI Presently developing guidelines to predict No information given Hodges, et al., 1998
type and intensity of services from CAFAS
scores and diagnostic/risk information.
MN Statewide CAFAS use is encouraged but At least since 1999 GUNTAC, 2000
not mandated as component of measuring
client and family outcomes.

(Continued )
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 67

Table I. (Continued )
State Purpose of CAFAS use Approx. date of implementation Source(s)
MO Component of preliminary study to assess October 1, 1995 Daniels & Clements, 1997
outcomes for children and adolescents receiving
public mental health services funded by the
Department of Mental Health.
NE Collected at intake, every At least since 1999 GUNTAC, 2000
6 months, and at discharge while in
Professional Partner Program.
NH Using Miniscale Version (see IN) and diagnostic At least since GUNTAC, 2000; J. Perry,
information to determine eligibility for services. October, 1993 personal communication,
Planning to implement full version of the scale January 31, 2000
beginning July 2000.
NJ Piloting the CAFAS in Southern Region Summer, 2000 GUNTAC, 2000
with the long-term goal to use statewide.
NY Administered with other battery instruments At least since 1999 GUNTAC, 2000
at intake and every 6 months in the
F.R.I.E.N.D.S. program.
NC Primary criterion to authorize levels of care January, 1994 Behar, & Stelle, 1997;
related to six levels of intensity of services (statewide by 1997) S. Clark, personal
for children with mental health and/or communication,
substance use problems. November 19, 1997
ND Expanding use of CAFAS from 3 to all 8 At least since 1999 K. Moum, personal
state regions for outcome assessment communication,
and treatment planning. January 28, 2000
OH Component of pilot study during 1998–99. 1998 GUNTAC, 2000
Switched to Ohio Youth Scales in 2000.
OR Using the CAFAS statewide along with At least since 1999 GUNTAC, 2000
the CGAS for outcome evaluation.
SC Currently mandated for use in treatment At least since 1999 D. Mahrer, personal
planning and outcome evaluation in inpatient communication,
and outpatient child and adolescent programs. February 1, 2000
Also in process of developing criterion
scores for eligibility determination.
SD CAFAS is principal instrument used across At least since 1999 GUNTAC, 2000
inpatient and outpatient settings statewide.
TN Component of Children’s Plan Outcome Review 1994 Heflinger & Simpkins,
Team (C-PORT) used in evaluation of service 1997; O’Neal
system for all children in state custody. & Wade, 1998
VT Component of evaluation battery designed by At least since 2000 GUNTAC, 2000
University of VT Evaluation Team to create
linkages across multiple state grants.
VA Component of performance and outcome Summer, 1997 Koch & Brunk, 1998;
measurement system (POMS) being piloted Kirkman, Brunk,
statewide to assess outcomes of child/ & Cohen, 1999
adolescent public mental health services,
and used to determine Level of Care
for services funded by the Comprehensive
Services Act.
WV Component of assessment battery required for At least since 1999 GUNTAC, 2000
all children receiving Medicaid-reimbursed
behavioral health services.
Note. Information regarding current usage of evaluation instruments in children’s mental health services for each state can be found at
the Georgetown University National Technical Assistance Center (GUNTAC) website (http://www.dml.georgetown.edu/depts/pediatrics/
gucdc/eval.html).
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

68 Bates

Carolina are in the process of developing guide- tors at each site must collect. Similarly, the CAFAS
lines for using the CAFAS in eligibility determina- is mandated for use in all county system of care
tion (S. Lindsey, personal communication, January 28, projects funded by California State Assembly Bill
2000; K. Moum, personal communication, January 31, 3015 (18 counties; A. Rosenblatt, Wyman, Kingdon, &
2000). Ichinose, 1997). Table II lists these and some of the
Several recent changes in the way state mental additional research projects that have used or are cur-
health departments conduct business appear to have rently using the CAFAS as an outcome measure.
contributed to this rise in CAFAS usage. First, the in-
clusion of the functional impairment stipulation in the
CMHS definition of SED now requires states to oper- EVALUATION OF LOF MEASURES
ationally define and measure functional impairment
to receive federal block grant funding for treatment Given that global functioning plays an important
of youths with SED. Second, with many states adopt- role in the provision and evaluation of mental health
ing a managed care model of service delivery, third- services, and that the CAFAS in particular has been
party payers such as Medicaid are requiring documen- adopted on such a widespread scale, it is prudent to
tation of functional impairment to justify treatment evaluate LOF measures for their technical and prac-
decisions (Anderson et al., 1996; Srebnik et al., 1998). tical adequacy in serving these purposes. Several au-
Third, the fields of psychology and mental health have thors have offered criteria for selecting appropriate
sparked a demand for empirically justified treatment measures to assess treatment outcomes in studies of
methods, which has created the need to collect ob- service delivery in mental health settings (Green &
jective outcome data using instruments such as the Newman, 1996; Newman & Ciarlo, 1994; Newman,
CAFAS (Kazdin & Weisz, 1998; Task Force, 1995). Hunter, & Irving, 1987; Vermillion & Pfeiffer, 1993).
Although there are differences between these sets
Demonstration Projects of criteria, they seem to converge into the follow-
ing four broad features of desirable outcome mea-
In addition to statewide implementation, the sures: (a) strong psychometric properties, (b) validity
CAFAS is widely used as an outcome measure on for use with target populations, (c) ease of use, and
a smaller scale in local mental health settings and (d) utility. In the following sections, these guidelines
evaluation projects across the country. The CAFAS will be applied as a framework to discuss the evalua-
was developed as one of the outcome measures for tion of LOF measures in general and the CAFAS in
the Fort Bragg Evaluation Project (FBEP; Bickman, detail.
1996a, 1996b). This project has recently received Studies of the reliability and validity of LOF
much public scrutiny, primarily because the evalua- measures have generally produced mixed results; al-
tors reported no significant differences in outcomes though most suggest that their psychometric qualities
between the experimental and control groups. In a are moderate to good (Bird & Gould, 1995; Hodges
single issue, the American Psychologist (May, 1997) & Gust, 1995), others have characterized them as un-
devoted eight commentary articles in response to acceptable (Zimmerman, 1996). Perhaps where LOF
Bickman’s findings. The interest raised was due pri- measures excel is in their ease of use. Most LOF
marily to Bickman’s conclusion that the $80 million measures employ a simple methodology, have min-
experimental treatment—a system of care for youths imal cost, take little time to complete, and can usually
with SED—produced no better outcomes than the be completed by nonprofessionals, though perhaps
traditional mental health system control. Although with questionable validity (B. Green, Shirk, Hanze,
many have questioned Bickman’s conclusions (e.g., & Wanstrath, 1994; Hodges & Gust, 1995). For most
Friedman & Burns, 1996; Pires, 1997), it is clear that LOF scales, training materials are not available; yet
the CAFAS was an essential component of Bickman’s because of their simple methodology they may not be
arguments. needed. LOF scales also have high utility. They usu-
The CAFAS is also being used in other system of ally generate a single score that is easily applied to
care projects. The Center for Mental Health Services, clinical treatment and outcome assessment, and pro-
for example, has funded more than 40 sites nation- vide a common metric by which to compare clients
wide to develop, implement, and evaluate systems of with different diagnostic features.
care for youths with SED. The CAFAS was selected as Unidimensional scales assessing global function-
one of the mandatory outcome measures that evalua- ing have a long history of use in diagnosis, treatment,
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 69

Table II. Summary of CAFAS Use in Research and Demonstration Projects


Project title Description of project & CAFAS use Source
Fort Bragg (NC) Evaluation Demonstration project comparing youths who Bickman, 1996a, 1996b
Project (FBEP) received continuum of care mental health
services with those who received CHAMPUS-
funded services. CAFAS was one of many
outcome measures.
California Assembly Bill 3015- Eight counties in CA funded to develop, implement, Rosenblatt, et al., 1997
Funded County Sites and evaluate systems of care for youths with SED.
CAFAS is used as one component of evaluation.
CMHS-Funded Sites More than 40 nationwide sites funded to develop, “Comprehensive community
implement, and evaluate systems of care for mental health services for
youths with SED. CAFAS is mandated component children program,” 1999
of outcome evaluation.
Mental Health Services Program Using CAFAS scores to assess client outcomes Rotto, Sokol, Matthews,
for Youth (MHSPY) Replication and track service accountability of this system of & Russell, 1998
Project (Indianapolis, IN) care for youths with SED.
Anne E. Casey Foundation’s Using CAFAS scores to assess client outcomes, Gutierrez-Mayka, 1998
Mental Health Initiative for service fidelity, and clinical impact of
Urban Children community-based services for children at-risk
of out-of-home placement in three Boston
neighborhoods.
Wraparound Milwaukee (WI) Using CAFAS scores to evaluate a pilot study Kamradt, Kostan, &
of the effectiveness of “wraparound” services Pina, 1998
for youths with SED.
MENTOR (Boston, MA) Using CAFAS scores to assess outcomes for youths Altaffer & Stelk, 1998
served by this national provider of community-
based child/adolescent mental health programs.
Cleo Wallace Center (Westminster, CO) Using CAFAS scores at intake and discharge to Jacobson & Meyer, 1997
establish need for service and monitor treatment
outcomes in this residential psychiatric facility.
Youth Alliance of Central Georgia Using CAFAS scores at intake, at 3-month intervals, Feibelman, 1998
and discharge to assess progress of youths served in
a variety of mental health treatment facilities.
School and Community Study Using CAFAS to evaluate study of four model Oliveira, Rivera,
(KY & VT) school-based programs for inclusion of children Kutash, Duchnowski,
with SED in communities with a system of care. & Calvanese, 1998
Illinois State Board of Using CAFAS to evaluate study of community- Eber & Rolf, 1998
Education Sites based supports and services for children with
emotional and behavioral disabilities and
their families.
Prime Time Project Using CAFAS scores to describe and monitor Selby, Trupin, McCauley,
(King County, WA) adolescents enrolled in community-based & Vander Stoep, 1998
intervention for youths with SED and
involvement in the juvenile justice system.

and evaluation of mental health problems. The first evenly distributed anchor points, which contained no
generation of global level of functioning scales was the diagnostic categories (Friis, 1996). Both of these scales
Health-Sickness Rating Scale (HSRS) developed by were designed for use with adults.
Luborsky (1962). This was a 100-point scale with eight The most widely accepted and utilized unidimen-
descriptor anchor points. Although the HSRS was sional LOF scale for youths—the Children’s Global
easy to use, it was criticized because its anchor points Assessment Scale (CGAS, Shaffer et al., 1983)—was
included both behavioral descriptions and diagnostic developed as an adaptation of the GAS for use with a
categories and were unevenly distributed within its younger population. Similar to the GAS, it contained
total range (Friis, 1996). Developed as an improve- 10 anchor points evenly distributed between 0 and
ment over the HSRS, the Global Assessment Scale 100. Much of the wording of descriptors was signif-
(GAS; Endicott, Spitzer, Fleiss, & Cohen, 1976) was icantly altered, however, for use with children and
also a 100-point scale marked by the inclusion of 10 adolescents. The Global Assessment of Functioning
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

70 Bates

(GAF) scale was first introduced in 1987 as Axis V mented low coefficients of interrater reliability among
of the multiaxial diagnostic system of the DSM-III-R 20 experienced clinicians using the GAF (DSM-III-
(American Psychiatric Association, 1987). This scale R version; .54 for outpatients, .66 for inpatients) and
was conceptually very similar to the GAS and the the CGAS (.63 for outpatients, .53 for inpatients) to
CGAS, although with a range of only 0–90. Whereas rate children in inpatient and outpatient treatment
the descriptors of the CGAS were written exclusively settings. Thus, the interrater reliabilities of the CGAS
for use with children, the descriptors of the GAF were and the GAF appear to be adequate only under cer-
more general and designed for use with both adults tain conditions.
and children. With the publication of the DSM-IV Studies of the validity of the CGAS have also
(1994), the total range of the GAF was extended to 0– yielded mixed results. On the one hand, Bird et al.
100 by adding definitions for the 91–100 range of func- (1987) found moderate correlations (absolute value
tioning. Both the GAF and CGAS anchor points con- range = .40–.65) between CGAS ratings and scores
tain a mix of behavioral descriptions and symptoms. on the Child Behavior Checklist (CBCL; Achenbach,
There have been few published studies of the 1991). Using the criterion score of 70 on the CGAS to
reliability or validity of unidimensional global rating form impaired and nonimpaired groups, these authors
scales (Friis, 1996). It appears that most of the studies also found significant group differences on CBCL to-
that do exist were conducted using the CGAS. Relat- tal problem scores, clinical status (case–noncase), re-
ing to the stability of the CGAS, test-retest reliabil- ferral status (referred–nonreferred), and number of
ity coefficients have been generally positive (.74–.76, clinical diagnoses. Green et al. (1994), on the other
Bird, Canino, Rubio-Stipec, & Ribera, 1987; Canino hand, failed to find significant correlations between
et al., 1987; .69–.95, Shaffer et al., 1983). Relating to CGAS and CBCL scores, but reported that CGAS
interrater reliability, the evidence is more mixed. Gen- scores correlated significantly with indices of chil-
erally, interrater reliability has been adequate in stud- dren’s competence. Thus, these results provide some
ies using professional raters when information is gath- evidence, but not compelling support, for the CGAS’s
ered through case histories or in-person interviews. In valid use in making clinical treatment decisions.
the original study of the CGAS, for example, Shaffer Hodges and Gust (1995, p. 407) concluded that
et al. (1983) reported a high coefficient of interrater the CGAS “has satisfactory reliability and validity
reliability (.84). Raters in this study were five second- when used by professionals and when used in a sit-
year psychiatry fellows responding to case vignettes. uation in which there is minimal information vari-
In a second study in which the GAF (DSM-III ver- ance (i.e., information on which the score is based is
sion) was also completed, two child psychiatrists con- consistent across all raters).” These authors and oth-
ducted in-depth diagnostic interviews with both par- ers (Green et al., 1994; Rey et al., 1995) emphasized
ents of 191 children. Two additional child psychiatrists that more research is needed to assess the adequacy
completed ratings from observing the videotapes of of the CGAS and similar measures for use in less-
these interviews. The interrater reliability coefficients controlled applied settings. Hodges and Gust (1995)
for overall severity were .72 for the CGAS, and .74 suggested that the CGAS and other unidimensional
(current) and .73 (past 6 months) for the GAF (Bird global functioning measures are particularly vulnera-
et al., 1987). ble to respondent bias when the amount of available
Whereas these initial findings demonstrated information about the child is low. Whereas the threat
moderate support for the interrater reliability of the of bias is present to some extent in all rating scales,
CGAS and GAF, Green et al. (1994) argued that none one goal of scale development, use, and evaluation
of these studies used raters who were actually in- should be to minimize the degree to which respon-
volved in the treatment of the child. Addressing this dent bias contributes to the given score (Hodges &
issue, these authors found somewhat lower interrater Gust, 1995). Thus, to generate an estimate of level of
reliabilities for attending psychiatrist raters (.62), and functioning that is less prone to respondent bias, these
comparable reliabilities for milieu staff raters (.76), authors advocated the use of multidimensional scales
who completed the CGAS on 95 child hospital inpa- that attempt to measure global functioning across a
tients upon admission and discharge. Unsatisfactory variety of domains.
reliability coefficients were also reported in an earlier Although several multidimensional functioning
applied field study conducted by Herman (1983, as scales have appeared in the literature, descriptive or
cited in Hodges & Gust, 1995). A more recent study psychometric information (or both) about them is vir-
(Rey, Starling, Wever, Dossetor, & Plapp, 1995) docu- tually nonexistent. The Colorado Client Assessment
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 71

Record (CCAR; Ellis, Wilson, & Foster, 1984) appears ated the Child and Adolescent Functional Assess-
to have been the first multidimensional checklist of ment Scale (CAFAS). Adapted from the NCFAS
client functioning. The CCAR consists of 77 checklist (Bickman, Heflinger, Pion, & Behar, 1992), the
items in the following nine domains: socio-legal, sub- CAFAS initially contained five domains (Role Per-
stance use, medical/physical illness/injury, thinking, formance, Moods/Emotions, Behavior Toward Oth-
personal distress, personal behavior, interpersonal ers/Self, Thinking, and Substance Use) with possi-
behavior, interpersonal relations, role performance ble total scores ranging from 0 to 150. In later
(employment, academic training, and management of versions, the Role Performance subscale was divided
personal affairs), and meeting basic needs. Item and into School/Work, Home, and Community domains.
factor analyses were conducted to arrive at these do- The instrument was originally designed as an outcome
main groupings. Although the developers claim the measure in the Fort Bragg Evaluation Project (FBEP;
CCAR has an extensive research background, much Bickman et al., 1992) for use with children and adoles-
of it is unpublished. Unfortunately, no reliability data cents with severe emotional and behavioral disorders
on the CCAR are available and the only validity ev- (Hodges & Wong, 1996).
idence reported was that scores from a preliminary
version of the scale discriminated hospital from clinic
clients both at admission and at discharge (Ellis et al., Overview
1984).
The North Carolina Functional Assessment Scale The CAFAS (Hodges, 1989, 1997) is a rating scale
(NCFAS) is another multidimensional functioning designed to measure functional impairment across
scale that was adapted from the CCAR. Again, very multiple domains in children and adolescents, and
little information about the NCFAS is available in the their caregivers. Impairment is operationalized as the
literature. To date, only one study using the NCFAS degree to which the youth’s problems interfere with
has been published (Walker, Minor-Schork, Bloch, & his or her functioning in various life roles (e.g., stu-
Esinhart, 1996). According to these authors, the NC- dent, family member, worker, friend, citizen). To com-
FAS is a clinician-administered rating scale designed plete the scale, a rater reviews a list of 165 behav-
for use with adults. Level of functioning is rated along ioral descriptions and selects those statements that
six dimensions: role performance, emotional health, describe the child’s most severe level of function-
ability to care for basic needs, behavior, thinking, and ing during a given time period (usually the past 1–3
substance use. These scales are combined to yield a months). The list of behaviors fall into the following
global score ranging from 0 to 180, with a score of five domains:
40 or above indicating significant functional disabil-
1. Role Performance – effectiveness of the
ity (Walker et al., 1996). Published reliability or other
youth’s ability to fulfill societal roles, includ-
validity data are unavailable.
ing School/Work, Home, and Community
Studies of the reliability and validity of unidimen-
subscales;
sional LOF measures have yielded mixed results. In
2. Behavior Toward Others/Self – appropriate-
particular, these scales appear to be highly vulnerable
ness of the youth’s daily behavior;
to rater bias when used in conditions of high informa-
3. Moods/Self-Harm – modulation of the youth’s
tion variance. Whereas multidimensional LOF mea-
emotional life and extent to which youth
sures appear on the surface to resolve this problem,
demonstrates self-harmful behavior, including
the reliability and validity evidence to support this
Moods/Emotions and Self-Harmful Behavior
claim is essentially nonexistent. To date, the CAFAS
subscales;
is the only multidimensional LOF measure for which
4. Thinking – ability of the youth to use rational
published reliability and validity studies are available.
thought processes; and
5. Substance Use – the youth’s substance use and
the extent to which it is inappropriate and dis-
DESCRIPTION OF THE CAFAS
ruptive.
Both the CCAR and NCFAS were designed A second portion of the scale allows the rater to
for use with adults. In order to fill the need for a assess functional impairment in the caregiver. Be-
multidimensional global assessment scale for chil- cause these caregiver subscales are supplementary
dren and adolescents, Hodges (1989, 1997) cre- (Hodges, 1997) and the first five scales often are used
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

72 Bates

exclusively, this paper will focus only on the first por- is fully aware of these assumptions or not (Reckase,
tion of the instrument. Hodges and Wong (1996) sug- 1996). Without specifically addressing these assump-
gested that the CAFAS may be useful in (a) linking tions, the scale developer leaves it to the user to in-
level of care to level of need, (b) evaluating and plan- fer them from the scale’s construction and scoring
ning programs, (c) conducting client-oriented cost scheme.
outcome studies, and (d) providing consumer “report
cards.”
Scoring

Scale Development According to the author, each item on the


CAFAS is presented in specific behavioral terms and
There is currently no available information in the assigned to a given functional impairment score as
published literature explaining how the CAFAS was follows:
developed. Both Hodges (1997) and Bickman et al.
(1992) stated that the CAFAS was adapted from the 1. “30”: Severe—severe disruption or incapaci-
NCFAS scale as part of the FBEP. In fact, 67% of tation;
the items on the original version of the CAFAS were 2. “20”: Moderate—persistent disruption or ma-
duplicate or modified NCFAS items. One source— jor occasional disruption of functioning;
the Clinical Training Manual of the Children and 3. “10”: Mild—significant problems or distress;
Youth Performance Outcome Program implemented and
by the California Department of Mental Health (1997, 4. “0”: Minimal or No Impairment—no disrup-
p. 61)—provides some information about the origins tion of functioning.
of the CAFAS. According to this document, the au- On each subscale, multiple items are given for each
thor of the CAFAS severity level. To generate a score for a scale or
. . .made extensive modifications to the items and
subscale, the highest indicated level of severity is
scales of the NCFAS to render them more appropri- recorded, even if multiple items at that severity level
ate for children, and subsequently sought input from are endorsed. For example, a rater would assign a
40 experts on three separate occasions after each re- score of 20 (for Moderate impairment) to a subscale
vision of the developing instrument. Colleagues were whether one or three Moderate items were endorsed
selected who could provide input from a variety of
perspectives, including child psychopathology, nor-
(assuming that no Severe items were endorsed). Us-
mal development, and the special needs of Hispanic ing the original scoring scheme for the Role Perfor-
and Afro-American children. Suggestions were also mance and Moods/Emotions scales, the highest sub-
obtained from spokespersons for parent advocate scale score is recorded as the overall scale score. Thus,
groups. a child who is rated 20 on School/Work, 10 on Home,
No further information is available about the specific and 30 on Community would receive a Role Perfor-
methods used in the item selection and revision pro- mance score of 30 (the highest of the subscale scores).
cess, nor how the input and suggestions were obtained It is important to emphasize that, according to the
and used. From the available literature, it cannot be scoring directions suggested in the manual, only items
determined whether the CAFAS items and subscales within the maximum severity level endorsed for a
were primarily derived using empirical or rational given subscale are evaluated by the rater. Thus, if one
methods. or more items in the Severe category are endorsed,
According to the manual (Hodges, 1997), the the rater would skip the Moderate, Mild, and Mini-
CAFAS is not based on a particular theory of child mal items and proceed to the next subscale.
psychopathology. Thus, ratings are not intended to For each of the five CAFAS scales the possible
reflect any underlying etiology or dynamics regarding scores range from 0 to 30 (by tens). A total scale
the youth’s problems, but to profile the degree of dis- score is generated by summing the five-scale scores,
ruption in the youth’s current functioning. It may be and can range from 0 to 150. It should be noted
argued, however, that scale development (selection of that, although this scoring system is suggested in the
items, determination of content area, etc.) must be in- original manual (Hodges, 1989), the revised version
fluenced by some theoretical assumptions about child (Hodges, 1997) expands the scoring range to 0–240
development and functioning, whether the developer by retaining each of the three School/Work, Home,
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 73

Table III. Relationship Between Scoring Systems for the 5-Scale and 8-Scale Versions of the CAFAS
8-Scale Name of scale 5-Scale
¯
School/work ¯
¯
Community ¯→ Role performance (max. score)
¯
Home
Behavior toward others
¯
Moods/emotions ¯ Moods/emotions (max. score)
¯→
Moods/self-harm
Thinking
Substance use
Range (0–240) TOTAL SCORE Range (0–150)
Note. Each CAFAS subscale score ranges from 0 to 30 by tens, such that the total score for the 5-scale version has only 16
possible values (25 possible values for the 8-scale version).

and Community Role Performance scores and both younger children aged 4–7 years—the Preschool and
of the Moods/Emotions and Moods/Self-Harm scores Early Childhood Functional Assessment Scale (PEC-
(see Table III). One concerning trend is that multi- FAS; Hodges, 1997). Hodges and Wong (1996) found
ple scoring schemes have been employed in published no significant differences in CAFAS scores between
CAFAS studies. For example, Furlong, Casas, and col- gender and racial/ethnic groups, suggesting that it may
leagues (Robertson et al., 1998; J. Rosenblatt et al., be a useful component of culturally competent assess-
1998; Wood et al., 1998) employed the five-scale scor- ment. A Spanish language version is also available.
ing method following the original manual guidelines.
J. Rosenblatt and A. Rosenblatt (1999) employed the
eight- scale scoring method as suggested in the revised Raters and Training
manual. Lemoine and McDermott (1998) used the
five-scale scoring scheme and included the two care- The CAFAS was designed to be completed by
giver scales in the total score to generate a possible clinicians or other trained administrators who are
0–210 range. Hersch (1998) eliminated the Commu- working with the youth and family. It is also preferred
nity Role Performance and Substance Use subscales that raters have graduate training in a mental health
from the eight-scale scoring scheme to generate a field and “be knowledgeable about the spectrum of
possible total score range from 0 to 180. The state behavioral and emotional problems which [sic] chil-
of Indiana uses a miniscale version of the CAFAS dren may experience” (Hodges, 1997, p. 6-2). Non-
that includes two additional subscales—Environment clinicians may complete the scale, but it is suggested
and Reliance (J. Phillips, personal communication, that they receive full training and use the optional
January 28, 2000). With such nonconformity in scor- structured interview to collect information about the
ing the instrument, it is imperative to clearly specify youth. One particularly strong feature of the CAFAS
how the total scores were calculated when comparing is the availability of a well-developed training manual
CAFAS scores across studies or programs. with numerous training vignettes. Clinician raters may
use various sources of information to complete the
CAFAS, including interviews with the child and fam-
Target Population ily, interviews with other professionals familiar with
the child’s behaviors, and record reviews.
The CAFAS is intended for use with “children
and adolescents who have or may have emotional,
behavioral, substance use, psychiatric, or psycholog- EVALUATION OF THE CAFAS
ical problems” (Hodges, 1997, p. 1-1). This includes
youths who are referred for these problems or who The following section describes an evaluation of
are at risk for developing them. The author suggests the CAFAS, using the previously outlined criteria
that the CAFAS is particularly useful in assessing out- for assessing treatment outcome measures. Evidence
comes for youths with SED. It is intended for children from the manual (Hodges, 1997) and relevant arti-
aged 6–17 years, although there is also a version for cles are explored in the context of the categories of
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

74 Bates

psychometric properties, validity for use with target items of different severities be endorsed. This neces-
population, ease of use, and utility. sitates a correlation of zero between items in differing
impairment categories. As a result, estimates of coef-
ficient alpha will be greatly attenuated. It can there-
Psychometric Properties fore be concluded that internal consistency reliability
of the CAFAS has not been established. Given that
The following psychometric properties are this appears to be an inappropriate method of eval-
addressed: (a) internal consistency reliability, (b) in- uating LOF measures, however, this does not appear
terrater reliability, (c) stability of scores, (d) con- to be a critical weakness of the scale.
tent and structural validity, (e) concurrent valid-
ity, (f) criterion-related validity, and (g) predictive
validity. Interrater Reliability

In contrast to internal consistency reliability, ev-


Internal Consistency Reliability idence for interrater reliability for the CAFAS has
been well-documented (Hodges, 1997; Hodges &
Little information about the internal consistency Wong, 1996). Using 20 training vignettes and four
reliability of the CAFAS and its scales is available in discrete samples (N = 54) of undergraduate students,
the manual and none appears in published articles. graduate students, and child service agency staff
In the manual, Hodges (1997) stated that the inter- members, Hodges and Wong (1996) assessed inter-
nal consistency coefficient (Cronbach’s alpha) values rater reliability in two ways. First, they calculated
ranged from 0.63 to 0.68 for the different waves in Pearson product moment correlations between the
the FBEP (Breda, 1996), and cited her own psycho- raters scores and a criterion score for each vignette.
metric paper (Hodges & Wong, 1996) as the source Criterion scores were generated by consensus of the
for these data. Unfortunately, these data do not ap- primary author and a board-certified child psychia-
pear in this paper and therefore the context under trist. Pearson coefficients were then transformed to
which they were generated is unclear. In the manual, z-scores and averaged across raters. Second, they cal-
Hodges (1997) stated that these internal consistency culated intraclass correlations (ICC) based on anal-
values (0.63 to 0.68) “reflect on the homogeneity of ysis of variance procedures to provide an estimate
the scales of the CAFAS” and “are supportive of of raters agreement with each other. Aggregated
the reliability of the CAFAS” (p. 2-1). The author Pearson coefficients for each of the four samples
also stated that this reliability evidence is “especially ranged from .74 to .99; ICC correlations ranged from
true [sic] given that the separate scales are intended .63 to .96.
to assess different domains of impairment” (p. 2-1), Most of these values indicate good interrater re-
and that the reliability of the entire scale would de- liability. However, this method of reliability estima-
crease with the omission of any of the individual tion is suspect in that the reliability coefficients were
scales. generated from ratings of subscales, not individual
These arguments can be critically examined on items. Thus, it provides no information about the de-
several points. First, coefficient alpha values of 0.63– gree of agreement between raters on actual behav-
0.68 are generally considered relatively low and do iors, but only on severity of groups of behaviors. Two
not provide compelling evidence of internal consis- raters could disagree about the behaviors a given child
tency (Clark & Watson, 1995; Schmitt, 1996). Instead, exhibits, but appear to be perfectly reliable if these
lower values of coefficient alpha suggest variability in behaviors were assigned to equivalent severity cate-
item content, which may still be congruent with the gories. Furthermore, rater A could endorse five se-
desired goals of a particular scale depending on the vere items on a subscale whereas rater B could en-
heterogeneity of the construct under study (Clark & dorse only one severe item, yet by these methods
Watson, 1995; Schmitt, 1996). Second, the procedures they would demonstrate perfect agreement due to the
for completing the scale require selecting items in only maximum scoring criteria. It would be of interest to
the most impaired category on each subscale. Thus, on examine interrater reliability of the CAFAS, using in-
a given subscale, several items within the same impair- dividual items as the unit of analysis. It would also
ment category may be endorsed, but never can two be of interest to replicate this reliability study using a
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 75

larger sample of raters, using raters involved in child conditions—that is, when client information is varied
treatment, or using “real life” ratings as opposed to or limited.
vignettes.
According to the authors, values for the Think-
Stability of Scores
ing subscale were not reported “due to low frequency
of formal thought problems or organicity in the vi-
Only one study (Hodges, 1995) has examined
gnettes, which were designed to be representative of
the test-retest reliability of the CAFAS. In this
typical clinical presentations” (Hodges & Wong, 1996,
study, CAFAS ratings were gathered by two differ-
p. 499). This argument begets the following question:
ent raters at 1-week intervals via telephone inter-
why was this subscale included on the CAFAS if it
views with mothers of 56 youths. Interviews were
does not contribute to the assessment of “typical” clin-
conducted by trained graduate students. The Pearson
ical presentations? Is this subscale less important than
product—moment correlation coefficients between
the other scales? Even if this were the case, it would
the two scores were as follows: Total Score = 0.95;
still be necessary to explore the interrater reliability
Role Performance score = 0.84; Behavior Toward
of this subscale to establish reliability estimates for
Self and Others = 0.82; Moods/Emotions = 0.91; and
the entire CAFAS. Thus, it would be desirable to in-
Thinking = 0.89. No explanations were provided for
clude formal thought problems or organicity on future
the absence of correlations for the Substance Use sub-
interrater reliability training vignettes.
scale. Results of follow-up t-tests indicated no signif-
Ogles, Davis, and Lunnen (1999) tested the in-
icant differences between Time 1 and Time 2 ratings
terrater reliability of the CAFAS under two meth-
for any of the scale scores or the total score. In gen-
ods of presentation of case data: (a) manual vignettes
eral, these findings provide fairly strong evidence that
and (b) archival data from actual cases. As would
CAFAS scores are stable over a period of 1 week, us-
be expected, the interrater correlations for CAFAS
ing the interview protocol. Again, it would be infor-
total scores generated using manual vignettes by three
mative to explore the stability of scores generated by
groups of raters [undergraduate students (.88), grad-
clinician raters under actual usage conditions.
uate students (.89), and case managers (.94)] dropped
considerably by using actual case data (.66, .75, and
.55, respectively). A major drawback of this study, Content and Structural Validity
however, was small sample size: there were only four
raters in each of the three groups. To date, there is no available information in
Given these questions about the interrater reli- either the published literature or the manual con-
ability studies of the CAFAS, it cannot be concluded cerning the content validity of the CAFAS items.
that raters tend to agree with each other on specific As previously discussed, only one secondary source
items. It may be reasonably concluded, however, that (California Department of Mental Health, 1997) has
they do tend to agree fairly well with each other on the addressed the details of item selection. Given this lim-
severity of behaviors, at least on four of the five sub- ited information regarding the development of the
scales, in response to vignettes. Given that the severity scale, several problems emerge. First, it is unclear how
level (as opposed to the item level) is the suggested items were selected for inclusion in the scale, what the
level of analysis for the CAFAS, this evidence pro- underlying factor structure of the instrument is, and
vides, at the very least, moderate support for the inter- whether individual items represent the constructs to
rater reliability of the CAFAS when used under these which they were assigned. As most scale development
conditions. Closer inspection of the CAFAS training scholars agree (cf., Reckase, 1996), it is imperative
vignettes, in fact, reveals that case information often to have clear theoretical and empirical reasoning to
contains wording that is identical to that of individ- make meaningful decisions about inclusion of items
ual items on the scale, creating a rating situation that and creation of subscales. To demonstrate this reason-
is likely much less complex than actual usage condi- ing, for example, one might summarize ratings from
tions. Furthermore, as previously discussed, estimates expert judges (theoretical) or conduct factor analy-
of interrater reliability are maximized under condi- ses (empirical). Given that this reasoning is lacking,
tions when information variance is low. Thus, it would it must be concluded that the content validity of the
still be necessary to demonstrate that the interrater CAFAS is suspect. Second, given that the construct
reliability of the CAFAS holds up under actual usage of global functioning and its subscale domains are not
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

76 Bates

operationalized, it is unclear whether the items pro- of a youth’s functioning, but insufficient attention is
vide sufficient or excessive coverage. It is also unclear given to positive functioning on the CAFAS (as it
whether the CAFAS items represent the most theo- does not even affect the score). One might reasonably
retically or technically sound items from a larger pool, conclude that the CAFAS is aimed at assessing only
or whether they were subjected to any form of item impairment in functioning, rather than positive func-
analysis. tioning, for the majority of items are in the “Severe”
Evidence supporting the structural and scaling and “Moderate” severity levels. It thus appears that
validity of the CAFAS is also unavailable. In other the items in the “Minimal or No Impairment” sever-
words, there is no supporting evidence suggesting that ity category are superfluous to its use as an objective
items in the “Severe” (“Moderate,” etc.) category outcome measure.
actually reflect severe (moderate, etc.) functional im- Upon face inspection, there also appear to be a
pairment. On the School/Work Role Performance number of items with overlapping content. For exam-
scale, for example, the following items are scored 30 ple, it may be argued that the following items repre-
for “severe impairment”: sent essentially equivalent content:
#004 – “Harmed or made serious threat to hurt a #012 – “Non-compliant behavior which results in per-
teacher/peer/co-worker/supervisor. . .” sistent or repeated disruption of group functioning
#006 – “Chronic truancy resulting in negative conse- or becomes known to authority figures other than
quences (e.g., detention, loss of course credit, failing classroom teacher (e.g., principal) because of sever-
courses or tests, parents notified. . .)” ity and/or chronicity”
#008 – “Disruptive behavior, related to poor attention #013 – “Inappropriate behavior which results in per-
or high activity level, persists despite the youth hav- sistent or repeated disruption of group functioning
ing been placed in a special learning environment or becomes known to authority figures other than
or receiving a specialized program or treatment. . .” classroom teacher (e.g., principal) because of sever-
ity and/or chronicity” [emphases added]
It may be the case that these items do reflect a simi-
lar level of functional impairment. Conversely, these In the interest of parsimony, it would be desirable
items may be associated with different levels of im- to reduce item redundancy by eliminating extrane-
pairment. One could argue, for example, that a stu- ous items or combining items with similar content.
dent who harmed or threatened a teacher is more Item analysis—a critical step in scale construction and
functionally impaired than a student who received refinement—would be appropriate to achieve these
a detention for repeated truancy. What is clearly aims to select the “best” items for the scale.
needed is empirical evidence to demonstrate that the Another problem is that the suggested scoring
CAFAS items reflect a unidimensional continuum of system for the CAFAS employs a theoretically con-
severity and are appropriately scaled. Such evidence fusing scoring system that combines compensatory
would greatly enhance the construct validity of this and maximum strategies. Within each subscale, the
instrument. maximum severity is scored, and then these scores
One potential problem with the scaling of the are summed across subscales to generate a global
CAFAS is that the items comprising the “Minimal or score, apparently combining two theoretically com-
No Impairment” severity level do not contribute to peting scoring models. Certainly there are other scor-
the total score. Thus, a child’s subscale and total scores ing models that might be applied to the CAFAS (and
are not affected by whether these items are endorsed other multidimensional LOF scales) that might prove
or not. It appears then, that the purpose of these items equally or more valid. A variety of scoring models
is purely descriptive in nature. From initial inspection might be particularly useful for LOF assessment, in-
of the items, it appears that some attempt to reflect cluding compensatory, average, conjunctive, or dis-
the absence of impairment (e.g., #030 – “Functions junctive models. Because choice of scoring model can
satisfactorily even with distractions”), whereas oth- have significant impact on the relative ranking of re-
ers seem to reflect positive functioning (e.g., #037 – spondents (see Bates, 1999 for discussion), it is sug-
“Graduated from high school or received GED”). It gested that the original scoring system of the CAFAS
is interesting that both of these behaviors are scored be revisited and examined using empirical methods.
0 even though one could argue that they represent To address these problems, Bates (1999) recently
different levels of functioning. Certainly this informa- completed a study investigating the scaling properties
tion would be important to understanding the scope of CAFAS items. The study was conducted in three
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 77

phases. In Phase 1, a group of expert raters was asked total scores and scores on the CGAS was investi-
to indicate (a) the degree to which each of the items gated in the FBEP sample. Pearson correlations be-
represents the subscale construct to which it was orig- tween the CAFAS and the CGAS ranged from −0.72
inally assigned, and (b) how well the item tapped the to −0.91 for three time periods of data collection
given construct. In Phase 2, additional expert raters [Note: correlations are negative because higher val-
was asked to indicate their perceptions of severity by ues of CAFAS scores reflect greater impairment,
assigning to each item a severity rating on a 9-point whereas lower CGAS scores reflect greater impair-
scale. Using these ratings, successive intervals scaling ment]. There was also significant agreement between
techniques were used to generate weighted rankings the CAFAS and CGAS in categorization of youths
for each item. These rankings were then used to cal- in one of four levels of impairment: severe, moderate,
culate weights for each items, providing a method to mild, or slight/none. Although no further information
investigate the validity of the original scoring system. about this study is available, it does provide prelimi-
In Phase 3, CAFAS data were collected on a sample of nary, albeit limited, evidence of the construct validity
youths with SED enrolled in a cross-agency system of of the CAFAS.
care project for youths with serious emotional distur- In the second study (Hodges & Wong, 1996),
bance. CAFAS scores calculated with these derived also using the FBEP data, analyses were conducted to
item weights and a consistent average scoring model demonstrate the construct validity of the CAFAS by
were then compared with CAFAS scores generated by investigating its relationships with global measures of
the original method with reference to the strengths of psychopathology and problematic behaviors. Evalua-
their associations with other outcome measures such tion measures collected in the FBEP project included
as the CBCL, risk factors, and educational indicators. (a) the Child Assessment Scale (CAS; Hodges, 1990)
The results of this study generally failed to sup- and its parent form, the Parent Child Assessment Scale
port the suggested scoring system for the CAFAS and (PCAS; Hodges, 1990), which generate global scores
instead indicated that empirically guided alterations indicating general psychopathology; (b) the Burden of
in the scoring system performed as well or better than Care Questionnaire (BCQ; Bickman, 1996b; Bickman
the original in several measures of concurrent valid- et al., 1992), developed specifically for use with the
ity. Specifically, there were multiple occasions of item FBEP to assess objective and subjective burden ex-
order reversal, where the relative severity rankings us- perienced by parents of children with serious emo-
ing the empirically derived values were reversed com- tional or behavioral problems; and (c) the Child Be-
pared with the values suggested in the original version havior Checklist (CBCL; Achenbach, 1991) for the
(e.g., an item with an original severity of 20 had a parent, the Youth Self-Report (YSR; Achenbach &
higher empirically derived weight than an item with Edelbrock, 1983) for youths aged 11 and older, and the
an original severity of 30 did). More problematic was Teacher Report Form (TRF; Edelbrock & Achenbach,
the finding that empirically derived item weights did 1984) for the teacher—instruments designed to as-
not hold equivalency across subscales, such that items sess perceptions of problematic behaviors from mul-
with original severity values of 30 on the Community tiple informants. Correlations between the CAFAS
subscale, for example, were rated as much more se- and other global measures of problematic function-
vere than items with original severity values of 30 on ing across four points in time were as follows: PCAS
the School/Work subscale. These results call into ques- (.59, .62, .58, .63); CBCL (.42, .49, .48, .47); CAS (.54,
tion the structural validity of the CAFAS and should .56, .55, .52); and BCQ (.36, .42, .43, .42). As indi-
cause concern about the validity of current and past cated, moderate positive correlations were found for
usage of the instrument. all measures across all time periods, providing evi-
dence of concurrent validity between the CAFAS and
a constellation of problematic behaviors.
Concurrent Validity

There have been several investigations of the


concurrent validity of the CAFAS total score. In Criterion-Related Validity
the first study,4 the relationship between CAFAS
To measure the association between CAFAS
total scores and individual problematic behaviors
4 Thisstudy was reported in the CAFAS manual (Hodges, 1997), (data gathered through interviews with parents, and
but no reference was given. CBCL, TRF, and YSR scores), Hodges and Wong
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

78 Bates

(1996) bifurcated CAFAS total scores into two cat- dential) and total number of days on which any service
egories: presence and absence of pathology. For the was delivered. [At 6 months postintake, the number
first wave of data (intake), the authors used a total of service days ranged from 1 to 370, raising questions
score of 80 as the cutoff between the categories; a about how this variable was operationalized.] Results
score of 50 was used as a cutoff for the three follow-up indicated that even after controlling for the effects of
time periods of 6, 12, and 18 months postintake. Little other instruments (e.g., CBCL, CAS, PCAS, BCQ),
explanation was given of the rationale for choosing the CAFAS total scores significantly predicted these
these cutoff scores, other than the observation that indicators of service utilization at both follow-up time
approximately 20% of each sample comprised the periods, with proportion of unique variance explained
“pathological” group. [The authors stated that they ranging from .04 to .11. Although these values appear
“considered these respondents to be seriously im- low, the CAFAS total score was the single best predic-
paired” (p. 455), yet did not explain why they did tor of service utilization and cost. The results of addi-
not consider respondents who scored between 50 and tional analyses indicated that the CAFAS total score
70 at intake to also be seriously impaired.] A series in combination with psychiatric diagnostic informa-
of logistic regression analyses were then conducted tion [e.g., DSM-IV (1994) diagnosis] best predicted
using CAFAS category as a criterion and the fol- service utilization and cost.
lowing variable sets as predictors: (a) problems in
social relationships (with other children, other stu-
dents, siblings, parents, and teachers); (b) risk behav- Validity for Use With Target Population
iors (physically attacked people, threatened people,
talked about killing self); (c) involvement in juvenile The CAFAS was designed for use with youths
justice (arrested, convicted of crime, placed on pro- with a variety of emotional and behavioral prob-
bation, spent time in correctional facility, saw proba- lems, specifically those with SED. The psychometric
tion/law enforcement officers, detention center); and data previously presented were collected through the
(d) school-related behaviors (disliked school, skipped Fort Bragg Evaluation Project (Bickman, 1996b), a
school, disciplined in school, suspended, grades, hap- demonstration project comparing a continuum of care
piness at school, worked much less hard than others, with traditional mental health services for youths with
repeated grade). Results indicated that each of these SED. Thus, these psychometric data are clearly rele-
variables was highly significant in predicting CAFAS vant to the target population and do provide prelim-
category for at least one (and oftentimes all four) of inary support for its valid use with children and ado-
the time periods. The authors concluded that these lescents with SED. As previously discussed, SED is a
results provide support for the validity of the CAFAS heterogeneous diagnostic category covering a wide ar-
as a measure of impairment across multiple spheres ray of symptoms and problem behaviors. The CAFAS
of functioning. has face validity in that its items appear to cover the
breadth and depth of emotional and behavioral prob-
lems that children and adolescents with SED face. Yet,
Predictive Validity its construct validity would be enhanced with an item
analysis to ensure that item coverage is truly repre-
In a third study involving the FBEP, Hodges and sentative of SED. At the least, the strategies used to
Wong (1997) investigated the predictive validity of select items should be addressed in the manual.
the CAFAS total score. CAFAS total scores at in- In an attempt to demonstrate validity for use with
take were used to predict restrictiveness of care lev- a diverse population, Hodges (1996) also reported
els, cost of services, and number of services at both 6 that, using strict criteria, no significant differences
and 12 months postintake. Restrictiveness of care was in CAFAS were found across gender, racial/ethnic,
operationalized along the following continuum: out- or caregiver education level groupings. Although this
patient care, intensive nonresidential care, residential speaks to the comparability of CAFAS scores, it does
care (e.g., group home), residential treatment center, not provide sufficient evidence to conclude that the
and inpatient hospitalization. Cost of services was op- scale holds equivalent meaning across groups. Al-
erationalized as the total cost of all services received, though a discussion of necessary conditions to estab-
whereas number of services received was operational- lish equivalence is beyond the scope of this paper,
ized as the number of bed days (for inpatient or resi- Reid (1995) provided a well-reasoned model for
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 79

demonstrating cross-cultural equivalence of rating to compare youths with a variety of emotional and
scales. He highlighted the need to explore four forms behavioral difficulties, although there remain ques-
of equivalence: (a) linguistic – the degree to which tions about its validity. It also appears to be easily
“. . . content and grammar have similar connotative understood by nonclinicians. One potential problem
and denotative meaning across cultures” (Marsella & in tracking clinical changes, however, is that there is
Kameoka, 1989, p. 239 as cited in Reid, 1995); (b) con- no associated meaning with the scale’s intervals. This
ceptual – the degree to which constructs in assess- is essentially an issue of social or clinical validity. The
ment hold similar conceptual meaning; (c) scale – the clinical utility of the CAFAS would greatly benefit
degree to which raters share a common understanding from supporting evidence of this type (Sechrest et al.,
of the uses and metric of a scale; and (d) normative – 1996). Other evidence in support of the CAFAS’ util-
the degree to which norms developed for one culture ity comes from the extent to which it has been adopted
are appropriate for another. Thus, further study of the on both the state and local levels.
CAFAS items and structure, and how these are inter-
preted by various cultural groups (or other meaning-
ful groupings), needs to occur before the equivalence RECOMMENDATIONS FOR FUTURE
of the scale can be established. RESEARCH

The CAFAS has been implemented extensively


Ease of Use on the state and local levels. Despite this trend, there
is surprisingly little evidence supporting the psycho-
Hodges (1997) stated that the CAFAS takes metric and clinical validity of this scale. Indeed, it is
about 10 min to complete if the rater is very famil- disconcerting that the CAFAS has been so widely en-
iar with the child’s behavior and functioning. No time dorsed, especially at the legislative level, without em-
guidelines are given for raters who are unfamiliar with pirical demonstration of its validity for use in mak-
the child. In practical terms, the CAFAS may actually ing the types of treatment decisions for which it is
take longer than 10 min to complete, given the large currently being employed across the nation. Further-
number of items. Nonclinicians can use the CAFAS, more, there appears to be little concern (at least as
although it is suggested they receive full training and expressed in the available literature) among mental
use a structured interview to gather information. The health researchers, practitioners, administrators and
training materials included with the CAFAS are ex- state legislators about these apparent limitations of
tensive, consisting of detailed instructions for scoring, the CAFAS.
demonstration vignettes with ratings provided, and The widespread use of the CAFAS is an indica-
10 vignettes for testing rater reliability. To provide tion that there is a growing recognition of the need
a simple measure of the ease of use of the CAFAS, to assess multiple dimensions of functional impair-
states are required to report their evaluation instru- ment. With managed care and new regulations (e.g.,
ments used in children’s mental health services to changes in the federal definition of SED), policy mak-
the Georgetown University National Technical As- ers, administrators, treatment providers, and program
sistance Center (GUNTAC) and are asked to rate the evaluators are increasingly faced with the dilemma of
burden of these instruments on a 5-point scale (1: low selecting objective and valid measures of children’s
burden, 5: high burden). Of the 18 reporting this data functioning and having none to choose from. At the
on the CAFAS, more than half gave a rating of 4 or time this review was written, the CAFAS was the only
5 (mean = 3.6; GUNTAC, 2000). Thus, there does ap- children’s multidimensional LOF instrument with any
pear to be some burden associated with the use of the published articles on its technical merits. Several other
CAFAS. measures of children’s functioning have been intro-
duced and implemented in recent years (e.g., Ohio
Youth Scales, Ogles, Melendez, et al., 1999; Child and
Utility Adolescent Scale of Temperament and Life Function-
ing (CASTLE); and Child Functional Assessment Rat-
One of the strengths of the CAFAS, as with most ing Scale (CFARS), no references given but men-
LOF measures, is its clinical utility. The CAFAS total tioned in GUNTAC, 2000), yet no information about
score appears to provide a meaningful metric by which them exists in the published literature and it is unclear
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

80 Bates

to what extent they have demonstrated validity for the reported that studies using symptomatic crite-
purposes for which they are being used. Given this ria alone have overestimated the prevalence
context, the CAFAS may represent the best available rates of most childhood disorders, such that as
option, thus accounting for its widespread use. many as “one-third to one-half of the children
The potential benefits of the establishment of in a population have been found to meet crite-
objective and valid level-of-need criteria, using the ria for one or more diagnostic categories” (p.
CAFAS, are numerous, and clearly the interest in do- 92). When information about symptom sever-
ing so is high. Presumably, such a development would ity and dysfunctionality are included, these
lead to increased precision in decision-making for the rates drop to levels more in line with theoretical
delivery of mental health services, which would, in and clinical consensus. Demonstration of dis-
turn, lead to lower costs and better care management. criminant validity, using the CAFAS, therefore,
It may also lead to improvements in matching the level would presumably lead to fewer misclassifica-
of care to the level of client need. For any of these ben- tion rates and greater confidence in diagnos-
efits to be realized, however, the psychometric limita- tic decision making (e.g., type and amount of
tions of the CAFAS identified in this review need to services rendered; Herman & Mowbray, 1991;
be addressed. Toward this aim, the following section Srebnik et al., 1998).
provides suggested directions for further research on 4. Predictive validity – Perhaps the most pressing
the CAFAS. need is to establish the predictive validity of
the CAFAS. Because it is commonly used at
1. Technical properties – Further research is intake to make treatment decisions, it is vital
needed on the technical properties of the to investigate whether CAFAS scores predict
CAFAS, particularly its factor structure, inter- important treatment variables, such as number
rater reliability, and stability. A more in-depth and amount of services, length of treatment,
investigation of whether the subscale domains and cost of treatment. Newman and Tejeda
hold up to empirical analysis would be desir- (1996) reported the initiation of such a project
able. As discussed in this review, analyses of the through the Indiana Division of Mental Health
interrater reliability on an item level and test- (IDMH) and indicated that the CAFAS was
retest reliability are also needed. These analy- the intended instrument for use with children
ses should be performed on all of the CAFAS and adolescents (with a multidimensional adult
subscales, even if the incidence of certain items LOF measure to be developed by the first au-
(e.g., Thinking) is low in the clinical population. thor). Briefly, the aims of this project are to
2. Concurrent validity – Further evidence of the (a) investigate the psychometric properties of
CAFAS’s concurrent validity with additional the LOF measures, (b) track LOF and ser-
measures is needed. Potential concurrent mea- vice data from service providers, and (c) iden-
sures to establish the validity of the CAFAS tify “cost-homogenous groups” (or clusters of
might include another measure of LOF, more clients with similar LOF and service costs),
elaborate measures of school functioning, self- with the ultimate goal of providing data for
and parent-report measures of substance use, the creation of actuarial criteria. These crite-
juvenile justice indicators, diagnostic indica- ria will then be revisited and refined as needed
tors such as DSM-IV (1994) diagnoses, as well on an ongoing basis. Other states have initi-
as additional behavioral, social, and emotional ated similar projects, using the CAFAS (see
assessments. Table I; Behar & Stelle, 1997; Heflinger &
3. Discriminant validity – Studies of the CAFAS’s Simpkins, 1997; Hersch, 1998; Hodges, Warren,
discriminant validity would be useful in deter- & Wotring, 1998; Schwartz & Perkins, 1997)
mining whether scores can reliably differen- and other scales (Newman & Tejeda, 1996;
tiate between samples of interest (e.g., clini- Srebnik et al., 1998). Such studies might even-
cal vs. nonclinical). Many scholars (cf., Bird & tually produce valid and empirically based al-
Gould, 1995; Weissman, Warner, & Fendrich, gorithms for matching CAFAS scores with lev-
1990) have espoused the utility of LOF assess- els of care.
ment for improving the precision of nosologi- 5. Applications – A variety of other applications
cal diagnoses, such as employed in the DSM- of the CAFAS appear to hold promise to
IV (1994). For example, Bird and Gould (1995) increase its clinical utility in the delivery of
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 81

mental health services to youths. One innova- Achenbach, T. M., & Edelbrock, C. (1983). Manual for the child be-
tive approach to classification of psychopathol- havior checklist and revised child behavior profile. Burlington,
VT: University of Vermont, Department of Psychiatry.
ogy in both adults and children has involved the Altaffer, F., & Stelk, W. (1998, March). Thriving in the frenzy:
application of cluster analysis to client charac- Outcomes reporting in a national provider of child/adolescent
teristics, particularly LOF, to generate client services. Paper presented at the 11th Annual Conference, A
System of Care for Children’s Mental Health: Expanding the
typologies. Herman and Mowbray (1991), for Research Base, Tampa, FL.
example, assessed 2,447 adults with serious American Psychiatric Association. (1987). Diagnostic and statistical
mental illness using 16 scales of daily function- manual of mental disorders (3rd ed., Rev.). Washington, DC:
Author.
ing (e.g., community living, substance abuse, American Psychiatric Association. (1994). Diagnostic and statistical
level of health care needs) and subjected these manual of mental disorders (4th ed.). Washington, DC: Author.
data to cluster analysis to “organize and iden- Anderson, D. F., Berlant, J. L., Mauch, D., & Maloney, W. R. (1996).
Managed behavioral health care services. In P. R. Kongstvedt
tify the patterns within the rich array of in- (Ed.), The managed health care handbook (3rd ed., pp. 341–
formation provided by multidimensional LOF 366). Gaithersburg, MD: Aspen.
assessments” (p. 102). Results indicated six Bates, M. P. (1999). Global functioning within a system of care for
youths with serious emotional disturbance: A closer look at the
relatively homogenous clusters or client types Child and Adolescent Functional Assessment Scale (CAFAS).
that were then used to facilitate analysis of ser- Unpublished doctoral dissertation, University of California,
vice utilization patterns and population differ- Santa Barbara.
Behar, L., & Stelle, L. (1997). Criteria for accessing child mental
ences across service sites. Several other studies health and substance abuse services in North Carolina. In C.
have demonstrated the clinical utility of clus- Liberton, K. Kutash, & R. Friedman (Eds.), The 9th Annual
ter analyses in child populations (Lahey, et al., Research Conference Proceedings, A System of Care for Chil-
dren’s Mental Health: Expanding the Research Base, February
1988; McDermott & Weisz, 1995; Rosenblatt et 26 to February 28, 1996 (pp. 262–264). Tampa, FL: University
al., 1998; Wood et al., 1998). of South Florida, the Louis de la Parte Florida Mental Health
Institute, Research and Training Center for Children’s Mental
Herman and Mowbray (1991, p. 111) concisely Health.
supported the utility of cluster analysis as follows: Bickman, L. (1996a). A continuum of care: More is not always
better. American Psychologist, 51, 689–701.
The cluster analysis . . . has the descriptive advantages Bickman, L. (1996b). The evaluation of a children’s mental health
expected: it succinctly summarize [sic] a substantive managed care demonstration. Journal of Mental Health Ad-
amount of data about client functioning and sever- ministration, 23, 7–15.
Bickman, L., Heflinger, C. A., Pion, G., & Behar, L. (1992). Evalu-
ity levels overall, as well as about special treatment
ation planning for an innovative children’s mental health sys-
needs, such as health care and substance abuse. Data tem. Clinical Psychology Review, 12, 853–865.
from statewide studies in the past have been difficult Bird, H., Canino, G., Rubio-Stipec, M., & Ribera, J. C. (1987). Fur-
to use because agencies were presented with dozens ther measures of the psychometric properties of the Children’s
of variables and asked to meaningfully interpret why Global Assessment Scale. Archives of General Psychiatry, 44,
their clients are more functional than the state av- 821–824.
erages on some variables, less functional on others, Bird, H. R., & Gould, M. S. (1995). The use of diagnostic instru-
etc. With a cluster analysis, agencies are presented ments and global measures of functioning in child psychiatry
with simple statements . . . regarding the proportion epidemiological studies. In F. C. Verhulst & H. M. Koot (Eds.),
The epidemiology of child and adolescent psychopathology (pp.
of clients they serve from each cluster. Thus, cluster
86–103). Oxford, UK: Oxford University Press.
analysis based on LOF data appears to be a burgeon- Breda, C. S. (1996). Methodological issues in evaluating mental
ing and useful technique to summarize and interpret health outcomes of a children’s mental health managed care
an often overwhelming amount of client data. demonstration. Journal of Mental Health Administration, 23,
In sum, the CAFAS holds substantial promise 40–50.
as an important tool for use in diagnosis, treatment, Burlingame, G. M., Lambert, M. J., Reisinger, C. W., Neff, W. M.,
and evaluation of youths with EBD. This review sug- & Mosier, J. (1995). Pragmatics of tracking mental health out-
gests that the technical adequacy of the CAFAS has comes in a managed care setting. Journal of Mental Health
yet to be clearly established. It is hoped that more Administration, 22, 226–236.
California Department of Mental Health. (1997). The Children and
defensible, empirically based methods can be used to
Youth Performance Outcome Program: Clinical training man-
assess children’s global functioning more efficiently ual. Sacramento, CA: Author.
and accurately, with the ultimate goal to enhance al- Canino, G., Bird, H. R., Rubio-Stipec, M., Woodbury, M. A.,
location of resources and service delivery to youths Ribera, J. C., Huertas, S. E., & Sesman, M. (1987). Reliability of
with emotional and behavioral disorders. child diagnosis in a Hispanic sample. Journal of the American
Academy of Child Psychiatry, 26, 560–565.
Center for Mental Health Service. (1999, December 17).
REFERENCES Comprehensive community mental health services
for children program. Washington, DC: Author Re-
trieved December 17, 1999 from the World Wide Web:
Achenbach, T. (1991). Child behavior checklist manual, 1991. http://www.mentalhealth.org/publications//allpubs/CA-0013/
Burlington, VT: University of Vermont. ccmhse.htm#TOP
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

82 Bates

Clark, L. A., & Watson, D. (1995). Constructing validity: Basic Tampa, FL: University of South Florida, the Louis de la Parte
issues in objective scale development. Psychological Assess- Florida Mental Health Institute, Research and Training Center
ment, 7, 309–319. for Children’s Mental Health.
Daniels, L. V., & Clements, L. (1997). The utilization of the Child Herman, S. E., & Mowbray, C. T. (1991). Client typology based
and Adolescent Functional Assessment Scale for assessing on functioning level assessments: Utility for service planning
program and clinical outcomes, mental health policy, and and monitoring. Journal of Mental Health Administration, 18,
child outcomes in Missouri. In C. Liberton, K. Kutash, & R. 101–115.
Friedman (Eds.), The 9th Annual Research Conference Pro- Hersch, P. (1998). Implementing eligibility determination process
ceedings, A System of Care for Children’s Mental Health: Ex- for children’s mental health services in Massachusetts, Char-
panding the Research Base February 26 to February 28, 1996 acteristics of youth: The first six months. In C. Liberton, K.
(pp. 420–423). Tampa, FL: University of South Florida, the Kutash, & R. Friedman (Eds.), The 10th Annual Research Con-
Louis de la Parte Florida Mental Health Institute, Research ference Proceedings, A System of Care for Children’s Mental
and Training Center for Children’s Mental Health. Health: Expanding the Research Base, February 23 to Febru-
Eber, L., & Rolf, K. (1998). Education’s role in the system of care: ary 26, 1997 (pp. 377–381). Tampa, FL: University of South
Student/family outcomes. In C. Liberton, K. Kutash, & R. Florida, the Louis de la Parte Florida Mental Health Institute,
Friedman (Eds.), The 10th Annual Research Conference Pro- Research and Training Center for Children’s Mental Health.
ceedings, A System of Care for Children’s Mental Health: Ex- Hodges, K. (1989). Child and Adolescent Functional Assessment
panding the Research Base February 23 to February 26, 1997 Scale. Unpublished manuscript, Eastern Michigan University,
(pp. 175–180). Tampa, FL: University of South Florida, The Ypsilanti.
Louis de la Parte Florida Mental Health Institute, Research Hodges, K. (1990). Child Assessment Schedule. Unpublished
and Training Center for Children’s Mental Health. manuscript, Eastern Michigan University, Ypsilanti.
Edelbrock, C., & Achenbach, T. (1984). The teacher version of the Hodges, K. (1995, March). Psychometric study of a telephone in-
Child Behavior Profile: I. Boys aged 6–11. Journal of Consult- terview for the CAFAS using an expanded version of the scale.
ing and Clinical Psychology, 52, 207–217. Paper presented at the 8th annual research conference: A Sys-
Ellis, R. H., Wilson, N. Z., & Foster, F. M. (1984). Statewide treat- tem of Care for Children’s Mental Health: Expanding the Re-
ment in outcome assessment in Colorado: The Colorado Client search Base, Tampa, FL.
Assessment Record (CCAR). Community Mental Health Jour- Hodges, K. (1996). Summary of psychometric data on the CAFAS.
nal, 20, 72–89. Ann Arbor, MI: Author.
Endicott, J., Spitzer, R. L., Fleiss, J. L., & Cohen, J. (1976). The Hodges, K. (1997). CAFAS manual for training coordinators, clini-
Global Assessment Scale. Archives of General Psychiatry, 33, cal administrators, and data managers. Ann Arbor, MI: Author.
766–771. Hodges, K., & Gust, J. (1995). Measures of impairment for children
Fedral Register 29422–29425. (1993, May 20). and adolescents. Journal of Mental Health Administration, 22,
Feibelman, N. D., III. (1998). A system of care for children’s men- 403–413.
tal health. In C. Liberton, K. Kutash, & R. Friedman (Eds.), Hodges, K., & Wong, M. M. (1996). Psychometric characteristics of
The 10th Annual Research Conference Proceedings, A System a multidimensional measure to assess impairment: The Child
of Care for Children’s Mental Health: Expanding the Research and Adolescent Functional Assessment Scale. Journal of Child
Base February 23 to February 26, 1997 (pp. 43, 44). Tampa, and Family Studies, 5, 445–467.
FL: University of South Florida, The Louis de la Parte Florida Hodges, K., & Wong, M. M. (1997). Use of the Child and Adolescent
Mental Health Institute, Research and Training Center for Functional Assessment Scale to predict service utilization and
Children’s Mental Health. cost. Journal of Mental Health Administration, 24, 278–290.
Friedman, R. M., & Burns, B. J. (1996). The evaluation of the Fort Hodges, K., Warren, B., & Wotring, J. (1998, March). The develop-
Bragg demonstration project: An alternative interpretation of ment of a set of criteria for determining levels of care for youth
the findings. Journal of Mental Health Administration, 23, 128– with SED based on empirical data. Paper presented at the 11th
136. Annual Research Conference, A System of Care for Children’s
Friis, H. L. S. (1996). Routine evaluation of mental health: Re- Mental Health: Expanding the Research Base, Tampa, FL.
liable information or worthless ‘guesstimates’? Acta Psychi- Hodges, K., Wong, M. M., & Latessa, M. (1998). Use of the Child
atrica Scandinavica, 93, 125–128. and Adolescent Functional Assessment Scale (CAFAS) as an
Georgetown University National Technical Assistance Center. outcome measure in clinical settings. Journal of Behavioral
(1999). Evaluation Instruments [Table]. Washington, DC: Au- Health Services and Research, 25, 325–336.
thor. Retrieved January 24, 2000 from the World Wide Individuals with Disabilities Education Act, 20 U.S.C. Sec. 1400
Web: http://www.dml.georgetown.edu/depts/pediatrics/gucdc/ (1990).
instruments 1.html Irvin, E., & Hersch, P. (1997). Proposed eligibility criteria and pro-
Green, B., Shirk, S., Hanze, D, & Wanstrath, J. (1994). The Chil- cedures for enrollment in Department of Mental Health con-
dren’s Global Assessment Scale in clinical practice: An em- tinuing care. In C. Liberton, K. Kutash, & R. Friedman (Eds.),
pirical evaluation. Journal of the American Academy of Child The 9th Annual Research Conference Proceedings, A System
and Adolescent Psychiatry, 33, 1158–1164. of Care for Children’s Mental Health: Expanding the Research
Green, R. S., & Newman, F. L. (1996). Criteria for selecting out- Base, February 26 to February 28, 1996 (pp. 264–267). Tampa,
come instruments to assess treatment outcomes. Residential FL: University of South Florida, the Louis de la Parte Florida
Treatment for Children and Youth, 13, 29–48. Mental Health Institute, Research and Training Center for
Gutierrez-Mayka, M. (1998, March). Findings from an evaluation of Children’s Mental Health.
a community-based intervention for children at-risk in Boston, Jacobson, C. V., & Meyer, T. (1997). Assessment of patient func-
MA. Paper presented at the 11th Annual Conference, A Sys- tioning in a child and adolescent psychiatric facility. In C.
tem of Care for Children’s Mental Health: Expanding the Re- Liberton, K. Kutash, & R. Friedman (Eds.), The 9th Annual Re-
search Base, Tampa, FL. search Conference Proceedings, A System of Care for Children’s
Heflinger, C. A., & Simpkins, C. G. (1997). CAFAS: Evaluating Mental Health: Expanding the Research Base, February 26 to
statewide service. In C. Liberton, K. Kutash, & R. Friedman February 28, 1996 (pp. 297–301). Tampa, FL: University of
(Eds.), The 9th Annual Research Conference Proceedings, A South Florida, The Louis de la Parte Florida Mental Health
System of Care for Children’s Mental Health: Expanding the Institute, Research and Training Center for Children’s Mental
Research Base February 26 to February 28, 1996 (pp. 415–420). Health.
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

Review and Current Status of the CAFAS 83

Kamradt, B., Kostan, M. J., & Pina, V. (1998). Wraparound Milwau- Parte Florida Mental Health Institute, Research and Training
kee: Two year follow-up on the Twenty-Five Kid Project. In C. Center for Children’s Mental Health.
Liberton, K. Kutash, & R. Friedman (Eds.), The 10th Annual McDermott, P. A., & Weiss, R. V. (1995). A normative typol-
Research Conference Proceedings, A System of Care for Chil- ogy of healthy, subclinical, and clinical behavior styles among
dren’s Mental Health: Expanding the Research Base, February American children and adolescents. Psychological Assessment,
23 to February 26, 1997 (pp. 225–228). Tampa, FL: University 7, 162–170.
of South Florida, The Louis de la Parte Florida Mental Health Newman, F. L. (1980). Global scales: Strengths, uses, and problems
Institute, Research and Training Center for Children’s Mental of global scales as an evaluation instrument. Evaluation and
Health. Program Planning, 3, 257–268.
Kazdin, A. E., & Weisz, J. R. (1998). Identifying and developing Newman, F. L., & Ciarlo, J. A. (1994). Criteria for selecting psycho-
empirically supported child and adolescent treatments. Journal logical instruments for treatment outcome assessment. In M.
of Consulting and Clinical Psychology, 66, 19–36. E. Maruish (Ed.), The use of psychological testing for treatment
Kirkman, C., Brunk, M., & Cohen, R. (1999). Determining levels planning and outcome assessment (pp. 98–110). Hillsdale, NJ:
of need for decategorized funding of services for children with Lawrence Erlbaum Associates.
emotional and behavior disturbance. In J. Willis, C. Liberton, Newman, F. L., Hunter, R. H., & Irving, D. (1987). Simple measures
K. Kutash, & R. Friedman (Eds.), The 11th Annual Research of progress and outcome in the evaluation of mental health
Conference Proceedings, A System of Care for Children’s Men- services. Evaluation and Program Planning, 10, 209–218.
tal Health: Expanding the Research Base, March 8 to March 11, Newman, F. L., & Tejeda, M. J. (1996). The need for research that is
1997 (pp. 21–26). Tampa, FL: University of South Florida, The designed to support decisions in the delivery of mental health
Louis de la Parte Florida Mental Health Institute, Research services. American Psychologist, 51, 1040–1049.
and Training Center for Children’s Mental Health. O’Neal, L., & Wade, P. (1998, March). Use of CAFAS and case
Koch, J. R., & Brunk, M. (1998). An outcomes management sys- reviews for outcome evaluation. Paper presented at the 11th
tem for child/adolescent public mental health services. In C. Annual Research Conference, A System of Care for Children’s
Liberton, K. Kutash, & R. Friedman (Eds.), The 10th Annual Mental Health: Expanding the Research Base, Tampa, FL.
Research Conference Proceedings, A System of Care for Chil- Ogles, B. M., Davis, D., & Lunnen, K. M. (1999). Inter-rater reli-
dren’s Mental Health: Expanding the Research Base, February ability of four measures of youth functioning. In J. Willis, C.
23 to February 26, 1997 (pp. 359–363). Tampa, FL: University Liberton, K. Kutash, & R. Friedman (Eds.), The 11th Annual
of South Florida, the Louis de la Parte Florida Mental Health Research Conference Proceedings, A System of Care for Chil-
Institute, Research and Training Center for Children’s Mental dren’s Mental Health: Expanding the Research Base, March 8
Health. to March 11, 1997 (pp. 321–326). Tampa, FL: University of
Lahey, B. B., Pelham, W. E., Schaughency, E. A., Atkins, M. S., South Florida, The Louis de la Parte Florida Mental Health
Murphy, H. A., Hynd, G., Russo, M., Hartdagen, S., & Lorys- Institute, Research and Training Center for Children’s Mental
Vernon, A. (1988). Dimensions and types of attention deficit Health.
disorder. Journal of the American Academy of Child and Ado- Ogles, B. M., Melendez, G., Davis, D. C., & Lunnen, K. M.
lescent Psychiatry, 27, 360–365. (1999). The Ohio Youth Problems, Functioning, and Satisfac-
Lambert, M. J. (1994). Use of psychological tests for outcome as- tion Scales: User’s manual. Unpublished manuscript, Ohio Uni-
sessment. In M. E. Maruish (Ed.), The use of psychological versity, Athens.
testing for treatment planning and outcome assessment (pp. 75– Oliveira, B., Rivera, V. R., Kutash, K., Duchnowski, A. J., &
97). Hillsdale, NJ: Lawrence Erlbaum. Calvanese, P. K. (1998). The school and community study:
Lambert, M. J., & McRoberts, C. H. (1993, April). Outcome mea- Summary of preliminary baseline data. In C. Liberton, K. Ku-
surement in JCCP: 1986–1991. Paper presented at the meeting tash, & R. Friedman (Eds.), The 10th Annual Research Con-
of the Western Psychological Association, Phoenix, AZ. ference Proceedings, A System of Care for Children’s Mental
Lemoine, R. L., & McDermott, B. E. (1998). Assessing levels and Health: Expanding the Research Base, February 23 to Febru-
profiles of service need using the CAFAS. In C. Liberton, K. ary 26, 1997 (pp. 141–146). Tampa, FL: University of South
Kutash, & R. Friedman (Eds.), The 10th Annual Research Con- Florida, The Louis de la Parte Florida Mental Health In-
ference Proceedings, A System of Care for Children’s Mental stitute, Research and Training Center for Children’s Mental
Health: Expanding the Research Base, February 23 to Febru- Health.
ary 26, 1997 (pp. 371–375). Tampa, FL: University of South Pires, S. A. (1997). Lessons learned from the Fort Bragg Demon-
Florida, the Louis de la Parte Florida Mental Health Institute, stration: An overview. In S. A. Pires (Ed.), Lessons learned
Research and Training Center for Children’s Mental Health. from the Fort Bragg Demonstration (pp. 1–21). Tampa, FL:
Lemoine, R., Speier, T., Ellzey, S., & Pine, J. (1997). Using the Child University of South Florida, Louis de la Parte Florida Mental
and Adolescent Functional Assessment Scale (CAFAS) to es- Health Institute, Research and Training Center for Children’s
tablish level-of-need for Medicaid managed care services. In Mental Health.
C. Liberton, K. Kutash, & R. Friedman (Eds.), The 9th Annual Pokorny, L. J. (1991). A summary measure of client level of func-
Research Conference Proceedings, A System of Care for Chil- tioning: Progress and challenges for use within mental health
dren’s Mental Health: Expanding the Research Base, February agencies. Journal of Mental Health Administration, 18, 80–87.
26 to February 28, 1996 (pp. 267–270). Tampa, FL: University Reckase, M. D. (1996). Test construction in the 1990s: Recent ap-
of South Florida, the Louis de la Parte Florida Mental Health proaches every psychologist should know. Psychological As-
Institute, Research and Training Center for Children’s Mental sessment, 8, 354–359.
Health. Reid, R. (1995). Assessment of ADHD with culturally different
Luborsky, L. (1962). Clinicians’ judgments of mental health: A pro- groups: The use of behavioral ratings scales. School Psychology
posed scale. Archives of General Psychiatry, 7, 407–417. Review, 24, 537–560.
Massey. T., Kershaw, M. A., Armstrong, M., Shepard, J., & Wu, Rey, J. M., Starling, J., Wever, C., Dossetor, D. R., & Plapp,
L. (1998). The children’s performance outcome measures: Re- J. M. (1995). Inter-rater reliability of global assessment of
sults after six months. In C. Liberton, K. Kutash, & R. Friedman functioning in a clinical setting. Journal of Child Psychology
(Eds.), The 10th Annual Research Conference Proceedings, A and Psychiatry, 36, 787–792.
System of Care for Children’s Mental Health: Expanding the Robertson, L. M., Bates, M. P., Wood, M., Rosenblatt, J. A., Furlong,
Research Base, February 23 to February 26, 1997 (pp. 353– M. J., Casas, J. M., & Schweir, P. (1998). Educational place-
358). Tampa, FL: University of South Florida, the Louis de la ments of students with emotional and behavioral disorders
P1: GAE
Clinical Child and Family Psychology Review (CCFP) PP083-04-297983 February 27, 2001 16:45 Style file version Nov. 07, 2000

84 Bates

served by probation, mental health, public health, and social nile justice system. In C. Liberton, K. Kutash, & R. Friedman
services. Psychology in the Schools, 35, 333–345. (Eds.), The 10th Annual Research Conference Proceedings, A
Rosenblatt, A., Wyman, N., Kingdon, D., & Ichinose, C. (1997). System of Care for Children’s Mental Health: Expanding the
Managing what you measure: Creating outcome driven systems Research Base, February 23 to February 26, 1997 (pp. 339–
of care for youth with serious emotional disturbance. Unpub- 344). Tampa, FL: University of South Florida, The Louis de la
lished manuscript. Parte Florida Mental Health Institute, Research and Training
Rosenblatt, J., & Rosenblatt, A. (1999). Academic achievement Center for Children’s Mental Health.
and mental health functioning: An illusory or realistic relation- Shaffer, D., Gould, M. S., Brasic, J., Ambrosini, P., Fisher, P.,
ship? In J. Willis, C. Liberton, K. Kutash, & R. Friedman (Eds.), Bird, H., & Aluwahlia, S. (1983). A Children’s Global Assess-
The 11th Annual Research Conference Proceedings, A System ment Scale (CGAS). Archives of General Psychiatry, 40, 1228–
of Care for Children’s Mental Health: Expanding the Research 1231.
Base, March 8 to March 11, 1997 (pp. 112–117). Tampa, FL: Srebnik, D., Uehara, E., & Smukler, M. (1998). Field test of a tool
University of South Florida, The Louis de la Parte Florida for level-of-care decisions in community mental health sys-
Mental Health Institute, Research and Training Center for tems. Psychiatric Services, 49, 91–97.
Children’s Mental Health. Task Force on Promotion and Dissemination of Psychological Pro-
Rosenblatt, J., Robertson, L., Bates, M., Wood, M., Furlong, M. cedures. (1995). Training in and dissemination of empirically-
J., & Sosna, T. (1998). Troubled or troubling? Characteristics validated psychological procedures: Report and recommenda-
of youths referred to a system of care without system-level tions. Clinical Psychologist, 48, 3–23.
referral constraints. Journal of Emotional and Behavioral Dis- Vermillion, J., & Pfeiffer, S. (1993). Treatment outcomes and con-
orders, 6, 42–54. tinuous quality improvement: Two aspects of program evalu-
Rotto, K. L., Sokol, P. I. T., Matthews, B., & Russell, L. (1998, ation. Psychiatric Hospital, 24, 9–14.
March). A practitioners view of outcomes. Paper presented at Walker, R., Minor-Schork, D., Bloch, R., & Eisenhart, J. (1996).
the 11th Annual Conference, A System of Care for Children’s High risk factors for rehospitalization within six months. Psy-
Mental Health: Expanding the Research Base, Tampa, FL. chiatric Quarterly, 67, 235–243.
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psycho- Weissman, M. M., Warner, V., & Fendrich, M. (1990). Applying
logical Assessment, 8, 350–353. impairment criteria to children’s psychiatric diagnosis. Journal
Schwartz, A., & Perkins, S. (1997). Criteria used in determining of the American Academy of Child and Adolescent Psychiatry,
appropriateness of service utilization in Arizona. In C. Lib- 29, 789–795.
erton, K. Kutash, & R. Friedman (Eds.), The 9th Annual Re- Wood, M., Rosenblatt, J. A., Furlong, M. J., Robertson, L. M., Bates,
search Conference Proceedings, A System of Care for Children’s M. P., & Casas, J. M. (1998). Evaluating system of care clini-
Mental Health: Expanding the Research Base, February 26 to cal outcomes by youth risk profiles. In C. Liberton, K. Ku-
February 28, 1996 (pp. 270–272). Tampa, FL: University of tash, & R. Friedman (Eds.), The 10th Annual Research Con-
South Florida, the Louis de la Parte Florida Mental Health ference Proceedings, A System of Care for Children’s Mental
Institute, Research and Training Center for Children’s Mental Health: Expanding the Research Base, February 23 to Febru-
Health. ary 26, 1997 (pp. 407–414). Tampa, FL: University of South
Sechrest, L., McKnight, P., & McKnight, K. (1996). Calibration of Florida, The Louis de la Parte Florida Mental Health In-
measures for psychotherapy outcome studies. American Psy- stitute, Research and Training Center for Children’s Mental
chologist, 51, 1065–1071. Health.
Selby, P. M., Trupin, E. W., McCauley, E., & Vander Stoep, A. Zimmerman, D. P. (1996). A comparison of commonly used treat-
(1998). The Prime Time Project: Preliminary review of the first ment measures. Residential Treatment for Children and Youth,
year of a community-based intervention for youth in the juve- 13, 49–69.

You might also like