Inter-Rater Reliability of Cognitive Behavioral Case Form Ulations of D Epression: A Replication

Cogniti ve Th erapy and Research , Vo l. 23, No. 3, 1999, p p.
271-283
Inter-Rater Reliability of Cognitive ± Behavioral Case

Form ulations o f D epression: A Replication
Jacq ue lin e B . Pe rsons1,3 an d A ndre w B ertagn olli 2
We developed a m o del of cogn itive± behavio ral case form ulatio n an d tested se veral
hyp otheses ab out therap ists’ ab ility to use it to obtain cogn itive± beh avio ral form ula-
tio ns of cases of depressed patien ts. We tested w hether clin ician s, usin g m easu res w e
developed, could correctly id entify patien ts’ o vert problem s an d agree on assessm ents
of patien ts’ underlyin g schemas. Clin ician s offered cogn itive± behavio ral form ulatio ns
for three cases after listen in g to au dio tap es of in itial in ter view s with depressed w om en
cond ucted by the ® rst au thor in her pri vate practice. Therap ists id enti® ed 67% of
patien ts’ o vert problem s. When schem a ratin gs w ere a veraged o ver ® ve ju dges, in ter-
rater reliab ility w as go od (in ter-rater reliab ility coef® cients a veraged 0.72); sin gle
ju dges show ed poor in ter-rater agreem ent on sch ema ratin gs (in ter-rater reliab ility
coef® cients averaged 0.37). Pro vid in g therap ists w ith a speci® c context in w hich to
m ake ratin gs did not im pro ve schema agreem ent. Ph.D .-train ed therap ists w ere m ore
accurate than non-Ph.D .-train ed therap ists in id entifyin g patien ts’ problem s. Most
® ndin gs replicated those ob tain ed in an earlier study.
K E Y WOR D S: case formulation; inter-rate r re liability; schemas.
O ne goal of cognitive ± be havior therapy (CB T) is to solve ove rt proble ms by

changing cognitions and be haviors. Change in unde rlying cognitions, or schem as,
is also conside re d quite important, both in the proce ss of tre ating ove rt proble ms
and to pre ve nt relapse . The re fore, reliable methods for asse ssing patie nts’ ove rt
proble ms and unde rlying sche mas are ne e de d. The importance of case formulation
to the practice of CB T is re ¯ e cted in the fact that the ne west me asure of cognitive
the rapy adhe re nce include s ite ms inte nde d to assess the the rapist’ s use of an individ-
ualize d formulation (Lie se, 1995) .
Pe rsons ( 1989, 1993a; Pe rsons & Tompkins, 1997) de ve lope d a frame work for
conce ptualizing case s from a cognitive ± be havioral point of vie w. Cognitive ±
B e havioral Case Formulation e mphasize s the importance of ide ntifying the patie nt’ s
1
Unive rsity of California, San Francisco, and San Francisco B ay A rea Ce nter for Cognitive Therapy,
O akland, California.
2
California School of Profe ssional Psychology, A lame da /B e rke ley, California.
3
Please direct corre spondence to Dr. Pe rsons at the San Francisco B ay A rea Center for Cognitive
Therapy, 5435 College A venue , O akland, California 94618.
271
0147-5916/99/0600-0271$1 6.00/0 Ó 1999 Ple num Publishing Corporation
272 Pe rso ns and B e rtagno lli
ove rt proble ms and specifying the unde rlying schemas, or core belie fs, that, whe n
activate d by life e ve nts, are postulate d to cause the ove rt proble ms (cf. B eck, Rush,
Shaw, & E mery, 1979) .
The Cognitive ± B ehavioral (CB ) Case Formulation model asks the rapists to
make a list of the patie nt’ s ove rt proble ms; the se are concre te dif® cultie s, such as
de pre ssive symptoms, fe ar of fre eway driving, social anxie ty, binge e ating, le gal
proble ms, ® nancial dif® cultie s, and inte rpe rsonal con¯ icts. Using the CB Case For-
mulation mode l, the rapists make a compre he nsive proble m list, identifying both
the proble ms the patie nt asks for he lp with as well as othe rs that the patie nt may
not mention. The ne ed for a comprehensive proble m list is base d on the notion
that if the the rapist knows about not only the patie nt’ s state d prese nting proble m,
but also of othe r proble ms that the patie nt may have but may not spontane ously
re port ( see also Nezu & Nezu, 1993; Surbe r, 1994; Turkat & Maisto, 1985) . For
e xam ple , depressed patie nts ofte n abuse substance s; if the the rapist treating a
de pre sse d patie nt is not aware of the patie nt’ s substance abuse , this proble m can
unde rmine the de pre ssion treatme nt. In the pre sent study, we te st the hypothe sis
that the rapists, following brief training that e mphasize s the importance of a compre-
he nsive proble m list and provide s some guide line s for making a proble m list, can
make a compre he nsive proble m list for a patie nt.
The cognitive ± be havior the rapist also identi® es sche mas, or core belie fs, that
the the rapist hypothe size s unde rpin and cause the ove rt proble ms whe n activate d
by life e ve nts or situations. In the CB Case Formulation, therapists ide ntify the
patie nt’ s vie ws of se lf, othe rs, and the world. In the pre sent study, we te st the
hypothe sis that the rapists, following some brie f training, can agre e on ratings of
schemas for a particular patie nt. We asse ss whe ther the rapists can agre e on sche mas
rathe r than whe ther the ir schema ratings are accurate be cause no crite rion measure
of a pe rson’ s sche mas is available .
Few inve stigators have studie d cognitive ± be havioral case conceptualization .
B e ckham e t al. (1984) showed that the rapists were 76% accurate in ide ntifying, for
a particular patie nt (four patie nts were studie d), the unde rlying sche mas chosen
by anothe r te am of clinicians as characte ristic of that patie nt. Muran and colle ague s
(Muran & Segal, 1992; Muran, Se gal, & Samstag, 1994) de ve lope d an idiographic
asse ssment of patie nts’ se lf-schemas base d on the cognitive mode l; this mode l
focuse s only on the patie nt’ s vie ws of se lf. In an e arlie r study (Pe rsons, Moone y, &
Pade sky, 1991) , we found that clinicians usually ide nti® e d 65% or more of patie nts’
ove rt proble ms, and when groups of ® ve judge s were ave rage d, reliability coe f® cients
re ¯ e cting agre e ment on sche ma ratings ave rage d .76. Inte r-rater re liability of sche ma
identi® cation was poor for single judge s (re liability coe f® cients ave rage d .46) .
The pre se nt study was conducte d with the hope of incre asing the re liability and
validity ratings obtaine d in our e arlie r study. To improve the rapists’ ability to ide ntify
patie nts’ ove rt proble ms, we taught them to conside r a speci® c list of proble m domains
whe n making a proble m list, using a list base d on work by Nezu and Nezu (1993) . The
proble m domains were : psychiatric symptoms and proble ms (e.g., de pre ssive symp-
toms, panic attacks) ; inte rpe rsonal proble ms; work dif® cultie s; ® nancial dif® cultie s;
he alth proble ms; housing proble ms; and re creational dif® cultie s.
To improve sche ma ratings, we adde d anchor points to the rating scale and
Cognitive ± B eh av io ral Case Fo rm u lation 273
provide d more e xample s in our teaching. We also offe re d clinicians some spe ci® c
conte xts to conside r whe n the y made the ir sche ma ratings; that is, we aske d clinicians
to make sche ma ratings for a patie nt who had a public spe aking anxie ty by conside r-
ing what the patie nt’ s vie ws of se lf, othe rs, and the world might be in that particular
situation. We predicte d that clinicians whould be more like ly to agre e on sche ma
ratings whe n ratings were made in a spe ci® c conte xt than whe n no conte xt was
provide d. This pre diction was base d on the notion that the conte xt, which was chose n
be cause it was proble matic for the patie nt, might provide some initial hypothe ses to
clinicians about the type s of sche mas that are commonly activate d in that situation
(e .g., a public spe aking situation commonly activate s ``se lf’ ’ sche mas about inade -
quacy and `òthe r’ ’ sche mas about criticism) .
What de te rmine s a therapist’ s accuracy in ide ntifying proble ms and agre e ment
with othe r clinicians on sche ma ratings? The answe r to this que stion has implications
for training and se lection of the rapists. We e xpe cted that clinicians with Ph.D .-
leve l training might have more specialize d training in a wide range of re late d tasks
and skills, and thus might pe rform bette r. We e xpe cted that clinicians with pre vious
training in case formulation of any type might perform be tter on this task. We also
e xpe cted that those with spe cialize d cognitive , be havioral, or CB me thods or who
use CB T me thods more might ® nd the tasks more familiar and e asie r and might,
the refore , pe rform bette r. We expe cted that clinicians with more e xpe rie nce might
have had more practice with the se or similar tasks and might, there fore , perform
be tte r. We colle cted de mographic and training information from the the rapists to
te st the se hypothe se s.
In summary, we have deve lope d a mode l of CB case formulation that calls for
the therapist to ide ntify the patie nt’ s ove rt proble ms and sche mas like ly to unde rly
those proble ms. In this study, we te ste d the hypothe se s that, using this mode l and
the insstrum ents de ve lope d he re , and following a brief (2 A hours) training, clinicians
can accurate ly ide ntify patie nts’ ove rt proble ms and can agre e with one anothe r
on ratings of patie nts’ sche mas about themse lve s, othe rs, and the world. We teste d
the hypothe sis that the rapists would agre e more on sche mas when schema ratings
while conside ring the patie nt in a spe ci® c conte xt than whe n no context was pro-
vide d. We also teste d the hypothe se s that Ph.D.-le ve l training, training in case
formulation , training in CBT, and clinical expe rie nce would improve clinicians’
pe rformance on these tasks.
ME THOD
Su bjects
Clinician subje cts were 47 me ntal health profe ssionals who participate d in a
day-long training /research workshop in CB case formulation conducte d by the ® rst
author. Nine subje cts were clinicians who attende d the workshop when it was give n
at the annual conve ntion of the A ssociation for A dvance ment of B e havior The rapy
in A tlanta, Georgia, in Nove mbe r, 1993. Thirty-e ight subje cts were clinicians who
atte nde d the workshop whe n it was give n at the V .A . Medical Ce nte r in Palo
A lto ± Me nlo Park in July 1994. Forty-se ve n me ntal he alth profe ssionals attended
the Palo A lto se ssion; data from four subje cts were discarde d because the y had no
clinical expe rience ( the y were re se arche rs or administrat ors) and data from ® ve
subje cts were discarde d be cause they were incomple te ; the re fore , thirty-e ight clini-
cians provide d comple te data at the Palo A lto site . B ecause all clinicians re ceived
the same training and provide d the same measure s, data from the A tlanta and
Palo A lto sample s were combine d. Demographic and training characte ristics of the
clinicians are pre se nte d in Table I.
Patie nt subje cts we re two de pre sse d and anxious wome n ( ``Megan’ ’ and ``Lisa’ ’ )
tre ate d by the ® rst author in her private practice . A third case se rved as a practice
case (this was the ® rst case studie d in Pe rsons et al., 1995; ``Megan’ ’ and ``Lisa’ ’
have not be e n studie d be fore ). A ll patie nts gave writte n permission allowing the ir
the rapy se ssions to be studie d. The practice case was a 23-ye ar-old stude nt who
met A xis I criteria for Major Depre ssion and Ge neralize d A nxie ty Disorde r. Me gan
was a 32-ye ar-old inve ntory manage r at a large de partm e nt store who was living with
he r boyfrie nd. She me t crite ria for Major Depre ssion, Dysthymia, and Personality
Disorde r NO S ( avoidant and passive ± aggre ssive fe ature s). Lisa was a 56-ye ar-old
house wife who was living with he r husband. She met crite ria for Major Depression,
Dysthymia, Social Phobia, Undiffe re ntiate d Somatoform Disorde r (multiple physi-
cal complaints not fully explaine d by a known medical condition) , Depe nde nt
Pe rsonality Disorde r, and A voidant Personality Disorde r. Patie nts are described
more fully in the Results se ction title d `Ò btaining a Crite rion Proble m List.’ ’
Measu re s
Problem List
Clinicians were aske d to list patie nts’ ove rt proble ms and to provide a fe w
words of detail about e ach proble m. Clinicians were give n space to list a maxim um
of eight proble ms for each case , in a free -re sponse format.
Tab le I. Demograp hic and Training Characteristics of

Clinicians (N 5 47)
Characte ristic Me an or % (SD )
Pe rce nt fe male 66.7 a

Highest degree
Pe rce nt Ph.D. 44.7
Pe rce nt M.A . or M.S.W. 44.7
Pe rce nt B .A . 10.6
Pe rce nt students 12.8
Unlicensed 19.0 a
Pe rce nt with previous training in 63.0
case formulation
Hours training in CBT case for- 173.9 (589.5) a
mulation
Hours training in cognitive ther- 1290.9 (3206.2) a
apy (CT), be havior therapy
(B T), or CBT
Hours /we e k doing CBT 6.0 (8.1) b
Ye ars of clinical expe rie nce 10.0 (7.8)
a
n 5 42.
b
n 5 45.
Schemas
A multiple -choice que stionnaire assessed clinicians’ judgm ents about each pa-
tie nt’ s views of se lf, othe rs, and the world. The que stionnaire liste d 15 adje ctive s
de scribing the clie nt’ s vie w of se lf, othe rs, and the world. Clinicians were aske d,
``Ple ase rate the stre ngth of (patie nt’ s pse udonym )’ s belie f in e ach ite m using this
scale from 0 to 10,’ ’ whe re the 0 point on the scale was labe le d ``no be lie f’ ’ and 10
was labe le d ``ve ry strong be lie f.’ ’
A dje ctive s de scribing se lf, othe rs, and world were as follows. Se lf: de fe ctive ;
wonde rful; passive ; spe cial; weak, fragile ; strong; inade quate ; e ntitle d; unimportan t;
no good; re sponsible for othe rs; bad; incompe tent; unable to cope on my own;
unde se rving, unworthy. O the rs: unsupportiv e; strong; weak; supportive , he lpful;
dominating, controlling; important; critical; abusive ; abandoning; tre ating me un-
fairly; unavailable ; stupid; passive ; unconce rne d about me; self-ce nte re d. The world:
bad; pre dictable ; cruel; be ne vole nt; dange rous; male vole nt; ove rwhe lming; negative ;
unfair; unpre dictable ; e mpty, purpose less; pote ntially catastrophic ; ful® lling; unre -
warding; challe nging. The se ite ms were se lected from a large r set of items use d by
the ® rst author in he r formulations in a se t of approxim ate ly 50 case s of depresse d
outpatie nts tre ate d in he r practice and from ite ms use d in a pre vious study (Pe rsons
e t al., 1995) .
Clinicians provide d thre e se ts of sche ma ratings for Me gan and Lisa. Clinicians
rated the se patie nts’ schemas without any context instructions and in two spe ci® c
conte xts. (Clinicians we re not give n any conte xt instructions for the practice case .)
The two spe ci® c conte xts for Me gan were : ``When Megan is at work, functioning
as a manage r’ ’ and ``Whe n Me gan is inte racting with he r boyfrie nd.’ ’ The two
conte xts for Lisa were : ``Whe n Lisa is in a public-spe aking situation’ ’ and ``Whe n
Lisa is inte racting with he r husband.’ ’
D emograp hics an d Train in g
A brief que stionnaire aske d clinicians to provide inform ation about de mo-
graphic characte ristics, training, and clinical e xpe rie nce .
Procedure
In the morning of the workshop day, the ® rst author pre sente d didactic mate rial
on CB Case Formulation. Next, to practice the formulation proce ss, clinicians
liste ned to an audiotape of the ® rst 12 minute s of an initial se ssion conducte d with
a practice case by the ® rst author and comple te d the case formulation measure s
de scribe d pre viously. The n the ® rst author provide d some fe edback about the case
and the formulation .
In the afte rnoon, clinicians listene d to audiotape s of two initial se ssions ( Megan
and Lisa) of CB T conducte d by the ® rst author and comple ted the case formulation
measure s de scribe d pre viously. A udiotape s were e dite d to de le te identifying infor-
mation, se gme nts in which the inte rvie wer summarized the proble m list or formula-
tion, and re dundancie s; each audiotape d se gment was about 35 minute s long. When
liste ning to the audiotape , rate rs also had a type d transcript of the audiotape .
A fte r re ceiving some fe e dback about the cases, participants comple te d de mo-
graphic and workshop e valuation questionnaire s.
R E SU LTS
We te ste d four hypothe ses: (1) clinicians can accurate ly ide ntify patie nts’ ove rt
proble ms; (2) clinicians can agre e with one anothe r on ratings of sche mas unde rpin-
ning a patie nt’ s ove rt proble ms; ( 3) clinicians agre e more on sche ma ratings whe n
ratings are made in a spe ci® c context than whe n no conte xt is provide d; (4) clinicians
with Ph.D .-le ve l training, training in case formulation , training in CB T, or more
clinical e xpe rie nce perform bette r on the se tasks than those without Ph.D .-leve l
training, with less training in case formulation, less CB T training, and with less e xpe -
rience .
Ide nti® catio n of Ove rt Proble m s

To te st the hypothe sis that clinicians can accurate ly identify patie nts’ ove rt
proble ms, we calculate d the proportion of clinicians who re cognize d the proble ms
liste d on a crite rion proble m list for each case .
O btain in g Criterio n Problem L ists

The criterion proble m list for the practice case was de ve lope d in a pre vious
study (Pe rsons et al., 1995) and was base d on judgme nts of two e xpe rts (J. Pe rsons
and K. Moone y) . Criterion proble m lists for the case s of ``Megan’ ’ and ``Lisa’ ’ we re
de ve lope d by thre e clinicians (the authors of the pre se nt study and a graduate
stude nt). Information use d to de ve lop the criterion proble m lists include d the ® rst
author’ s e xtensive knowle dge of the case s base d on he r tre atme nt of both patie nts
and pilot work in which six the rapists in training and nine practicing clinicians
provide d proble m lists for both case s.
Criterio n Pro blem L ists

The crite rion proble m list for the practice case had thre e ite ms (family prob-
lems, guilt, and social isolation) . This list was shorte r than the list for the othe r
case s be cause it was base d on only the ® rst 12 minute s of the initial inte rvie w rathe r
than on the e ntire inte rvie w, as was done for the case s of Me gan and Lisa.
The crite rion proble m list for Me gan had e ight ite ms: work dif® cultie s; dif® cul-
tie s in re lationship with boyfrie nd; depression /anxie ty; `èscaping,’ ’ procrastinat ion;
not pursuing creative inte re sts; dif® cultie s in relationships with friends; smoking;
avoiding driving. The crite rion proble m list for Lisa had six ite ms: fatigue , fre que nt
illne sses; depression /re se ntme nt; ge neralize d anxie ty; social anxie ty; marital dif® -
cultie s; inte rpe rsonal dif® cultie s (unasse rtive ness, con¯ ict).
Scorin g Clin ician s’ Problem L ists

For e ach proble m on e ach crite rion list, clinicians re ceive d a score of 1 if their
proble m list include d that proble m and 0 if it did not. Scoring was ge nerous;
clinicians receive d a score of 1 if the proble m in question occurred anywhe re (eve n
as a subproble m of anothe r proble m or as an aside ) on the clinician’ s proble m list.
The decision rule used by rate rs to de termine whe the r the clinician had re cognize d
the proble m was: `Ìf I were supe rvising this clinician with this case , would I fe e l
that the clinician was ``ge tting’ ’ the proble m? ’ ’
Inte r-rate r reliability of raters’ scoring of clinicians’ proble m lists was high.
For the A tlanta sample , the two authors score d the proble m lists for Lisa for the
® rst four subje cts and the n compare d ratings and re ® ned the ir scoring criteria. The n
the y scored all the re maining subje cts’ proble ms lists for all case s; the raters agre e d
87% on those ratings. For the Palo A lto sample, the two authors score d the proble m
lists for all thre e case s provide d by six randomly sele cted clinician subje cts. The
two judge s agre ed on 93% of ratings and the re fore the se cond author score d the
proble m lists for the remaining clinician subje cts.
Clin ician s’ Identi® catio n of Criterio n Problem s

The practice case had thre e proble ms, Me gan had e ight, and Lisa six, for a
total of 17 proble ms across all thre e case s. O n ave rage , clinicians rated 16.23 (SD 5
2.10) proble ms (a fe w clinicians did not rate one of the cases) . O n ave rage , of the
16.23 proble ms they rate d, clinicians corre ctly ide nti® ed 10.94 (SD 5 2.59) proble ms.
O f the proble ms rated, the ave rage pe rcentage corre ctly ide nti® ed was 67.46%
(SD 5 13%). Thus, clinicians corre ctly ide nti® e d about two-thirds of the proble ms
the y rated.
Inter-R ate r R e liab ility of Sche m a R atin gs

To test the hypothe sis that clinicians can agre e on schema ratings, we asse sse d
inte r-rater re liability of sche ma ratings by calculating intraclass corre lation coe f® -
cie nts (ICC; Shrout & Fle iss, 1979) se parate ly for e ach case for e ach cate gory of
schema ( vie ws of se lf, othe r, and world) for e ach case and e ach conte xt for e ach
case . The ICC is e sse ntially a ratio of the proportion of variance in ratings due to
``targe ts’ ’ divide d by the sum of the proportion of variance due to ``targe ts’ ’ plus
the porportion of variance due to ``judge s.’ ’ If the proportion of variance due to
judge s is low and the proportion of variance due to targe ts is high, the n the ICC
approache s 1 and inte r-judge re liability is high.
In our analyse s, the ``judge s’ ’ were the clinicians who provide d ratings, and
the ``targe ts’ ’ were not individuals, as in the usual ICC computation; inste ad, targe ts
were the individual ite ms in e ach cate gory (vie ws of self, othe r, world) . For e xample ,
the ``targe ts’ ’ in the ICC analysis for ``se lf’ ’ are the 15 adje ctive s that describe the
se lf. O ur ICC, thus, is e sse ntially a ratio of the proportion of variance in ratings
due to `ìte ms’ ’ divide d by the sum of the proportion of variance due to `ìte ms’ ’
plus the proportion of variance due to ``judge s.’ ’ If the variance due to judge s is
low and the variance due to ite ms is high, the n the inte r-rate r re liability is high. If
the variance due to judge s is high, the n the inte r-rater re liability is low.
The repe ated-measure s analysis of variance that unde rlie s the ICC computa-
tions assume s inde pe nde nce from targe t to targe t. Howe ve r, with ite ms re placing
targe ts, it is like ly that the re is some corre lation among ite ms. Howe ver, the inde pe n-
de nce assumption is ne e de d only for statistical infe re nce , and we are using the ICC
he re as a de scriptive statistic. The me thod used he re is also the method adopte d
by the Mount Z ion re se arche rs (Curtis e t al., 1988; Rosenberg e t al., 1986) and in
our own previous work ( Pe rsons et al., 1995) .
Table II pre se nts ICCs for e ach case and type of rating for a single , random
judge and for a me an of a random sample of ® ve judge s. The ICC for a single judge
is the e stimated ratio of variance due to targe ts to the sum of variance due to
targe ts and judge s (e ven though it is for a single judge ). Whe n more than one judge
is use d, the variance due to judge s goe s down, so re liability goe s up. To say this
anothe r way: A s the numbe r of judge s upon which a rating is base d incre ases (from
one to ® ve ), the re liability of the rating incre ases (Horowitz e t al., 1989) . We chose
the ® gure ® ve because clinical mee tings he ld to discuss and formulate a case might
involve a group of that size.
A s Table II shows, inte r-rate r re liability coef® cie nts were good for ® ve judge s
(ranging from 0.44 to 0.91 and ave raging 0.72) and poor for single judge s (ranging
from 0.13 to 0.66 and ave raging 0.37) . The se ® gure s were very similar to those
obtaine d in a pre vious study of the practice case (inte r-rate r re liability coe f® cie nts
ave rage d 0.46 for single judge s and 0.80 whe n ave rage d ove r ® ve judge s).
E ffe cts of Context o n Sche m a A gre em e nt

To test the hypothe sis that inte r-rater agre e ment would be highe r whe n spe ci® c
conte xts were provide d than when the y were not, an analysis of variance using z-
Table II. Inter-Rater Re liability for Clinicians’ (N 5 47) Judgme nts of Sche mas of
Self, O ther, and World for Thre e Cases in Ge ne ral and Spe ci® c Contexts
Single Five
judge judge s
Practice Case
V iews of se lf 0.35 0.73
V iews of others 0.55 0.86
V iews of world 0.34 0.72
Megan
V iews of se lfÐ ge ne ral 0.50 0.83
V iews of se lfÐ manager context 0.28 0.66
V iews of se lfÐ boyfriend context 0.25 0.63
V iews of othersÐ general 0.24 0.61
V iews of othersÐ manage r conte xt 0.17 0.51
V iews of othersÐ boyfriend conte xt 0.35 0.73
V iews of worldÐ general 0.31 0.70
V iews of worldÐ manager conte xt 0.20 0.56
V iews of worldÐ boyfriend context 0.33 0.71
L isa
V iews of se lfÐ ge ne ral 0.55 0.86
V iews of se lfÐ public speaking conte xt 0.66 0.91
V iews of se lfÐ husband conte xt 0.38 0.75
V iews of othersÐ general 0.38 0.75
V iews of othersÐ public spe aking conte xt 0.39 0.76
V iews of othersÐ husband context 0.62 0.89
V iews of worldÐ general 0.37 0.75
V iews of worldÐ public speaking context 0.40 0.77
V iews of worldÐ husband conte xt 0.13 0.44
Note: Intraclass corre lation coe f® cie nts (Shrout & Fle iss, 1979) are pre sented.
transforme d ICC value s was compute d. Inde pe nde nt variable s were CA SE (prac-
tice, Me gan, Lisa) , V IE W ( self, othe r, world) , and CO NTEXT (spe ci® c conte xt,
no conte xt) . A n inte raction variable for V IE W 3 CO NTEXT was also ente re d in
the mode l. The ove rall R-square of the mode l was 0.59; none of the inde pe nde nt
variable s or the inte raction e ffe ct were statistically signi® cant at the p , .05 le vel.
Thus, contrary to pre diction, rate rs did not agre e more often on schema ratings whe n
spe ci® c context ratings of sche mas were made than when no context was provide d.
E ffe cts of Train in g an d E xp erie nce

Prob lem Identi® catio n
We te ste d the hypothe sis that clinicians with Ph.D .-leve l training, those with
more training in case formulation, those with more training in CB T, or those with
more clinical expe rience identify proble ms more accurate ly than clinicians without
Ph.D.-le ve l training, with le ss training in case formulation , with le ss CB T training,
or with le ss expe rience. To te st this hypothe sis, we conducte d a multiple re gression
analysis, in which the de pende nt variable was the logit-transfo rmed proportion of
proble ms corre ctly ide nti® e d and the inde pende nt variable s were: Ph.D. (code d 0-
no or 1-ye s), prior training in case formulation (coded 0-no or 1-ye s), hours training
in CB case formulation , hours of training in CB T, hours of wee kly CB T provide d,
and ye ars of clinical e xpe rie nce.
Results of this analysis for an N of 38 subje cts show that the ove rall mode l is
statistically signi® cant (R-square is 0.34, p 5 0.034) and only one inde pende nt
variable , Ph.D .-le vel training, was statistically signi® cant (p 5 0.019) . The re siduals
of this mode l were normally distribute d (p 5 0.46) .
Inter-rater Reliab ility of Schem a Ratin gs

We teste d the hypothe se s that the rapists with Ph.D.-le ve l training, with more
training in case formulation, more training in CB T, or with more ye ars of e xpe rie nce
were more like ly to agre e with one anothe r on schema ratings than those without
Ph.D.-le ve l training, with le ss training in case formulation, le ss training in CB T, or
with fe wer ye ars of e xpe rie nce . To do this we be gan, by calculating, for e ach judge ,
for e ach vie w (se lf/othe r/world) , for e ach context, and for each case , an `àgre e ment
inde x.’ ’ The `àgre e ment inde x’ ’ is a correlation ( Pe arson product ± mome nt corre la-
tion) be twe e n a particular judge ’ s rating and the ave rage of the othe r judge s’ ratings,
divide d by the ave rage correlation among all judge s. A ve rage corre lations are
compute d afte r transform ing using Fishe r’ s Z transformation. This method is recom-
mende d by William s (1976) and we used it in an e arlie r study (Persons e t al., 1995) .
We used the `àgre e ment inde x’ ’ rathe r than the ICC be cause the ICC for a single
judge provide s informatio n about the de gre e to which a single judge agre es, on
ave rage , with any othe r single judge ; howeve r, in orde r to examine pre dictors of
inte r-rater re liability, we ne e ded a ® gure that would me asure the degre e to which
the ratings of each particular judge agre e d with the ratings of all of the othe r judge s
in the sample .
A multiple regre ssion was conducte d using the `àgre e ment inde x’ ’ as the
de pe nde nt variable and the same inde pende nt variable s as in the previous analysis.
The ove rall mode l is not ve ry impre ssive (R-square 5 0.152, p 5 0.49) , and re siduals
are norm ally distribute d (p 5 0.35) . None of the inde pendent variable s are statisti-
cally signi® cant at the p , .05 le vel. Thus, none of the de mographic or training
variable s pre dicte d clinicians’ tende ncy to agre e with the othe r clinicians on
schema ratings.
D ISCU SSION
Clinician rate rs ide nti® e d, on ave rage , about two-thirds of patie nts’ ove rt prob-
lems. This ® gure is at ® rst blush a bit disappointin g. Howe ve r, it prove s to be quite
a bit supe rior to the ® gure s obtaine d by othe r inve stigators. Hay e t al. (1979) studie d
proble m are as rate d by four inte rvie wers, e ach of whom inte rvie wed the same four
clie nts. The mean rate of agre e ment be twe e n inte rviewers on the prese nce of
spe ci® c proble m are as was .55 {rate of agre e ment 5 agre e ments/(agre e ments 1
disagre e ments)}. Wilson and E vans ( 1983) reporte d that 38.6% of judge s se le cted the
most commonly agre e d-upon priority targe t behavior when they re vie wed writte n
de scriptions of thre e cases of child psychopatho logy; a somewhat highe r ® gure
(48.2%) was obtaine d whe n the proportion of judge s ide ntifying the patie nt’ s six
proble ms was calculate d.
The proble m ide nti® cation rate we obtaine d in this study is similar to the rate s
re porte d in our e arlie r study (Persons et al., 1995) . A lthough in this study we taught
clinicians to conside r a list of proble m domains, this prove d insuf® cie nt to incre ase
the proble m ide nti® cation rate . Clinicians might be more accurate at proble m
identi® cation if they comple ted a che cklist of proble m domains whe n assessing the
patie nt, or patie nts the mse lve s might be aske d to comple te such a che cklist. The
close st available measure of this sort that we are aware of is the Q uality of Life
Inve ntory de ve lope d by Frisch (1992) . The Q uality of Life Inve ntory is a se lf-re port
measure that asks individuals to rate the ir satisfaction in 16 life domains. A limitation
of the Q uality of Life Inve ntory is that it measure s satisfaction , not functioning.
The importance of comprehensive proble m ide nti® cation and asse ssme nt is
supporte d by the work of Linehan (1993) ; her manual for tre ating parasuicidal
wome n with borde rline pe rsonality disorde r stre sse s ide nti® cation of the full range
of the se patie nts’ ove rt proble ms. Miranda (1995) also reporte d that asse ssment
and treatme nt of the multiple proble ms of disadvantage d de pre sse d medical patie nts
produce d be tte r outcom e than tre atme nt focused solely on de pre ssive symptom s.
Thus, measure s of pre se nting proble ms are urge ntly ne e de d.
Inte r-rate r agre e ment of clinicians’ ratings of patie nts’ schemas was good whe n
ratings were ave rage d ove r ® ve judge s (mean inte r-rate r reliability coe f® cie nt of
0.72) , poor whe n single judge s were conside re d (ave raging 0.37) . Certainly it is well
known that ave raging ratings ove r multiple judge s produce s highe r agre e ment than
whe n single judge s are e xam ine d (cf. Horowitz e t al., 1989) . This ® nding sugge sts
that clinicians can be ne® t from consulting with one anothe r whe n formulating
schema hypothe se s about their patie nts. Consultation with the patie nt is also use ful
to enhance re liability (and collaboratio n).
Judge s did not agre e more often whe n rating sche mas in a spe ci® c conte xt
than whe n no conte xt was provide d. Why not? A nd we were not able to improve
inte r-rater re liability of sche ma ratings ove r our e arlie r study (Persons et al., 1995) .
How can this be done ? We addre ss the se two que stions toge the r.
To improve inte r-rate r re liability of schema ratings, we propose that te ache rs
must list, ve ry e xplicitly, the typical sche mas of patie nts who have particular pre-
se nting proble ms that occur in particular situations. If this were done , clinicians
pre se nted with those pre se nting proble ms and situations could agre e more ofte n
on sche ma ratings. We spe culate that the re lative ly good inte r-rater re liabilitie s
obtaine d for the Plan Formulation method (Curtis, Silbe rschatz, Sam pson, & We iss,
1994; Rose nbe rg e t al., 1986) are due at le ast in part to the fact that the theory
unde rlying the method cle arly states how to conce ptualize the case (the the ory
state s that patie nts’ proble ms arise from survival or se paration guilt re lating to
pare ntal ® gure s).
O ne training variable , earning a Ph.D., pre dicted clinicians’ ability to ide ntify
pre se nting proble ms. We did not obtain this result in our earlie r study; the re fore ,
this ® nding deserve s re plication be fore it can be acce pted without rese rvation. The
link be twee n Ph.D .-leve l training and identi® cation of pre senting proble ms is not
straightforw ard, and the variable Ph.D.-le vel training most like ly serve s as a proxy
for a number of othe r factors, possibly including training in diagnostic and psycho-
logical asse ssment of all type s.
The present study has seve ral limitations. A lthough a stre ngth of the study is
that it e xamine s data colle cted in a ``re al world’ ’ clinical setting, the study doe s not
complete ly re¯ e ct some of the proce sse s that ``re al world’ ’ clinicians use to formulate
case s. Rate rs had acce ss to transcripts in addition to the audiotape material; the re -
fore , if they page d backward or forward in the transcript, the y processed material
diffe re ntly from the way it is done in a therapy session, when mate rial must be
processed in the se que nce in which it is re ceive d from the patie nt. A udiotape
mate rial doe s not provide the rapists with the visual cue s that are useful in asse ssing
patie nts’ proble ms and sche mas, particularly inte rpe rsonal one s. In addition, as
the y formulate d the case, the rapists were re quire d to follow the inte rvie w seque nce
pursue d by the inte rvie wer rathe r than asking the que stions that would have allowe d
the m to de velop and test the ir own clinical hypothe se s. The three patie nts studie d
were all fe male and were se le cted be cause good audiotape s were available , the
patie nts gave permission to be studie d, and the case s se e med re lative ly straightfor-
ward. Clinicians were a conve nie nce sample. A s a re sult, ® ndings of this study do
not necessarily generalize to othe r patie nts and clinicians.
O ve rall, clinicians were moderate ly good at ide ntifying pre se nting proble ms
and proposing sche ma hypothe se s. A n important ne xt ste p in this line of work is
the de monstration that an accurate and re liable individualize d formulation contrib-
ute s to tre atme nt outcome . Some e arly studie s of this que stion have be e n disappoint-
ing. A study by E mme lkamp, B ouman, and B laaw (1994) found no outcome supe ri-
ority for patie nts who were tre ate d via an individualize d formulation-drive n
tre atment as compare d to a standardize d treatme nt, and a study by Schulte , Kunzel,
Pe pping, and Schulte -B ahre nbe rg ( 1992) found that standardize d tre atme nt was
supe rior to individualize d treatme nt. Certainly the E mme lkamp et al. (1994) result
might be accounte d for by the low powe r of the study and both re sults may be
accounte d for in part by the fact that patie nts in the se two studie s had re lative ly
homoge ne ous proble ms. Perhaps an individualize d case formulation is particularly
important in the tre atme nt of patie nts with multiple proble ms. Neverthe less, this
has not be en shown as yet, and thus ® ndings to date do not provide strong support
for importance to outcome of an individualize d formulation. More e ncourage ment
can be obtaine d from the ® ndings that de pre sse d patie nts whose unde rlying sche mas
were e ffectively tre ate d relapse d le ss ofte n than patie nts who e nde d acute treatme nt
for de pre ssion with high le ve ls of dysfunction al schemas (B lackburn, E unson, &
B ishop, 1986; E vans e t al., 1992; Simons, Murphy, Le vine , & We tzel, 1986) . The se
® ndings remind us that atte ntion to unde rlying core sche mas may contribute more
to relapse pre ve ntion than to the outcom e of acute tre atment.
A CK NOWLED G ME NTS
We thank the patie nt subje cts for giving pe rmission to study their cases and
the clinicians for the ir time . We thank the A ssociation for A dvance ment of B e havior
Therapy, and particularly Mary Jane Eime r, for allowing data to be colle cted at
the conve ntion in 1993, and the Palo A lto V ete rans A dministration, particularly
A ntone tte Z e iss and Jacque line B e cke r, for allowing data to be colle cted the re .
We thank Miche lle Hatzis and Joan Davidson for participating in the re se arch
se minar that guide d this work, B e rt E pste in for assisting with data colle ction in
Palo A lto, and A lan B ostrom for statistical assistance . This work was supporte d
by grant MH50367 from the National Institute of Mental Health.
R E FE R E NCES
B e ck, A . T., Rush, A . J., Shaw, B . F., & E me ry, G. (1979) . Co gn itive therapy of depressio n . Ne w
York: Guilford.
B lackburn, I. M., E unson, K. M., & B ishop, S. (1986) . A two-year naturalistic follow-up of depresse d
patients tre ate d with cognitive therapy, pharmacothe rapy and a combination of both. Jo urnal of
A ffective D isord ers, 10, 67-75.
Curtis, J. T., Silberschatz, G., Sampson, H., We iss, J., & Rose nberg, S. E . (1988) . Deve loping re liable
psychodynamic case formulations: A n illustration of the Plan Diagnosis method. Psycho th erapy,
25, 256-265.
Curtis, J. T., Silberschatz, G., Sampson, H., & We iss, J. (1994) . The Plan Formulation Method. Psych oth er-
apy Research, 4, 197-207.
E mme lkamp, P. M. G., B ouman, T. K., & B laauw, E . (1994) . Individualize d ve rsus standardized therapy:
A comparative e valuation with obse ssive -compulsive patie nts. Clinical Psycholo gy and Psych other-
apy, 1, 95-100.
E vans, M. D., Hollon, S. D., De Rubeis, R. J., Piase cki, J. M., Grove , W. M., Garve y, M. J., & Tuason,
V . B . (1992) . Differe ntial relapse following cognitive therapy and pharmacoth erapy for depre ssion.
A rchi ves o f G en eral Psychiatry, 49, 802-808.
Frisch, M. B ., Cornell, J., Villanueva, M., & Re tzlaff, P. J. (1992) . Clinical validation of the Q uality of
Life Inve ntory: A me asure of life satisfaction for use in tre atme nt planning and outcome assessme nt.
Psycho logical Assessm en t, 4, 92-101.
Hay, W. M., Hay, L. R., A ngle, H. V ., & Ne lson, R. O . (1979) . The re liability of problem identi® cation
in the behavioral interview. Beh avioral A ssessm en t, 1, 107-118.
Horowitz, L. M., Rose nberg, S. E ., Ure no, G., Kalehzan, B . M., & O’ Halloran, P. (1989) . Psychodynamic
formulation, conse nsual re sponse me thod, and interpe rsonal problems. Journ al o f Con sulting an d
Clinical Psych olo gy, 57, 599-606.
Lie se, B . S. (1995) . Cogniti ve Th erapy A dh erence Scale (CTA S). Unpublished manuscript.
Linehan, M. M. (1993) . Co gn itive-beha vio ral treatm ent of b ord erline person ality diso rd er. New York:
Guilford.
Miranda, J. (1995) . Treatmen t of depre ssion for disadvantaged medical patie nts. Pape r prese nted at
Socie ty for Psychotherapy Re se arch, V ancouve r, B ritish Columbia, Canada, June 22-25, 1995.
Muran, J. C., & Segal, Z . V . (1992) . The de ve lopment of an idiographic me asure of self-sche mas: A n
illustration of the construction and use of se lf-scenarios. Psycho therap y, 29, 524-535.
Nezu, A . M., & Ne zu, C. M. (1993). Identifying and se lecting target problems for clinical interventions:
A problem-solving mode l. Psycholo gical Assessm en t, 5, 254-263.
Pe rsons, J. B . (1989) . Cogn itive therap y in p ractice: A case fo rm u lation appro ach . New York: Norton.
Pe rsons, J. B . (1993a) . Case conceptualization in cognitive-behavior therapy. In K. T. Kue hlwein & H.
Rose n (E ds.), Cogniti ve therap y in action : E vol ving inn o vative p ractice. (pp. 33-53) . San Francisco:
Josse y-B ass.
Pe rsons, J. B . (1993b). Outcome of psychotherapy for unipolar de pre ssion. In T. R. Giles (E d.), Han db ook
o f effecti ve psych otherap y (pp. 305-323) . Ne w York: Plenum.
Pe rsons, J. B ., Mooney, K. A ., & Pade sky, C. A. (1995) . Inter-rater re liability of cognitive-behavioral
case formulations. Cogniti ve Th erapy and Research , 19, 21-34.
Pe rsons, J. B ., & Tompkins, M. A . (1997) . Cognitive-behavioral case formulation. In T. D. E e lls (E d.),
Handb oo k o f psycho therap y case form ulatio n. New York: Guilford.
Rose nberg, S. E ., Silberschatz, G., Curtis, J. T., Sampson, H., & Weiss, J. (1986). A method for establishing
re liability of state me nts from psychodynamic case formulations. A m erican Journ al of Psychiatry,
143, 1454-1456.
Schulte, D., Kunze l, R., Pepping, G., & Schulte-Bahrenbe rg, T. (1992) . Tailor-made versus standardize d
therapy of phobic patients. A d van ces in B eha vio ur Research and Th erapy, 14, 67-92.
Shrout, P. E ., & Fleiss, J. L. (1979) . Intraclass correlations: Uses in asse ssing rate r re liability. Psycho lo gical
B u lletin, 86, 420-428.
Simons, A. D., Murphy, G. E ., Le vine, J. L., & W etze l, R. D. (1986). Cognitive therapy and pharmacothe r-
apy for de pre ssion. A rch ives o f G en eral Psych iatry, 43, 43-49.
Surbe r, R. W. (E d.). (1994) . Clinical case managem ent: A gu ide to com prehensi ve treatm ent o f seriou s
m ental illness. Thousand O aks, CA : Sage Publications.
Turkat, I. D., & Maisto, S. A . (1985). Pe rsonality disorders: A pplication of the expe rime ntal me thod
to the formulation and modi® cation of pe rsonality disorders. In D. H. B arlow (Ed.), Clinical
h an db oo k o f psychol ogical d isord ers: A step-b y-step treatm ent manu al (pp. 502-570) . Ne w York:
Guilford.
Williams, G. W. (1976). Comparing the joint agree me nt of se ve ral raters with another rate r. B iom etrics,
32, 619-627.
Wilson, F. E ., & E vans, I. A . (1983). The re liability of target-behavior se lection in be havioral assessme nt.
B eha vio ral A ssessm en t, 5, 15-32.

Inter-Rater Reliability of Cognitive Behavioral Case Form Ulations of D Epression: A Replication

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Inter-Rater Reliability of Cognitive Behavioral Case Form Ulations of D Epression: A Replication

Uploaded by

Copyright:

Available Formats

Cogniti ve Th erapy and Research , Vo l. 23, No. 3, 1999, p p.

Inter-Rater Reliability of Cognitive ± Behavioral Case

O ne goal of cognitive ± be havior therapy (CB T) is to solve ove rt proble ms by

Tab le I. Demograp hic and Training Characteristics of

Pe rce nt fe male 66.7 a

Ide nti® catio n of Ove rt Proble m s

O btain in g Criterio n Problem L ists

Criterio n Pro blem L ists

Scorin g Clin ician s’ Problem L ists

Clin ician s’ Identi® catio n of Criterio n Problem s

Inter-R ate r R e liab ility of Sche m a R atin gs

E ffe cts of Context o n Sche m a A gre em e nt

E ffe cts of Train in g an d E xp erie nce

Inter-rater Reliab ility of Schem a Ratin gs

You might also like