You are on page 1of 15

Benchmarking performance indices: pitfalls and

solutions
John Maleyef
The Authors
|ohn Maeyeh, Lally School of Management and Technology, Rensselaer Polytechnic
Institute, Hartford, Connecticut, US
Abstract
Many organzatons ncude benchmarkng as a component of ther performance
management system. Often, a performance ndex s used to quantfy the abty of an
organzatona entty to operate successfuy. Benchmarkng a performance ndex s done
nappropratey when statstca methods are empoyed that gnore sampe sze ehects or
use aggregate performance data over a perod durng whch changes occurred wthn the
organzaton. Benchmarkng w aso be nehectve when nvad targets are empoyed.
When benchmarkng s done ncorrecty, customer satsfacton may actuay decne due
to gamng and poor morae among empoyees. Based on the phosophy of W. Edwards
Demng, the technques of statstca process contro (SPC), and basc undergraduate
statstcs, a system s descrbed for ehectvey benchmarkng a performance ndex.
Exampes are presented to ustrate the ptfas that exst n many performance
management systems and to expan the system presented for ehectve benchmarkng.
Article type: Theoretca wth appcaton n practce.
Keywords: Benchmarkng, Statstcs, Statstca process contro, Ouaty, Performance,
Demng.
Content Indicators: Research Impcatons** Practce Impcatons*** Orgnaty**
Readabty**
Introduction
Many organzatons address some form of a benchmarkng process as an ntegra part of
ther performance management system. Such benchmarkng s often performed based on
nterna quaty system requrements (e.g. sx-sgma or tota quaty management (TOM)).
Benchmarkng may be used, for exampe, to dentfy those manufacturng ces that
acheve consstenty hgher yeds than ces makng smar products, or to dentfy ca
center operators who perform better or worse than ther peers. Externa requrements
aso motvate the need for the deveopment of a performance benchmarkng system.
These motvatons may stem from compettve pressures or from requrements mposed
by certcaton or accredtaton requrements, governmenta reguatons, or customers.
For exampe, n the USA, The |ont Commsson on Accredtaton of Heathcare
Organzatons (|CAHO) requres a benchmarkng process for heathcare factes.
Wth the excepton of manufacturng enttes where measurement data are generay
used to anayze performance to desgn or customer speccatons, performance ndces
n the form of a rato or proporton are very common n most non-manufacturng
organzatons. For exampe, the technoogy empoyed by ca centers may automatcay
record, for each ca, the abandonment rate of cas receved, expressed as the rato of
the number of caers that hung up whe on hod dvded by the number of caers. A
hospta may record the mortaty rate for a certan dsease category (rato of the number
of deaths to the number of patents). In manufacturng, ths type of performance ndex s
aso used at tmes. A common exampe s yed (number of parts conformng to
speccatons dvded by the number of parts manufactured).
Ths paper extends the work of Wash (2000) who addressed the deveopment of targets
and how to use statstca methods, ncudng SPC, to assess performance. However,
Wash presented methods of anayss that requred a performance measure rather than a
performance ndex. Methods used to evauate measurements are typcay not
approprate when deang wth an ndex and, n fact, are often ms-apped n these cases.
Ths paper ncudes two man components. Frst, key concepts that nvove the anayss of
a performance ndex are provded and ptfas of some commony used statstca
approaches are dscussed. Second, a system s outned that provdes an ehectve
mechansm for benchmarkng organzatona enttes based on a performance ndex.
Literature review
Performance benchmarkng s the mergng of two methodooges, benchmarkng and
performance management. Benchmarkng has been dened as "the search for and the
mpementaton of best practces" (Camp, 1995, p. 15), and ncudes the benchmarkng of
products and servces, busness processes, and performance measures. The goa of
benchmarkng performance measures s "to estabsh and vadate ob|ectves for the vta
few performance measures that gude the organzaton" (Camp, 1995, p. 16). Four types
of benchmarkng exst:
nterna, based on enttes wthn the same organzaton;
ndustry, based on enttes n the same type of busness;
compettve, based on drect compettors; and
process, based on dssmar companes empoyng smar processes (Emut and
Kathawaa, 1997).
When data are used as part of a benchmarkng process, the focus of the anayss s on
the "gaps" between the organzaton's data and the benchmark standard, often wthout
regard for random varatons (Camp, 1995, Chapter 5).
Performance management has been dened as "the use of performance measurement
nformaton to ehect postve change n organzatona cuture, systems and processes, by
hepng to set agreed upon performance goas, aocatng and prortzng resources,
nformng managers to ether conrm or change current pocy or program drectons to
meet these goas, and sharng resuts of performance n pursung those goas"
(Procurement Executves' Assocaton, 1999). Authors are carefu to dstngush
performance management from performance measurement whch nvoves the
deveopment of metrcs that quantfy the "emcency and ehectveness of acton" (Neey
et al! , 1995).
Some authors have been crtca of ether performance management systems or
benchmarkng systems. For exampe, performance management systems have been
crtczed for the nterna focus on measures that may not correate wth the satsfacton
of externa customers (Swnde and Key, 2000). The baanced scorecard approach
appears to oher a souton by consodatng the varous dmensons of performance, both
nterna and externa (Gautreau and Kener, 2001). However, the deveopment of drect
cause-and-ehect reatonshps that nk performance measures wth organzatona
success remans a chaenge (McKenze and Shng, 1998). Another methodoogy, quaty
functon depoyment, has been recommended to hep conrm that the customer
vewpont s beng consdered durng product desgn (Zar and Youssef, 1995) or servce
system deveopment (Pun et al! , 2000). Benchmarkng has aso been crtczed. Some of
the crtcsms are smar to concerns regardng performance management systems. For
exampe, Freytag and Hoensen (2001) menton the nterna focus, wthout a drect nk
to customer satsfacton, as a concern aong wth the tendency of some organzatons to
use benchmarkng as a short-term souton to a probem rather than an ongong process.
Zar and Ahmed (1999) sted cutura dherences as beng a concern when transferrng
best practces n goba organzatons, and Browne (2000) addressed the dmcuty n
ehectvey obtanng usefu nformaton from externa enttes.
Performance benchmarkng has become a component of numerous certcaton and
accredtaton systems. For exampe, to be accredted by the |CAHO, heathcare
organzatons must mpement a performance management system based n part on the
use of a common set of performance metrcs (|CAHO, 2000). Gudenes for the Badrge
Natona Ouaty Program aso ncude performance benchmarkng (NIST, 2002). In
con|uncton wth ths requrement, ndependent organzatons have evoved that aow
partcpatng enttes to submt performance data on a perodc bass, wth the
organzaton returnng statstca reports that compare the data to sutabe benchmarks.
For exampe, Purdue Unversty has desgned extensve questonnares for ca center
managers to nput ther ca center performance data, whch has been consodated nto a
database. In return, ca center managers get reports that compare ther performance
wth a partcpants (http://www.benchmarkporta.com).
To be ehectve, performance benchmarkng must be statstcay sound, must ensure that
performance metrcs are correated wth customer needs, and must be part of a
contnuous process that resuts n ehectve management acton. The ack of a
standardzed system for performance benchmarkng stems from the dherence among
ndustres regardng the nature of the benchmarkng process as we as the compexty of
the statstca methods nvoved. For exampe, Shah and Sngh (2001) present a
framework for performance benchmarkng of suppy chans, that nvoves varous metrcs
keyed to suppy chan performance. Maeyeh et al! (2001a) present a system for
benchmarkng heathcare factes usng metrcs that reate to patent care. The
statstca sophstcaton of these systems ranges from no statstca anayss (where
"gaps" are generay nterpreted as a shortfa, wthout regard for random varaton), to
more sophstcated methods such as data enveop anayss (Madu and Kue, 1998).
Implications for decision makers
When data are used to make decsons, managers must be aware that ncentves w be
created and that the ncentves may conct wth company msson or the ntenton of the
manager. The ehect of management decson makng processes was addressed by W.
Edwards Demng (1990-1993). Consder Demng's "system of profound knowedge"
(Demng, 1993, chapter 4). The four underpnnngs of ths system are:
a busness system s a seres of nterconnected processes;
every process generates data that behave randomy;
probabty aws can form the bass of nterpretng data; and
empoyees w behave n very predctabe ways dependng n part on how
decsons are made n the presence of uncertanty.
Ths ast pont, whch Demng referred to smpy as "psychoogy", s the source of many
organzatona probems especay when statstca methods are mpemented
nappropratey.
Three exampes w be used to ustrate how seemngy vad and we-ntended ehorts
on the part of management can backre. In these exampes, a compensaton system that
provdes ncentve pay to empoyees based on the quaty of ther work s mpemented:
manufacturng ce workers are rewarded an extra bonus on days where the yed
for the ce exceeds 95 percent;
deaers of a arge company's products and servces are ranked quartery based on
the rate of compants receved and cash payments are made to managers f they
appear n the top 25 percent of deaers; and
teachers n a schoo dstrct are rewarded extra bonus pay when the standardzed
test scores for students n ther cass mproves over the score of ast year's cass.
What s wrong wth rewardng empoyees for superor performance and punshng them
for sub-standard performance? In Demng's 14 ponts, he warns aganst the practce of
"management by numbers", whch can occur when managers reward or punsh
empoyees based on performance data (Demng, 1986, p. 75). Demng's warnng s based
on hs observaton that n these systems, t s rare for an ndvdua empoyee to have a
hgh degree of contro for the quaty of the product made or the servce provded. In
addton, t s rare for managers to account for random statstca varaton n these
systems. In exampe (1) above, f a manufacturng ce has a yed of 95 percent, and a
worker n the ce receves extra compensaton on days where yed equas of exceeds 95
percent, we can expect that the empoyee w receve the day bonus on ony about haf
of the days. In ths case, even though ce performance s consstent wth the target yed,
due to random varaton, bonuses are rewarded based on what s essentay a p of a
con.
Management by numbers can nst fear n the workpace when rewards and
punshments appear to be beyond the contro of the ndvdua. In exampe (2) above, a
teacher has no contro over the quaty and makeup of the ncomng student cass, no
contro over whether or not the current year's test s smar to the prevous year's test,
and no contro over random statstca varatons. As a resut, documented cases exst of
teachers and admnstrators who dvert ther attenton from other sub|ects n order to
spend an nordnate amount of tme "teachng to the test" or outrght hepng students to
cheat (Treadway, 2000; Hartocos, 2000).
In the end, management by numbers can cause more harm than good to quaty and
performance. For exampe (3) above, f bonuses are rewarded for havng the owest rate
of compants receved from customers, t woud be dmcut to magne empoyees ndng
ways to make t easer for customers to regster a compant. In the ong term, the
company woud be better oh f t were made aware of unhappy customers before they
defected to other companes, but the management system woud make ths awareness
unkey.
Key statistical concepts and potential pitfalls
The type of data addressed n ths paper s cassed as a proporton, dened as a
number between zero and one that contans a numerator (number of tmes the event to
occurred) and a denomnator (number of opportuntes for the event to occur). Exampes
ncude conformance rate of parts, compant rate of customers, mortaty rate of
patents, and abandonment rate of caers. For proportons, the event cannot occur more
than once for each opportunty. That s, a part s ether conformng or nonconformng, a
customer compants or doesn't compan, etc. At tmes, other forms of data are assumed
to be proportons. One case nvoves where t s rare for an event to occur more than
once per opportunty, for exampe, hospta-acqured nfectons per patent-day. Whe t s
not mpossbe for a patent to acqure two nfectons n one day, the chance of ths
occurrng s remote and the rate of nfectons may be anayzed as f t were a proporton.
It s possbe to convert any form of data to a proporton so that the methods descrbed n
ths paper woud appy. For exampe, data from customer satsfacton surveys may be
recorded as the proporton of customers who are "satsed" wth a servce, where a
customers who chose "good" or "exceent" on ther survey are combned. Or, data
consstng of the number of errors made when processng a mortgage may be recorded
as the proporton of mortgages on whch errors when made durng processng. Fnay,
data consstng of a part measurement may be recorded as the proporton of parts that
conformed to speccatons (.e. the producton yed).
When anayzng a performance ndex, t s common for the anayss to focus on the
proporton wthout regard to the statstca ehect of the sampe sze (e.g. number of
parts, number of cas, number of patents). Ths vaue appears as the denomnator of the
ndex. For exampe, when comparng yeds over a group of manufacturng ces makng
the same product, a smpe comparson of ther yeds s awed f the number of parts
manufactured n each ce s not taken nto account. A comparson of hospta mortaty
rates woud be awed f the number of patents served were not consdered as part of the
anayss.
Illustrative example
A smpe exampe w be used to ustrate the concepts and methods descrbed n the
remander of ths paper. The exampe s smpstc n form so that the focus on underyng
concepts can be done ehectvey. The concepts ustrated, however, are appcabe to
any rea word stuaton where a performance ndex s expressed as a proporton. Assume
that twenty ndvduas are gven one con each and tod to p ther con a speced
number of tmes. If a of the cons were baanced, "heads" w be shown on 50 percent of
the tosses. We do not know, however, that every person has been gven a baanced con.
That s, one or more persons n the group may be gven an unbaanced con that
generates "heads" at a rate that dhers from 50 percent. Of the 20 peope, ten are asked
to p ther con 100 tmes and ten are asked to p ther con 500 tmes. Each person s
asked to count the number of tmes ther con showed "heads" and, upon competon of
ther tosses, to cacuate ther proporton of "heads". A computerzed random number
generator was used to smuate ths exercse, wth the summary resuts provded n Tabe
I.
Improper approaches
An anayst presented wth the data shown n Tabe I may be asked to dentfy those
enttes that operated n a fashon nconsstent wth an approprate target, whch n ths
case coud be dened as a 50 percent probabty of obtanng "heads". Aternatvey, f a
xed target were not avaabe, an anayst may wsh to dentfy those enttes that
operated n a fashon nconsstent wth the other enttes. One type of nvad approach to
ths anayss woud be based on where each person ranked compared to the other
persons n the study. Typcay, ths approach woud consst of the deveopment of a
percente score for each entty or the assgnment of each entty to a quarte (the owest
25 percent of enttes, second owest 25 percent of enttes, etc.). Tabe II provdes
percente scores and quarte assgnments for each person n the exercse. Based on ths
nformaton, an anayst may decde that the persons n the ower or upper 10 percente
range appear to have unbaanced cons. In ths case, those persons n the outer
percentes woud be persons 16 and 19 (too few "heads"), and persons 17 and 13 (too
many "heads").
Another popuar, but equay mproper, approach s the cacuaton of the means and
standard devatons of the performance ndex, the deveopment of a nterva extendng
two standard devaton unts above and beow the mean, and the hghghtng of enttes
that fa outsde ths nterva. In ths case, the mean proporton of "heads" s 50.4 percent
(the average of the data contaned n Tabe I) and the standard devaton s 4.5 percent
(the standard devaton of the data contaned n Tabe I). Usng ths approach, a those
persons fang outsde of the nterva extendng from 41.4 percent to 59.4 percent (50.4
percent9.0 percent) woud be key canddates for havng the cons that dhered from
the norm. Hence, person 16 woud be consdered key to have an unbaanced con that
resuts n an unusuay ow probabty of "heads".
There are at east two probems wth the approaches dscussed above. The rst probem
s that t s napproprate to compare proportons when the sampe sze vares across
enttes. Statstca theory, as we as common sense, dctates that the arger the sampe
sze, the coser the resutng proporton w estmate the abty of a process to perform.
That s, the more the con s tossed, the coser the resutng proporton w be to the
actua probabty that the con tossed s "heads". So, referrng to person 16, coud hs or
her 41 percent rate be due to an unbaanced con or due to fewer tosses (100 rather than
500)? It s mpossbe to answer ths queston usng the data provded n Tabe I or the
anayss summarzed n Tabe II. The second probem wth the approaches s that the
methods essentay guarantee that some persons w be dented as unusua, even f a
of the enttes are operatng n essentay the same fashon, wth varaton |ust due to
norma randomness. For exampe, n every anayss, exacty 10 percent of enttes w
have a percente rankng ess than of equa to 10, and exacty 10 percent of enttes w
have a percente rankng of at east 90, whether or not those enttes vary from the other
enttes due to rea process dherences or due to random varaton. Smary, n every
anayss, about 5 percent of enttes w have a performance ndex that extends more
than two standard devatons beyond the mean.
The faaces n the approaches can aso be understood usng basc probabty theory.
When a performance ndex s a proporton, the bnoma probabty mode can be used to
predct the behavor of data, such as the proporton of "heads" that woud resut when a
person tosses a baanced con (Berenson et al! , 2002). For ths exampe, the expected
varaton for the proporton of "heads" w vary and the range of the varaton w depend
on:
the chance that the con generates "heads", whch s 50 percent for a baanced
con; and
the number of tmes the con was pped, whch s ether 100 or 500.
The foowng smpe formua can be used to quantfy the ehect of random varaton,
when data conssts of proportons and the probabty of an event occurrng approxmates
50 percent (Berenson et al! , 2002, p. 282):Equaton 1For exampe, the expected eve of
random varaton for a person tossng a baanced con 100 tmes s 10 percent and the
expected eve of random varaton for a person tossng a baanced con 500 tmes s
about 4.5 percent. Taken n ths context, t s mpossbe to snge out the person who
had an unbaanced con wthout knowedge of the number of tmes the con was pped.
As a resut of ths phenomenon, t s key that when a performance ndex s anayzed
wthout takng sampe sze nto account, ether:
an entty wth a smaer sampe sze s dented as atypca even though t s
operatng n a fashon that s consstent wth other enttes; or
an entty wth a arge sampe sze s dented as typca, even though t s
operatng n a fashon that s nconsstent wth other enttes.
Importance of statistical control
Often, ndvduas traned n cassca statstcs have dmcuty appyng the technques
they know to make ehectve busness decsons. In "ut of the Crisis (Demng, 1986, p.
132) states "... statstca technques taught n books, however nterestng, are
napproprate because they provde no bass for predcton and because they bury the
nformaton contaned n the order of producton". Even tough ths statement was made
over twenty years ago and appears to appy to manufacturng ony, most busness
statstcs textbooks ook remarkaby unchanged snce that tme, and the mportance of
varaton over tme appes to any busness.
The heart of the probem wth cassca statstcs es n the noton that statstcs nvoves
makng a decson regardng a popuaton, based on the nformaton contaned n a
random sampe from that popuaton. The popuaton s assumed to be statc, that s, the
characterstcs of the popuaton are not changng over tme. Ths framework s entrey
approprate n a statstca anayss of data from cnca tras of a new drug gven to
hundreds of peope or the anayss of a marketng survey to determne product
preference. However, when ths framework s apped to the anayss of a busness's
performance data, probems may occur snce the performance occurs over tme and we
cannot assume that the system beng studed s stabe (.e. unchanged) durng the data
coecton perod. A process that remans unchanged over tme s sad to be n statstca
contro. In ths case, ts future performance s predctabe to wthn a range of vaues
whose magntude depends on the sampe sze. For the data provded n Tabe I, cassca
statstcs woud assume that, over the course of the exercse, each person con remaned
unchanged. But, how can ths be assumed for busness processes wthout specc
evdence that the dynamc process beng anayzed s unchanged?
An aternatve framework for appyng statstca methods to data from busness
processes has been suggested (Maeyeh and Kamnsky, 2002). Ths framework nvoves
vewng performance data as beng generated from dynamc processes that may or may
not have changed durng the perod under study. Wth ths framework, two very
mportant ssues become apparent. Frst, an anayss of stabty must be performed
before |udgements are make regardng the acceptabty of a process. Unstabe processes
are changng for a reason and an astute manager w nd those reasons before
attemptng to compare an unpredctabe process to other processes. Second, even f the
data consst of 100 percent of the actvty durng a perod, ths does not mean that
random varaton can be gnored, as t woud be n the cassca framework f the entre
popuaton were known. Ths seemngy abstract concept s hghy crtca, snce data
coected for 100 percent of the actvtes for two smar processes w dher, even f the
processes themseves were dentca.
Apples to oranges comparisons
The choce of approprate targets s crtca to the success of performance benchmarkng
systems. Wash (2000) consdered severa forms of targetng, ncudng constant targets,
targets wth step ncreases, and targets wth seasona or product cyce growth trends.
Persco and McLean (1994) provded a set of warnngs regardng the use of napproprate
targets, ncudng the use of the process average as a target wthout consderaton of
varaton, stretch targets based on "hopes and wshes" of management, and targets set
on unstabe processes. Demng, n hs 14 ponts, aso warned aganst the use of
napproprate targets:
Emnate sogans, exhortatons, and targets for the work force (Demng, 1986, p. 65).
He beeved that these practces ed to adversara reatonshps, snce most of the
probems n an organzaton are caused by the "system" rather than the ndvdua
empoyees. Here agan, Demng addresses the faacy of assumng that empoyees have
sgncant drect contro over the processes they work wthn. Uness fundamenta
changes are made to the processes, how can we expect performance to mprove? It s
not dmcut to antcpate the reacton of many workers to mproper targetng.
In the context of managng performance, the concept of "comparng appes to oranges"
must be avoded. For exampe, n hospta admnstraton, a key performance ndex s
mortaty rate. These rates have been pubshed n newspapers and other pubcatons n
an ehort to nform consumers regardng hospta performance. However, t has been
shown that factors ready avaabe for anayss are known to ahect mortaty rates, and
must be accounted for when comparng medca factes (Dubos et al! , 1987). However,
even when these factors are accounted for, other factors not ready avaabe such as
morbdty (magntude of an ness) and co-morbdty (exstence of more than one ness)
w aso ahect the mortaty rate. These factors can ony be anayzed by checkng each
patent record, whch s often not practca. Hence, there may exst some crcumstances
where performance shoud not be compared across organzatons.
Avodng an appes to oranges comparson s accompshed by ensurng that:
the performance metrc s dened and measured n a consstent way;
the metrc nvoves the same prorty of customer servce n each organzaton
compared; and
organzatons woud reasonaby be expected to perform smary gven ther
varety of customers, suppers, ocaton, etc.
For exampe, benchmarkng ca center abandonment rate woud be approprate ony f
ths metrc were dened and measured n a consstent way, the ca center performs a
smar servce n each comparson organzaton, and the varous ocatons, szes,
customers, etc. do not precude a reasonabe expectaton of smar performance.
ummary of key principles
To summarze the key ponts made to ths pont regardng the anayss of a performance
ndex for benchmarkng purposes, the foowng prncpes have been estabshed:
Ony organzatona enttes that are stabe over the data coecton perod can be
compared to a target or to other stabe enttes:
stabe, or unchanged, enttes w generate performance data that vary over tme;
f an entty's performance s not stabe, then somethng changed durng the data
coecton perod and the reason for the change shoud be determned.
When comparng an entty's performance ndex to a target or to other enttes,
random varaton of the ndex, reated to ts sampe sze, must be taken nto
account:
a performance ndex cannot be compared wthout knowedge of the vaue of both
the numerator and denomnator;
the amount of random varaton n a performance ndex w be nversey reated to
the sampe sze (denomnator) of the ndex.
When choosng a target aganst whch to compare an entty's performance ndex,
care must be taken to choose a target that corresponds to the speca
characterstcs of the partcuar organzaton:
|ust because an entty dhers from a target does not necessary mean that a
probem exsts or that an opportunty for mprovement has occurred;
t may be possbe to deveop an ad|usted target based on the speca
characterstcs of an organzaton.
The settng of targets, the method of comparng enttes, and the reacton to
benchmark studes a mpact how empoyees w behave wthn the organzaton:
f done mpropery, benchmarkng can ead to poor morae among empoyees, and
may cause workers to act n ways that degrade, rather than mprove, customer
satsfacton.
ystem for benchmarking proportions
Three dstnct steps must be ncorporated nto any system that compares a performance
ndex to a target or that compares performance ndces across organzatons. In ths
secton, each step nvovng the anayss of a performance ndex s descrbed usng the
data provded earer to ustrate the methods. For the sake of brevty, statstca detas
are kept to a mnmum. Thus, readers wthout knowedge of statstca bascs may need
to revew other sources before attemptng mpementaton.
Analysis of statistical control
A contro chart s a frequenty used too of statstca process contro (SPC) to determne f
a manufacturng process s n statstca contro. Contro charts are aso appcabe n non-
manufacturng appcatons (MacCarthy and Wasusr, 2002). The basc structure of a
contro chart nvoves organzng the entre sampe of data nto subgroups accordng to
the tme frame durng whch the data were coected. For exampe, data coected durng
September woud be subgrouped nto 30 day ncrements. For each subgroup, a
summary statstc s cacuated, such as the proporton of cas per day that were
abandoned. Each summary statstc s potted on a dspay that shows the trend of
performance over the data coected perod. Then, a center ne that corresponds to
average performance over the entre study perod s added to the dspay aong wth a
set of upper and ower contro mts. These contro mts are cacuated based on a
statstca expectaton that a stabe process w generate summary statstcs that fa
wthn the mts about 99.7 percent of the tme.
Certan dagnostc rues are empoyed to determne, based on the pattern seen on the
contro chart, f the process appeared stabe (.e. n statstca contro). If the process
were not stabe, then the speca cause of the process change woud be dented and
acted upon. In ths way, probems are dented and opportuntes for mprovement are
hghghted. Ony stabe processes are predctabe from one perod to the next. Hence,
ony stabe processes are egbe for comparson wth targets or wth other processes.
The type of contro chart used to anayze proporton data s referred to as a # chart.
Standard contro mt formuas for a # chart are presented n . The # chart for person 1 n
the con tossng exampe s shown as Fgure 1. In ths case, the 500 tosses were
organzed as 20 subgroups of 25 tosses each. Each pont potted on the chart s the
proporton of "heads" n each subgroup. If the process were stabe (.e. the same con
was tossed n the same manner) statstca theory woud suggest that 99.7 percent of
subgroup proportons woud be contaned wthn the contro mts and that, wthn these
mts, a random pattern consstent wth a norma (be curve) pattern woud be expected.
A tutora on # charts and ther assocated dagnostc rues s provded by Kamnsky et al!
(1997).
In Fgure 1, the center ne corresponds to the average proporton of "heads" (53 percent,
whch s aso ncuded n Tabe I) for person 1. The upper contro mt (83 percent) and
the ower contro mt (23 percent) represent the range of expected proportons for 25
opportuntes of an event whose probabty of occurrence s 53 percent. Based on ths #
chart, t woud be reasonabe to assume that the process dened by person 1 s stabe
over the data coecton perod, mpyng that the characterstcs of the con and how t
was tossed were unchanged. The data for each person ncuded n the con tossng
exercse woud be potted n the same manner.
The mportance of the stabty requrement s ustrated n the foowng exampe.
Consder the comparson of compant rates, by week, for two departments n a company
that provde a smar servce over a 26-week perod. In both cases, performance s not
stabe. The performance of department 1 s mprovng (ess compants), whe the
performance of department 2 s degradng (more compants). Overa, each department
averaged about one compant per 200 customers over a 26-week perod. An aggregate
comparson of these two departments woud concude that no performance dherence
exsted over the data coecton perod. The cost of ths naccurate anayss woud be two-
fod. Frst, the company woud have mssed an opportunty to determne the cause of the
mprovement experenced by department 1, whch coud have resuted n performance
mprovement f mpemented at department 2. Second, the performance of department 2
has and w contnue to degrade, whch w key resut utmatey n ost customers for
ths company.
Performance comparison
The target used to benchmark a stabe process w fa nto one of two categores. An
absoute target s one that s derved wthout consderaton of the process varaton. That
s, n order to be acceptabe, each entty must meet some target eve of performance.
Exampes ncude desgn speccatons, reguatory standards, and goas set by
management correspondng to eves that must be reached to assure compettveness. In
the con tossng exercse, a xed standard of 50 percent woud be used f the goa was to
determne those persons tossng unbaanced cons. The second form of targetng s the
use of a reatve target. In ths case, the anayst woud attempt to hghght those enttes
that are operatng n a fashon nconsstent wth the other enttes. Reatve targets are
used when the standard of performance s dened by other organzatons that provde a
smar product or servce. Exampes ncude abandonment rate of caers and mortaty
rate for a certan ness. In the con tossng exercse, a reatve target woud be used f
the goa were to dentfy those persons who appeared to have cons that dhered from
the cons used by the remander of the group.
When comparng performance of stabe processes usng data expressed as a proporton,
the focus becomes comparng the aggregate performance of each stabe entty (dened
as the center ne of the entty's # chart) to a sutabe target, whch woud aso be
expressed as a proporton. Assumng stabty for each person, Tabe I shows the center
ne vaues (aggregate proporton of "heads") for the 20 partcpants. At ths pont, a
standard statstca hypothess test woud be used determne, for each entty, whether or
not ther performance dhered from a target proporton. These tests woud take sampe
sze nto account.
For proporton data, the approprate hypothess test s caed a one-sampe hypothess
test for a proporton (Berenson et al! , 2002, p. 329). The test nvoves the cacuaton of a
standard norma $-score that s compared wth an approprate set of mts. If the $-score
exceeds the mts, then t s assumed that the entty's performance dhers from the
target. These mts are often set at 2.00, whch corresponds to an approxmate 5
percent rsk that a process operatng n a way consstent wth the target s dented as
beng nconsstent wth the target (wdenng the mts woud reduce ths rsk). In these
tests, the performance of an entty s ony consdered to be dherng from the target f the
anayst can be 95 percent condant that the dherence exsts.
shows the normazed $-score cacuaton used to mpement a one-sampe hypothess
test for proportons. When mpementng ths test, an entty's aggregate performance
ndex, aggregate sampe sze, and target proporton are needed. For ths exercse, an
absoute target of 50 percent coud be used, or aternatvey, a reatve target of 50.1
percent (the average proporton of "heads" for the entre group) coud be used.
If the absoute target of 50 percent s used, the anayss for person 1 (53.2 percent
"heads" on 500 tosses) woud resut n a $-score of 1.43. Snce ths $-score does not
exceed the 2.00 mt, we do not have sumcent evdence to consder ths person's con
to be unbaanced. If the reatve target of 50.1 percent were used, the $-score for person
1 woud be 1.39, supportng the assumpton that the con s smar to cons used by other
partcpants. Tabe III provdes the $-scores for a 20 partcpants, usng both the absoute
and the reatve target. Ony person 7 s dented as havng an unbaanced con ($-score
of -2.77). It shoud be noted at ths tme that the random number generator used to
deveop the exercse assumed a baanced con for a tossers except person 7, whose con
was setup to generate "heads" ony 45 percent of the tme.
The reatve target of 50.1 percent was derved based on a tota of 6000 tosses over the
data coecton perod. Ths deveopment of a reatve target s ony vad when two
condtons exst. Frst, the method s approprate when most enttes w conform to the
benchmark. Otherwse, the target may be based accordng to especay strong or weak
performers and thus napproprate for comparson. Second, the method s approprate
when the tota sampe sze (6,000 n ths case) s arge enough to precude the ehect of
random varaton on the resutng target. When ths requrement s not met, a two-sampe
hypothess test for proportons may be mpemented (Berenson et al! , 2002, p. 450).
It shoud be stressed that ths method appropratey accounts for sampe sze when
deang wth a performance ndex expressed as a proporton. Note that n Tabe III, the 41
percent proporton for person 16 (100 tosses) resuted n a $-score of -1.80, whe the 44
percent proporton for person 7 (500 tosses) resuted n a $-score of -2.77. Aso, t s
mportant to recognze that the enttes hghghted usng the percente/quarte method
referred to earer (persons 16, 19, 17, 13) a had a baanced con but the smaer sampe
sze of 100 tosses. Hence, these smaer enttes were nappropratey dented as
dherng from the others when usng the percente/quarte method. The hypothess
testng method dd not make ths error. Usng the percente/quarte method, person 7
(wth the unbaanced con) was nappropratey not dented snce, wth 500 tosses, ths
person's performance ndex was n ehect masked by the random varaton of smaer
enttes. Agan, the hypothess testng method dd not make ths error. Whe con tossng
may appear trva and not reated to rea-word performance data, any proporton ndex
woud behave n a smar way, except that the probabty of an event occurrng woud
dher from 50 percent.
Final analysis
In Tabe III, person 7 was ehectvey dented as the ony person who dhered from the
target of 50 percent "heads". In ths case, snce the process n queston conssted of a
person tossng a con, t can be correcty assumed that ether the con tsef or the
manner n whch the con was tossed caused the dscrepancy n performance. In actua
stuatons, processes are much more compex n terms of the products or servces
ohered, customer characterstcs, materas used, facty, and envronmenta factors (e.g.
facty, weather, temperature). Hence, an entty that stands out n a group statstcay
may be operatng n a fashon entrey as expected consderng that entty's unque
crcumstances.
When an entty s dented as dherng from a target, t woud be wse to nvestgate the
approprateness of the target pror to takng acton, especay f the acton nvoves some
form of reward or punshment. Ths anayss may resut n the deveopment of an
ad|usted target, based on what s known about key factors ahectng performance. The
methodoogy empoyed n ths anayss woud nvove consderaton of data and other
nformaton avaabe wthn the organzaton or externa to the organzaton that hep to
reate performance to factors unque to the entty. Data mnng technques (e.g.
regresson anayss) may be hepfu, as we as the consutaton of professona terature.
Dependng on the resources and sks possessed by an organzaton, at east four choces
appear to present themseves:
a comprehensve anayss be performed to determne the reatonshp of expected
performance to factors ahectng performance, resutng n a ad|usted target;
an anayss be performed of the professona and trade terature that reports on
factors ahectng performance n smar organzatons;
the use of pubcy-avaabe reference data bases that ncude target ad|ustment
capabtes; and
a casua |udgement based on experence wth the organzaton, ts products and
servces, and ts customers.
As an exampe, a recent anayss of hospta mortaty rates supported the noton that
age, admssons from the emergency department, and admssons from nursng homes
contrbute to an expectaton that rates woud dher across hosptas (Maeyeh et al! ,
2001b). For exampe, a hospta that admts comparatvey oder patents woud expect
hgher mortaty rates. However, the same study dscounted other factors, ncudng race,
gender, and economc status. The nformaton n ths study s transferabe to other
smar hosptas, so t s not necessary for a hosptas to perform such a comprehensve
study. Some reference databases that provde the abty to retreve target eves for
enttes aow for the deveopment of targets based on subsets of enttes based on
speca characterstcs. An exampe of ths type of system s the Maryand Hospta
Assocaton's quaty ndcator pro|ect (www.qpro|ect.org).
!iscussion and future research
Ths paper hghghted opportuntes and potenta ptfas of usng a performance ndex
for purposes of benchmarkng organzatona enttes. It was shown that, when not
accountng for process changes that may occur durng a perod of data coecton,
opportuntes for mprovement may be mssed, probems my go unaccounted for, and
resutng comparsons may msead anaysts. Aso, enttes that operate on a smaer
scae run the rsk of beng dented as dherng from targets, when n fact ther
performance s consstent wth ther target eve. Aternatvey, arger enttes operatng n
a way nconsstent wth a target w tend not to be hghghted, even when ther
performance vared sgncant from the target eve. As a resut of mproper anayses,
quaty w key be degraded, gven that way empoyees are known to react to arbtrary
performance management systems. The system presented n ths paper provdes an
ehectve framework for propery comparng the performance ndex of an organzatona
entty to an approprate target.
The enhanced abty to coect and dssemnate performance data qucky and
nexpensvey has aowed performance benchmarkng to evove to the pont where the
means exsts for most organzatons to obtan comparson performance data for
benchmarkng ether nterna or externa enttes. In con|uncton wth these events, many
ndependent organzatons, both for-prot and not-for-prot, are routney obtanng data
from partcpants and returnng statstca reports. However, no standard approach exsts
for the anayss of performance data n these systems. Addtonay, exampes can be
found of ncorrect statstca anayss usng some of the ptfas descrbed earer n ths
artce. In turn, the ehectve nterpretaton and approprate acton focus s key to be
compromsed when ndvdua managers havng tte or no statstca expertse are
charged wth the task of respondng to these reports. Hence, an mportant focus of future
work woud nvove the standardzaton of the data anayss routnes wth a focus on
methods that aow for reports to be easy foowed by most managers. In addton,
managers must be more ehectvey educated n the basc statstca methods empoyed
wthn these standard systems. Not ony woud ths work aow for more ehectve
performance benchmarkng, but w aso ower the resources necessary for
mpementaton. Ths woud aow sma and medum szed rms, as we as other
organzatons ackng sumcent resources, to benet from performance benchmarkng.
%&uation '
%&uation (
%&uation )
%&uation *
%&uation +
%&uation ,
%&uation -
%&uation .
Figure 1 # chart for #erson '
Table I Results of simulated e/ercise
Table II Percentile and &uartile summaries
Table III Hy#othesis test results
"eferences
Berenson, M.L., Levne, D.M., Krehbe, T.C., 2002, Basc Busness Statstcs: Concepts and
Appcatons, 8th ed., Prentce-Ha, Engewood Chs, N|.
Browne, D., 2000, " 0enchmar1ing your mar1eting #rocess ", Long Range Pannng, 32, 1,
88-95.
Camp, R.C., 1995, Busness Process Benchmarkng, ASOC Ouaty Press, Mwaukee, WI.
Demng, W.E., 1986, Out of the Crss, MIT Center for Advanced Engneerng Study,
Cambrdge, MA.
Demng, W.E., 1993, The New Economcs for Industry, Government, Educaton, 2nd ed.,
MIT Center for Advanced Engneerng Study, Cambrdge, MA.
Dubos, R.W., Brook, R.H., Rogers, W.H., 1987, " d2usted hos#ital death rates3 a #otential
screen for &uality of medical care ", Amercan |ourna of Pubc Heath, 77, 9, 1162-6.
Emut, D., Kathawaa, Y., 1997, " n o4er4ie5 of 6enchmar1ing #rocess3 a tool for
continuous im#ro4ement and com#etiti4e ad4antage ", Benchmarkng for Ouaty
Management & Technoogy, 4, 4, 229-43.
Freytag, P.V., Hoensen, S., 2001, " The #rocess of 6enchmar1ing, 6enchlearning, and
6enchaction ", The TOM Magazne, 13, 1, 25-33.
Gautreau, A., Kener, B.H., 2001, " Recent trends in #erformance measurement systems 7
the 6alanced scorecard a##roach ", Management Research News, 24, 3/4, 153-6.
Hartocos, A., 2000, " 8 educators accused of encouraging students to cheat ", New York
Tmes, 3 May, B4.
|ont Commsson on Accredtaton of Heathcare Organzatons, 2000, Comprehensve
Accredtaton Manua for Hosptas: The Omca Handbook, |CAHO, Oakbrook Terrace, IL.
Kamnsky, F.C., Maeyeh, |., Provdence, S., Purnton, E., Waryasz, M., 1997, " Using SPC to
analy$e &uality indicators in a healthcare organi$ation ", |ourna of Heathcare Rsk
Management, 17, 4, 14-22.
MacCarthy, B.L., Wasusr, T., 2002, " re4ie5 of non7standard a##lications of statistical
#rocess control 9SPC: charts ", Internatona |ourna of Ouaty & Reabty Management,
19, 3, 295-320.
McKenze, F., Shng, M., 1998, " 4oiding #erformance measurement tra#s3 ensuring
efecti4e incenti4e design and im#lementation ", Compensaton and Benets Revew, 30,
4, 57-64.
Madu, C.N., Kue, C., 1998, " ##lication of data en4elo# analysis for 6enchmar1ing ",
Internatona |ourna of Ouaty Scence, 3, 4, 320-7.
Maeyeh, |., Kamnsky, F.C., 2002, " Si/ Sigma and introductory statistics education ",
Educaton + Tranng, 44, 2, 82-9.
Maeyeh, |., Kamnsky, F.C., |ubnve, A., Fenn, C.A., 2001a, " guide to using
#erformance measurement systems for continuous im#ro4ement ", |ourna for Heathcare
Ouaty, 23, 4, 33-7.
Maeyeh, |., Kamnsky, F.C., Schwartz, E., Bombarder, G., Fenn, C.A., 2001b, " nalysis of
hos#ital mortality for continuous im#ro4ement ", |ourna of Heathcare Rsk Management,
21, 2, 23-31.
Natona Insttute of Standards and Technoogy, 2002, Crtera for Performance
Exceence, Badrge Natona Ouaty Program, NIST, Gathersburg, MD.
Neey, A., Gregory, M., Patts, K., 1995, " Performance measurement systems design3 a
literature re4ie5 and research agenda ", Internatona |ourna of Operatons & Producton
Management, 15, 4, 80-116.
Persco, |. |r, McLean, G.N., 1994, " Manage 5ith 4alid rather than in4alid goals ", Ouaty
Progress, Apr, 49-53.
Procurement Executves' Assocaton, 1999, Gude to a Baanced Scorecard Performance
Management Methodoogy, Procurement Executves' Assocaton, avaabe at:
http://oamweb.osec.doc.gov/bsc/gude.htm.
Pun, K.F., Chn, K.S., Lau, H., 2000, " ;<=>hoshin a##roach for ser4ice &uality
de#loyment3 a case study ", Managng Servce Ouaty, 10, 3, 156-70.
Shah, |., Sngh, N., 2001, " 0enchmar1ing internal su##ly chain #erformance3
de4elo#ment of a frame5or1 ", The |ourna of Suppy Chan Management, 37, 1, 37-47.
Swnde, D., Key, |.M., 2000, " Lin1ing citi$en satisfaction data to #erformance
measures ", Pubc Performance & Management Revew, 24, 1, 30-52.
Treadway, |., 2000, " In?ated scores discussed at education conference ", Tmes-Pcayne,
Apr 25, B1.
Wash, P., 2000, " Targets and ho5 to assess #erformance against them ", Benchmarkng:
An Internatona |ourna, 7, 3, 183-99.
Zar, M., Ahmed, P.Z., 1999, " 0enchmar1ing maturity as 5e a##roach the millennium@ ",
Tota Ouaty Management, 10, 4/5, 810-6.
Zar, M., Youssef, M.A., 1995, " ;uality function de#loyment3 a main #illar for successful
total &uality management and #roduct de4elo#ment ", Internatona |ourna of Ouaty &
Reabty Management, 12, 6, 9-23.
Appendi# $% p chart formulae
For subgroup i, et /
i
be the number of tmes an event occurred and et n
i
be the number
of opportuntes for the event to occur. For each of the k subgroups, cacuate #
i
to be the
rato of /
i
and n
i
, and pot the vaues of #
i
on the contro chart. Then, pot the proporton
of tme the event occurred over a subgroups, cacuated as foows:Equaton 2Then, pot
the ower contro mt (LCL) and the upper contro mt (UCL), cacuated as
foows:Equaton 3For person 1 n the con tossng exampe, the LCL for each subgroup
s:Equaton 4and the UCL for each subgroup s:Equaton 5
Appendi# &% 'ne(sample hypothesis test for proportions
For the entty, cacuate A (the tota sampe sze over a subgroups) as foows:Equaton
6Then, cacuate the $-score for an entty as foows (where T s the target
proporton):Equaton 7For person 1 n the con tossng exampe, A = 500 and the
correspondng $-score s:Equaton 8

You might also like