You are on page 1of 12

BEHAVIORAL RESEARCH IN ACCOUNTING

Volume 17, 2005


pp. 4353

Relative Weighting of Common and Unique


Balanced Scorecard Measures by
Knowledgeable Decision Makers
William N. Dilla
Iowa State University
Paul John Steinbart
Arizona State University
ABSTRACT: Prior research has found that decision makers with limited experience in
using the Balanced Scorecard (BSC) ignored measures that reflect the unique strategy
of a business unit and based their performance evaluations solely on measures common across units. The purpose of this study is to investigate whether decision makers
who have had training and experience in designing BSCs exhibit the same behavior.
Results of an experiment show that decision makers who are knowledgeable about the
BSC attended to both common and unique measures, but placed greater emphasis on
the former. These results hold in both a performance evaluation judgment and in a
bonus allocation decision. We attribute these results to the knowledge participants
acquired through classroom training on the design of the BSC, but cannot rule out an
alternative explanation that our results differ from previous research because participants in our study were undergraduate accounting and information systems majors,
rather than M.B.A. students.

INTRODUCTION
he Balanced Scorecard (BSC) was introduced more than a decade ago (Kaplan and Norton
1992) and has been widely adopted by both large (Silk 1998) and small companies (Frigo and
Krumwiede 2000). A major attraction of the BSC is that it is designed to provide a multidimensional view of organizational performance. Moreover, BSC proponents argue that it can, and
should, be used not only to evaluate performance, but also as a tool for implementing and monitoring
strategy (Kaplan and Norton 1996a, 2001a, 2001b). This means that the BSC for each business unit
should typically include both measures that are common across business units (e.g., sales and income
targets, cycle time, etc.) and also measures that reflect the goals unique to each business unit.
Recent research, however, identifies a potential problem in using the BSC to evaluate performance. Lipe and Salterio (2000) found that M.B.A. students assigned the role of superior managers
ignored unique BSC measures when evaluating divisional performance. That finding is troubling
because managers may not give much attention to factors that they perceive as not affecting their

We thank Marlys Lipe and Steven Salterio for sharing their experimental instruments with us. Thanks also to Diane Janvrin,
Cynthia Jeffrey, Steve Kaplan, Casey Rowe, Steven Salterio, and workshop participants at Iowa State University and the
2002 ABO Midyear Meeting for comments on earlier versions of this paper.

43

44

Dilla and Steinbart

compensation (Holmstrom and Milgrom 1991). If unique measures reflect key facets of a units
strategy, then inattention to them undermines the usefulness of the BSC as a strategic management
system.
Lipe and Salterio (2000, 295) acknowledge that their results may reflect their participants lack
of experience with the BSC. Indeed, BSC proponents argue that training and instruction about the
theory and design of the BSC is essential for successful implementation (e.g., Niven 2002). Research
addressing other accounting-related issues shows that task-related knowledge affects judgment performance (Bonner and Lewis 1990; Libby and Luft 1993). For example, more knowledgeable
decision makers weight cues differently than do less knowledgeable decision makers, especially on
complex, less structured tasks (Bonner 1990). The BSC typically contains four to seven measures in
each of four categories. This large number of cues, coupled with the lack of normative guidelines to
evaluate the accuracy of performance evaluations, makes use of the BSC to evaluate performance a
complex, relatively unstructured task. Therefore, decision makers level of knowledge and understanding of the BSC is likely to influence how they use common and unique measures to evaluate
divisional performance.
Lipe and Salterio (2000) used participants who had no prior work experience with the BSC. This
design choice strengthened experimental control by ensuring that all participants had a common
level of prior knowledge. Yet, as noted above, we believe that participants relative lack of exposure
to the BSC may account for the finding that they ignored unique measures when using it to make
decisions. Therefore, we extend Lipe and Salterios (2000) research by using participants who have
more than a cursory understanding of the BSC. Participants in our study learned about the BSC
through lectures and readings and by developing actual BSCs for two different organizations. In
order to avoid creating demand effects, this training did not provide specific information about how
to weight and combine the various BSC measures to evaluate performance. It did, however, ensure
that participants in our study possessed the same basic level of knowledge about the BSC. To further
control for individual knowledge differences, participants in our study, like those used by Lipe and
Salterio (2000), did not have any prior work experience with the BSC.
BSC proponents also argue that it should be linked to compensation (Frigo and Krumwiede
2000; Kaplan and Norton 2001b; Niven 2002), and the importance of linking performance measurement systems to rewards is acknowledged in management accounting research (Otley 1999, 366). A
study of BSC use in a large manufacturing company (Malina and Selto 2001) concluded that
managers were more satisfied with the BSC when it was linked to meaningful rewards. Lipe
and Salterio (2000) only examined the use of the BSC to evaluate performance. Therefore, a second
contribution of this study is to investigate how knowledgeable decision makers use common and
unique BSC measures in their bonus allocation judgments. We predict that knowledgeable decision
makers will attend to both common and unique measures both when evaluating performance and
when allocating bonuses.
The next section of this paper presents the background for and theoretical development of this
study. The third and fourth sections describe the experimental method and present our results,
respectively. The final section discusses our findings and their implications for practice and future
research.
BACKGROUND AND HYPOTHESES
Judgments made using the BSC involve multiple attributes, as a BSC typically contains from 16
to 28 measures. Research in psychology and consumer behavior shows that on multi-attribute judgment tasks, decision makers use both measures that are common across alternatives and measures
unique to each alternative, but place greater weight on the former (Kivetz and Simonson 2000;
Markman and Medin 1995; Slovic and MacPhillamy 1974; Zhang and Markman 1998, 2001). In
contrast, Lipe and Salterio (2000) (hereafter Lipe and Salterio) found that decision makers totally
ignored unique measures when using the BSC to evaluate subordinates performance.
Behavioral Research in Accounting, 2005

Relative Weighting of Common and Unique Balanced Scorecard Measures

45

Differences in task complexity and participants knowledge may explain why Lipe and Salterios
cue utilization results differed from psychology and consumer behavior research. In the latter types
of research, participants are typically provided a limited amount of information, presented in a
familiar format such as a tabular matrix. They are asked to respond to familiar scenarios, such as
making consumer choices or predicting student performance. In contrast, the BSC is a complex
report that differs from traditional reports used to evaluate performance in that it includes both
common and unique measures organized by financial and nonfinancial attributes. Moreover, Lipe
and Salterios participants had little prior experience with the BSC, so use of the BSC represented a
relatively novel task even for those participants who had prior experience in evaluating subordinates
performance. Therefore, Lipe and Salterios results may reflect how decision makers initially resort
to simplifying strategies when using the BSC. Decision makers who are more familiar with the BSC
are expected to behave differently.
Audit judgment research shows that greater knowledge results in better judgment performance
across a wide variety of audit tasks (Bonner 1990; Bonner and Lewis 1990; Libby and Tan 1994).
One benefit of additional task-specific knowledge is that it may facilitate comparing alternatives that
possess dissimilar features (Alba and Hutchinson 1987). This is particularly relevant to judgments
based on the BSC. BSCs for different departments are likely to contain some measures in common,
but other measures that are unique to a specific department, and thus possess both similar and
dissimilar features. The better a decision maker understands the theory and structure of the BSC, the
easier it should be to incorporate both common and unique measures when comparing and evaluating
each departments performance.
Formal training is one way to acquire knowledge. Audit judgment research has found that the
performance of experienced auditors is positively correlated with the extent of task-specific training
they received (Bonner 1990; Bonner and Pennington 1991, 32). This finding supports the claims of
BSC proponents that instruction on the theory underlying the BSC and its structure is a necessary
prerequisite to its successful implementation (Niven 2002). Multiple methods for acquiring that
knowledge exist: seminars (e.g., by The Balanced Scorecard Initiative, see http://www.bscol.com),
books (Kaplan and Norton 1996b, 2001c), and journal articles (Kaplan and Norton 1992, 1993,
1996a, 2001a, 2001b). These knowledge-delivery methods all provide explanations of the structure
of the BSC and examples of BSCs for a variety of different kinds of organizations. The preceding
discussion suggests that such generic knowledge about the BSC should enable decision makers to
more fully utilize all the information contained therein (i.e., both common and unique measures)
when using it to evaluate performance and allocate compensation.
Nevertheless, research on multi-attribute judgment tasks indicates that while decision makers
performing familiar tasks do attend to both common and unique measures, they place greater weight
on the former than the latter (Kivetz and Simonson 2000; Markman and Medin 1995; Slovic and
MacPhillamy 1974; Zhang and Markman 1998, 2001). One reason for this is that it is easier to
compare alternatives on common measures because the attribute values are represented on the same
scale. This enables the decision maker to directly evaluate the relative ranking of each alternative. In
contrast, comparing alternatives on unique measures is more complex. The decision maker needs an
absolute scale for each unique measure, in order to be able to evaluate performance on that dimension. In addition, the relative importance of each unique attribute must also be established in order to
evaluate whether superior performance on unique attribute A is worth more than superior performance on unique attribute B. In the case of judgments based on the BSC, this suggests that even
decision makers with experience in building BSCs and knowledgeable about its structure will still
place greater weight on common than on unique measures. This leads to our first hypothesis:
H1:

Decision makers with experience in building BSCs and knowledgeable about its
structure will use both common and unique measures when making performance
evaluation decisions, but will place greater emphasis on common than unique
measures.
Behavioral Research in Accounting, 2005

46

Dilla and Steinbart

The BSC was originally introduced as a performance measurement tool (Kaplan and Norton
1992), but has subsequently been offered as a means of implementing and monitoring strategy
(Kaplan and Norton 1996a). Moreover, BSC proponents argue that its usefulness as a strategic
management tool is enhanced when it is linked to compensation decisions (Frigo and Krumwiede
2000; Kaplan and Norton 2001b). One type of compensation decision involves allocating bonuses.
On the surface, bonus allocations appear to differ markedly from performance evaluations because
the former require explicit comparisons across ratees, whereas the latter do not. Managers, however,
usually are responsible for evaluating multiple subordinates and typically evaluate them simultaneously. Consequently, when reviewing information about each subordinates individual performance, managers are likely to make implicit comparisons between subordinates. Thus, both
performance evaluation and bonus allocation decisions are likely to involve comparisons across
divisions, and therefore may be made in a similar manner. This leads to our second hypothesis:
H2:

Decision makers with experience in building BSCs and knowledgeable about its
structure will use both common and unique measures when making bonus allocation
decisions, but will place greater emphasis on common than unique measures.

EXPERIMENTAL DESIGN AND PROCEDURE


Task
The experiment used Lipe and Salterios (2000) experimental task, with minor modifications.1
Participants were asked to assume the role of a senior executive whose task was to evaluate the
performance of two divisions of WCS Incorporated, a firm specializing in womens apparel. They
read a case that described the mission and organizational structure of WCS Incorporated, its desire to
implement the BSC concept, and explanations of the specific strategies and goals of the companys
two largest divisions: RadWear, a retail division specializing in clothing for the urban teenager; and
WorkWear, a division that sells business uniforms directly to clients. The case provided separate
BSCs for each division. Each BSC consisted of 16 measures, four in each section. Eight of the
measures, two in each section, were common to both divisions; the other eight measures reflected the
unique strategy of each division. Table 1 displays a sample BSC for the RadWear division.
After reading the case, participants made two sets of judgments. First, they evaluated the
performance of each divisional manager, using a 101-point scale anchored at 0 = reassign and 100
= excellent. Second, they allocated a $20,000 bonus to the two managers. After making both judgments, participants answered a series of questions designed to collect demographic information and
measure their perceptions about the task.
Design and Procedure
The experiment employed a 2 2 between-subjects design, along with a two-level withinsubject factor, so that the complete design is 2 2 2. The first between-subjects factor is the
divisions relative performance on the measures it shared in common with the other division. This
factor had two levels: RadWear could perform better than WorkWear on the common measures, or
vice versa. The second between-subjects factor was each divisions relative performance on the
unique measures included in the BSC. This factor also had two levels: either RadWear or WorkWear
could perform better than the other division on unique measures. Each subject evaluated the performance of the two divisions, RadWear and WorkWear; thus, division is the within-subject factor.
1

We used Lipe and Salterios (2000) case, with permission. We made a minor modification to the WorkWear division. In
Lipe and Salterios instrument, WorkWear had decided to print a catalog to support its sales of uniforms; in our case,
WorkWear had established a website to support its sales of uniforms. The corresponding measures were changed in our
instruments to agree with the change in context. Except for these minor changes in content, our display formats are
identical to those used by Lipe and Salterio (2000).

Behavioral Research in Accounting, 2005

Relative Weighting of Common and Unique Balanced Scorecard Measures

47

TABLE 1
Example of RadWear Divisional Balanced Scorecard
Measure
Financial
1. Return on sales
2. New store sales
3. Sales growth
4. Market share relative to retail space
Customer-Related
1. Mystery shopper program rating
2. Repeat sales
3. Returns by customers as % of sales
4. Customer satisfaction rating
Internal Business Processes
1. Returns to suppliers
2. Average major brand names/store
3. Average markdowns
4. Sales from new market leaders
Learning and Growth
1. Average tenure of sales personnel
2. Hours of employee training/employee
3. Stores computerizing
4. Employee suggestions/employee

Target

Actual

% Better
than Target

24%
30%
35%
$80

26%
32.5%
38%
$86.85

8.33
8.33
8.57
8.56

85
30%
12%
92%

96
34%
11.6%
95%

12.94
13.33
3.33
3.26

6%
32
16%
25%

5%
37
13.5%
29%

16.67
15.63
15.63
16.00

1.4
15
85%
3.3

1.6
17
90%
3.5

14.29
13.33
5.88
6.06

The experiment took place during a regularly scheduled class session. Participation was voluntary. Participants could earn 10 points extra credit (approximately 1.5 percent of the course grade) by
participating; those who did not wish to participate could complete an alternative project to earn the
extra credit points.
Participants
Forty-three undergraduate students enrolled in a required accounting information systems course
participated in the experiment. Twenty-four were accounting majors and 19 were information systems majors. 62.8 percent were male. Mean full-time work experience was 3.1 years, with a range
from 0 to 19 years. The distribution of work experience is somewhat skewed, in that 45 percent of
participants had one year or less of full-time work experience.
The experiment took place after the topic of the BSC had been covered in class and after
participants were tested on their knowledge about the BSC. Two class sessions were devoted to
the BSC. Participants read Kaplan and Nortons (2001a, 2001b) two Accounting Horizons
articles, as well as an SAP white paper (Norton et al. 2001) that described and illustrated an oil
companys experience in designing and implementing a BSC. As an in-class exercise, participants developed a BSC for a nonprofit organization. In addition, as part of a team project,
participants developed a BSC for a local business. Finally, the midterm exam included a problem in which participants had to use a mission statement and set of goals for a wholesale
distributor to develop a BSC for that organization.

Behavioral Research in Accounting, 2005

48

Dilla and Steinbart

Mean and median scores on the midterm exam BSC problem were 22.2 and 24 points out of
28, with a range from 14 to 28.2 Participants also reported whether they were familiar with the
concept of a BSC and had a good understanding of the BSC structure, on an 11-point scale ranging
from 5 = strongly disagree to 5 = strongly agree. Mean responses were 3.33 for familiarity and
3.52 for understanding, which are both greater than the response scale midpoint (p < .001).3 Thus,
participants in our experiment appear to be familiar with and knowledgeable about the BSC.
Manipulation Checks
The post-experimental questionnaire contained three questions that served as manipulation
checks. Responses to these were on the same 11-point scale used for reporting understanding of and
familiarity with the BSC. Participants recognized that RadWear and WorkWear were targeting
different markets (mean = 3.26), employed some different performance measures (mean = 3.00), and
that it was appropriate for them to use some different performance measures (mean = 3.57). All three
means were greater than the scale midpoint (p < .001).4
RESULTS
Performance Evaluations
To test H1, we conducted a 2 2 2 ANOVA on divisional performance evaluations with
common and unique performance measures as between-subjects variables and division as a repeated
measure. The results appear in Panel A of Table 2. The significant interactions between division and
both common (F(1, 39) = 34.066, p < .001) and unique (F(1, 39) = 6.380, p = .016) measures
indicate that both common and unique measures influenced participants divisional evaluations. To
assess the relative magnitude of the division by common and division by unique effects, we computed eta-squared values.5 Eta-squares are higher for the division by common effect (.419) than for
the division by unique effect (.078). Further, Panel B of Table 2 shows that when common measures
favor RadWear, the RadWear managers evaluations are 9.04 points higher on average than the
WorkWear managers. When common measures favor WorkWear, the WorkWear managers evaluations are 6.59 points higher on average than the RadWear managers. When unique measures favor
RadWear (WorkWear), the mean difference in manager evaluations is smaller, averaging 4.55 (2.42)
points. The results therefore support H1, which predicts that decision makers with experience in
building BSCs and knowledgeable about its structure will use both common and unique measures
when making performance evaluation decisions, but will place greater emphasis on common than
unique measures.
2

3
4

Participants in the experimental conditions where the WorkWear division performed better on common measures had a
higher mean score on the midterm exam BSC problem (23.3) than those in the conditions where the RadWear division
performed better on common measures (20.9) (p = .08). We re-analyzed the divisional performance and bonus allocation
data including performance on the midterm exam BSC problem as a covariate. None of the results in these analyses were
substantively different from those reported in the results section.
There were no significant differences across experimental conditions for self-reported familiarity with and understanding
of the BSC (p > .10).
Mean responses for the targeting different markets and employed different performance questions did not differ across
experimental conditions (p > .10). There was a significant common by unique factors interaction for the appropriate to
use different performance measures question (p = .03). Participants in the experimental conditions where common and
unique measures showed a mixed pattern of performance (e.g., RadWear performing better on common measures and
WorkWear performing better on unique measures) had lower agreement with this statement on average than participants
in conditions where the same division was better on both common and unique measures. We re-analyzed the divisional
performance and bonus allocation data including response to the appropriateness of using different performance measures question as a covariate. None of the results in these analyses were substantively different from those reported in the
results section.
Eta-squared values were calculated by dividing the sum of squares for each interaction by the total sum of squares for the
division within-subjects effect (Kirk 1995).

Behavioral Research in Accounting, 2005

Relative Weighting of Common and Unique Balanced Scorecard Measures

49

TABLE 2
Results for Managers Performance Evaluations
Panel A: Results of a 2 2 2 Repeated Measures ANOVA of Evaluations of the Performance of
RadWear and WorkWear Division Managers
Source
df
SS
MS
F
p
Between-Subjects
Common
Unique
Common Unique
Error
Within-Subjects
Division
Division Common
Division Unique
Division Common Unique
Error

1
1
1
39

159.645
54.324
70.953

159.645
54.324
70.953

1.021
0.348
0.454

0.318
0.559
0.504

1
1
1
1
39

29.978
1296.718
242.838
42.616
1484.545

29.978
1296.718
242.838
42.616
38.065

0.788
34.066
6.380
1.120

0.380
0.000
0.016
0.297

Panel B: Evaluations of the Performance of RadWear and WorkWear Division Managersa


Difference
RadWear
WorkWear
RadWear WorkWear
Common Measures
Favor RadWear
Favor WorkWear
Unique Measures
Favor RadWear
Favor WorkWear

a
b

82.14b
(9.02)
71.59
(9.56)

73.10
(8.87)
78.18
(11.71)

77.73
(12.02)
75.91
(9.12)

73.18
(10.53)
78.33
(10.29)

9.04
6.59

4.55
2.42

Evaluations made on a 101-point scale, with 0 labeled Reassign and 100 labeled Excellent.
Panel values are means (standard deviations). Common measures appear on both divisions balanced scorecards;
Unique measures appear on only one divisions balanced scorecard. Favor RadWear indicates the measures were higher
for the RadWear division than the WorkWear division. Favor WorkWear indicates the measures were higher for the
WorkWear division than the RadWear division.

Bonus Allocations
To test H2, we conducted a 2 2 ANOVA on the RadWear bonus allocations. It was not
necessary to analyze WorkWear bonuses, as the two bonuses summed to a fixed amount.6 Therefore,
the ANOVA only incorporates the between-subjects factors of common and unique performance
measures. The results appear in Panel A of Table 3. There are both significant common (F(1,39) =
24.828; p < .001) and unique (F(1,39) = 7.881; p = .008) main effects, indicating that participants
incorporated both common and unique measures into their bonus allocations. Eta-squares are higher
for common (.341) than for unique (.108) main effects. Panel B of Table 3 shows that the mean
RadWear bonus allocation is $11500.00 when common measures favor RadWear, as opposed to
$9090.91 when common measures favor WorkWear, or a difference of $2409.09. In contrast, the
6

All participants bonus allocations correctly summed to the stated amount.

Behavioral Research in Accounting, 2005

50

Dilla and Steinbart

TABLE 3
Results for Bonus Allocations
Panel A: Results of a 2 2 ANOVA of RadWear Bonus Allocations
Source
Common
Unique
Common Unique
Error

df

SS

MS

1
1
1
39

60997838
19361475
922450
95815909

60997838
19361475
922450
2456818

24.828
7.881
0.375

p
0.000
0.008
0.544

Panel B: RadWear and WorkWear Bonus Allocationsa


Common Measures
Favor RadWear
Favor WorkWear
Unique Measures
Favor RadWear
Favor WorkWear

a
b

RadWear

WorkWear

11500.00b
(1732.05)
9090.91
(1637.63)

8500.00
(1732.05)
10909.09
(1637.63)

10954.55
(1889.25)
9547.62
(2030.42)

9045.45
(1889.25)
10452.38
(2030.42)

Table values are in dollars. Participants allocated a $20,000 bonus pool.


Panel values are means (standard deviations). Common measures appear on both divisions balanced scorecards;
Unique measures appear on only one divisions balanced scorecard. Favor RadWear indicates the measures were higher
for the RadWear division than the WorkWear division. Favor WorkWear indicates the measures were higher for the
WorkWear division than the RadWear division.

mean difference in RadWear bonuses when unique measures favor RadWear as opposed to WorkWear
is $1406.93 ($10954.55 minus $9547.62). The results therefore support H2, which predicts that
decision makers with experience in building BSCs and knowledgeable about its structure will use
both common and unique measures when making bonus allocations, but will place greater emphasis
on common than unique measures.
SUMMARY AND DISCUSSION
This paper reports the results of an experiment investigating judgments made with the BSC
by decision makers who were knowledgeable about its design and structure. Our principal finding is
that decision makers with training in BSC theory and development used both common and unique
BSC measures to evaluate subordinates performance and to allocate bonuses, but placed greater
emphasis on common than on unique measures when making both judgments. This result is consistent with research in psychology and consumer behavior, but contrary to Lipe and Salterios (2000)
finding that decision makers ignored unique measures when using the BSC to evaluate the performance of two business units. To provide a further basis of comparison between Lipe and Salterios
results and ours, we used the sum-of-squares measures reported in their Table 3 to compute
eta-squares for their performance evaluation data. Eta-squared for their division by common measures effect was .353, compared to .419 in our results. Eta-squared for their division by unique
measures effect was only .012, compared to our observed value of .078. Thus, participants in our
study appeared to rely more heavily on both common and unique BSC measures when making
performance evaluation judgments than did participants in Lipe and Salterio.

Behavioral Research in Accounting, 2005

Relative Weighting of Common and Unique Balanced Scorecard Measures

51

There are two possible explanations for why participants in our study used unique BSC measures to evaluate performance, whereas Lipe and Salterios did not. One is differences in participants level of knowledge about the BSC. Participants in our study had studied the theory underlying
the design and structure of the BSC, had practiced building BSCs for several different organizations,
and had been tested on their understanding of the BSC. In contrast, Lipe and Salterios participants
had not received as extensive background training on the BSC. Thus, Lipe and Salterios results may
reflect how decision makers initially use the BSC, whereas our results show behavior after training in
and experience with the BSC. Alternatively, the difference between our results and Lipe and Salterios
may reflect differences in participant demographics. Participants in our study were all either accounting or information systems majors, whereas Lipe and Salterios participants were M.B.A.
students from a variety of backgrounds. Thus, our results may not be due as much to a better
understanding of the decision tool (the BSC) as to the fact that information preparers have a greater
facility for working with numerical data and therefore are more likely to consider all the data
presented in the BSC.
Our experimental design does not enable us to definitively choose between these alternative
explanations of the difference between our results and Lipe and Salterios. Nevertheless, there are
good reasons to favor the knowledge and training explanation. First, our results are consistent with
psychology and consumer behavior research findings that for multi-attribute decision tasks decision
makers use both common and unique attributes to evaluate and choose among a set of alternatives.
Second, audit judgment research suggests that even experienced decision makers may need training
in order to demonstrate superior performance on complex tasks (Bonner 1990).
Our finding that decision makers attended to, but underweighted, unique BSC measures both
when using the BSC to evaluate performance and to allocate bonuses suggests that this behavior may
be systemic to decision making with the BSC. Although such unequal weighting of common and
unique attributes is consistent with multi-attribute decision making in a variety of other settings, it is
problematic for the success of the BSC as a strategic management tool. Agency theory suggests that
managers will focus attention on those aspects of performance that most directly affect their compensation (Holmstrom and Milgrom 1991). Consequently, the results of both Lipe and Salterio and this
study suggest that managers will devote more effort to improving performance in the activities
related to common measures than to activities related to unique measures. If the latter reflect key
aspects of the organizations strategy, then this imbalance may undermine the effectiveness of using
the BSC as a strategic management tool.
In addition, unequal weighting of BSC measures may limit its effectiveness as a performance
measurement tool. Interestingly, the BSC literature (Kaplan and Norton 1996a, 1996b, 2001a;
Norton et al. 2001) suggests that the measures in the financial section of the BSC deserve greater
attention because they serve to validate the theory underlying the goals and measures included in the
other three sections. Gering and Mntambo (2002, 37) go further and argue that weighting all four
sections equally will cause the BSC to fail. Yet, Ittner et al. (2003) found that managers dissatisfaction with the unequal weighting of BSC measures in compensation decisions at a major financial
institution was so high that the organization quit linking the BSC to compensation. Although Ittner
et al.s (2003) results concerned the relative weighting given to different sections of the BSC, they
suggest that the unequal weighting of measures within a given section of the BSC found in this study
may create similar negative reactions to use of the BSC to evaluate managerial performance.
In conclusion, our results add to the evidence that decision makers do not equally weight all
measures in the BSC when making decisions. Although prescriptive discussions of the BSC in both
the academic and practitioner literature suggest that such behavior may be desirable, agency theory
and limited empirical evidence suggest otherwise. Clearly, there is need for additional research to
investigate how decision makers use the BSC and whether the manner in which it is used affects its
utility as both a performance evaluation and strategic management tool.
Behavioral Research in Accounting, 2005

52

Dilla and Steinbart

REFERENCES
Alba, J. W., and J. W. Hutchinson. 1987. Dimensions of consumer expertise. Journal of Consumer Research 13:
411454.
Bonner, S. E. 1990. Experience effects in auditing: The role of task-specific knowledge. The Accounting Review
65 (1): 7292.
, and B. L. Lewis. 1990. Determinants of auditor expertise. Journal of Accounting Research (Supplement): 120.
, and N. Pennington. 1991. Cognitive processes and knowledge as determinants of auditor expertise.
Journal of Accounting Literature 10: 150.
Frigo, M. L., and K. R. Krumwiede. 2000. The balanced scorecard: A winning performance measurement system.
Strategic Finance (January): 5054.
Gering, M., and V. Mntambo. 2002. Parity politics. Financial Management (February): 3637.
Holmstrom, B., and P. Milgrom. 1991. Multitask principal-agent analyses: Incentive contracts, asset ownership,
and job design. Journal of Law, Economics, and Organization 7: 2452.
Ittner, C. D., D. F. Larcker, and M. W. Meyer. 2003. Subjectivity and the weighting of performance measures:
Evidence from a balanced scorecard. The Accounting Review 78 (3): 725758.
Kaplan, R. S., and D. P. Norton. 1992. The balanced scorecard: Measures that drive performance. Harvard
Business Review (January-February): 7179.
, and . 1993. Putting the balanced scorecard to work. Harvard Business Review (SeptemberOctober): 134147.
, and . 1996a. Using the balanced scorecard as a strategic management system. Harvard Business
Review (January-February): 7585.
, and . 1996b. The Balanced Scorecard: Translating Strategy into Action. Boston, MA: Harvard
Business School Press.
, and . 2001a. Transforming the balanced scorecard from performance measurement to strategic
management: Part I. Accounting Horizons 15 (March): 87104.
, and . 2001b. Transforming the balanced scorecard from performance measurement to strategic
management: Part II. Accounting Horizons 15 (June): 147160.
, and . 2001c. The Strategy-Focused Organization: How Balanced Scorecard Companies Thrive
in the New Business Environment. Boston, MA: Harvard Business School Press.
Kirk, R. E. 1995. Experimental Design: Procedures for the Behavioral Sciences. 3rd edition. Pacific Grove, CA:
Brooks/Cole.
Kivetz, R., and I. Simonson. 2000. The effects of incomplete information on consumer choice. Journal of
Marketing Research 37 (November): 427448.
Libby, R., and J. Luft. 1993. Determinants of judgment performance in accounting settings: Ability, knowledge,
motivation, and environment. Accounting, Organizations and Society 18 (5): 425450.
, and H. Tan. 1994. Modeling the determinants of audit expertise. Accounting, Organizations and Society
19 (8): 701716.
Lipe, M. G., and S. E. Salterio. 2000. The balanced scorecard: Judgmental effects of common and unique
performance measures. The Accounting Review 75 (July): 283298.
Malina, M. A., and F. H. Selto. 2001. Communicating and controlling strategy: An empirical study of the
effectiveness of the balanced scorecard. Journal of Management Accounting Research 13: 4790.
Markman, A. B., and D. L. Medin. 1995. Similarity and alignment in choice. Organizational Behavior and
Human Decision Processes 63 (August): 117130.
Niven, P. R. 2002. Balanced Scorecard Step-by-Step: Maximizing Performance and Maintaining Results. New
York, NY: John Wiley & Sons.
Norton, D., The Balanced Scorecard Collaborative, Inc., and SEM Product Management, SAP AG. 2001. SAP
Strategic Enterprise ManagementTranslating strategy into action: The balanced scorecard. White paper,
SAP Strategic Enterprise Management.
Otley, D. 1999. Performance management: A framework for management control systems research. Management
Accounting Research 10: 363382.
Silk, S. 1998. Automating the balanced scorecard. Management Accounting 80 (May): 3844.

Behavioral Research in Accounting, 2005

Relative Weighting of Common and Unique Balanced Scorecard Measures

53

Slovic, P., and D. MacPhillamy. 1974. Dimensional commensurability and cue utilization in comparative
judgment. Organizational Behavior and Human Performance 11: 172194.
Zhang, S., and A. B. Markman. 1998. Overcoming the early entrant advantage: The role of alignable and
nonalignable differences. Journal of Marketing Research 35 (November): 413426.
, and . 2001. Processing product unique features: Alignability and involvement in preference
construction. Journal of Consumer Psychology 11 (1): 1327.

Behavioral Research in Accounting, 2005

You might also like