Joint use of accounting and Nonfinancial Information has had mixed results. Research has documented benefits to such use but has also documented significant challenges. Measurement errors that have significant negative impact on some decisions can be innocuous.
Joint use of accounting and Nonfinancial Information has had mixed results. Research has documented benefits to such use but has also documented significant challenges. Measurement errors that have significant negative impact on some decisions can be innocuous.
Joint use of accounting and Nonfinancial Information has had mixed results. Research has documented benefits to such use but has also documented significant challenges. Measurement errors that have significant negative impact on some decisions can be innocuous.
Joan Luft SYNOPSIS: Recent years have seen widespread interest in supplementing or replac- ing accounting information with nonnancial information NFI in a variety of uses such as incentive compensation, prediction of costs and prots, and rm valuation. The joint use of NFI and accounting has had mixed results, however. Research has documented benets to such use but has also documented signicant challenges. This commentary summarizes research that addresses two particularly important challenges in using combinations of accounting and NFI: measuring nonnancial performance accurately and weighting measures appropriately when multiple accounting and nonnancial mea- sures are used together. These challenges are related, in that the nature and magni- tude of measurement error helps to determine appropriate weights on multiple mea- sures. Two common themes appear in strategies for dealing successfully with these challenges. The rst is that matching information properties to decision types can limit the need for costly or infeasible improvements in measurement. Measurement errors that have signicant negative impact on some decisions can be innocuous when the information is used for other decisions. The second theme is a portfolio approach to measurement error: the negative decision effects of error in individual measures can be signicantly mitigated by well-chosen combinations of NFI and accounting measures. INTRODUCTION P roposals to supplement conventional accounting with the use of nonnancial information NFI have exerted a powerful appeal in recent years. Balanced scorecards and similar performance measurement systems have been advocated intensively and are widely used by organizations e.g., Eccles et al. 2001; Kaplan and Norton 2001a, 2001b, 2001c, 2008. Business- risk or strategic-systems audits, which rely on NFI to understand the clients business, have been put forward as a way to conduct efcient high-quality audits in a challenging economic and regulatory environment Bell et al. 2002; Peecher et al. 2007. Financial analysts use NFI to forecast earnings and stock prices Dempsey et al. 1997; Chandra et al. 1999; Rajgopal, Ven- katachalam, and Kotha 2003; Peecher et al. 2007, and the Financial Accounting Standards Board Joan Luft is a Professor at Michigan State University. The author is grateful to Karen Sedatole, Tyler Thomas, two anonymous reviewers, and Ella Mae Matsumura editor for helpful comments. Accounting Horizons Vol. 23, No. 3 American Accounting Association 2009 DOI: 10.2308/acch.2009.23.3.307 pp. 307325 COMMENTARY Submitted: May 2007 Accepted: May 2009 Published Online: August 2009 Corresponding author: Joan Luft Email: luftj@bus.msu.edu 307 FASB has considered mandating the reporting of nonnancial measures along with traditional nancial statements FASB 2001; Maines et al. 2002; Upton 2001. 1 Recent evidence, however, suggests that high initial expectations about the value of NFI were not fullled in many instances. NFI appeared particularly value relevant that is, associated with stock prices for Internet rms in the later 1990s, but this value relevance fell signicantly not to zero, however after the end of the Internet bubble Demers and Lev 2001; Rajgopal, Venkatacha- lam, and Kotha 2003. Many rms that adopted NFI-based incentive systems subsequently dis- carded them e.g., 42 percent in the sample analyzed by HassabElnaby et al. 2005. Recent research on business risk audits has reported considerable unwillingness by auditors to rely on NFI-based approaches Knechel 2007; Curtis and Turley 2007. After relatively intensive consid- eration at the beginning of the decade, the FASB has not acted to mandate NFI reporting. Given the retreat to the nancial that appears in this recent research, a number of questions arise for accountants. First, why, if at all, should accountants be involved with the development and use of NFIrather than, for example, leaving customer-satisfaction measurement to market- ers and employee-morale measurement to human resource specialists? Second, what has been learned from the experience of recent years about the actual benets and challenges of using NFI in conjunction with accounting information? Third, if and when accountants are involved with NFI development and use, what assistance does accounting research provide to deal with the observed challenges in the development and use of NFI? The remainder of this commentary addresses these three issues in turn. NFI AND ACCOUNTING Most organizations use a wide range of data that is important to the organizational mission but falls outside the purview of the organizations nancial function. Accountants typically have little to do with NFI documenting for example procedures for engineering experiments or biometric indicators for high-security employee IDs. What makes selected NFI the business of accountants or users of accounting information? Accounting research summarized below provides evidence that selected NFI can be used both to substitute for and to complement accounting information in tasks for which accounting is typically important, such as forecasts of future nancial performance or evaluation of current performance. Accounting and NFI work together as a portfolio of measures, in which the value of using and rening accounting measures depends on the information properties of NF measures included in the portfolio, and the information value of any specic NF measure depends on the properties of accounting. In consequence, the accountants tasks depend on the properties of NFI as well as of account- ing information. Whether accountants should, for example, devote signicant effort to developing nancial measures of intellectual capital as an input to the valuation of knowledge-intensive rms depends on how cost-effectively NF measures such as patents and publications can provide the same information. In this case, accounting and NF measures are substitutes, and more informative NFI means less need for accountants to develop or users to seek out nancial measures. In contrast, when NFI complements accounting, more use of NFI means more use of account- ing, because accounting is more valuable when used together with NFI than when used alone. For 1 The denition of nancial and nonnancial varies across users. Some regard all measures denominated in dollars or other currency e.g., cost of quality measures as nancial, while measures like defect counts or satisfaction ratings are nonnancial e.g., Nagar and Rajan 2001. Others regard nancial measures as consisting primarily of GAAP earnings and its components and stock prices or returns, while measures like customer protability or cost of quality are nonnancial even though dollar-denominated Kaplan and Norton 2001a; Upton 2001. In general, the observations in this paper apply to NFI identied by either denition. 308 Luft Accounting Horizons September 2009 American Accounting Association example, in Amir and Lev 1996, accounting earnings alone appear irrelevant to stock prices for wireless communication rms; but when NF measures of growth potential are included in the model, earnings become signicantly value relevant. 2 Similarly, in performance evaluation and reward systems, accounting earnings that are imperfect measures of employees actions can be more heavily weighted i.e., more dollars of reward can be provided for a given increase in earnings when appropriate NFI is also included in the reward base Feltham and Xie 1994; Datar et al. 2001. In such cases, more informative NFI means that accountants can more condently advocate the use of earnings or other accounting information in decision making, even though earnings is not a perfect measure of rm, business-unit, or individual performance. Thus accounting and nonnancial measurement can usefully inform each other and do not, in principle, benet from being performed in isolation. But as described in more detail below, results of recent experience with joint use of accounting and NFI have been mixed, and users of multiple- measure systems have expressed a number of specic frustrations with them. BENEFITS AND CHALLENGES OF NFI USE Benets Benets of combining selected NFI with accounting measures have been documented in numerous studies in recent years. The incremental explanatory power for earnings and stock returns provided by NFI has been well established e.g., Amir and Lev 1996; Ittner and Larcker 1998; Hughes 2000; Trueman et al. 2000; Nagar and Rajan 2001; Rajgopal, Venkatachalam, and Kotha 2002, 2003; Rajgopal, Shevlin, and Venkatachalam 2003; Smith and Wright 2004, al- though prior earnings is often a stronger predictor e.g., for stock returns in Francis et al. 2003. Brazel et al. 2007 nd that NFIspecically, inconsistency between patterns in nancial and NF informationis a signicant indicator of nancial fraud. The use of diverse nancial F and NF measures to manage organizations e.g., to allocate resources and reward employees appears to be positively associated with organizational perfor- mance on average. The association is often weak, however; there is considerable variation in the experience of individual organizations, and attempts to predict which types of organizations will benet more from NFI use than others have had mixed results Hoque and James 2000; Banker et al. 2000; Ittner, Larcker, and Randall 2003; Said et al. 2003; Chenhall 2005; Van der Stede et al. 2006. Another stream of studies focuses on associations between NFI and learning, which may not be captured clearly in tests of the direct especially short-term association between NFI use and organizational performance. Chenhall 2005, using survey data, nds that organizational learning mediates the effect of customer-measure use on strategically important customer outcomes. That is, how strongly the use of customer measures is associated with successful customer outcomes depends on how strongly the use of customer measures is associated with organizational learning. Campbell 2008, using archival data from a restaurant chain, nds that NF-based incentives increase performance, and a portion of the performance improvement remains after the incentive is reduced or eliminated, apparently because of nonreversible learning gains. Similarly, experi- mental results in Farrell et al. 2008 indicate that even in settings where leading NF indicators have no incentive value because employees have long horizons, employee performance is higher 2 A discussion of this study by Shevlin 1996 expresses some reservations about the analyses employed, but the general principle that accounting can be more informative when complemented by NFI remains valid. For example, Brazel et al. 2007 nd that a measure developed by combining revenue growth and growth in NF measures e.g., number of employees or facilities is a signicant predictor of fraud; revenue growth without the comparison to NF growth seems unlikely to be an equally valuable fraud indicator. Nonnancial Information and Accounting: A Reconsideration of Benets and Challenges 309 Accounting Horizons September 2009 American Accounting Association when compensation contracts include forward-looking NF measures than when they do not. The inclusion of forward-looking measures appears to induce more focused testing of task strategies, which increases performance over time. 3 Sometimes incentives for current performance conict with learning, but well-chosen NFI can contribute to resolving this conict. Dye 2004 observes that the actions by managers that do the most to increase prot in the current period do not always provide the most valuable information about which actions will improve prots in the future. Thus the evaluation and reward system must carefully balance incentives for performance against incentives for experimentation and learning. 4 Analytic results in Dye 2004 indicate that the better an organizations information system tracks the intermediate outcomes product quality, customer satisfaction, etc. between managers actions and nancial performance, the more worthwhile it is for managers to experi- ment, because they can learn more from the outcomes of their experimentation. Challenges to Effective Use of NFI In spite of the benets of NFI use documented above, attempts to implement systems of F and NF measures have had mixed success. Some of the challenges to effective NFI use are primarily management issues: for example, leadership failures in implementation Kaplan 2006 and failure to link the NFI to strategy Ittner and Larcker 2003 or even agree on a strategy to which NFI could be linked Kasurinen 2002. Other challenges relate to information design and use, however, and thus fall more clearly in the domain of accounting. Two key information-related challenges to effective NFI use are measuring important non- nancial factors and weighting multiple F and NF indicators appropriately in decisions like resource allocation, planning, and performance evaluation and reward. A Deloitte 2007 survey of senior executives and board members identies understanding of how to measure non-nancial drivers of performance as the primary requirement for more successful use of NFI. According to Ittner and Larcker 2003, one of the most common mistakes organizations make in using NFI is incorrect measurement. A number of studies have documented the importance of measurement problems in blocking the development and use of multiple-performance-measure systems e.g., Malina and Selto 2001, 2004; Cavalluzzo and Ittner 2004; Andon et al. 2007. Incorrect weighting of multiple measures can also lead to disappointments with NFI, and determining appropriate weights is likely to be a difcult and conict-ridden process see Malina and Selto 2001, 2004, for examples. 5 Users of NFI are sometimes simply unsure about appro- priate weights; for example, a manager quoted in one eld study says: I dont have a sense of which of these measures have the most leverage compared to others Malina and Selto 2004, 459. Even when users of NFI have more condence about weights on multiple measures, the weights can be systematically mistaken, thus supporting decisions with disappointing outcomes. Daniel and Titman 2006 and Rajgopal, Shevlin, and Venkatachalam 2003 argue that the equity markets systematically overweight selected NFI. In single-rm studies, Ittner, Larcker, and Meyer 3 Results of an experiment by Webb 2004 indicate that incentive effects of contracting on forward-looking measures depend on the prima facie plausibility of the measures effects on future performance, and learning might also be affected by this factor. 4 Note that the tradeoff Dye 2004 describes is not the usual tradeoff between actions that improve current performance and actions that improve future performance, but between actions that improve current performance and actions that reduce managers uncertainty about what will improve future performance. 5 Weights can be explicit in formal prediction models or incentive-compensation formulas, or implicit in subjective evaluations or predictions that are more inuenced by some measures than others. They can also be implicit in other elements of the control system; for example, the intensity with which managers respond to variances on different measures Lillis 2002. 310 Luft Accounting Horizons September 2009 American Accounting Association 2003 and Malina and Selto 2001 nd that managers initially put signicant weights on NFI when a new performance measurement system is introduced but soon redistribute the weight to more traditional nancial or market-share measures, possibly because managers regard the initial weights on NFI as mistakes. Moers 2005, using proprietary information from a large European industrial rm, nds that the use of more diverse performance measures is associated with more lenient and more compressed evaluations. That is, it appears that supervisors weight multiple measures differently for different subordinates to avoid the unpleasant task of giving low evaluations or evaluations that differ much across individuals. Thus both measurement and weighting pose signicant problems for organizations. Popular understanding of measurement and weighting problems sometimes seems limited to a belief that accurate or reliable measures should be chosen, and that more important measures should be weighted more heavily. In this view, when performance is multidimensional e.g., both inno- vation and cost management are important, an appropriate approach would be to choose for each dimension innovation and cost management the most accurate measure that can be acquired or constructed at a reasonable cost and then weighting the measures based on the relative strategic importance of the performance dimensions. Both analytical and empirical research indicate, however, that sometimes inaccurate measures are quite serviceable in decision making, and at other times important measures need to be weighted lightly see below for examples. Hence a more rened approach to measurement and weighting can be helpful, and research suggests two important principles for making such rene- ments. The rst principle is to match information properties appropriately to decisions. The type and magnitude of error that make a measure virtually useless for one decision can be relatively harmless in another. Because error reduction is often costly, identifying the decision settings in which it is more valuable can make NFI use more cost effective. The second principle is to take a portfolio approach in performing this matching: that is, to consider the information properties of the set of F and NF measures used, rather than evaluating the measures one by one. Important dimensions of performance are sometimes difcult to mea- sure, but even a poor measure can be useful if other measures in the set can reduce the rst measures negative effects on decision quality. The remainder of this commentary rst provides brief denitions of decision and measure- ment error types as a basis for successful matching. The following sections then summarize recommendations for measurement of NFI that can be drawn from recent research, followed by recommendations for weighting of F and NF measures in multiple-measure portfolios. DEFINITIONS Decisions Based on Demski and Felthams 1976 distinction between decision-inuencing and decision-facilitating uses of information, decisions are categorized as follows: 1 Performance evaluation for purposes of reward is a decision-inuencing use of informa- tion. Here NFI, in combination with accounting, is used as a basis for determining performance-based rewards such as bonuses, equity compensation, and promotions. A measure or portfolio of measures is better for this purpose, the better it captures em- ployee efforts and talents that increase rm value or other relevant organizational objec- tives, and the less it responds to window-dressing or the occurrence of random exter- nal events that inuence organizational outcomes. 2 Provision of predictor variables is a decision-facilitating use of information. Here NFI is used, for example, as a leading indicator to forecast future nancial performance, and to Nonnancial Information and Accounting: A Reconsideration of Benets and Challenges 311 Accounting Horizons September 2009 American Accounting Association estimate the expected return of alternative investment projects as a basis for choosing between them. A measure or portfolio of measures is better for this purpose when it supports more accurate predictions. Measurement Errors Measurement errors can be usefully divided into two categories, bias and random error. When NFI is upwardly or downwardly biased, on average it overstates or understates actual values. NFI can also, or instead, contain random error noise. If the measure is noisy but unbiased it is accurate on average but overstates or understates in particular instances. MEASUREMENT Measurement issues in the use of NFI take a number of forms. When multiple measures of a particular performance dimension e.g., quality, customer satisfaction are available, criteria are needed for choosing among the available measures. When accountants and managers are con- structing portfolios of F and NF measures, they need to respond to users concerns about the possible inaccuracy of the measures included. See Malina and Selto 2001, 2004 for examples of these problems in a eld study. Methods of addressing these problems include not only reducing the error in a given measure, but also mitigating the negative decision consequences of irreducible error and identifying decision settings in which the particular errors are relatively innocuous and thus NFI can be useful in spite of measurement error. The recommendations below begin with identifying relevant errors and matching them to decisions, and then continue with means of mitigating errors and/or their effects. Identify Error in Measuring the Construct, Not the Indicator A measure such as patent counts is an indicator of an underlying construct such as rm-value- increasing innovation. In economic models of management accounting see, e.g., Lambert 2007, both bias and random error are dened with respect to the underlying construct that the measure intends to capture, not with respect to the indicator. Thus a patent count that correctly reports the number of patents the organization actually received is not error-free in the sense that is relevant here. A focus on accuracy in the indicatorfor example, choosing NF indicators that are precisely countable and discarding those that are notcan lead to disappointing experiences with NF measures that are accurate but are neither good predictors nor good motivators. For predictions a patent count would be error-free in this sense if it measured exactly the innovativeness that generates future expected revenues. For reward decisions a patent count would be error-free if it measured exactly the employees efforts that contributed to this innovativeness. Neither F nor NF measures are likely to be error-free in this sense. Sometimes softer or fuzzier measures such as knowledgeable subjective judgments of innovation quality can be more accurate in measuring the underlying construct and can provide better support for predictions and performance evalua- tions. Care needs to be taken about possible biases in subjective measuresfor example, favoritism in subjective performance evaluationsbut such measures should certainly not be discarded out of hand in favor of harder more objectively countable measures. Analytic research nds that including subjective measures of performance can improve overall performance evaluation and motivation Prendergast 1999, and Gibbs et al. 2004 nd empirical evidence consistent with this prediction, particularly when employees have long tenureperhaps because long tenure is an indicator of their trust in the subjective evaluation system, and/or subjective evaluation is more accurate for employees with longer track records. 312 Luft Accounting Horizons September 2009 American Accounting Association Match Measures to Decisions: Different Errors in Performance Evaluation and Prediction A measure that is inaccurate and unsatisfactory for one decision type can sometimes serve very well for another. For example, a division might generate a large number of high-value patents based largely on work that was done before the present divisional managers arrival. This can be true for some years after the managers arrival in settings where R&D is a slow and cumulative process. The patent count can thus be quite inaccurate as a measure of the current divisional managers contributions to rm value but quite accurate as a predictor of future revenues. Careful matching of measures to decision types based on construct-measurement error avoids two mistakes in judgment about NFI measures. One mistake in this setting would be to regard the patent count as a good measure because it is an excellent predictor, and therefore to insist on using it as a basis for rewarding the manager. The other mistake would be to regard it as a bad measure because it works poorly as a basis for reward, and therefore to fail to use it in predictions. Match Measures to Decisions: Random Error and Prediction Characteristics The predictive power of NFI for future nancial performance or for other NFI is often low, and measurement error in both predictor and predicted variables is one of the sources of this low predictive power. 6 In consequence, the errors of NFI-based predictions will be large; but both the magnitude of the prediction error and the seriousness of its consequences depend on specics of the decision setting. A NF measure with considerable random error in it can still be valuable in making some predictions and need not be discarded; but conversely, the fact that the measure provides valuable predictions in some decision settings does not mean it will provide valuable predictions in other settings. Key elements of the decision setting are the number of observations to be predicted and the use to which the predictions are to be put. Holding the prediction model constant, the expected error in the prediction of mean future nancial performance will be smaller for the mean of a large number of observations than for the mean of a small number or for the prediction of a single observation. Thus a model with an R 2 that provides tolerable error levels in predicting the future prots of a large portfolio of rms can be problematic for predicting the future prots of a small number of business units cf., use of NFI in contemporary budgeting techniques Hansen et al. 2003, or forming expectations as a basis for judging the plausibility of an audit clients unaudited numbers cf., Bell et al. 2002. The consequence of large prediction errors in small-sample predictions depends on whether the decision effects of the errors offset. Consider, for example, an auditor who is responsible for four clients. In this setting, error effects do not offset. Forming too low an expectation of earnings, resulting in unnecessary audit work and conict with one client, does not make up for forming too high an expectation of earnings and failing to nd a signicant error or irregularity at another client. In contrast, consider a manager who forecasts sales of four different product lines and adds up the forecasts to get a total sales revenue forecast, and assume that accuracy of the total sales revenue forecast is the primary objective in this case. In this setting the errors tend to offset. An overstatement in the revenue prediction for one product line is partly offset by an understatement in the revenue prediction for another product line. Holding constant the predictive ability of the 6 Although NFI is a signicant predictor of future nancial performance, the R 2 s of models relating NFI to nancial performance are often low: 1 percent to 5 percent for customer satisfaction in Ittner and Larcker 1998, and single-digit or low double-digit incremental R 2 s for various NFI in Nagar and Rajan 2001, Francis et al. 2003, Amir and Lev 1996, Trueman et al. 2000, and Rajgopal, Venkatachalam, and Kotha 2002, 2003. Lambert 1998 makes the point that measurement error in the NF predictors could account for the low R 2 s of some models. Nonnancial Information and Accounting: A Reconsideration of Benets and Challenges 313 Accounting Horizons September 2009 American Accounting Association model and the sample size four clients or products, the overall error effect is less damaging in the case of the sales forecast. In consequence, different levels of measurement error can be tolerated in the two different settings. Match Measures to Decisions: Innocuous Measurement Biases Bias can seem like a more serious problem than mean-zero random error, because a biased measure does not represent the true value of the underlying construct even on average, while a noisy but unbiased measure does. For some decisions, however, pure bias is a relatively innocuous type of measurement error. The presence of bias in measures need not always be an obstacle to implementing NFI-based performance measurement and prediction, and accepting some bias in return for a reduction in random error can be worthwhile when such tradeoffs are possible. 7 Situations in which bias is relatively harmless are identied separately below for performance evaluation and prediction. Performance Evaluation Performance evaluations and rewards are often based not on the observed NFI measure itself but on the change in the measure or a comparison between the measure and a target Murphy 2000. These practices signicantly mitigate the effect of bias. For example, suppose that a cus- tomer satisfaction measure has an upward bias of two points on a 10-point scale. Perhaps the questions are designed to make customers reect on positive more than negative experiences. When the true value changes from four to six, the reported value changes from six to eight. In this case, the change of two units is an unbiased measure although the absolute levels are biased. Users do not need to know the amount of the bias; they only need assurance that it is stable over time. 8 Similarly, when performance evaluation is based on comparing a measure to a target, the comparison can do much to eliminate the effects of bias. If the target is based on past performance, then the comparison to target is very similar to a change measure. If the target is based on external benchmarks instead, and decision makers have some awareness of the existence and magnitude of the bias, they can set the target accordingly. The more upward bias there is likely to be in the measure, the more the target performance should exceed an unbiased external benchmark. Providing Predictor Variables When NFI is used to provide predictor variables, bias can be harmless as long as the predic- tive models or individuals subjective prediction strategies were developed using previous data with the same bias. The bias will be captured in the intercept of the model, and both the coefcient representing the effect of NFI and the prediction itself will be unbiased. In such settings, the prot increase associated with for example an increase in reported customer satisfaction from six to eight in the past provides a reasonable basis for estimating the prot increase associated with an increase in reported customer satisfaction from six to eight in the future, even though the measure itself is overstated. Reduce Error by Aggregating Multiple Measures It is not always intuitively obvious that total measurement error can be reduced by using more measures that contain error, but this is often a cost-effective way of decreasing random error. Such 7 For example, in sample-based measures like customer satisfaction surveys or defect counts, sample composition and sample size choices can determine the magnitudes of noise and bias and the terms of tradeoff between the two. 8 When as is often the case, a NF measure captures the underlying construct with random error as well as bias, then a change in the reported measure or a comparison to target will still contain error. But it is important to realize in such cases that the random error, not the bias, is the problem that must be addressed. 314 Luft Accounting Horizons September 2009 American Accounting Association reductions can be worthwhile for organizations, because random error has signicantly negative effects on decision making. Random error in predictor variables reduces the accuracy of predic- tions, and random error in employees evaluations reduces the motivation that a given level of monetary incentive provides to risk-averse employees. Random error in performance measures can also reduce the ability of an organization to attract talented but risk-averse individuals, because it reduces their certainty that they will be rewarded for the exercise of their talents. The basic principle of reducing random error by averaging multiple observations is intuitive in some instances. For example, average divisional earnings over several periods are likely to be regarded as being a more reliable measure of a divisional managers talents and efforts than a single periods earnings. Research also provides examples of more sophisticated combinations of measures via statis- tical methods in prediction settings. Rajgopal et al. 2002 use factor analysis to reduce a large number of specic NF information items such as new product introductions and managerial team-building actions. Taken singly, the specic actions are too numerous, diverse, and ambiguous in their implications to be easily used as predictors. But combined into two factors, they explain a substantial portion of the cross-sectional variation in e-commerce rms stock market returns post-IPO, even after controlling for reported earnings and analysts forecasts of future earnings and revenues. Similarly, Demers and Lev 2001 and Dikolli and Sedatole 2007 use factor analysis to combine multiple measures of website performance into two factors that have signi- cant explanatory power for stock returns and future protability of e-commerce rms. Data- reduction techniques of this kind offer considerable promise for reducing random error in NFI measurement. Mitigate Error Effects: Offsetting Deliberate Bias Bias is particularly troubling when it is the result of deliberate actionswindow-dressing or gamingon the part of individuals whose performance is being measured. In such cases it may not be stable across individuals or time, as motivations to introduce bias will vary across indi- viduals and across time. A well-designed portfolio of measures for performance evaluation and reward can reduce intentional bias, however, by including measures on which window-dressing has countervailing effects. That is, actions taken to window-dress one measure and increase the employees reward will tend to make another measure look worse and thereby decrease the employees reward, thus reducing the overall incentive to window-dress. See Feltham and Xie 1994 and Datar et al. 2001 for analyses of the construction of multiple-measure evaluation and reward systems. For example, consider rewarding managers for a particular NF measure, high inventory turn- over, as a measure of effective inventory management. Managers may game the measure, resulting in high values of reported turnover but not effective inventory management, as too-low inventories lead to stockouts, poor customer service, and even reduction in product innovation, because of managers uncertainty about whether radically innovative products will move quickly enough to keep the inventory turnover measure high see examples in Melnyk et al. 2005 and Melnyk et al. forthcoming. Thus gaming of the inventory management measure has negative effects if the measure is used alone. But when it is used as a component in a portfolio of measures that includes GAAP income, managers incentives to game the measure are mitigated. Understocking will improve inventory turnover but reduce income by reducing sales. Conversely, the use of inventory turnover as a measure mitigates the tendency of managers to bias reported income upward through overproduction see Roychowdhury 2006 for evidence of overproduction as an earnings man- Nonnancial Information and Accounting: A Reconsideration of Benets and Challenges 315 Accounting Horizons September 2009 American Accounting Association agement technique. 9 Overproduction will increase current GAAP absorption costing income, but will make inventory turnover worse. The combination of measures limits gaming and moti- vates decisions more congruent with organizational goals. 10 WEIGHTING MULTIPLE MEASURES Just as the problem of measurement does not reduce to a problem of nding the single most accurate measure for a given construct, so the problem of weighting does not reduce to a question of which measures are more important in general. In prediction models using NFI, error in the measures as well as the predictive importance of the underlying constructs can inuence weight estimation. In performance evaluation and reward systems, optimal weights on F and NF measures depend not only on the underlying constructs strategic importance or contribution to rm value, but also on a complex array of factors such as contract length and individuals time horizon Dikolli 2001; Dutta and Reichelstein 2003, product architecture in supply chains Baiman et al. 2001, whether the incentive contract is implicit or explicit Budde 2007, how tasks are bundled togetherfor example, whether one employee is responsible for sales only and another for service only, or each is responsible for a mix of sales and service Hughes et al. 2005and whether the incentive compensation is simply paid out based on measured performance or a bonus pool is determined rst and then divided among employees Rajan and Reichelstein 2006. This section focuses on how measurement propertiesa key concern of accountantsaffect the weighting of NFI and related nancial measures. Measurement-property effects are not always intuitively obvious. For example, Hemmer 1996 shows analytically that adding a measure of customer satisfaction to an incentive system based on accounting earnings can either increase or decrease the optimal weight on earnings, depending on whether the customer satisfaction measure is the mean level of satisfaction or the number of customers that exceed a certain satisfaction threshold. Because appropriate weighting is not always intuitive and weighting decisions are often made subjectively, an important element of effective use of NFI is avoiding common biases in subjective decision making. Hence the recommendations below include notice of potential biases in subjec- tive weighting and techniques for mitigating these biases, insofar as recent research has addressed these issues. Weighting problems and solutions differ considerably between performance evalua- tion and prediction settings, and thus the two settings are presented separately. Weighting in Performance Evaluation Weight High-Random-Error Measures Cautiously, Even When Important The effect of measurement properties on incentive weighting that has received the most attention in accounting research is the negative effect of random measurement error 11 on optimal weights when a measure is used as a basis for rewarding risk-averse employees Lambert 2007. The larger the error typically is in a measure of employees efforts and talents, the more uncertain 9 An alternative solution for the overproduction problem under absorption costing is of course to use variable costing instead, but this is not the best solution in all settings. For example, the public nature of GAAP income can mean that important rewards are attached to absorption-costing income e.g., reputation, career concerns even if variable costing is used internally. Or, if incentive compensation and its basis must be made public, the organization may prefer to use GAAP earnings as the basis because it is already public information. In such settings, adding NFI to the evaluation system may be preferred. 10 Datar et al. 2001 point out that reducing gaming by creating combinations of measures is not always an equally feasible solution: it is more feasible when the number of different activities performed by the individual being evaluated is not large compared to the number of performance measures. 11 In this context, random measurement error means error in the measure as a measure of employee actions, not a measure of quality, innovation, etc. as such. 316 Luft Accounting Horizons September 2009 American Accounting Association and less motivating is the compensation based on the measure, and the less valuable it is for an organization to weight the measure heavily; that is, to pay large amounts for changes in the level of the unreliable measure. Low incentive weights on strategically important NFI measures such as innovation or cus- tomer satisfaction of course reduce the motivational value of the NFI. If there is no way of mitigating the risk created by using unreliable high-error measures, then low weighting is often the lesser of two evils. However, as the following recommendations indicate, there are often ways of mitigating this risk by taking advantage of the portfolio properties of sets of F and NF mea- sures. To the extent that irreducible random error remains, there are potential gains from avoiding common decision errors in dealing with this error, as described in the last set of recommendations for performance-evaluation uses. Use Measures with Negatively Correlated Errors to Allow Higher Weights Random error in both accounting and NFI as measures of employee efforts and talents is often caused by external shocks such as macroeconomic changes. In a well-constructed portfolio of measures, the effects of these shocks on some measures will be negatively correlated with their effects on other measures, resulting in a lower error in the performance evaluation based on combining all the measures. For example, a plant manager might be responsible for both unit cost and quality of the product but might be unable to predict or signicantly inuence production volume. cf., manag- ers responsibility for unit costs, quality, and customer service, all of which are inuenced by uncontrollable volume uctuations, in the disk-drive manufacturer studied by Davila and Wouters 2005. In this case, an unexpected upward spike in volume adds positive error to the cost measure unit costs go down, but not because of the managers efforts or talents and adds negative error to the quality measure unexpected volume stresses the production system and increases defects, but not because of the managers lack of effort or talent. The reverse happens when production volume spikes downward. In this case, if only cost or only quality was included as a basis for the managers performance evaluation, the volume shocks could add considerably to errors in evaluation. But if the manager is evaluated on a weighted sum of the cost and quality measures, the positive and negative errors tend to offset, and the error in the overall evaluation is relatively small. In consequence, substantial errors in individual measures do not result in equally substantial errors in the overall evaluation on which compensation is based. Both measures can therefore be weighted relatively heavilythat is, signicant monetary incentives can be offered for performance on both dimensionswithout imposing excessively costly risk on the manager example from Krishnan et al. 2005, based on Feltham and Xie 1994. Leverage the Effects of Reducing Error in One Measure to Allow Higher Weights on Other Measures Reducing the random error in an important NF measure to allow it to be weighted heavily is often costlyfor example, customer satisfaction measures can be improved through more sophis- ticated survey design and the collection of larger samples. Cost-benet analyses of such actions should not neglect the fact that improving one measure can improve overall performance evalua- tion and motivation by allowing heavier weights on other measures as well. Further development of the example from the previous recommendation illustrates this potential benet. Suppose that a measure of product quality contributes signicant random error to the overall performance evaluation of the plant manager in the example, and the error is not fully offset by negatively correlated error in other measures. In this case, the managers compensation cannot depend too heavily on quality because the measure is too unreliable. In consequence, compensa- tion also cannot depend too heavily on cost or other measuresthat is, the managers pay cannot Nonnancial Information and Accounting: A Reconsideration of Benets and Challenges 317 Accounting Horizons September 2009 American Accounting Association be very performance-basedbecause a high weight on cost and a low weight on quality will skew the managers efforts suboptimally toward cost. In such a case, lowering the measurement error in quality will allow not only quality but also cost to be weighted more heavily, and pay can be more strongly performance-based without skewing employees allocation of attention and effort Feltham and Xie 1994. Similarly, improvements in accounting that reduce the measurement error in cost e.g., a well-designed activity-based costing system allow compensation to depend more heavily not only on cost but also on quality, thus increasing employee motivation for both objec- tives. Avoid Common Subjective Weighting Errors Because weights on performance measures are often determined subjectively in incentive- compensation systems, avoiding common subjective decision errors can increase gains from using NFI. For example, Krishnan et al. 2005 provide experimental evidence that nonexpert compen- sation system designers tend to incorporate negative error correlation effects into their weighting choices Recommendation 2, but are less likely to realize that decreasing the independent random error in one measure means that the weights not only on that measure but also on other measures should be increased Recommendation 3. The basic principle that compensation for risk-averse employees should not depend heavily on high-error performance evaluations is often intuitively clear. But in some instances it is not, resulting in signicant obstacles to successful implementation of portfolios of F and NF measures. One recurring problem appears to be weighting a NF measure heavily based on its strategic importance without discounting for its error. The overweighting can lead to unsatisfactory results large compensation changes unconnected with changes in employee efforts and potentially an overreaction against the measure. Malina and Selto 2001, in a eld study of a balanced scorecard adoption, nd that initially heavy weights on learning and growth and corporate citizenship measures were sharply reduced later because of the unreliability of the measures. Ittner, Larcker, and Meyer 2003 describe another large rm in which signicant initial weights on NF measures were rapidly reduced, perhaps in part because of reliability concernsand arguably representing too extreme a reaction, as the NF measures were given zero or near-zero weights in bonus determination, even though they were signicantly associated with future nancial performance and could be inuenced by managers actions. An opposite problem is that evaluators can be sensitive to random error in performance measures but respond by deliberately increasing the incentive weight in response to higher random error. They believe that a larger amount of risky pay instead of a xed risk premium is a good way to compensate employees for risk. Krishnan et al. 2005 document this belief experimentally, noting that the practitioner literature sometimes expresses similar beliefs e.g., Bloom 1999. Arguments for a larger amount of risky pay as an appropriate way of compensating for risk can sometimes be found in the justications offered for high levels of risky executive compensation e.g., justications mentioned by Bettis et al. 2008. But in general, making employees compen- sation more dependent on a measure when it is more unreliable as an indicator of their efforts and talents seems to be an unpromising basis for incentive compensation. Another frequently observed problem in subjective weighting arises from comparative evalu- ation of multiple managers. Lipe and Salterio 2000, in a much-replicated experiment, nd that when managers of two divisions are being evaluated subjectively, based on balanced scorecards tailored to the strategy of each division, evaluators tend to put more weight on the measures shared by both divisions than on those unique to each division, although the unique measures are meant 318 Luft Accounting Horizons September 2009 American Accounting Association to be equally important to divisional strategy. 12 Banker et al. 2004, Libby et al. 2004, and Dilla and Steinbart 2005 replicate this nding and identify ways of reducing though not usually eliminating the apparent overweighting of common measures. Providing additional training on the balanced scorecard, emphasizing the strategic relevance and reliability of the unique measures, or requiring that the evaluator explicitly justify the evaluation all increase relative weights on the measures unique to individual divisions. 13 Weighting in Predictions The accuracy of predictions based on portfolios of F and NF measures depends in part on the accuracy of the weights placed on individual predictors. These weights coefcients in predictive models can also play an important role in resource allocation. For example, a higher weight on NF measure 1 than on NF measure 2 in a model predicting prots suggests that, if the cost of improving performance on either measure is the same, more resources should be devoted to improving 1 than to improving 2. As noted in the previous section, stable bias is relatively innocuous when estimating and using weights on multiple predictors: weights in predictive models are unaffected by stable bias. Ran- dom error can be more problematic, but the nature and magnitude of the problems depend on the predictive decisions being made and on characteristics of the random error. Consider a balanced scorecard, in which learning and growth is expected to lead to improve- ments in internal business processes, which lead in turn to customer-measure improvements and higher nancial performance. Internal business process measures can be used both to test the effects of learning and growth initiatives e.g., is a higher level of measured employee skills associated with higher product quality? and to predict customer and/or nancial measures. Rec- ommendations for matching NFI characteristics with decisions are made in the context of this example. Match Error Reduction Efforts to Prediction-Model Characteristics and Uses The quality measure plays a dual role in the example given above. It is predicted by learning and growth measures in one model, and it is a predictor of customer and/or nancial measures in another model. In some cases, a good estimate of one of these two predictive models may have higher priority than a good estimate of the other predictive model. There may, for example, be more ex ante uncertainty about the strategy component represented by one of these models than the other, or there may be more important managerial decisions dependent on the weights in one model than in the other. Random error in the quality measure has different effects on determining the weights in these two models, and so reduction in random error may matter more or less depending on the relative importance of the two models. When quality is the dependent variable, predicted by learning and growth, random error in the quality measure is not an obstacle to estimating unbiased weights coefcients on these indicators. But when quality is the independent variable, predicting cus- tomer or nancial measures, random error in the quality measure can be more problematic. Random error in predictors creates misweighting if the error is correlated with the reported 12 Arya et al. 2005 point out that common measures can be more informative because they allow evaluators to remove common measurement error via relative performance evaluation. Thus underweighting unique measures can be ap- propriate because unique measures in effect contain more error. However, it is also possible that when divisions have radically different strategies, the error in their common measures may not be common: factors other than managers actions may affect the same measure differently in the different divisions. 13 Whether the increased weights on unique measures in these studies are better weights is unclear; but it is not unrea- sonable to suppose that modest positive weights on the unique measures are better than the zero weights that appear in some experimental settings. Nonnancial Information and Accounting: A Reconsideration of Benets and Challenges 319 Accounting Horizons September 2009 American Accounting Association predictorfor example, if instances of particularly high reported quality are likely to be overstated and particularly low instances are likely to be understated. This kind of error will result in biased weights on NFI in predictive models, not only with OLS regression but also with subjective judgments based on high-low comparisons: these analyses will tend to underestimate the actual effects of NF performance on nancial performance. Suppose, for example, that managers compare the product-quality levels of business units with highest and lowest prots, and nd that a difference of ve points on a 10-point quality scale is associated with a $20 million prot difference in business units of similar size. It appears that, at least as a rough estimate, a one-point gain in quality is associated with a $4 million difference in prot. But suppose that measurement error in quality means that the real difference between the relevant observations of quality is only four points, perhaps because the extreme observations are the result of outright clerical errors or faults in the quality-measure construction. If this is the case, then a one-point gain in actual quality is associated with a $5 million difference in earnings rather than a $4 million difference. The measurement error in quality has downwardly biased the esti- mate of the effect of actual quality on earnings, possibly leading to mistaken judgments about the value of initiatives to promote quality. 14 Moreover, random error in one of the NF predictors included in a model with multiple predictors e.g., other NFI and past earnings not only biases the estimate of the noisy predictors coefcient; it also can bias the coefcient estimates of other measures in the model, in unknown directions and amounts, unless the other predictors in the model are uncorrelated with the true value of the noisy measure Greene 2000. Because it is quite likely that NF predictors are correlatedconsider innovation, quality, and customer satisfaction, for example, as predictors of nancial performancethis can be a signicant problem in identifying weights for predictive models. Whether the weights in such a setting are too unreliable for use in important decisions, and whether resources should be devoted to random-error reduction, depends in part on the properties of the F and NF measures employed, and second, the intended uses of the predictive model. If the variation in the true value of the measure is large relative to the variance of the measurement errorfor example, if the product quality in different observations used in the estimate is actually radically differentthen coefcient bias will not be large Wooldridge 2006. But if the variation in the true value of the measure is not largefor example, if real differences in product quality across observations are modest and random measurement error is largethen the coefcient bias in regression analyses can be substantial. Mitigate Scale-Compatibility Biases on Subjective Predictions When Needed When predictions are made subjectively instead of based on regression models, a variety of common judgment biases can affect weighting. Jackson 2008 calls attention to scale- compatibility bias in a study of the use of NFI by nonprofessional investors in screening invest- ments. Consistent with prior psychological research, these investors tend to weight information more heavily when it is scaled in the same way as the screening criterion than when it is scaled differently. The scaling differences in Jackson 2008 are relatively slight ratings versus rankings. The wide variety of scales used in NFI counts, ratios, seven-point scales, etc. could exacerbate this problem: it could, for example, lead to underweighting of NF relative to F information in predict- 14 The effect is intuitively clear for a pair of high-low observations, but it also occurs with OLS regression when the measurement errors and the reported values are correlated i.e., high reported observations are likely to contain large positive errors. Random error in a NF predictor does not bias the coefcient, however, if the magnitude of the error is correlated with the true level of the underlying construct rather than with the reported measure Wooldridge 2006. 320 Luft Accounting Horizons September 2009 American Accounting Association ing nancial performance. However, in Jacksons 2008 experiment, the scale-compatibility bias is eliminated when investors compare several rms at once rather than completing the screening evaluation of one rm before examining the next rm: the cognitive processes involved in simul- taneous rather than sequential analysis counteract the judgment bias. Mitigate Self-Serving Biases in Weighting Multiple NF and F Measures Another potential problem with subjective weighting is self-serving biases. Because NFI can be interpreted and weighted in a variety of ways, individuals can easily use NFI to support self-serving judgmentsfor example, judging that their favored management initiative has a stronger effect on prot than a nonfavored initiative. Often at least some part of the bias is not evident to the individual suffering from it; the judgment is sincere and thus not easily altered by incentives for greater truthfulness. For example, Tayler 2008 provides experimental evidence that managers using balanced scorecard information judge customer-value-creation initiatives with no nancial value as more successful when they have chosen the initiatives, even though the evidence available customer and nancial measures does not support this judgment, and the biased judgment generates no nancial reward for them. This self-serving bias is reduced, however, when individuals are re- sponsible for selecting the scorecard measures and the scorecard is explicitly framed as a causal model of performance consistent with Kaplan and Norton 2001a, rather than simply a balanced set of four perspectives on performance. It appears that the causal-model representation draws attention to the failure of the expected positive association between customer measures and nan- cial measures, and responsibility for choosing the measures induces individuals to take more seriously the conclusions the measures suggest. Especially When Complex Predictive Relations are Likely, Supplement Subjective Weighting with Statistical Analysis The relations of NF measures to each other and to nancial performance often take complex forms. For example, Ittner and Larcker 1998 document strongly nonlinear effects of customer satisfaction on future revenues. Nagar and Rajan 2005, Dikolli and Sedatole 2007, and Chen 2007 nd signicant interactions among F and NF measures in predictive models of individual- rm performance: that is, the magnitude, or even the sign, of a NF measures effect on future nancial performance depends on the level of another measure. Nagar and Rajan 2005 nd that a path model including both direct and indirect effects of NFI provides a different and stronger explanation of nancial performance in retail banks than a standard multiple regression. Subjective weighting of predictors tends to be less accurate for nonlinear, interaction, and indirect relations than for linear additive relations Karelaia and Hogarth 2008; Diehl and Sterman 1995. When it seems likely, based on managers knowledge of causal processes in the rm, that a good predictive model will include nonlinearities, interactions, and mediated indirect relations, it may be time to call in the statisticians rather than rely on subjective estimation if accurate weighting in predicting models is important to the organization. CONCLUSION Measurement and weighting of NFI are challenging problems, and the experience summa- rized in recent research does not provide complete solutions to these problems. It does, however, identify important features of potential solutions. First, aiming at the highest possible accuracy in each measure is often not the most cost-effective approach to measurement. When multiple mea- sures F and NF are used together, the portfolio characteristics of the measuresthe way they offset random error and bias in each othercan offer important opportunities for effective use of imperfect measures. Nonnancial Information and Accounting: A Reconsideration of Benets and Challenges 321 Accounting Horizons September 2009 American Accounting Association Second, NF and F measures are not accurate or inaccurate as such: they are accurate with respect to particular decision requirements. The type, magnitude, and effect of measurement errors vary, depending on the decisions for which the measures are used. The fact that a particular NF measure is useful in predicting stock returns does not necessarily make it equally useful in managing the rm or auditing its nancial statements, and vice versa. Finally, weights on NF and F measures, both for performance evaluation and for prediction, depend on the error properties of the whole portfolio of measures as well as on the relevance or importance of the measure to organizational objectives. Optimal weighting is a particularly com- plex task, vulnerable both to statistical estimation problems and subjective judgment biases. Re- search has engaged frequently with these questions in recent years, but more remains to be done. REFERENCES Amir, E., and B. Lev. 1996. Value-relevance of nonnancial information: The wireless communications industry. Journal of Accounting and Economics 22: 330. Andon, P., J. Baxter, and W. F. Chua. 2007. Accounting change as relational drifting: A eld study of experiments with performance measurement. Management Accounting Research 18 2: 273308. Arya, A., J. Glover,B. Mittendorf, and L. Ye. 2005. On the use of customized versus standardized perfor- mance measures. Journal of Management Accounting Research 17: 721. Baiman, S., P. E. Fischer, and M. V. Rajan. 2001. Performance measurement and design in supply chains. Management Science 47 1: 173188. Banker, R. D., G. Potter, and S. Srinivasan. 2000. An empirical investigation of an incentive plan that includes nonnancial measures. The Accounting Review 75 1: 6592. , M. Chang, and M. J. Pizzini. 2004. The balanced scorecard: Judgmental effects of performance measures linked to strategy. The Accounting Review 79 1: 123. Bell, T. B., M. Peecher, and I. Solomon. 2002. The 21st Century Public-Company Audit: Conceptual Ele- ments of KPMGs Global Audit Methodology. Montvale, NJ: KPMG. Bettis, C., J. Bizjak, J. Coles, and S. Kalpathy. 2008. Equity grants with performance-based vesting condi- tions. Working paper, SSRN. Bloom, M. 1999. The art and context of the deal: Abalanced view of executive incentives. Compensation and Benets Review 31 1: 2531. Brazel, J. F., K. L. Jones, and M. Zimbelman. 2007. Using nonnancial measures to assess fraud risk. Working paper, North Carolina State University. Budde, J. 2007. Performance measure congruity and the balanced scorecard. Journal of Accounting Research 45 3: 515539. Campbell, D. 2008. Nonnancial performance measures and promotion-based incentives. Journal of Ac- counting Research 46 2: 297332. Cavalluzzo, K., and C. D. Ittner. 2004. Implementing performance measurement innovations: Evidence from government. Accounting, Organizations and Society 29 34: 243267. Chandra, U., A. Procassini, and G. Waymire. 1999. The use of trade association disclosures by investors and analysts: Evidence from the semiconductor industry. Contemporary Accounting Research 16: 643 670. Chen, C. X. 2007. Relevance of customer satisfaction measures in a setting with multiple customer groups: Evidence from a health insurance company. Working paper, University of Illinois. Chenhall, R. 2005. Integrative strategic performance measurement systems, strategic alignment of manufac- turing, learning and strategic outcomes: An exploratory study. Accounting, Organizations and Society 30 5: 395422. Curtis, E., and S. Turley. 2007. The business risk audit: A longitudinal case study of an audit engagement. Accounting, Organizations and Society 32 45: 439462. Daniel, K., and S. Titman. 2006. Market reactions to tangible and intangible information. The Journal of Finance 61 4: 16051643. 322 Luft Accounting Horizons September 2009 American Accounting Association Datar, S., S. C. Kulp, and R. A. Lambert. 2001. Balancing performance measures. Journal of Accounting Research 39 1: 7592. Davila, A., and M. Wouters. 2005. Managing budget emphasis through the explicit design of conditional budgetary slack. Accounting, Organizations and Society 30 78: 587608. Deloitte. 2007. In the Dark II: What Many Boards and Executives Still Dont Know About the Health of Their Businesses. New York, NY: Deloitte Touche Tomatsu. Demers, E., and B. Lev. 2001. A rude awakening: Internet shakeout in 2000. Review of Accounting Studies 6 23: 331359. Dempsey, S., J. D. Gatti, D. J. Grinnel, and W. Cats-Baril. 1997. The use of strategic performance variables as leading indicators in nancial analysts forecasts. Journal of Financial Statement Analysis 2 4: 6179. Demski, J. S., and G. A. Feltham. 1976. Cost Determination: A Conceptual Approach. Ames, IA: Iowa State University Press. Diehl, E., and J. Sterman. 1995. Effects of feedback complexity on dynamic decision-making. Organizational Behavior and Human Decision Processes 62 2: 198215. Dikolli, S. S. 2001. Agent employee horizons and contracting demand for forward-looking performance measures. Journal of Accounting Research 39 3: 481494. , and K. D. Sedatole. 2007. Improvements in the information content of non-nancial forward-looking performance measures: A taxonomy and empirical application. Journal of Management Accounting Research 19: 71105. Dilla, W. N., and P. J. Steinbart. 2005. Relative weighting of common and unique balanced scorecard measures by knowledgeable decision makers. Behavioral Research in Accounting 17: 4353. Dutta, S., and S. Reichelstein. 2003. Leading indicator variables, performance measurement, and long-term versus short-term contracts. Journal of Accounting Research 41 5: 837866. Dye, R. 2004. Strategy selection and performance measurement choice when prot drivers are uncertain. Management Science 50 12: 16241638. Eccles, R., R. Herz, E. Keegan, and D. M. H. Phillips. 2001. The Value Reporting Revolution. New York, NY: Wiley. Farrell, A. M., K. Kadous, and K. L. Towry. 2008. Contracting on contemporaneous vs. forward-looking measures: An experimental investigation. Contemporary Accounting Research 25 3: 773802. Financial Accounting Standards Board FASB. 2001. Improving Business Reporting: Insights into Enhanc- ing Voluntary Disclosures. Steering Committee Report, Business Reporting Research Project. Nor- walk, CT: FASB. Feltham, G., and J. Xie. 1994. Performance measure congruity and diversity in multi-task principal/agent settings. The Accounting Review 69: 429453. Francis, J., K. Schipper, and L. Vincent. 2003. The relative and incremental explanatory power of earnings and alternative to earnings performance measures for returns. Contemporary Accounting Research 20 1: 121164. Gibbs, M., K. A. Merchant, W. A. Van der Stede, and M. E. Vargus. 2004. Determinants and effects of subjectivity in incentives. The Accounting Review 79 21: 409436. Greene, W. H. 2000. Econometric Analysis. 4th edition. Upper Saddle River, NJ: Prentice Hall. Hansen, S. C., D. T. Otley, and W. A. Van der Stede. 2003. Practice developments in budgeting: An overview and research perspective. Journal of Management Accounting Research 15: 95116. HassabElnaby, H. R., A. A. Said, and B. Wier. 2005. The retention of nonnancial performance measures in compensation contracts. Journal of Management Accounting Research 17: 2343. Hemmer, T. 1996. On the design and choice of modern management accounting measures. Journal of Management Accounting Research 8: 87116. Hoque, Z., and W. James. 2000. Linking the balanced scorecard measures to size and market factors: Impact on organizational performance. Journal of Management Accounting Research 12: 117. Hughes, J. S., L. Zhang, and J. Xie. 2005. Production externalities, congruity of aggregate signals, and optimal task assignments. Contemporary Accounting Research 22 2: 393408. Hughes, K. E.. 2000. The value relevance of nonnancial measures of air pollution in the electric utility industry. The Accounting Review 75 2: 209228. Nonnancial Information and Accounting: A Reconsideration of Benets and Challenges 323 Accounting Horizons September 2009 American Accounting Association Ittner, C. D., and D. F. Larcker. 1998. Are nonnancial measures leading indicators of nancial performance? An analysis of customer satisfaction. Journal of Accounting Research 36 Supplement: 136. , and . 2003. Coming up short on nonnancial performance measurement. Harvard Business Review 81 11: 8895. , , and M. W. Meyer. 2003. Subjectivity and the weighting of performance measures: Evidence from a balanced scorecard. The Accounting Review 78 3: 725758. , , and T. Randall. 2003. Performance implications of strategic performance measurement in nancial services rms. Accounting, Organizations and Society 28 78: 715741. Jackson, K. L. 2008. Debiasing scale compatibility effects when investors use non-nancial measures to screen potential investments. Contemporary Accounting Research 25 3: 803826. Kaplan, R. S., and D. Norton. 2001a. The Strategy-Focused Organization. Boston, MA: Harvard Business School Press. , and . 2001b. Transforming the balanced scorecard from performance measurement to strategic management. Part I. Accounting Horizons 15 1: 87104. , and . 2001c. Transforming the balanced scorecard from performance measurement to strategic management. Part II. Accounting Horizons 15 2: 147161. . 2006. The competitive advantage of management accounting. Journal of Management Accounting Research 18: 127135. , and D. Norton. 2008. Mastering the management system. Harvard Business Review 86 1: 6277. Karelaia, N., and R. M. Hogarth. 2008. Determinants of linear judgment: A meta-analysis of lens-model studies. Psychological Bulletin 134 3: 404426. Kasurinen, T. 2002. Exploring management accounting change: The case of balanced scorecard implemen- tation. Management Accounting Research 13 3: 323343. Knechel, W. R. 2007. The business risk audit: Origins, obstacles and opportunities. Accounting, Organiza- tions and Society 32 45: 383408. Krishnan, R., J. Luft, and M. D. Shields. 2005. Effects of accounting-method choices on subjective performance-measure weighting: Experimental evidence on precision and error covariance. The Ac- counting Review 80 4: 11631192. Lambert, R. A. 1998. Customer satisfaction and future nancial performance: Discussion of are nonnancial measures leading indicators of nancial performance? An analysis of customer satisfaction. Journal of Accounting Research 36 Supplement: 3746. . 2007. Agency theory and management accounting. In Handbook of Management Accounting Re- search, Vol. 1, edited by C. Chapman, A. Hopwood, and M. Shields. Oxford, U.K.: Elsevier. Libby, T., S. E. Salterio, and A. Webb. 2004. The balanced scorecard: The effects of assurance and process accountability on managerial judgment. The Accounting Review 79 4: 10751095. Lillis, A. 2002. Managing multiple dimensions of manufacturing performanceAn exploratory study. Ac- counting, Organizations and Society 27 6: 497529. Lipe, M. G., and S. E. Salterio. 2000. The balanced scorecard: Judgmental effects of common and unique performance measures. The Accounting Review 75 3: 283298. Maines, L., E. Bartov, P. M. Faireld, D. E. Hirst, T. E. Iannaconi, R. Mallett, C. M. Schrand, D. J. Skinner, and L. Vincent. 2002. Recommendations on disclosure of nonnancial performance measures. Ac- counting Horizons 16 4: 353362. Malina, M. A., and F. H. Selto. 2001. Communicating and controlling strategy: An empirical study of the effectiveness of the balanced scorecard. Journal of Management Accounting Research 13: 4790. , and . 2004. Choice and change of measures in performance measurement models. Management Accounting Research 15 4: 441469. Melnyk, S. A., R. J. Calantone, J. Luft, D. M. Stewart, G. A. Zsidisin, J. Hanson, and L. A. Burns. 2005. An empirical investigation of the metrics alignment process. International Journal of Productivity and Performance Management 54 5/6: 312324. , D. L. Stewart, R. J. Calantone, and C. Speier. Forthcoming. Metrics and the Supply Chain: An Exploratory Study. Alexandria, VA: APICS E&R Foundation. Moers, F. 2005. Discretion and bias in performance evaluation: The impact of diversity and subjectivity. Accounting, Organizations and Society 30 1: 6780. 324 Luft Accounting Horizons September 2009 American Accounting Association Murphy, K. J. 2000. Performance standards in incentive contracts. Journal of Accounting and Economics 30 3: 245278. Nagar, V., and M. V. Rajan. 2001. The revenue implications of nancial and operational measures of quality. The Accounting Review 76 4: 495513. , and . 2005. Measuring customer relationships: The case of the retail banking industry. Manage- ment Science 51 6: 904920. Peecher, M., R. Schwartz, and I. Solomon. 2007. Its all about audit quality: Perspectives on strategic- systems auditing. Accounting, Organizations and Society 32: 463485. Prendergast, C. 1999. The provision of incentives within rms. Journal of Economic Literature 31: 763. Rajan, M. V., and S. Reichelstein. 2006. Subjective performance indicators and discretionary bonus pools. Journal of Accounting Research 44 3: 585618. Rajgopal, S., M. Venkatachalam, and S. Kotha. 2002. Managerial actions, stock returns, and earnings: The case of business-to-business Internet rms. Journal of Accounting Research 40 2: 529556. , T. Shevlin, and M. Venkatachalam. 2003. Does the stock market fully appreciate the implications of leading indicators for future earnings? Evidence from order backlog. Review of Accounting Studies 8 4: 461492. , M. Venkatachalam, and S. Kotha. 2003. The value relevance of network advantages: The case of e-commerce rms. Journal of Accounting Research 41 1: 135162. Roychowdhury, S. 2006. Earnings management through real activities manipulation. Journal of Accounting and Economics 42 3: 335370. Said, A. A., H. R. HassabElnaby, and B. Wier. 2003. An empirical investigation of the performance conse- quences of nonnancial measures. Journal of Management Accounting Research 15: 193223. Shevlin, T. 1996. The value-relevance of nonnancial information: A discussion. Journal of Accounting and Economics 22 13: 3142. Smith, R. E., and W. F. Wright. 2004. Determinants of customer loyalty and nancial performance. Journal of Management Accounting Research 16: 183206. Tayler, W. 2008. The balanced scorecard as a strategy-evaluation tool: The effects of responsibility and causal-chain focus. Working paper, Emory University. Trueman, B., M. H. F. Wong, and X.-J. Zhang. 2000. The eyeballs have it: Searching for the value in Internet stocks. Journal of Accounting Research 38 Supplement: 137162. Upton, W. S. 2001. Business and Financial Reporting: Challenges from the New Economy. Norwalk, CT: FASB. Van der Stede, W., C. W. Chow, and T. W. Lin. 2006. Strategy, choice of performance measures, and performance. Behavioral Research in Accounting 18: 185206. Webb, R. A. 2004. Managers commitment to the goals contained in a strategic performance measurement system. Contemporary Accounting Research 21 4: 925958. Wooldridge, J. M. 2006. Introductory Econometrics: A Modern Approach. 3rd edition. Mason, OH: Thompson/Southwestern. Nonnancial Information and Accounting: A Reconsideration of Benets and Challenges 325 Accounting Horizons September 2009 American Accounting Association