Professional Documents
Culture Documents
SUMMARY
The Australian Department of Defences Defence Managers Toolbox, a CD ROM which
incorporates the Commonwealth Managers Toolbox, has tens of thousands of management
reports, reviews, guidelines, directives and laws. Of these, some 2,500 documents refer to
measuring (or grammatical variations of the term) in the context of evaluation. A great
number of these emphasise the critical importance of, or the difficulty in measuring outputs
and outcomes. NOT ONE gives useful practical guidance on the task of measuring. The
presumption seems to be that, having defined what to measure, the actual task of
measurement is trivial and can be left to the technicians.
The field of data specification, collection, measurement and analysis, however, is a highly
specialised one. It is in this area that many evaluations are deficient.
This paper is a practical guide for managers on the subject of the measurement of program
outputs and outcomes. It does not pretend to be a comprehensive checklist. Rather it aims to
raise awareness of some of the key issues and to provide guidance on managing this
important facet of program evaluation.
Keywords: Public sector management; program evaluation; program performance
measurement; evaluation design.
Keith Linard, as Chief Finance Officer, Australian Department of Finance, was responsible
for the Machinery of Government Section and later the Financial Management Improvement
Section during the 1983-88 'reform' of the Australian Federal Public Service. Keith
currently runs the postgraduate system dynamics program at the Australian Defence Force
Academy and co-directs the postgraduate program in project management.
Introduction
Government documents on evaluation tend to have an air of a Lewis Carroll nonsense rhyme
about them. The literally thousands of such documents I have studied in general tend to
generality and platitude. This is especially the case in relation to the critical area of
measuring evaluation outputs and outcomes. It is apparent that most of the writers or editors
have little if any familiarity with either the statistical or mathematical side of measurement,
or with the data specification and measuring side of the issue. 1
In preparing for this paper I scanned thousands of evaluation related documents in The
Australian Department of Defences Defence Managers Toolbox 2, a CD ROM which
incorporates the Commonwealth Managers Toolbox. The CD ROM contains some 2,500
documents which refer to measuring (or grammatical variations of the term) in the context
of evaluation. A great number of these emphasise the critical importance of, or the difficulty
in measuring outputs and outcomes. NOT ONE gives useful practical guidance on the task of
measuring. The presumption seems to be that, having defined what to measure, the actual
task of measurement is trivial and can be left to the technicians.
This paper does not address in any depth what measures of performance might be
appropriate. It assumes that the what of measurement, i.e. the specification of the
performance criteria, has been determined. Rather it focuses on the data aspects of
measurement:
the nature of the data to be acquired
the sources of the data
methods to be used in sampling
methods of collection
timing and frequency of data collection
basis of comparing outcomes (to analyse cause and effect)
data analysis methods to be applied.
1
It is possibly unfair to Lewis Carroll to compare him to the evaluation bureaucrats. Carroll after all
was mathematically and statistically literate.
2
Department of Defence, Defence Managers Toolbox. Directorate of Publishing, Defence Centre
Canberra. June 1996.
Sage advice from the bureaucracy
The how to do it manual, Doing Evaluations - A practical guide3 emphasises that
evaluation is about measuring. It notes:
Evaluations of effectiveness are primarily concerned with:
measuring outcomes;
measuring factors that affect those outcomes . . .
Measurement should focus on the most important attributes (rather than those most
easily measured), especially those which are crucial to achieving higher level
outcomes.
But how should we undertake this task? Our practical guide gives impeccable advice; get
the consultant to figure it out.
Most evaluations, therefore, should also include terms of reference which relate to:
the identification and measurement of unanticipated outcomes; and
recommendations as to how unanticipated outcomes may be maximised if positive,
or minimised if negative.
Quality for our Clients: Improvement for the Future4 talked discursively on the subject of
measurement, but again with little enlightenment on the how.
Some activities can be measured "objectively" ie, quantified and others assessed
"qualitatively" ie, subjectively. The idea that complex service delivery can only be
assessed qualitatively is only justified when quantitative measures are exclusively
used as unit costs measures or productivity measures (inputs/outputs), because these
are not always applicable to some activities (such as teaching or health..
The Management Advisory Board (MAB) was charged under the then Public Service Act
with advising the Commonwealth Government, through the Prime Minister, on significant
issues on the management of the Australian Public Service (APS). The MAB acknowledged
that measurement of outputs was critically important, but was difficult.
Measurement of the performance of individuals and of groups is difficult and
challenging in both the private and public sectors. The measurement of overall
performance in the public sector is, however, conceptually different. There is often,
for example, no market model to apply. In the public sector the term performance is
taken to mean the achievement of planned outcomes or results, and the taking of
actions designed to stimulate such outcomes. The yardsticks of performance are many
and varied: there is generally no identifiable profit measure (or equivalent) which
can be applied. 5
3
Department of Finance, Doing Evaluations - A practical guide. Commonwealth of Australia, Canberra.
1994
4
Department of Finance, Quality for our Clients: Improvement for the Future. Internal Working Group
Report. Canberra. 1995
5
MAB-MIAC Publication Series No 10, Performance Information and the Management Cycle.
February 1993
The MAB also seemed to think that we solve the problem of measurement with clear and
precise terms in a contract specification.6 Unfortunately they give no practical guidelines
on how to specify this difficult and challenging task.
116. The contract specification should define in clear and precise terms the scope of
the work to be undertaken, including the output in amount and quality,
standards to be met, response times, how and when the work will be measured,
the frequency and measurement of the work, the responsibilities of the agency
and contractor, key milestones and the contractor's continuing responsibilities.
In this paper I seek to redress some of this imbalance by focusing specifically on the how,
identifying the key issues involved in output and outcome measurement and suggesting
sources for more detailed understanding.
6
MAB-MIAC Publication Series No. 8, Contracting for the Provision of Services in Commonwealth
Agencies, December 1992
CHARACTERISTICS
PURPOSE OF KEY USERS DATA TYPE OF KEY PAYOFF
EVALUATION OF COLLECTION DATA USED EVALUATOR EXPECTED
TYPES OF EVALUATION IN S
EVALUATION EVALUATION
Help decide Corporate, Ad hoc during Input, Program & Improved design
implementation how best to set program & the program process s line and management
analysis up new program line planning variables managers of initial
managers phase program
implementation
Assess needs for Parliament, Ad hoc during Environment Evaluation Improved policy
ex-ante evaluation policy Minister, the program al, input, unit; policy development;
development; corporate planning output unit; best choice of
assess best management phase variables; program options
options guesses manager;
task force
In every evaluation there will be a trade-off between the theoretical purity of the
methodology and resource and timing constraints. Corporate management agreement should
be sought if such constraints are likely to endanger the evaluation's credibility or value.
The design is , of course, dependent on the nature of the evaluation questions being asked.
Figure 2 groups some common types of analytical tools according to the type of evaluation
question for which they are most relevant.
Based on the evaluation objectives, the evaluation design should have specified specific
hypotheses to test or specific questions to answer. The nature of these questions or
hypotheses will suggest which statistical techniques are relevant.
EVALUATION QUESTION METHOD OF ANALYSIS ANALYTICAL TOOLS
What is the difference between Gap analysis Statistical Inference
what is and what should be? analysis of variance
hypothesis testing
multi-dimensional scaling
What do we measure?
All evaluation involves measurement. Figure 3 illustrates some key measurement definitions
which we will call upon later.
The criterion (that is, the performance to be measured)
is depicted here as the vertical scale.
The performance standard level indicates the desired (or
STANDARD
acceptable level of the criterion towards which the
TARGET
program is aiming. The target (T) level is the actual
MEASURE
} DISCREPANCY performance level being aimed at, consistent with budget
constraints.
BASELINE
} CHANGE
When we come to measurement, two points are critical:
the base-line measure (B) - what was the state of
performance when the program was initiated; and
the current measure (M) - the performance at some
point in time after the program has commenced.
The difference between the two measurements (B-M) is a
CRITERION measure of the change as operationally defined on this
Figure 3: EVALUATION criterion.
MEASURE OR CRITERION7
The performance discrepancy is shown as the difference
between the target and the current measure (T-M).
Sources of data
Availability of data is often a critical constraint in evaluation.
7
Source: A Guide to Measuring Performance and Outcomes. Children Services Program Victoria.
1985.
Rarely is much thought given to the data necessary for comprehensive assessment of impacts
until the program is up and running. In such cases effectiveness evaluations, in particular,
must rely on special post-program collection efforts to establish a base-line datum. This is
usually costly, and it is often difficult to get the desired accuracy.
Future evaluation data needs, the data sources and mechanisms for collecting and storing it
should be addressed during the initial planning for the program. Data sources may be
considered under five broad categories: management information systems, special collection
efforts, existing records and statistics, simulation modelling and expert judgement.
i
. Refer Evaluating Government Programs - A Handbook. Canberra, AGPS, 1987.