You are on page 1of 7

The Planning of Data Editing and

IT Support Requirements
Elmar WEIN
Federal Statistical Office
Gustav-Stresemann-Ring 11
65189 Wiesbaden
Germany
e-mail: Elmar.Wein@statistik-bund.de

Abstract: A process management approach combined with project


management techniques seems to be suitable for the planning of data editing.
IT tools for the planning of data editing should support planning activities like
the collection and judgement of relevant information and overall and detailed
planning. An intensive use of metadata should help to reduce the planning
effort. In addition it opens new prospects particularly for the analysis of edit
specifications and the estimation of the time effort. This may improve the
quality of plans and provide better bases for the management of data editing
processes.

Keywords: project management techniques, process management, survey


preparation, planning of data editing, metadata

1. Introduction
Increasing user demands for statistical results and (continuous) budget cuts represent a
permanent requirement to improve survey activities. The efficiency of survey activities can
be profoundly influenced by planning activities. While a lot of research has been done on
the performance of data editing, research on the planning of data editing activities and
demand on an adequate support of information technology (IT) is still missing in spite of
the progress which has been achieved in the areas of management techniques and IT.

The aims of this contribution are therefore to provide:


- an overview of the planning of data editing and
- reflections on changing subject-matter demand on an adequate IT support.

The contents of the following paragraphs have not been tested in surveys, however they are
influenced by the discussion of a German taskforce which develops guidelines for data
editing.

635
Elmar Wein

2. Overview of the Planning of Data Editing


The planning of data editing is part of the primary process "survey preparation". It is aimed
at the development of a consistent sequence of data editing processes based on the existing
resources to improve data quality, expressed by users in terms of "timeliness", "accuracy"
and "clarity, accessibility. It needs information from other processes like specification the
requirements of statistical results, questionnaire design and methodological knowledge
which is available in data editing guidelines. Information created by the planning of data
editing, i.e. the specification of checks is needed for electronic data processing.
Documentations of the plans and specifications are needed for managing and optimising
of data editing processes.

The planning of data editing consists of various stages as illustrated in figure 1:

Figure 1: The Flow of the Planning of Data Editing

D Final Check
O
C Description of Risks
U
M Costs and Investments
E
N Equipment and Materials Detailed
T Planning
A Work, Personnel, Time and Milestones
T
I Specification of Data Editing Activities
O
N Overall Planning

Collection and Judgement of Relevant Information

The planning of data editing starts with the collection and judgement of relevant
information. Specialists should gain knowledge about the main tasks and relevant
conditions of the data editing to be planned. In addition an adequate documentation should
make inconsistencies in judgements obvious and enable colleagues to bring in their
experience.

On the basis of the evaluated information a consistent data editing strategy is developed
by a top-down approach in the context of overall planning. It divides data editing into
processes, sets preconditions for instance concerning data editing methods, process
duration and costs and is documented in a structural plan. A data editing process contains
the checking, imputation or modification of statistical data as a result of logically

636
The Planning of Data Editing and IT Support Requirements

connected activities. Data editing processes are designed in such a way as to contribute to
the dissemination of statistical results by short runtimes, low consumption of resources and
user-friendly documentation. They possess the absolutely necessary interfaces to other
survey processes, and complex data editing processes can be divided into logically
separated sub-processes. A process owner is defined for every data editing process who
needs information, methodological and subject-matter knowledge and adequate equipment.

The ensuing detailed planning should finally lead to consistent process descriptions and
should be performed in a bottom-up approach accompanied by reviews to promote internal
monitoring. It begins with the specification of data editing activities like checks which form
the basis for the detailed planning of work, time, personnel, equipment, costs and
investments. These procedures are terminated when a balance is achieved between the
"demand for statistical results", "resources", "survey organisation", "available time" and
"data editing effort", provided the different detailed plans are consistent.

After detailed planning description of risks should indicate the conditions which may cause
a failure of the planned conduct of data editing and it should contain possible counter
activities. During the final check as a last step inconsistencies within and between the
different detailed plans should be detected. Important documents of detailed planning are
the specification of data editing activities, work manuals, the time and milestone table, cost
and investment plans and the description of risks.

3. Subject-Matter Demand on an IT Support


3.1 Overview

It is assumed that specialists have access to an office wide database which permits a
multiple use of metadata, i.e.:
· Users' demands for statistical results.
· The survey contents in terms of question and answer texts, their sequence and the
necessary record descriptions.
· Basic information about a survey, i.e. name of the survey, periodicity, participating
statistical offices, sample size.

IT tools for the planning of data editing use this information, continue to process it, store
additional information in a database when it occurs and support relations between different
metadata. Several IT modules are necessary to support the planning of data editing:
· The collection and judgement of relevant information requires the documentation of
requests for the statistical data needed and the display of relevant questions concerning
the complexity of survey contents, measurement techniques, expected errors, pre-
information, knowledge, existing data editing specifications, available resources and
expected positive and negative influences on data editing. Other main functions of an
IT tool are the documentation of judgements and the provision of information for
subsequent planning activities.
· A tool for overall planning should support the development of a structure of data
editing and document it by a structural plan. The tool should offer complete sequences

637
Elmar Wein

of data editing processes for typical survey types, i.e. primary and secondary statistics
and data collection techniques. Specialists should be able to choose the most
appropriate data editing strategy and to adapt it to the specific requirements of their
surveys. The tool should check the consistency of the structure and give access to the
information required for all subsequent planning activities.
· A tool for the specification of data editing activities should support the specification
of checks and other data editing methods and also the description of relevant computer
assisted activities. Specified checks should be analysed and linked to data editing
processes. The specifications are used for software development and subsequent
planning activities.
· The planning of work, personnel assignment, time scheduling, resources, materials,
costs and investments should be supported in the mentioned sequence. Main demand
on an IT support is the use of links between the different plans for the documentation
of changes, the access to and use of existing plans and their analysis. An important
output includes documents of the respective plans and the provision of data for
comparisons with real data for managing data editing processes.

3.2 Specification of Checks

The integration of an IT tool for the specification of checks into an office wide metadata
system and the proposed planning activities cause some modifications of traditional
statistical work. At first the object oriented software development approach should be used
to integrate permitted lower and upper bounds in datafield objects which are able to
perform range checks. Due to this combination the multiple use of datafield objects induces
a multiple use of range checks and limits the specification of checks. Check specifications
should be arranged together with datafield objects in blocks reflecting parts of
questionnaires to facilitate their multiple use. In addition an IT tool should promote a
structure of checks referring to completeness, wholeness, structure plausibility and
interplausibility and enable views and a navigation based on datafields, blocks or types of
checks.

Besides a permanent syntax checking an IT tool should analyse specifications and


document datafields not involved in combination checks. Check specifications with manual
instructions for corrections decisively determine the work effort. For the provision of
information about this effort an IT tool should analyse checks and calculate informative
benchmarks, which should facilitate comparisons between “similar” surveys. As not all
effects are explicable by benchmarks rather homogenous groups of surveys should be
created. They may be classified by periodicity, types of data collection and the attribute
characteristic relation AC which defines the relation between the sum of all attributes y s
and the sum of all characteristics x s . Benchmarks should make differences concerning
checks of various surveys apparent and facilitate comparisons. A top benchmark for the
analysis of checks may be the specification scope SC which is defined as the ratio
between the sum of all checks z s and the sum of all characteristics x s – including needed
characteristics from other surveys and computed characteristics, excluding information
given on open-ended questions:

638
The Planning of Data Editing and IT Support Requirements

ì zs
ï ; xs > 0
SC = í x s
ï0 ; else
î
The sum of all attributes may be used instead of the sum of all characteristics if this
benchmark is not informative enough. The specification scope should be explained by
further benchmarks like the share of manual checks and interplausibility, which expresses
relations between characteristics.

3.3 IT Support for Personnel Assignment and Time Scheduling


Based on check specifications the planning of work for manual activities and the estimation
of process time efforts follow. Time efforts can be estimated for data editing sub-processes
like optical character recognition (OCR), data capture, coding and computer assisted error
detection and correction. The estimation of the process effort requires a lot of information
as demonstrated for data capture:

The estimation of the time effort for data capture tˆD is based on the average time needed
for capturing an attribute i of a characteristic tˆDi . It is determined by the length of an
attribute, their number and the speed of data capture. If more than one attribute is captured,
it will be necessary to weight them with their probabilities of occurence p̂ Di and to add
the weighted values. If the occurence of a characteristic c depends on routings the
respective probability shall be considered. Based on an estimation of the average time
needed for checking the captured data per questionnaire tˆQ and the expected number of
questionnaires for data capture n̂ D the basic time effort can be estimated. It should be
generally expanded by time needed for nonspecific activities, like personal absence. This
will be done by a factor of 15 percent so that the time effort for data capture tˆD can be
finally estimated as follows:
C Ic
tˆD = 1,15 × nˆ D × ((å pˆ c × å pˆ Di × tˆDi ) + tˆQ )
c =1 i =1

The main advantage of this procedure is the efficient use of existing metadata of the survey
contents and assumed sample size. In addition the formula should also be used for the
calculation of real process duration on the basis of real numbers to obtain better
information for the management of data editing processes.

An IT tool for the estimation of process efforts should use the information provided by
check specifications. Estimated time efforts are determined by amounts – like assumed
number of records, characteristics and errors – and the time needed for work. Estimations
are needed for these basic variables to find out the reasons for deviations during the
performance of data editing. Especially in the case of voluminous survey contents it will
be hard to estimate the factors mentioned for every characteristic or check. An IT tool
should therefore facilitate estimations for shortest and longest questionnaire routes by user-

639
Elmar Wein

friendly interfaces and should break overall estimations down to the level of checks and
characteristics. In addition an IT tool should display control variables, like the average
duration for different types of checks or characteristics, which enables a judgement of the
estimated process efforts.

The next step towards the estimation of process durations is the assignment of personnel
to data editing processes. Relevant information in this context is the availability of a person
expressed by a starting and a finishing date and the individual work capacity per week - full
time or part time. On the basis of this additional information and the calculated time effort
an IT tool should calculate process durations. In the case of conflicts with given deadlines
an adaptation of captured data and recalculations should be possible – down to specified
checks, if necessary.

Finally the time scheduling should discover and document critical paths, i.e. sequences of
data editing processes. These steps require the setting of dependencies between data editing
processes. Based on this additional and existing information an IT tool should compute the
earliest and latest starting dates of data editing processes and document the results by lists,
informative Gannt Charts or - if necessary – by network plans.

The data of this planning step form the basis for cost calculations. In addition they are
needed for comparisons with real data resulting from data editing and they contribute to
discovering the reasons for deviations between the plans and the performance.

4. Conclusions
The complete view of data editing caused by the process management approach and the use
of project management techniques may improve data quality and the efficiency of survey
processing. Research on that specific subject area is rare though necessary to ensure a good
quality of processes. Research should be strengthened in the area of analysis of metadata
and survey specific planning techniques.

The efficient use of metadata and the progress made in the area of software development
create new and more powerful opportunities which should be used for the development and
judgement of data editing plans. They also provide better preconditions for the
management of data editing processes and therefore are of great relevance to the
improvement of data quality.

IT tools for the planning of data editing should restrict the planning effort by a multiple use
available metadata and support relations between different plans. In addition they should
promote a harmonisation of survey contents with positive effects on the software
development.

Powerful IT tools for the planning of data editing require an enhanced use of IT equipment
and set higher demands on scarce IT personnel. Modern software consists of components,
which offer various services. Against this background modern project management
applications are no longer "closed shops". It should be investigated how they can be
adapted to meet the needs of the planning of data editing.

640
The Planning of Data Editing and IT Support Requirements

References

[1] Yves Franchet (1998), "Verbesserung der Qualität des ESS", DGINS-Konferenz, Stockholm, 18
[2] Martin Collins, Wendy Sykes (1999), “Extending the Definition of Survey Quality”, Journal of Official
Statistics, Vol. 15, No. 1, pp. 57-66
[3] Statistisches Bundesamt (2000), "Leitfaden zum Projektmanagement", Wiesbaden
[4] Statistisches Bundesamt (1997), "Das Arbeitsgebiet der Bundesstatistik". Wiesbaden, pp. 91-94
[5] Georg A. Winkelhofer (1997), "Methoden für Projektmanagement und Projekte", Berlin, pp. 121-215.
[6] Heinrich Keßler, Georg Winkelhofer (1999), "Projektmanagement", Berlin, pp. 162-180.
[7] Bernd W. Wirtz (1996), "Business Process Reengineering – Erfolgsdeterminanten, Probleme und Aus-
wirkungen eines neuen Reorganisationsansatzes", Zeitschrift für betriebswirtschaftliche Forschung, 48,
pp. 1023 – 1037
[8] Manfred Schulte-Zurhausen (1995), "Organisation", München, 41 pp.
[9] Günter Schmidt (1999), "Methoden des Prozess-Managements", WiSt, 9, pp. 241-245.
[10] Verein Deutscher Ingenieure, Deutsche Gesellschaft für Qualität (1998), "Total Quality Management
Prozesse", VDI/DGQ 5505 (Entwurf), Düsseldorf, pp. 2-17
[11] Giulio Barcaroli (1997), "Survey of Generalized Systems Used for Edit Specifications", "Statistical Data
Editing Volume 2", UN / ECE, Geneva, pp. 68-73
[12] UN / ECE (1995), "Guidelines for the Modelling of Statistical Data and Metadata", Geneva
[13] Bo Sundgren (1999), "Information Systems Architecture for National and International Statistical
Offices – Guidelines and Recommendations", UN / ECE, Geneva
[14] Peter v.d. Lippe (1993). "Deskriptive Statistik". Stuttgart, pp. 20-23.
[15] Zentralverband Elektrotechnik- und Elektronikindustrie e.V. (1989), "ZVEI-Kennzahlensystem",
Frankfurt/Main, pp. 27
[16] Alexander Teubner (2000), "Software Engineering and Information Systems Engineering", WISU 5/00,
pp. 704 – 708
[17] Jörg Biethahn (1998), "Ganzheitliches Informationsmanagement", WiSt 8/1998, pp. 412 – 414
[18] Haralambos Papageorgiou et al (1999), "Quality of statistical metadata", Research in Official Statistics
1/1999, pp. 45 – 57
[19] Jelke Bethlehem, Anco Hundepool (1999), "Analysing and documenting electronic questionnaires",
Research in Official Statistics 2/1999, pp. 7-32
[20] Heinz-J. Bontrup (2000), "Methoden der Personalbedarfsermittlung", WISU 4/00, pp. 500 – 510
[21] Bundesministerium des Innern (1995), "Handbuch für die Personalbedarfsermittlung", Bonn
[22] Elmar Wein (2000), "The Planning of Data Editing", Working Paper, UN / ECE, Work Session on
Statistical Data Editing, Cardiff
[23] Andreas Fink, Stefan Voß (1998), "Software-Wiederverwendung mittels Frameworks", WiSt 10/1998,
pp. 535 – 538

641

You might also like