Professional Documents
Culture Documents
Ulrike Baumöl,
Reinhard Jung, and
Robert Winter
University of St. Gallen (Switzerland)
control over the entire holding structure. Moreover, we The data warehousing process
have to concisely define what is meant by a management
information and performance indicators. A management A lot of publications deal with data warehousing from
information can either be a detailed, non-aggregate fig- a rather technical point of view (e.g. Bontempo and
ure, an aggregate figure, or a qualitative statement which Zagelow, 1998). We consider data warehousing as an in-
all serve as a basis for the decision making process. Per- finite process which aims at an efficient and effective
formance indicators constitute a subset of management information supply. Hence, in this paper we take the
information with some specific characteristics: They are business perspective in order to describe the characteris-
always aggregate, and their components can either be ag- tics and limitations of data warehousing.
gregate or non-aggregate. This differentiation is impor-
tant because information is lost during the aggregation Characteristics of data warehousing
process. An information system, therefore, has to support
the disaggregation of such information in order to enable In today’s large companies the operational IT envi-
analyses (e.g. of deviations), but also in order to enable ronment is usually the result of one or more decades of
the communication of performance indicator calculations changing development paradigms and a long line of
from the top management level to lower organizational technological innovations. Therefore, the IT environment
units (Reichmann, 1997). is in most cases very heterogeneous. As far as data or
According to the above mentioned tasks, the basic or- information supply is concerned, we face a variety of
ganization of the reporting system has to be as follows: proprietary data sources, ranging from flat files and hier-
• On top management (holding) level, we have the archical databases to relational or even object oriented
highest degree of generalization and thus also need databases; some companies already have created “legacy”
the highest level of standardization. This means that data warehouses or data marts. On the one hand, these
the performance indicators used on this level must be heterogeneous data sources contain transactional data
clearly and concisely defined for all the management from daily operations, i.e. they are mission critical. On
levels below. A typical performance indicator on this the other hand, however, they are not appropriate to sup-
level is, for example, the profit. In order to be able to port management decision processes because they don’t
calculate corporate profit, all individual profits of provide historical data, they are implemented by unsuit-
business units must be calculated consistently. Other- able means (e.g. sequential files, highly normalized ta-
wise the aggregation of the individual figures would bles, detailed data) and, most important, they lack data
result in a meaningless value derived from heteroge- integration.
neous sources (e.g. gross and net profits). From a management perspective, two main reasons
• On the business unit management level, various per- for the significance of data warehousing can be identi-
formance indicators are needed in addition to the fied:
performance indicators created for the management • Today’s markets require immediate response to new
holding. The performance indicators have to meet less trends as regards management and control. Hence, the
strict requirements with regard to generalization and time needed to provide management with actual, ag-
standardization. Here, we only need standards for the gregate data and information becomes crucial if not
lower management levels due to the same reasons as mission critical. Manual information supply by a hier-
mentioned above. archy of specialized personnel is far too slow to meet
• On the lowest management level, the operational this requirement.
management, we do not need standards as far as a • People responsible for the information supply spend
company-wide management is concerned. The only huge amounts of time to gather detailed data from
standard needed is that for the individual operational various sources for reports and decision support. Es-
unit itself. pecially in large and decentralized companies these
However, it must be observed that all the lower man- processes comprise a lot of identical steps often gath-
agement levels nevertheless have to obey the standardi- ering the same data. As a consequence, data integra-
zation rules of the management holding. Thus an infor- tion into a single database (core data warehouse) leads
mation system is needed which supports the information to a more efficient information supply provided the
flow between the different and often geographically dis- semantics of the data elements are unambiguously de-
tributed levels (business units), provides a meta model of fined.
performance indicator definitions, and enables an inte- Figure 2 illustrates a data warehouse architecture
gration of the performance indicators from different lev- which represents an abstract view on the various archi-
els into aggregate performance indicators. tectures of our research partners’ data warehouses (see
Baumöl, U, Jung, R., Winter, R.: Adapting the Data Warehouse
Concept for the Management of Decentralized Heterogeneous Corporations;
Journal of Data Warehousing 5 (2000), 1, pp.35-43
acknowledgements). Other authors (e.g. Bontempo and gate data on the core data warehouse level imposes re-
Zagelow, 1998, Kimball, Reeves, Ross, and Thorn- strictions on the drill-down depth of business intelligence
thwaite, 1998) present quite similar architectures. The tools. The idea of independent data marts is also critical
basis of every data warehouse architecture is the opera- because independent data marts cannot efficiently serve
tional IT environment and especially its data sources. top management information requirements due to their
The next layer deals with the so-called ETL processes, lack of data integration and they may lead to complex
i.e. extraction, transformation, and loading of detailed integration problems in the long run (Chauduri and
data into the core data warehouse. In contrast to transac- Dayal, 1997).
tional databases, the core data warehouse comprises both
actual and historical data in order to support all kinds of Limitations and potentials of data warehousing
analyses. Since the data within this central component of
the architecture is detailed and not aggregate, it is neces- The idea of an integrated data supply for management
sary to define specific views on the core data warehouse support is not new. Management Information Systems or
for the business units. If the views are materialized, i.e. if Executive Information Systems have been discussed ex-
controlled redundancy is introduced, we call the set of tensively for decades. However, some good concepts
views for one business unit a data mart. These views usu- could not be implemented until powerful database man-
ally provide aggregate data and denormalized or even agement systems and business intelligence tools became
multidimensional data structures. The top layer of the ar- available. Today tools are available for every layer of the
chitecture, called business intelligence, comprises end- architecture and for data transformations between the
user tools for ad hoc queries, online analytical processing layers. Only as far as metadata is concerned, there is still
(OLAP), and data mining. a lot of technological work to be done. Almost every
component of the data warehouse architecture "produces"
metadata as well as it needs other components metadata
(cf. Figure 2). The challenge is to integrate not only data,
but also metadata in order to support all user groups of
the data warehouse (developers, users, administrators
etc.).
From our point of view, the most challenging tasks in
the area of data warehousing are methodological ones.
One of those tasks is the data integration as regards the
conceptual schema, i.e. the design of the core data ware-
house. The data elements as they are used in the opera-
tional IT environment represent the business units' views
on data. As soon as more than one business unit takes
part in a discussion about "their" definition of perform-
ance indicators or ratios (sometimes they even refuse to
support the IT department) such as, for example, contri-
Figure 2: The data warehouse architecture bution margins and revenues, the real challenge becomes
obvious. The problem becomes more difficult if different
As regards the characteristics of the core data ware- divisions or even daughter companies from completely
house, it has to be mentioned that other approaches exist. different industries are involved. Data integration means
Some authors favor to integrate similar data elements from different sources
(1) a core data warehouse which already contains aggre- (and, therefore, definitions of different business units!)
gate data or into a common definition either
(2) so-called independent data marts (Gardner, 1998), (1) by agreeing on a suitable definition and modifying
i.e. data integration takes place in independent, the affected operational systems or
small-scale data warehouses. (2) by preserving individual definitions and integrating
We did not integrate these approaches into our archi- the data elements on a more abstract level.
tecture due to the following reasons. Most of our partner It is obvious that the first alternative will face serious
companies are not able to anticipate all future require- resistance because the operational systems are almost al-
ments as regards the granularity of the data. Therefore, it ways mission critical and cannot easily be modified. The
is advisable to preserve the degrees of freedom for future second alternative means consensus on an abstract level
data mart projects by populating the core data warehouse which is much easier to accomplish.
with detailed data only (Kelly, 1997). Moreover, aggre-
Baumöl, U, Jung, R., Winter, R.: Adapting the Data Warehouse
Concept for the Management of Decentralized Heterogeneous Corporations;
Journal of Data Warehousing 5 (2000), 1, pp.35-43
The idealistic and primary function of a data ware- this kind of standardization. As a consequence, executive
house is to provide top management with appropriate information systems are usually implemented as individ-
information. This goal cannot be achieved in a short- ual software, while for transactional systems, standard-
term project. Instead of building an enterprise wide data ized business packages are utilized whenever available
warehouse in a big-bang approach, most companies build and adaptable to specific needs.
up their warehouse over a longer period of time by im- This may be due to the fact that management proc-
plementing smaller increments, i.e. providing informa- esses are automatically interpreted as core competencies
tion supply to business units. Thus, it is much easier to of any company so that standardization would be more
accomplish project budget approval through, for exam- harmful than helpful. If properly made, however, refer-
ple, short-term cost savings. However, this strategy leads ence models do not restrict systems development to some
to a warehouse which grows bottom-up. In order to be pre-defined standard. Instead, the model integrates mul-
able to meet top management requirements further on in tiple perspectives that provide a framework for individual
the development process it is necessary to ensure a goal- adaptation, thereby preserving some basic integrity con-
driven design, i.e. the solution is a combined top-down- straints and basing systems development on a common
bottom-up approach. terminological and architectural foundation.
For informational processes, the most important com-
Reference models ponent of reference models is a complete data view, i.e. a
data schema that implies all potential aggregations,
The basic idea of a reference model is to provide a thereby allowing consistent schema clustering and re-
generalized schema plus adaptation rules so that special- finement operations to be applied (Winter, 1996). If only
ized schemas can be derived consistently, i.e. without certain, validated aggregation rules have been used to de-
violating integrity constraints. Hence, the adaptation of rive aggregate schemas, schema refinements can also be
reference models can be compared to configuration proc- formally validated. Since adaptations cannot validate
esses, thereby differing significantly from other informa- overall integrity in such an environment, the predomi-
tion modeling paradigms like component based modeling nant problem of disaggregation (as discussed e.g. in
(reuse of partial detailed schemas) or object oriented Ritzmann et al., 1979) is solved. Based on an abstract
modeling (inheritance of abstract schema components). "reference" schema and a set of adaptation rules, spe-
In information systems development, the utilization of cialized schemas can be derived without having to guar-
reference models has been initially discussed for the data antee consistency by special disaggregation procedures
view of integrated systems (e.g. Scheer and Hars, 1992). (e.g. Bitran and Hax, 1977).
Subsequently, reference schemas have been proposed by As a consequence, management middleware should be
software vendors not only for data structures (e.g. SAP based on an abstract schema that is derived from the de-
AG, 1994), but also for business processes (e.g. Curran, tailed data view by selected, integrity-preserving aggre-
Keller, and Ladd, 1997, vanEs and Post, 1996). Recently, gation procedures.
not only software vendors, but also researchers have pro-
posed reference models (e.g. Lindemann and Schmid, Management middleware
1998, for electronic markets).
Various other approaches also claim to be based on The management middleware is an adapted data
reference models. E.g., the Workflow Management Coa- warehouse architecture designed to enable an easy inte-
lition created a workflow reference model (Workflow gration of additional companies into a management
Management Coalition, 1996), or ISO defined the well- holding as regards the supply of management informa-
known OSI reference model (ISO, 1983). But while the tion.
former is quite abstract so that adaptation to actual In the following we describe both the technological
workflow schemas is not an easy task, the latter is more a and the business view on the management middleware
regulation of protocol and service definitions than a tem- concept. Furthermore, we present some means which
plate for actual systems development. If usable for sys- may help to implement the concept as regards the or-
tems developments, reference models should allow for ganizational integration.
the configuration of “executable” schemas.
When grouping available reference models by process Technological view
type, another problem becomes apparent: While reference
schemas for transactional business processes (e.g. order Today's data warehouse architectures are designed as
entry, materials management, financials) are widely read-only systems, i.e. the direction of the data and in-
available, informational processes seem to have evaded formation flow is from operational systems to the core
data warehouse and onwards to management or analyti-
Baumöl, U, Jung, R., Winter, R.: Adapting the Data Warehouse
Concept for the Management of Decentralized Heterogeneous Corporations;
Journal of Data Warehousing 5 (2000), 1, pp.35-43
cal systems (upstream). In order to serve as a manage- level or consolidation on the next higher level, i.e.
ment middleware, the warehouse architecture has to be through generalization. As a consequence, upstream data
adapted. It is especially important that both the manage- flow will be possible. In order to facilitate downstream
ment holding and the daughter companies are able to data flow as well, either disaggregation mechanisms or
push data from their individual data marts into the core agreements have to be established.
data warehouse (downstream). Furthermore, the data As far as time series analyses are concerned, changing
warehouse must not only contain historical data but also aggregation structures which usually come along with
forecasts and plans. In Figure 3 an adapted data ware- changing company structures impose some requirements
house architecture serving as a management middleware on the way management information is generated and
is depicted. The daughter company, for example, uses its presented:
actual data in order to calculate the actual ROI and the (1) Generation: Comparability of, for example, area-
planned ROIs for subsequent periods and pushes the re- related total turnover figures requires the application
sulting values into the warehouse which are later on either of the former or the actual structure on the
analyzed by the management holding. The management complete time series.
holding then communicates ROI targets for all its (2) Presentation: The front-end tools should be able to
daughter companies by writing the values into the ware- indicate that the aggregation structure had been
house. changed in the period of time under consideration
From the technological point of view it is appropriate and which structure was actually applied.
to realize web applications based on an intranet for all As far as the generation of aggregated figures is con-
front-end tools of the management middleware in order cerned, an obvious approach is to deploy related meta-
to enable a seamless software introduction especially in data. Chamoni and Stock suggested to use matrices to as-
newly acquired companies. sign detailed data to aggregated figures (Chamoni and
Stock, 1999). The cells contain periods of time in which
the assignment is valid.
The basic concept of a management middleware is Each merger or acquisition evokes resistance to a
only feasible if data integration within the core data certain degree from the part of the employees. As a con-
warehouse is achieved as regards management data. sequence, the integration process of the management
Therefore, we advocate to utilize a hierarchy of reference systems as well as the information systems is affected by
models of performance indicators which enables a seam- this resistance. This means, for example, that the people
less integration of the management information of a cer- fear the loss of control over certain data and therefore
tain level into the reporting schema of the next higher their influence on the decision process. Thus, two steps
level, because management information on each higher are mandatory:
level need to fulfill a higher degree of generalization. In (1) the communication of the benefits of an integrated
Figure 4 a schematic hierarchy of reference models is de- information architecture by "roadshows" and a thor-
picted. These models, once available, enable data inte- ough communication policy;
gration among heterogeneous companies either through (2) the establishment of an incentive system in order to
performance indicator consolidation within the same achieve both the pushing of data into the system and
the pulling of data from the system.
Baumöl, U, Jung, R., Winter, R.: Adapting the Data Warehouse
Concept for the Management of Decentralized Heterogeneous Corporations;
Journal of Data Warehousing 5 (2000), 1, pp.35-43