Professional Documents
Culture Documents
Pallas
Lidwine van As
Wouter van Aerle
October 2004
Table of content:
1. Introduction ______________________________________________________________ 3
1.1 Work Method and Organisation of Documents ____________________________________ 3
1.2 Explaining the Choices _______________________________________________________ 3
1.3 Reference___________________________________________________________________ 4
1.4 About the Authors ___________________________________________________________ 4
2. Rationale _________________________________________________________________ 5
2.1 Defining the Problem _________________________________________________________ 5
2.2 Solution ____________________________________________________________________ 5
3. A Bird’s-eye View of Pallas __________________________________________________ 6
4. Functional Capability_______________________________________________________ 7
4.1 Point of Departure ___________________________________________________________ 7
4.2 Process Orientation __________________________________________________________ 8
4.3 Quality of the Data ___________________________________________________________ 9
4.4 Performance Capability for Reporting _________________________________________ 10
4.5 Other Performance Capabilities _______________________________________________ 10
5. Technology ______________________________________________________________ 12
5.1 Infrastructure ______________________________________________________________ 12
5.2 Architecture _______________________________________________________________ 13
5.2.1 Overall Architecture _____________________________________________________________ 13
5.2.2 Transaction Repository ___________________________________________________________ 14
5.2.3 Data Marts_____________________________________________________________________ 15
5.3 Data Staging Areas (DSA) ____________________________________________________ 16
5.3.1 Processing _____________________________________________________________________ 17
5.4 Interfaces and Interaction with Other Systems___________________________________ 18
6. Approach________________________________________________________________ 20
6.1 Organisational Measures_____________________________________________________ 20
6.1.1 Current Services ________________________________________________________________ 20
6.1.2 Metadata ______________________________________________________________________ 20
6.1.3 New Services __________________________________________________________________ 21
6.2 Methods, Techniques and Documentation_______________________________________ 21
7. Costs and Benefits ________________________________________________________ 22
7.1 General Situation ___________________________________________________________ 22
7.2 Costs _____________________________________________________________________ 23
7.3 Benefits ___________________________________________________________________ 23
7.4 Future Prospects ___________________________________________________________ 24
In compliance with IEEE 1471, the primary stakeholders of the data warehouse and their
concerns have been described. The concerns are then translated into and recapitulated
according to 4 angles of approach:
The chapter on ‘Rationale’ explains the reason for the system’s existence; after that the
chapter on ‘A Bird’s-eye View of Pallas’ provides an overall description of the system.
Then the chapters on ‘Functional Capability’, ‘Technology’, ‘Approach’ and ‘Costs and
Benefits’ elaborate further from the various angles of approach.
Taking into account that in 2000 the original architectural design for Pallas filled
73 pages, the present document is inevitably briefly worded. Not all subjects can
be treated with the same degree of depth. So a selection has to be made from
among the supply of information.
The most important basic principle that was applied here is that what is unique
was preferred above what is trivial. For that reason, little attention was spent on
the privacy and security aspects of the data warehouse, since no measures have
been taken in that territory that are essentially different from what one may expect
from any system of this type.
[1] "The Data Warehouse Life Cycle Toolkit", Ralph Kimball et al., ISBN 0-471-25547-
5, Wiley, 1998
[2] Kimball Design Tip #59: "The Surprising Value of Data Profiling", September 2004
[3] "Corporate Information Factory, 2nd edition", W.H. Inmon et al, ISBN 0-471-39961-2,
Wiley, 2000
It was ascertained in 1999 that the quality of the management information then provided at
Albert Heijn was felt to be far from satisfactory. The situation was characterised by poor
controllability, incompleteness, contradictory figures, delayed availability and the lack of
a single, integrated method for approaching the sources of information.
That is why the Pallas project was launched in 1999 with the following objective:
“To create a solution extending over the whole of Albert Heijn for the problem relating to
the provision of information within the organisation, to raise this to a higher level and to
lay the foundation for further expansion”.
An additional objective derived from this, was adding value by protecting and supporting
the complete value chain and facilitating the comprehensive use of the available
information. As well as this, the new solution had to call a halt to the veritable tower of
Babel-like confusion that was the consequence of using several types of information
environments in juxtaposition. Establishing one central source of historical data would
make it possible to guarantee a common reference framework for information and
business definitions or “one copy of the truth”.
A final objective was to derive cost advantages by reusing the environment for other
Ahold operating companies.
2.2 Solution
The following starting points and supplementary conditions for developing the
environment were formulated to increase the chances that the new environment would
be successful:
• The data warehouse was to be business-driven: the information that the business
needs was the primary drivers for developing and expanding the data warehouse.
• A pragmatic approach was to be used without losing sight of the theory.
One way in which this was done was to seek a connection with already
existing best practices in the field of data warehousing.
• Think big, act small: The final objective is an enterprise-wide data warehouse,
but it is an enormous chore to get this set up all at once. Instead of doing that, the
first step was to lay the foundation (the architecture), and then build in small
increments.
• Proven technology was to be the preferred choice.
Pallas has a delay of a day when compared to the source systems. The source information
is supplied every night and is loaded and processed in the TR and the data marts so that
they are available the next day for reporting and analysis. Every week approximately 600
million records are loaded into the data warehouse. The feeding systems will gradually
switch over to supplying data in real time, which will make the data warehouse’s
response duration increasingly shorter. In the meantime, reports are available that can
display the turnover as of ten minutes ago.
After the foundation for the architecture was laid in 2000, Pallas was developed and
elaborated incrementally. Although the enlargements were triggered in the first place by,
and were implemented for the benefit of, the solutions to specific business issues, the idea
of integration always lay at the basis of the work: the added information was to be
reintegrated into the information already available. In this way it would be possible to
procure extra insight into the underlying business processes. The kinds of information
now available extend over nearly all elements of the retail value chain, from the
distribution centre (DC) deliveries to check-out counter transactions. The information is
used to support nearly all of the business processes in the company. Thanks to the rapid
availability of the information and the high degree of detail, the data warehouse can
provide both management and operational reports.
Since the start of the project, the Pallas users' group has grown relentlessly. Every week
1800 individual users of all levels, from the board of directors to the shop floor and the DC
employees, use the reports and the analysis environments.
As can be seen on this display sheet, all end-user functional capability has been put into
operation. The choice was made to achieve each type of functional capability (reporting,
analysis, and the like) with standard front-end tools that could offer both out-of-the-box
reporting capability as well as modules for constructing one’s own specific types of
reports (such as score cards). This approach determines a priori what is and what is not
functionally possible. After all, the available functional components and modalities of the
standard tools determine the way in which data can be reported and made available. As
well as this, the classification is a good means of communication when co-ordinating
information needs with users and the way in which these needs will be made operational.
Each type of business (merchandising, fulfilment, replenishment, etc) has its own unique
information requirements. This has been anticipated in the architectural options made at
the TR and data mart levels. Relevant choices in this context include such items as the
application of conformed dimensions and taking “one data mart per business process” as
a starting point. (In some cases, applying this principle meant that exactly the same data
had to be included in several different data marts. Take, for instance, the example of sales
information that is required for the management of several different primary processes
such as store operation, merchandising, and the like.) The use of tooling supports the
creation of several different reporting environments that can all be handled via one portal.
Thanks to this, it was possible to create a dedicated reporting environment for each
supported process or focus for attention as component of the chosen tool. This
environment could be fed its own underlying data.
The following bus matrix [1] gives an impression of the primary processes that Pallas
opens up and supports, and the data mart dimensions that are related to this.
Format mgmt
Merchandising
Replenishment
Fulfilment store
Fulfilment
logistics
Figure 2: Bus matrix: achieved sourcing and support with regard to primary processes.
Formal and navigation characteristics have been standardised across the various reporting
environments. This gives each report and each environment the same look and feel. This
affords a uniform and thus “peaceful” presentation for the user, and also creates a sense
of recognition regarding what information does and does not originate with the data
warehouse.
A primary driver for the added value and effectiveness of the data warehouse is the
quality of the data stored in it1. Designing and developing high-quality data was a
fundamental principle from the very beginning. The following measures were applied to
accomplish this:
1. Drafting KDDs2 with respect to the source systems and source data: since a data
warehouse is fed from source systems, the quality of its content is determined to a
large degree by the quality of the source systems. The KDDs then encompass the
following:
• Original source: data is only transferred from the source that created the data, and
not from other systems that, for efficiency reasons, also have access to this data but
not to its source.
• The highest possible level of detail: information is opened up and stored in the
highest possible level of detail that is available. Thanks to this, the potential of the
data warehouse is not unnecessarily restricted by the data that has been incorporated,
while in the future, perhaps, more detailed data could be required.
• Complete source: when a source table is opened up, the whole source table is
sourced and not only the data that is required at that moment.
• A control is made of the ETL process to prevent the issue from arising. Each
prevention is logged. In this way, for instance, a report can be generated on the
number of times that a specific issue has been resolved.
3. Because the TR is the only placeholder for all data, duplications are immediately
apparent. When the same type of information arises in various sources, it is collected
in the TR into one and the same table if this is applicable. On this score, the TR
proves its added value as data integrator. This means that the condition of the stock
of articles in distribution centres and branches can be stored separately in the
operational systems; modalities are created in the TR to combine data so that
comprehensive reports can be drafted on the total stock position of the article.
1
Efficiency aspects are established in advance by the shape of the output used. It could be stated
that the same decision or steering measure could be made with either the data sheet or a
dashboard/ exception report, provided that the underlying data is the same and is of the same
quality. However, the latter is not determined by the tools, but by the back-end of the data
warehouse.
2
Key Design Decisions: fundamental decisions with respect to the functional capability and the
architecture. See the chapter on ‘Approach’
3
Reject, Ignore, Correct, Suspend
Pallas - The Albert Heijn Data Warehouse Page: 9
4. In each data mart, reference data has the same modelling and the same content. This
principle of conformed dimensions [1] imposes a uniform pattern on all information.
Moreover, conformed dimensions offer the prospect of integrating all data relating to
the same subject (article, branch, customer, and the like). One challenge on this score
is the way of tackling informal stratification’s of things like articles or branches.
There appears to be a variety of alternative groupings that are not supported by a
source system. A strict policy is pursued with respect to this: the data is included in
the data warehouse only when a formal definition and a formal source are available
for this type of stratification.
5. Once the software has been constructed to open up new information, a verification is
performed on the data processing. This check examines whether everything that has
been supplied has successfully arrived in the data warehouse and the data mart.
In conformity with the display sheet, the output in the form of reports on the data
warehouse (‘Reporting’) makes up the bulk of the functional capability that is supplied
from the Pallas architecture. The potential for finding content for this type of functional
capability varies from Excel-style spreadsheets with large amounts of detailed
information up to and including pushed exception reports, balanced scorecards and
management dashboards with incorporated information. All these options are supported
by one and the same standard tool. To keep the number and variety of reports
manageable, a taxonomy of the reports has been drafted; newly developed reports must
be classifiable within this taxonomy, otherwise they will not be developed. However, it is
still difficult to foresee specific functional requirements with these standard forms. This is
due to procedure-based reasons more than to technological reasons: to a certain degree
standard report forms require standardised and structured procedures but the procedures
are far from always being standardised. Nevertheless, one positive effect is that the use of
the BI environment for the business has provided an occasion to address this subject.
The second level in the display sheet is Analysis. Within Pallas, this is understood as
meaning posing a series of ad-hoc question to the database where each following question
is determined by the answers to the preceding questions. This requires a first-rate
performance from the environment; the functional capability in question is made
operational within Pallas with the aid of (M)OLAP capability (multi-dimensional cubes).
Target groups for this type of functional capability are the knowledge employees in the
Planning & Control, Market Research and similar support departments.
As well as this, a limited amount of statistical analysis also takes place. A proof-of-concept
is also performed with data-mining technology.
• The delivery of information: Pallas is not only the platform for management
information within Albert Heijn, the chosen data warehouse architecture has also
made it the central storage place for historical information. This characteristic,
combined with the fact that the data is of a high quality, makes Pallas the ideal
supplier of historical information. This is effectuated as interfaces leading back to
the operational systems. We can thus speak of closing the loop: Pallas supplies the
required information to operational applications that need historical information for
the implementation of their task. This means that such tasks as predicting volumes
for automatic deliveries to stores, predicting campaigns and designing shelf plans for
stores can be supported.
• Operational reporting: because the selected standard tool for reporting can also be
used on operational databases, it was decided to use this front-end tool to standardise
all reporting (not only management information, but also operational reports).
This makes it possible to dismantle the many varieties of other reporting tools that are
often included in standard packages. In practice this means creating a supplementary
reporting environment, this time, however, in the operational system instead of in the data
warehouse. One example of the use of such environments is in the logistics systems. A
high degree of transparency is guaranteed for the user: with the same portal, tools and
layout, he/she can retrieve both management and operational reports.
5.1 Infrastructure
The largest part of the Pallas data warehouse is run under AIX on an IBM p670 with 16
1.45-GHz processors. Beside this, one other performance-intensive data mart runs, also
under AIX, on an IBM S85 with 8 750 MHz processors. All data is stored in an EMC
disk cabinet that can hold more than 11 TB.
Oracle is used as RDBMS. An ETL tool is used to transform and load the data into the
data warehouse; this was preferable to hand encoded procedures due to management-
related concerns, given the expected amount of processing jobs. We chose Informatica
Powercenter. The co-operation and scheduling of Powercenter workflow is established
and managed in control-M (BMC); a part of the scheduling and the workflow
management will gradually be shifted to the Powercenter Workflow Manager.
The end users do not retrieve information from the data warehouse directly, but via end-
user tools. Microstrategy is used for standard reporting. Most users view their reports in a
web browser via Microstrategy Web, or have these delivered as PDF files, Excel
spreadsheets or SMS messages sent by Microstrategy Narrowcaster. A few web servers
have been set up using Microsoft IIS to support Microstrategy Web. These web servers
can balance the load of user requests among one another.
Hyperion Essbase is available for ad-hoc analyses, with as front-ends Hyperion Analyzer
and Temtec Executive Viewer.
The Pallas architecture is constructed around a central data warehouse, the Transaction
Repository (TR). This architecture is actually a hybrid intermediate form between the
Inmon’s [3] Corporate Information Factory on one side and on the other the data
warehouse from Kimball [1], which is based on dimensional data marts. The TR is filled
with source information, this is then routed to the front portal –data staging area) from
where it is processed further. The data marts are built from the TR. Multi-dimensional
cubes for ad-hoc analysis can be generated from the data marts.
The Transaction Repository (TR) functions as the central ‘one copy of the truth’, the
storage place for historical information at the most detailed possible level. The TR is
optimised for bulk loading and queries on detailed information and, in principle, end users
do not retrieve information directly from it. The ultimate horizon for the TR is 5 years,
with a database size of approximately 2.5 TB (raw data).
With a view to controllability, it is not desirable to change the TR model frequently: the
central source for historical information should not need to be completely redesigned,
implemented and migrated every six months. Therefore when the TR model was being
designed, we had in mind the greatest possible independence for the source systems, on
one side, and the business requirements of the moment, on the other. A generic manner of
modelling based on the relational model, was used for reference data; reference data were
stored according to the following pattern:
Thanks to this way of modelling it is possible to add and expand hierarchies and
hierarchy levels without consequences for the modelling.
Business Requirements
While the TR has a permanent character, the data marts are oriented exclusively to
fulfilling a demand for information from the business, without attention for such
requirements as being future-safe, reusable, etc. The idea behind this is that the TR serves
as an unchangeable historical source, while the way in which the end user regards the
information can change, depending on the support that he/she needs to perform his/her
task. If the end user’s way of looking at the data changes, this can mean that a data mart
must be completely remodelled and regenerated. The data marts thus have a much more
transitory character than does the TR. As was indicated in the chapter on ‘Functional
Capability’, when modelling is done to suit the users’ needs, an attempt is made to
construct the data marts in such a way that they can act as a whole when supporting a
given business process.
Another difference between the TR and data marts is that the data marts usually contain
no detailed information, but contain primarily information that has been collected. The
detailed information from the TR serves only as a basis for composing the views that the
user needs to support his/her work. In this way the details of check-out transactions that
are stored in the TR can serve as a basis for the composition of the turnover information
for each branch in each quarter. The user is only interested in the collected information,
thus the data mart need only contain the aggregation. (One exception is the data mart used
for bonus card analysis, where the analysis takes place at the check-out transaction level.)
The time horizon for each data mart is also determined by the desires of the user for
whom it has been developed. The data mart with the broadest historical coverage contains
information going back (collected over) 3 years, but there is also a data mart with a 6-
week history. Beside this the cubes have technical restrictions on the amount of
information that they can store.
In addition to the business drive, the requirements that are imposed by the end-user tool
are also relevant to the modelling of the data marts. Most end-user tools require a
dimensional modelling (star diagrams), thus we follow the Kimball [1] dimensional
modelling method for the data marts. Beside this Microstrategy places several specific
demands on the dimensional model. For instance, the Microstrategy’s performance is
significantly improved when the hierarchies of the dimensions in the model are explicitly
modelled (‘snowflaking’). On this point, there is a divergence from the Kimball
modelling, which prescribes far-reaching de-normalisation. The performance of the
snowflake is improved still further by allowing the keys from the higher levels to be
inherited by lower levels in the hierarchy so that the number of joins needed to reach a
hierarchical level is kept to a minimum. For instance, the entity ARTICLE contains the
keys of all the higher-situated levels. (For that matter, it is also true here that should
Microstrategy be replaced by another end-user tool, this would probably impose such
different requirements on the modelling of the data marts that these would have to be
remodelled and regenerated. The separation between the TR and the data marts and the
independent modelling of the TR means that this can soon be achieved without posing
any problem.) In the data mart, the users always see the facts from ‘today’s perspective’,
whether these are yesterday’s facts or facts from two years ago. This means that the facts
in the data mart are always collated according to the most recently known version of the
dimensions (in Kimball-terms: a type 1 approach of slowly changing dimensions is
followed). Although the TR can be used to generate other ‘perspectives’, there has thus
far been no business demand for this. Thus, the data marts have not yet made any use of
the history dimension available in the TR.
The generation of cubes takes place on the basis of the data marts, and not the TR, because
the construction of the dimensions in the cubes is the same as those in the data marts. Once
they have been generated in the data marts, they are immediately ready in the correct
format for the cubes.
All the dimensions in the data marts are designed as conformed dimensions, so that
the information in the different star diagrams can be compared and related to one
another (‘drilling across’).
Source files are received in the first DSA, and the source data is transformed into the
model of the TR format. This is where quality controls, cleaning and enrichment of the
data take place. The initial DSA consists both of files and of tables. In the second DSA, it
is decided which factual information must be loaded onward (see the section on Deltas),
and the conformed dimensions are constructed on the basis of the last state of the
reference data in the TR and are transferred as a whole to the data marts.
The information from the source systems are usually batch processed at night. That keeps
the processor capacity during the day available for producing reports on the data marts
and making analyses in the cubes; moreover, some information can only be supplied after
the close of the day. The processing is subdivided into several main flows, each of which
loads a particular type of data: check-out transactions, logistic movements, condition of
the stock, etc. The main flows are, in their turn, composed of approximately 1500
Powercenter jobs. Managing the underlying dependencies and starting these flows is done
in Control-M.
Information can be used in several data marts; there is thus an m:n correspondence
between data flows and data marts. In the first place, source information is loaded onward
in one flow from the source via the TR to the data marts, where they were needed.
Sometimes for reasons of performance, the information is loaded from other data marts
instead of from the TR, because the tables needed were already present in the desired
basic form in another data mart. This means that the various flows become increasingly
interwoven, which had an unfavourable effect on the underlying dependencies and the
expandability of the environment.
That is why a new approach was developed in which it is possible to work with semi-
finished products. This means that the information flows are cut in two: first the
information from all flows is processed in the TR and the underlying DSA, where the so-
called semi-finished products are produced. These semi-finished products then serve as
the basis for constructing the separate data marts. Shared basic tables now arrive in the
DSA.
A procedure for undoing the interweaving and partially reorganising the standing
environment according to the semi-finished product principle has already started and will
run until the end of 2004.
Some sources can deliver changes (corrections) to factual information that has already
been loaded. A ‘smart’ delta mechanism is used to process this correction: it sends a
signal when ‘movements’ in the factual information have taken place. It ascertains which
information has changed since the last processing run or which has been newly supplied.
This is used to create a To Do list that indicates which facts must be updated. The ETL
processing uses the To Do list to determine which information must then be passed on for
loading into the data marts. Altered information is first removed from the tables and then
entered anew. The fact tables are thus constructed incrementally instead of always being
completely recreated anew each time.
This not only reduces the total amount of processing in comparison with “dumb” bulk
loading, it also maintains a record of the administration and the logging of information that
has changed.
Re-incorporation
The use of the today’s perception raises an issue in the case of incorporated information.
After all, this is incorporated on the basis of the perception at the time of the
incorporation; factual information that has not been changed is not reloaded, so changes
in the corresponding reference information (adjustment of the ‘perception’) is not
implemented in the incorporations. The picture of the reality that the incorporations
provide becomes gradually less and less accurate. This applies to incorporations that are
incorporated by means of dimensions in which entities can be assigned a different
grouping, for instance in the article dimension, when articles are shifted from one
assortment group to another.
That is why these incorporations must regularly be reconstructed anew. Given the
magnitude of the available history in some data marts, it is not practicable to include re-
incorporation as element of the nightly processing: the re-incorporation would simply
require too much time. When the time arrives for re-incorporation, a copy of the basic
tables is made and the re-integration process is started to run in the background. When the
integration has been completed, the old and new integration tables are swapped.
By definition, a data warehouse that has been created to integrate information from other
systems has links to these other systems.
In integrating information from different systems, Pallas has had an important advantage
from the very start with respect to many of its fellow data warehouses: within Albert
Heijn the source systems were already in an highly standardised and uniform state
because a corporate data model had been in use. That meant that for each type of source
data there was, in principle, only one source system in use; delivery of similar types of
data from source systems that differed in terms of modelling and had dissimilar data
definitions was thus not an issue.
In 2000, it was decided to remodel the AH application landscape. To anticipate better the
flexibility and differentiation demanded by business requirements, it was decided to
switch over to a service-oriented approach. Applications were henceforth set up as
independent services each of which had its own data store and could communicate with
other applications non-synchronically in an event-driven manner by means of messages
sent over a message broker.
Within Albert Heijn, the development and maintenance of Pallas is entrusted to the
Competence Centre Business Intelligence (CC-BI). This is a component unit of AH IT,
Albert Heijn’s IT department. In addition to this, the CC-BI also carries out BI activities
for various other of Ahold’s operating companies on the basis of existing infrastructure
and working methods. The CC-BI is divided into two sub-departments: Operations &
Improvements (O&I) and Projects. The activities performed included all the activities that
occur from DBA activities to end-user support; only the management of the machinery is
situated elsewhere. Depending on the number of current projects, there are between 30
and 50 people working in the department. The CC-BI supplies services to its users. A
service can be seen as a logical whole of information on offer, including the complete
flows leading up to it, from source to report.
The delivery of existing services is provided by O&I. O&I is primarily responsible for the
daily operation and the maintenance of the standing architecture and BI solutions; This
also includes implementing cost-saving improvements to the architecture and
infrastructure. A Service Level Agreement is entered into with each users' group for the
service that it receives; this is to help provide an orientation for its management. This
establishes agreements on such items as the expected time at which the requested reports
will be available each morning and the maximum down time of a service. In addition,
O&I supplies support and training for the end users of the data warehouse and provides
Quick Services: upon request, information that the users cannot access themselves but
which is stored in the data warehouse can be supplied via reports or cubes.
Finally, standards and directives for developing new solutions are drafted and monitored
within O&I and the Pallas Delivery Framework is also managed there. The development
processes and deliverables for new projects are defined within this framework.
6.1.2 Metadata
The Projects sub-department is organised into project teams that develop new services on
the basis of the existing environment and architecture. These project teams do not work in
isolation: O&I employees are involved in projects as reviewers to guarantee the tie-in of
newly developed solutions with the existing environment. Ultimately O&I determines
whether and when a new solution is put into production.
The central framework for each project team is the Pallas Delivery Framework. Because
each project or release usually has the same basic elements – data logistics and front-end
functional capability – a high degree of standardisation in the activities is attainable.
Although a lot of room must still be reserved for specific solutions, everyone operates on
the basis of same reference framework. This is also necessary because everyone is in fact
working on the same environment.
• Fundamental decisions with respect to the functional capability and the architecture of
the data warehouse are recorded as Key Design Decisions (KDDs). Originally created
as a medium for formalising decision-making, they form a good record and reference
for the principles that lie at the foundation of Pallas.
• Besides this, the directives and best practices are formulated and established for daily
development practices. After experimenting with various types of recording, design
patterns have recently been introduced for this purpose. The design patters have been
drafted for processing facts with an unknown reference, for the way of logging data
quality problems and for the processing of messages delivered in real time.
• In any case, experience has shown that the analysis methods used for ‘normal’ system
development are inadequate. For instance, it has been established that Use Case
Engineering (RUP/UML) offers hardly any grip, since each use case would amount to
a ‘Print Report’. Experiences with function point analysis have also shown that the
character of information analysis and functional design for BI applications is
fundamentally different than for transactional systems.
While drafting a general cost/benefit analysis for IT is difficult, the job is particularly
thorny for BI applications. After all, how do you determine the benefits of a new,
improved BI environment? Ultimately it comes down to quantifying the effect of better
decision making and steering, which seems to be an impossible exercise. Added to this is
that investments must often be assessed in comparison to one another. Think of the
comparative assessment that the management of a company must make when deciding to
invest either in logistics or in BI: it is probably easier to make credible for the first that
there are tangible benefits than it is to do so for the second.
As was already indicated in the rationale, several considerations played a role in Albert
Heijn’s decision to invest in BI:
• Necessity: each company has the need of the functional capability to provide its
management and steering information. As the complexity of the organisation
increases, the demand for management and steering information also increases. It was
evident that future developments within Albert Heijn (including the differentiation
strategy) would pose demands that could no longer be met by the then current
solutions: so something new was necessary. Within the framework of these
considerations it was important to set an acceptable investment level.
• Ideology: it was believed – also at the level of the board of directors – that creating
one integrated information environment would provide many advantages for the
company in terms of data quality, integration and combination of information,
availability of detailed data, the ‘one copy of the truth’ principle and the like. Without
being able to quantify this in hard figures, there was a strong conviction that this
solution could lay the foundation for future benefits. This last consideration in
particular contributed to the decision to make the actual in depth investment.
It is understandable that the combination of considerations was decisive in the ultimate
choice to make the investment.
In the sphere of action the rule of thumb was used that more than 70% of the development
costs would be made in the back-end development, so specifically in the development of
the ETL processing (excluding the hardware and licence costs). This rule of thumb also
applied to Pallas; for this reason much steering was placed on this part of the system
development. One of the measures that were taken to control costs was ,in the case of
opening up sources, to read in the entire source table and not only the attributes for which
there were specific requirements. The additional costs of adding this later proved to be
substantially higher than the extra costs of including them in the initial development. In
addition, considerable attention was devoted to data quality analysis, ETL design and
integration testing. Inevitably, there were costs attached to learning from experience and
increments that were too large were initially chosen in some sub-areas. In general,
however, practice showed that the foundation that had been laid was solid and the future
secure.
Just as with any other information system, Pallas is continually expanding and there is a
permanent demand for changes. To prevent uncontrolled proliferation, a steering group
was created. In addition to keeping the total cost of Pallas under control, it also had the
task of keeping watch over the quality and coherence of the content of the BI
environment as a whole.
7.3 Benefits
On the benefit side, the advantages were mainly found in the reusability of elements
of the data warehouse architecture. Relevant elements in this context are specifically:
• knowledge
• standard tools
• approach and methodology.
Several architectural variants were developed as spin-offs from Pallas by combining the
above mentioned components. These could be used to service different types of desires in
the area of management information. By ‘different types’ we mean smaller-scale, with a
smaller scope and lower demands for integration and business processes Within the CC-
BI, these were referred to as ‘the Ferrari and the Volkswagen’: the competence centre has
the architecture, the knowledge, the tools and the approach in house to build a Ferrari, but
there is no reason why these could not also be used to build a Volkswagen.
These alternative variants have thus far been used for other of Ahold’s operating
companies, including Gall&Gall, Ahold Vastgoed, albert.nl and the holding company
itself. For instance, the data warehouse solution for Gall&Gall was created at a relatively
very low cost. The management costs are proportionately low, certainly when they are
compared with a scenario in which Gall&Gall would have kept control over the
development and management. Strictly speaking, these benefits are not for Albert Heijn,
but for Ahold as concern. This certainly also played a role in the considerations when
deciding to make the initial investments.
Another important advantage from which Albert Heijn benefited directly was the lead
time for delivering information on an ad-hoc basis. Many examples could be given in
which the integrated collection of information and the speed with which this could be
accessed have proven their cogency:
• Setting out clearly the hoarding behaviour after major calamities (11 September,
Iraq war)
• Supporting the initial price offensive in the fall of 2003
• Interim evaluation of specific campaigns
In addition to this, it has become evident that preserving the history of both reference and
factual information offers considerable added value. This not only produces better insight
into the historical behaviour of campaigns, the condition of the stock and the turnover, for
instance, but the historical behaviour of reference data, such as the development of
purchasing prices over the years, can also be analysed.
Now that a large part of the information available in source systems is accessible from the
data warehouse, attention on the supply side will shift more to ‘filtering out’ the golden
nuggets from the enormous amount of information that is available: ‘less is more’. Users
will, for instance, be more frequently informed via automatic alerts and proactive
exception reports based on business rules. It is as if intelligence were being added to the
information.
On the demand side, initiatives have been launched to give third parties, such as
suppliers, access to elements of the data warehouse. In house, the integration aspect will
come to play a greater role: while in past years the primary objective in opening up most
subject areas was to support separate business processes, in the future the demand for
comprehensive information will increase over the entire chain. This movement is visible,
for instance, on the replenishment side, where after the demand from separate business
requirements for supporting DC replenishment and supplying stores, the demand now
arises for information on stock movements throughout the whole chain. Thanks to the
chosen approach and architecture, this type of information is in most cases already
present in an integrated form in the enterprise data warehouse. On this point, too, the
choices made with regard to the approach and architecture will continue to yield an
increasing benefit.