You are on page 1of 10

DW2.

0
The Architecture for the Next Generation of
Data Warehousing

W. H. Inmon
Forest Rim Technology

Derek Strauss
Gavroshe

Genia Neushloss
Gavroshe

AMSTERDAM BOSTON HEIDELBERG LONDON


NEW YORK OXFORD PARIS SAN DIEGO
SAN FRANCISCO SINGAPORE SYDNEY TOKYO
Morgan Kaufmann Publishers is an imprint of Elsevier.

MORGAN K A U F M A N N

PUBLISHERS

Contents
Preface

xvii

Acknowledgments

xx

About the Authors

xxi

CHAPTER 1

CHAPTER 2

A brief history of data warehousing and first-generation


data warehouses

Database management systems


Online applications
Personal computers and 4GL technology
The spider web environment
Evolution from the business perspective
The data warehouse environment
What is a data warehouse?
Integrating dataa painful experience
Volumes of data
A different development approach
Evolution to the DW2.0 environment
The business impact of the data warehouse
Various components of the data warehouse environment
ETLextract/transform/load
ODSoperational data store
Data mart
Exploration warehouse
The evolution of data warehousing from the business perspective
Other notions about a data warehouse
The active data warehouse
The federated data warehouse approach
The star schema approach
The data mart data warehouse
Building a "real" data warehouse
Summary

1
2
3
4
5
6
7
7
8
8
9
11
11
12
13
13
13
14
14
15
16
18
20
21
22

An introduction to DW 2.0

23

DW 2.0a new paradigm


DW 2.0from the business perspective
The life cycle of data
Reasons for the different sectors
Metadata
Access of data
Structured data/unstructured data

24
24
27
30
31
33
34

viii

Contents

CHAPTER 3

CHAPTER 4

Textual analytics
Blather
The issue of terminology
Specific text/general text
Metadataa major component
Local metadata
A foundation of technology
Changing business requirements
The flow of data within DW 2.0
Volumes of data
Useful applications
DW 2.0 and referential integrity
Reporting in DW 2.0
Summary

35
38
38
40
40
43
45
47
48
50
51
52
53
53

DW 2.0 componentsabout the different sectors

55

The Interactive Sector


The Integrated Sector
The Near Line Sector
The Archival Sector
Unstructured processing
From the business perspective
Summary

55
62
71
76
86
90
92

Metadata in DW 2.0

95

Reusability of data and analysis


Metadata in DW 2.0
Active repository/passive repository
The active repository
Enterprise metadata
Metadata and the system of record
Taxonomy
Internal taxonomies/external taxonomies
Metadata in the Archival Sector
Maintaining metadata
Using metadataan example
From the end-user perspective
Summary
CHAPTER 5

Fluidity of the DW 2.0 technology infrastructure


The technology infrastructure
Rapid business changes

96
96
99
100
101
102
104
104
105
106
106
109
110

112
114

Contents ix

CHAPTER 6

CHAPTER 7

The treadmill of change


Getting off the treadmill
Reducing the length of time for IT to respond
Semantically temporal, semantically static data
Semantically temporal data
Semantically stable data
Mixing semantically stable and unstable data
Separating semantically stable and unstable data
Mitigating business change
Creating snapshots of data
A historical record
Dividing data
From the end-user perspective
Summary

114
115
115
115
116
117
118
118
119
120
120
121
121
122

Methodology and approach for DW 2.0

123

Spiral methodologya summary of key features


The seven streams approachan overview
Enterprise reference model stream
Enterprise knowledge coordination stream
Information factory development stream
Data profiling and mapping stream
Data correction stream
Infrastructure stream
Total information quality management stream
Summary

124
129
129
129
133
133
133
133
134
137

Statistical processing and DW 2.0

141

Two types of transactions


Using statistical analysis
The integrity of the comparison
Heuristic analysis
Freezing data
Exploration processing
The frequency of analysis
The exploration facility
The sources for exploration processing
Refreshing exploration data
Project-based data
Data marts and the exploration facility
Abackflowof data
Using exploration data internally

141
143
144
145
146
146
147
147
149
149
150
152
152
155

Contents

CHAPTER 8

CHAPTER 9

From the perspective of the business analyst

155

Summary

156

Data models and DW 2.0

157

An intellectual road map


The data model and business
The scope of integration
Making the distinction between granular and summarized data
Levels of the data model
Data models and the Interactive Sector
The corporate data model
A transformation of models
Data models and unstructured data
From the perspective of the business user
Summary

157
157
158
159
159
161
162
163
164
166
167

Monitoring the DW 2.0 environment

169

Monitoring the DW 2.0 environment


The transaction monitor
Monitoring data quality
A data warehouse monitor
The transaction monitorresponse time
Peak-period processing
The ETL data quality monitor
The data warehouse monitor
Dormant data
From the perspective of the business user
Summary

169
169
170
171
171
172
174
176
177
178
179

CHAPTER 10 DW 2.0 and security


Protecting access to data
Encryption
Drawbacks
The
firewall
Moving data offline
Limiting encryption
A direct dump
The data warehouse monitor
Sensing an attack
Security for near line data
From the perspective of the business user
Summary

iei
181
181
182
182
182
184
184
185
185
187
187
188

Contents x i

CHAPTER 11 Time-variant data


All data in DW 2.0relative to time
Time relativity in the Interactive Sector
Data relativity elsewhere in DW 2.0
Transactions in the Integrated Sector
Discrete data
Continuous time span data
A sequence of records
Nonoverlapping records
Beginning and ending a sequence of records
Continuity of data
Time-collapsed data
Time variance in the Archival Sector
From the perspective of the end user
Summary
CHAPTER 12 Theflowof data in DW 2.0
The flow of data throughout the architecture
Entering the Interactive Sector
The role of ETL
Data flow into the Integrated Sector
Data flow into the Near Line Sector
Data flow into the Archival Sector
The falling probability of data access
Exception-based flow of data
From the perspective of the business user
Summary
CHAPTER 13 ETL processing and DW 2.0
Changing states of data
Where ETL
fits
From application data to corporate data
ETL in online mode
ETL in batch mode
Source and target
An ETL mapping
Changing statesan example
More complex transformations
ETL and throughput
ETL and metadata
ETL and an audit trail

191
191
192
192
193
194
194
196
197
197
198
198
199
200
200
203
203
203
205
205
207
209
209
210
213
214
215
215
215
216
216
217
218
219
219
221
222
223
223

ETL and data quality


Creating ETL
Code creation or parametrically driven ETL
ETL and rejects
Changed data capture
ELT
From the perspective of the business user
Summary
CHAPTER 14 DW 2.0 and the granularity manager
The granularity manager
Raising the level of granularity
Filtering data
The functions of the granularity manager
Home-grown versus third-party granularity managers
Parallelizing the granularity manager
Metadata as a by-product
From the perspective of the business user
Summary
CHAPTER 15 DW 2.0 and performance
Good performancea cornerstone for DW 2.0
Online response time
Analytical response time
The flow of data
Queues
Heuristic processing
Analytical productivity and response time
Many facets to performance
Indexing
Removing dormant data
End-user education
Monitoring the environment
Capacity planning
Metadata
Batch parallelization
Parallelization for transaction processing
Workload management
Data marts
Exploration facilities
Separation of transactions into classes
Service level agreements

224
224
225
225
226
226
227
228
231
231
232
232
234
236
237
237
238
238
239
239
240
241
241
242
243
243
244
245
245
246
246
247
249
249
250
250
251
253
253
254

Contents xiii

Protecting the Interactive Sector


Partitioning data
Choosing the proper hardware
Separating farmers and explorers
Physically group data together
Check automatically generated code
From the perspective of the business user
Summary
CHAPTER 16 Migration
Houses and cities
Migration in a perfect world
The perfect world almost never happens
Adding components incrementally
Adding the Archival Sector
Creating enterprise metadata
Building the metadata infrastructure
"Swallowing" source systems
ETL as a shock absorber
Migration to the unstructured environment
From the perspective of the business user
Summary
CHAPTER 17 Cost justification and DW 2.0
Is DW 2.0 worth it?
Macro-level justification
A micro-level cost justification
Company has DW 2.0
Creating new analysis
Executing the steps
So how much does all of this cost?
Consider company
Factoring the cost of DW 2.0
Reality of information
The real economics of DW 2.0
The time value of information
The value of integration
Historical information
First-generation DW and DW 2.0the economics
From the perspective of the business user
Summary

254
255
255
256
257
257
258
259
261
261
262
262
262
264
265
266
266
267
267
269
270
271
271
271
272
273
273
274
276
276
277
278
279
279
280
280
281
282
282

xiv

Contents

CHAPTER 18 Data quality in DW 2.0


The DW 2.0 data quality tool set
Data profiling tools and the reverse-engineered data model
Data model types
Data profiling inconsistencies challenge top-down modeling
Summary
CHAPTER 19 DW 2.0 and unstructured data
DW 2.0 and unstructured data
Reading text
Where to do textual analytical processing
Integrating text
Simple editing
Stop words
Synonym replacement
Synonym concatenation
Homographic resolution
Creating themes
External glossaries/taxonomies
Stemming
Alternate spellings
Text across languages
Direct searches
Indirect searches
Terminology
Semistructured data/VALUE = NAME data
The technology needed to prepare the data
The relational data base
Structured/unstructured linkage
From the perspective of the business user
Summary
CHAPTER 20 DW 2.0 and the system of record
Other systems of record
From the perspective of the business user
Summary
CHAPTER21 Miscellaneous topics
Data marts
The convenience of a data mart
Transforming data mart data

285
287
288
289
294
296
299
299
299
300
301
302
302
303
303
303
304
304
305
305
305
306
306
307
307
308
309
309
310
310
31
319
319
321
323
323
324
325

Monitoring DW 2.0
Moving data from one data mart to another
Bad data
A balancing entry
Resetting a value
Making corrections
The speed of movement of data
Data warehouse utilities
Summary
CHAPTER 22 Processing in the DW 2.0 environment
Summary

326
327
329
330
330
330
331
332
337
339
345

CHAPTER 23 Administering the DW 2.0 environment


The data model
Architectural administration
Defining the moment when an Archival Sector will be needed
Determining whether the Near Line Sector is needed
Metadata administration
Database administration
Stewardship
Systems and technology administration
Management administration of the DW 2.0 environment
Prioritization and prioritization conflicts
Budget
Scheduling and determination of milestones
Allocation of resources
Managing consultants
Summary

347
347
348
348
349
351
352
353
355
358
358
358
359
359
359
361

Index

363

You might also like