Professional Documents
Culture Documents
2.
3.
What is an ODS?
Operational Data Store is the database from which a business operates on an ongoing basis.
4.
5.
to function without at least some metadata. Indeed, the use of metadata, which
enable data access through names and logical relationships rather than physical
locations, is fundamental to the very concept of a DBMS.
Metadata are essential to any database, not just a data warehouse. (See answer to
Review Question 2 of this section above.)
3.
Data sources. Data are sourced from operational systems and possibly
from external data sources.
Data loading. Data are loaded into a staging area, where they are
transformed and cleansed. The data are then ready to load into the data
warehouse.
What are the key similarities and differences between a two-tiered and a threetiered architecture?
Both provide the same user visibility through a client system that accesses a
DSS/BI application remotely. The difference is behind the scenes and is invisible
to the user: in a two-tiered architecture, the application and data warehouse reside
on the same machine; in a three-tiered architecture, they are on separate machines.
2.
3.
4.
2.
3.
1.
Indirect benefits arise when end users take advantage of these direct benefits.
2.
List several criteria for selecting a data warehouse vendor and describe why they
are important.
Six criteria listed in the text are: financial strength, ERP linkages, qualified
consultants, market share, industry experience, and established partnerships.
These are important to indicate that a vendor is likely to be in business for the
long term, to have the support capabilities its customers need, and to provide
products that interoperate with other products the potential user has or may
obtain.
One could add others, such as product functionality (Does it do what we need?),
vendor strategic vision (Does their direction make sense for our future plans and
is it consistent with industry trends?) and quality of customer references (What do
their existing customers think of them?). These may be so obvious that the authors
(or the author of the cited reference from which this list is taken) did not feel they
needed to be mentioned, but they are still valid answers to this question.
3.
4.
5.
Slice: A slice is a subset of a multidimensional array (usually a twodimensional representation) corresponding to a single value set for one (or
more) of the dimensions not in the subset.
Dice: The dice operation is a slice on more than two dimensions of a data
cube.
What are ROLAP, MOLAP, and HOLAP? How do they differ from OLAP?
ROLAP stands for Relational Online Analytical Processing. ROLAP is an
alternative to the MOLAP (Multidimensional OLAP) technology. While both
ROLAP and MOLAP analytic tools are designed to allow analysis of data through
the use of a multidimensional data model, ROLAP differs s in that it does not
require the pre-computation and storage of information. Instead, ROLAP tools
access the data in a relational database and generate SQL queries to calculate
information at the appropriate level when an end user requests it. MOLAP is an
alternative to the ROLAP technology. MOLAP differs from ROLAP significantly
in that it requires the pre-computation and storage of information in the cube
the operation known as preprocessing. MOLAP stores this data in an optimized
multidimensional array storage, rather than in a relational database.
and another part of the data in a ROLAP store. The degree of control that the cube
designer has over this partitioning varies from product to product. T
All of these are variations of OLAP.
Section 2.6 Review Questions
1.
What are the major DW implementation tasks that can be performed in parallel?
Reeves (2009) and Solomon (2005) provided some guidelines regarding the
critical questions that must be asked, some risks that should be weighted, and
some processes that can be followed to help ensure a successful data warehouse
implementation. They compiled a list of 11 major tasks that could be performed in
parallel:
1. Establishment of service-level agreements and data-refresh requirements
2. Identification of data sources and their governance policies
3. Data quality planning
4. Data model design
5. ETL tool selection
6. Relational database software and platform selection
7. Data transport
8. Data conversion
9. Reconciliation process
10. Purge and archive planning
11. End-user support
2.
The project must fit with corporate strategy and business objectives.
3.
Only data that are relevant to decision analysis, have been cleansed, and
are from known/trusted source (both internal as well as external to the
organization) should be loaded.
Proven tools and methodologies that fit nicely into the existing
infrastructure should be chosen.
When developing a successful data warehouse, what are the most important risks
and issues to consider and potentially avoid?
Data warehouse projects have many risks. Most of them are also found in other IT
projects, but data warehousing risks are more serious because data warehouses are
expensive, time-and-resource demanding, large-scale projects. Each risk should
be assessed at the inception of the project. When developing a successful data
warehouse, it is important to carefully consider various risks and avoid the
following issues:
Setting expectations that you cannot meet. You do not want to frustrate
executives at the moment of truth. Every data warehousing project has two
phases: Phase 1 is the selling phase, in which you internally market the
project by selling the benefits to those who have access to needed
resources. Phase 2 is the struggle to meet the expectations described in
Phase 1. For a mere $1 to $7 million, hopefully, you can deliver.
10
Believing that your problems are over when the data warehouse is up
and running. DSS/BI projects tend to evolve continually. Annual budgets
must be planned for because data warehousing is a continuous process.
11
Managers are busy and need time to read reports. Alert systems are better
than periodic reporting systems and can make a data warehouse mission
critical. Alert systems monitor the data flowing into the warehouse and
inform all key people who have a need to know as soon as a critical event
occurs.
4.
What is an RDW?
A real-time data warehouse, in which decision making data are updated on an
ongoing basis as business transactions occur; same as an active data warehouse
(ADW).
2.
3.What are the major differences between a traditional data warehouse and an RDW?
1 An attempt by amazon.com in 2000 to study price sensitivity of demand, by varying the price of the same
movie to different customers and tracking the statistical effect on sales, was criticized as unethical and
withdrawn. They subsequently offered refunds to customers who paid more than the lowest price offered to
anyone.
12
In cases of huge nightly batch loads, the ETL setup and processing
might take too long. An EAI with real-time data collection can reduce or
eliminate the nightly batch processes.
What steps can an organization take to ensure the security and confidentiality of
customer data in its data warehouse?
Effective security in a data warehouse should focus on four main areas:
13
3.
What are the recent technologies that may shape the future of data warehousing?
Why?
Following are some of the recently popularized concepts and technologies that
will play a significant role in defining the future of data warehousing.
14
site. In essence, SaaS is the new and improved version of the ASP model.
For data warehouse customers, finding SaaS based software applications
and resources that meet specific needs and requirements can be
challenging. As these software offerings become more agile, the appeal
and the actual use of SaaS as the choice of data warehousing platform will
also increase.
Cloud computing. Cloud computing is perhaps the newest and the most
innovative platform choice to come along in years, where numerous
hardware and software resources are pooled and virtualized, so that they
can be freely allocated to applications and software platforms as resources
are needed. This enables information systems applications to dynamically
scale up as workloads increase.
Although cloud computing and similar virtualization techniques are fairly
established for operational applications today, they are just now starting to
be used as data warehouse platforms of choice. The dynamic allocation of
a cloud is particularly useful when the data volume of the warehouse
varies unpredictably, making capacity planning difficult.
15
In-memory processing (64-bit computing) super computing. Sixtyfour-bit systems typically offer faster CPUs and more power-efficient
hardware than older systems. But, for data warehousing, the most
compelling benefit of 64-bit systems is the large space of addressable
memory, allowing the deployment of an in-memory database for reporting
or analytic applications that need very fast query response. In-memory
databases provide such speed because they dont have disk input/output to
slow them down. The in-memory database is usually a function of a
DBMS, but some BI platforms for reporting and analysis also support inmemory data stores and related processing.
Tools for ETL commonly support in-memory processing in a 64-bit
environment, so that complex joins and transformations are executed in a
large memory space without the need to land data to disk in temporary
tables. This makes an ETL data flow a true pipe, which means the ETL
tool can scale up to large data volumes that are processed in relatively
short time periods.
Advanced analytics. There are different analytic methods users can choose
as they move beyond basic OLAP-based methods and into advanced
analytics. Some users choose advanced analytic methods based on data
mining, predictive analytics, statistics, artificial intelligence, and so on.
Still, the majority of users seem to be choosing SQL-based methods. SQLbased or not, advanced analytic methods seem to be among the most
important promises of the next generation data warehousing.
1.
16
2.
3.
17
6.
18
A question involving the word higher (or any other comparative, for that
matter) requires asking higher than what? In this case, we can take it to mean
higher than we would have for the same data, but without a formal data
integration process.
Without a data integration process to combine data in a planned and structured
manner, data might be combined incorrectly. That could lead to misunderstood
data (a measurement in meters taken as being in feet) and to inconsistent data
(data from one source applying to calendar months, data from another to fourweek or five-week fiscal months). These are aspects of low-quality data which
can be avoided, or at least reduced, by data integration.
7.
8.
The large amount of valuable corporate data in a data warehouse can make
it an attractive target.
3.
2 Privacy added to the answer. The two concerns are different, but are often groupedas they are in the
relevant portion of this text. Privacy is one reason to want security. Without security, one cannot ensure
privacy.
9.
19
1.
2.
20
for award. If it is likely to sell out, the seat isnt offered, even to this select group.
To make this decision, telephone agents (and the Yield Management staff, which
agents can consult) need up-to-the-minute, or at least up-to-the-hour, information.
3.
4.
Identify the major differences between the traditional data warehouse and a realtime data warehouse, as was implemented at Continental.
A traditional data warehouse moves data from operational databases to the DW on
a scheduled basis, typically daily or weekly. This provides consistent data for
analyses performed during one update cycle, but does not make current
information available for decisions that require it. A real-time DW, as was
implemented at CO, moves data into the DW on an hourly or even more frequent
basis.
5.
What strategic advantage can Continental derive from the real-time system as
opposed to a traditional information system?
By having real-time data available through its data warehouse, CO can make
decisions using up-to-date information. While data warehousing applications
which focus on long-term decisions arent affected much by the last hours, days
or even weeks data, lower-level short-term decisions are. As the use of the DW is
extended to these decisions and down in the organization, current data become
necessary. By having real-time (or near-real-time) data in the system, CO obtains
a strategic advantage by making better decisions.