You are on page 1of 11

Questions on OLAP Services

Q. Do I need to use SQL Server in order to build cubes?


A. No. You can build local cubes from Microsoft Excel, for
example or by using MSQuery.
Q. Must I use DTS in order to build cubes?
A. No. In a data mart solution, however, you will need to use some
sort of ETL utility. DTS provides a set of graphical tools and
programmable objects that let you extract, transform, and
consolidate data from disparate sources into single or multiple
destinations. However, if you already have an ETL utility that you
use, you can continue to use it instead.
Q. Where do cubes liveon a server or on the client?
A. Cubes are deployed on a server. It does not have to be a
dedicated server, but that may be recommended if you will have a
very large cube(s) or many users. However, users can take a subset
of the cube and download it to a laptop so they can do ad-hoc
analysis without having to be connected to the network.
Q. Can I use a cube in Excel? Can I access cubes over the
Web?
A. Yes to both questions. In fact, a user could connect to a cube
over the Web and still be in Excel. There are many solutions to
delivering cubes and OLAP data over the Web. Among these,
Microsoft ActiveX controls, Active Server Pages (ASP)
scripting, and ActiveX Data Objects (ADO) Application
Programming Interfaces (APIs) provide a variety of solutions for
querying OLAP data over the Web. Additionally, Data Analyzer
can access cube data by using the Hypertext Transport Protocol
(HTTP).

Q. What kind of licensing do I need for end-users to access


cubes?
A. Cubes built with Analysis Services are licensed the same way
(and using the same license) as SQL Server. For more information
on licensing option, see the SQL Server 2000 Web site or refer to
the End User licensing Agreement (EULA) which comes with SQL
Server 2000.
Q. How are cubes secured?
A. Analysis Services has a robust security model that allows
setting permissions for groups on dimensions all the way down to
the cell level. In addition, you can limit the administrators that are
permitted to access Analysis Services data through Analysis
Manager and perform administrative functions. You can also
restrict end users who access data on the Analysis server through
client applications as well as specify which end users can access
data and the types of operations they can perform. Administrator
security is controlled using the Microsoft Windows NT 4.0 or
Microsoft Windows 2000 group named OLAP Administrators.
End-user security is controlled by a combination of:

Authentication during connection to the Analysis server.

Database, cube, and mining model roles defined in Analysis


Manager.

Each role defines a set of users and the access they all share. A role
is defined at the Analysis Services database level and then assigned
to cubes that the users in the role are permitted to access. For more
information, see the article Creating Security Roles.
Analysis Services supports Windows integrated security system. If
you want to deliver cubes on the Web, Analysis Services also
supports HTTP or secure HTTP (HTTPS) authentication in
conjunction with Microsoft Internet Information Services (IIS) to

establish connections to an Analysis server. For more information,


see the article How to configure Analysis Services for the Web.
Q. Some competitors say that Microsoft cubes don't scale. Is
that true?
A. Analysis Services provides the following options to improve
scalability:

Customized Aggregation OptionsBy using the Storage


Design Wizard, you can optimize the tradeoff between system
performance and the disk space allocated to store aggregations.
Analysis Services uses a sophisticated algorithm to determine
the optimum set of aggregations from which other aggregations
can be derived. This will allow you to optimize the efficiency of
your queries while maintaining reasonable limits on the size of
your databases.

Usage-Based OptimizationYou can tune the performance


of a cube to provide quick response to the queries most often
executed by using the Usage-Based Optimization Wizard to
design aggregations appropriate to those queries while
maintaining reasonable storage requirements.
Data Compression and Storage OptimizationIn
multidimensional OLAP (MOLAP) and hybrid OLAP
(HOLAP) storage modes, Analysis Services stores all or some
of the cube information in multidimensional structures. In these
structures, storage is not used for empty cells, and a data
compression algorithm is applied to data that is stored. The can
greatly speed up access to your data and reduce the size of your
database.
Distributed CalculationMicrosoft PivotTable Service
incorporates functionality from the server so that calculations
can often be performed on the client instead of the server.
Because this distributes the computational load between the
server and the client, it increases the capacity of the server,

reduces network traffic, and improves performance for the


clients.
These and other scalability options are discussed in detail in the
SQL Server Books Online, which can be installed with the product.
Q. What is the difference between standard and enterprise
editions?
A. The two important differences are: first, the enterprise edition
allows users to access the cube over HTTP protocol; and second,
the enterprise edition allows you to break a large cube into smaller
pieces (called partitions) that may be distributed over multiple
servers. This allows for very large performance gains.
The OLAP Process
Q. What are cubes and how will they benefit me?
A. A cube is a specialized database that is optimized to combine,
process, and summarize large amounts of data in order to provide
answers to questions about that data in the shortest amount of time.
This allows users to analyze, compare, and report on data in order
to spot business trends, opportunities, and problems. A cube uses
pre-aggregated data instead of aggregating the data at the time the
user submits a query.
Q. What are the general steps involved in creating an OLAP
and building a cube?
A. Typically, the process of building a cube can be broken down
into these general steps:
1.

Data is transformed and loaded into a data warehouse or


a series of data martsThe operational data of the business is
copied from the origin data sources into a data warehouse or
data marts. During this process, the data is "cleansed" to remove
erroneous data and formatted to be consistent. This process is

usually accomplished by some ETL utility such as DTS in


Analysis Services. The data warehouse typically consists of one
or more fact tables joined by a number of dimension tables in a
star schema. The fact tables contain contains the numerical data
(that is, measures) and the dimension tables contain categories
by which the measures can be separated for analysis such as
customer information, product information, or time periods. The
data in the dimension tables is sometimes further subdivided
into additional tables that are joined to other dimension tables,
resulting in a snowflake schema.
2.

Hierarchies and levels can be defined for the dimensions


Hierarchies typically display the same data in different
formats such as time data can appear as months or quarters.
Levels typically allow the data to be "rolled up" into increasing
less detailed information such as in a Region dimension where
cities roll-up into states which roll-up into regions which roll-up
into counties and so forth. This allows the user to "drill-up" or
"drill-down" to see the data in the desired detail. Levels and
hierarchies for a star schema are derived from the columns in a
dimension table. In a snowflake schema, they are typically
derived from the data in related tables.
3. The cube is createdOnce the data has been loaded into a
data warehouse or series of data marts, the cube can be built. A
cube is essentially a graphical representation of the data defined
in the data warehouse. As such, it is also defined as a set of
dimensions and measures. In addition, because a cube is used
for data analysis and decision support, the data in the cube can
be further aggregated to provide a more summarized view of the
data than that are available from the data warehouse.
4. The storage mode for the cube is selectedPhysical
storage options affect the performance, storage requirements,
and storage locations of the data used by the cube. The three
options available include MOLAP, ROLAP, and HOLAP. For
further information on these three different storage mode

options, see the question What is the difference between


MOLAP, ROLAP, and HOLAP?.
5. The cube is processedWhen you process a cube, the
aggregations designed for the cube are calculated and the cube
is loaded with the calculated aggregations and data. Processing
a cube involves reading the dimension tables to populate the
levels with members from the actual data, reading the fact table,
calculating specified aggregations, and storing the results in the
cube. After a cube has been processed, users can query it.
6. The cube is now ready to be used by usersUsers can
view the cube data by using the Cube Browser in the Analysis
Manager, by using Microsoft Excel, or by using other specialty
applications such as the Microsoft Data Analyzer. Cube
Browser allows you to quickly browse multidimensional data in
a flattened, two-dimensional grid format. The Data Analyzer
provides a complete overview of your data on one screen so that
you can quickly find hidden problems, opportunities, and
trends.
Of course, depending on the complexities and structure of your
data and the types of analysis your users will be doing, other, more
complex steps may be necessary to complete the process.
Q. Which is better? A star schema or a snowflake schema?
A. It depends on your situation. In most cases, a star schema will
give you better performance and is easier to maintain. Why?
Because a star schema usually has fewer tables containing fewer
links than a snowflake schema. That means that your cube has
fewer tables to navigate in order to populate each dimension.
Fewer tables equates to fewer links, which should result in less
database maintenance.
A star schema consists of a fact table linked to one or more
dimension tables. This linking in a multidimensional database is
similar to linked tables in a relational database. However, the

biggest difference in the two types of databases is the emphasis on


the fact table in a star schema. The fact table contains a row for
each transaction that will be analyzed in your cube. So for a Sales
cube, the fact table will contain a row for each transaction. Thus
the level of detail that you see in your data when you view a cube
is determined by the granularity of the data in the fact table. Also
in a star schema, the levels in a dimension are usually derived from
the columns in the dimension table. For example, a Time
dimension table might have a column of data for each quarter,
week, and day of a year. These columns would translate to the
Quarter, Week, and Day levels in the Time dimension in a cube.
A snowflake schema also has a fact table. However, the
dimensions are spread across two or more related tables. Referring
back to our example with the Time dimension, a snowflake might
contain a table with quarterly data linked to a table of weekly data,
and so forth. In this case, the levels in the dimension would be
derived from the data in the different time-related tables. One
reason why you might want to use a snowflake schema is if the
volume of data makes storing it in one table too unwieldy.
However, for most cases, you'll want to stay with a star schema.
Q. What is the difference between a data mart and a data
warehouse?
A. Data marts are specialized databases designed to handle the
reporting needs of a single department or single line-of-business
application. A data warehouse is typically several data marts
"rolled-up" into one giant database so reporting can be done
enterprise-wide instead of on a departmental level. Warehouses are
often expensive and time-consuming to build while marts can be
built quickly and inexpensively. Typically, an organization will
build a mart for their most important department first and later (if
ever) build the warehouse.
Q. Do I need to have a data mart in order to build cubes?

A. No. However, Microsoft cube software is optimized for


building on data marts or data warehouses and specifically,
multidimensional databases with a star schema configuration.
Without pulling the data from a multidimensional data mart or data
warehouse, it can be more difficult to build cubes. For most
solutions, the best long-term solution will be found by using a data
mart or data warehouse consisting of an OLAP database
configured as a star schema.
Q. A data mart (or star schema) seems redundant and a waste
of hard disk space. Why can't I just use my operational data
without also storing it in a data mart?
A. The fact is that the best reporting solutions require some amount
of redundant data. In addition, OLTP systems have inherent
problems that severely limit their effectiveness as for business
intelligence:

OLTP data can be very inconsistent. For example, customer


name fields may be formatted as last name, first name & middle
initial in one table, as first name, middle initial & last name in
another table, or contained all in one field in another table.
Cleansing the data prior to loading it into a data warehouse can
remove many of these inconsistencies.

OLTP data typically changes frequently. For example, the


number of available units of a particular product can change
very rapidly in the course of an hour. An analysis of the number
of units sold could vary greatly from one analysis to the next.
Refreshing the data in a data warehouse can be scheduled so
that the data used for analysis is relatively constant.
The data might be located in multiple data sources. Data
warehouses provide a way to consolidate data from various
sources into a single data source.
Schemas for OLTP databases are usually optimized for
entering groups of records (also known as transactions) and,

therefore, tend to contain large numbers of individual records.


Summarizing large numbers of records can take a long time. In
contrast, data warehouses contain more summary data, which
tends to better performance for reporting.
Servers hosting OLTP databases are usually busy with
transactional processes. Summarizing large groups of records
can rapidly tax a server hosting OLTP databases resulting in
poor reporting performance or poor transactional processing.
Data warehouse are optimized for reporting.

If you want to reduce or eliminate redundancy, then your reporting


efforts will have to be based on your source OLTP systems. Not
only will that increase complexity and decrease performance, it
will reduce the performance for the OLTP systems used to run your
business day-to-day. Spending money on additional hard disk
systems and some time to create a data mart (or data warehouse) is
well worth the benefit that a cube provides.
Q. Can Analysis Services accept other types of schemas in a
data mart?
A. Analysis Services will allow variations on the star schema such
as snowflake schemas and parent-child dimensions. Normally,
these variations should only be used to accommodate unique
relationships in your data and not as a first choice for data mart
design.
Q. How long does it take, on average, to build a cube?
A. Building a cube is a relatively simple process that could literally
take only a couple hours, maybe even minutes. Usually, the hardest
part is building the data mart and populating it with data. By using
the tools and wizards in the Analysis Manager, building cubes is
easy and straightforward.
Q. It seems to me that cubes could be a real challenge to
manage on the hard drive. I've heard of "cube explosion"

where they get real big in a hurry. How am I going to manage


them?
A. When you build a cube, Analysis Services has a utility called
the Usage Analysis Wizard that helps you manage hard disk space.
Remember that cubes are pre-aggregated data. Usually, it turns out
that you don't need to build 100% of all possible aggregates when
processing a cube because it is unlikely that your users will need
all aggregates. The Usage Analysis Wizard can help you identify
the aggregates you need and help you remove the ones you don't
which helps you maximize your storage space. Analysis Services
also employs other solutions such as data compression to help
reduce "cube explosion".
Q. What is the difference between MOLAP, ROLAP, and
HOLAP?
A. Remember that a cube is pre-calculated summaries (aggregates)
of your data mart data. MOLAP, ROLAP, and HOLAP are
different methods of storing these aggregates on disk. Here is a
summary of their major features:

MOLAP (Multidimensional OLAP)Stores the aggregates


and the base-level data in your data mart into a number of
proprietary files. While this requires a bit more hard disk space,
it is the fastest type of cube accessibility and it greatly reduces
the strain on your data mart.

ROLAP (Relational OLAP)Stores the aggregates in your


data mart as tables alongside your base-level data. While this
minimizes the hard disk space needed, it is the slowest type of
cube access.
HOLAP (Hybrid OLAP)A compromise between MOLAP
and ROLAP. The aggregates are stored in a MOLAP file, while
the base-level detail is kept in the data mart. This provides
excellent performance while browsing aggregates, but is slow
when a user "drills down" to base-level detail.

Usually, MOLAP is the best choice for most cubes.