You are on page 1of 7

1) ROLAP stands for Relational Online Analytical Processing. ROLAP is an alternative to the MOLAP (Multidimensional OLAP) technology.

While both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, ROLAP differs significantly in that it does not require the precomputation and storage of information. Instead, ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) which summarize the data at any desired combination of dimensions. While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database. Logically, OLAP servers present business users with multidimensional data from Data Warehouses or data marts, without concerns regarding how or where the data are stored. However, the physical architecture and implementation of OLAP servers must consider data storage issues. Implementations of a warehouse server for OLAP processing include the following: Relational OLAP (ROLAP) servers: These are the intermediate servers that stand in between a relational back-end server and client front-end tools. They use a relational or extended-relational DBMS to store and manage warehouse data, and OLAP middleware to support missing pieces. ROLAP servers include optimization for each DBMS back end, implementation of aggregation navigation logic, and additional tools and services. ROLAP technology tends to have greater scalability than MOLAP technology. The DSS server of Micro strategy, for example, adopts the ROLAP approach.

The ROLAP Model:

Figure 7.5: ROLAP Architecture

Advantages of ROLAP

ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions with very high cardinality(i.e. millions of members). With a variety of data loading tools available, and the ability to fine tune the ETL code to the particular data model, load times are generally much shorter than with the automated MOLAP loads. The data is stored in a standard relational database and can be accessed by any SQL reporting tool (the tool does not have to be an OLAP tool). ROLAP tools are better at handling non-aggregatable facts (e.g. textual descriptions). MOLAP tools tend to suffer from slow performance when querying these elements. By decoupling the data storage from the multi-dimensional model, it is possible to successfully model data that would not otherwise fit into a strict dimensional model. The ROLAP approach can leverage database authorization controls such as row-level security, whereby the query results are filtered depending on preset criteria applied, for example, to a given user or group of users (SQL WHERE clause).

Disadvantages of ROLAP

There is a consensus in the industry that ROLAP tools have slower performance than MOLAP tools. However, see the discussion below about ROLAP performance. The loading of aggregate tables must be managed by custom ETL code. The ROLAP tools do not help with this task. This means additional development time and more code to support. When the step of creating aggregate tables is skipped, the query performance then suffers because the larger detailed tables must be queried. This can be partially remedied by adding additional aggregate tables, however it is still not practical to create aggregate tables for all combinations of dimensions/attributes. ROLAP relies on the general purpose database for querying and caching, and therefore several special techniques employed by MOLAPtools are not available (such as special hierarchical indexing). However, modern ROLAP tools take advantage of latest improvements inSQL language such as CUBE and ROLLUP operators, DB2 Cube Views, as well as other SQL OLAP extensions. These SQL improvements can mitigate the benefits of the MOLAP tools.

Since ROLAP tools rely on SQL for all of the computations, they are not suitable when the model is heavy on calculations which don't translate well into SQL. Examples of such models include budgeting, allocations, financial reporting and other scenarios.

2) MOLAP: MOLAP stands for Multidimensional Online Analytical Processing. MOLAP is an alternative to the ROLAP (Relational OLAP) technology. While both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, MOLAP differs significantly in that (in some software) it requires the pre-computation and storage of information in the cube the operation known as processing. Most MOLAP solutions store this data in an optimized multidimensional array storage, rather than in a relational database (i.e. in ROLAP). There are many methodologies and algorithms for efficient data storage, aggregation and implementation specific business logic with a MOLAP Solution. As a result there are many misconceptions to what the term specifically implies. Multidimensional OLAP (MOLAP) servers: These servers support multidimensional views of data through array-based multidimensional storage engines. They map multidimensional views directly to data cube array structures. The advantage of using a data cube is that it allows fast indexing to precomputed summarized data. Notice that with multidimensional data stores, the storage utilization may be low if the data set is sparse. In such cases, sparse matrix compression techniques should be explored .Many MOLAP servers adopt a two-level storage representation to handle dense and sparse data sets: denser sub cubes are identified and stored as array structures, whereas sparse sub cubes employ compression technology for efficient storage utilization.

Figure 7.6: MOLAP Architecture

Advantages of MOLAP

Fast query performance due to optimized storage, multidimensional indexing and caching. Smaller on-disk size of data compared to data stored in relational database due to compression techniques. Automated computation of higher level aggregates of the data. It is very compact for low dimension data sets. Array models provide natural indexing. Effective data extraction achieved through the pre-structuring of aggregated data.

Disadvantages of MOLAP

Within some MOLAP Solutions the processing step (data load) can be quite lengthy, especially on large data volumes. This is usually remedied by doing only incremental processing, i.e., processing only the data which has changed (usually new data) instead of reprocessing the entire data set. MOLAP tools traditionally have difficulty querying models with dimensions with very high cardinality (i.e., millions of members). Some MOLAP products have difficulty updating and querying models with more than ten dimensions. This limit differs depending on the complexity and cardinality of the dimensions in question. It also depends on the number of facts or measures stored. Other MOLAP products can handle hundreds of dimensions. Some MOLAP methodologies introduce data redundancy.

3) Hybrid OLAP (HOLAP) servers: The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater scalability of ROLAP and the faster computation of MOLAP. For example, a HOLAP server may allow large volumes of detail data to be stored in a relational database, while aggregations are kept in a separate MOLAP store. The Microsoft SQL Server 2000 supports a hybrid OLAP server.

4) STAR SCHEMA A star schema classifies the attributes of an event into facts (measured numeric/time data), and descriptive dimension attributes (product id, customer name, sale date) that give the facts a context. A fact record is the nexus between the specific dimension values and the recorded facts. The Facts are grouped together by grain (level of detail) and stored in the fact table. Dimension attributes are organized into affinity groups and stored in a minimal number of dimension tables. A weather star schema that records weather data may have facts of temperature, barometric pressure, wind speed, precipitation, cloud cover, etc and dimensions of location, date/time, reporter, etc. Star schemas are designed to optimize user ease-of-use and retrieval performance by minimizing the number of tables to join to materialize a transaction. A star schema is called such as it resembles a constellation of stars, generally several bright stars (facts) surrounded by dimmer ones (dimensions).

The fact table holds the metric values recorded for a specific event. Because of the desire to hold atomic level data, there generally are a very large number of records (billions). Special care is taken to minimize the number and size of attributes in order to constrain the overall table size and maintain performance. Fact tables generally come in 3 flavors - transaction (facts about a specific event eg Sale), snapshot (facts recorded at a point in time eg Account details at month end), and accumulating snapshot tables (eg month-to-date sales for a product). Dimension tables usually have few records compared to fact tables, but may have a very large number of attributes that describe the fact data.

Often there can be dozens to hundreds of dimension attributes describing the various facets of a fact. Dimension attributes are organized into tables of loosely related attributes that share a known or unknown affinity. Attributes of color, style, size, texture can describe a product and would be included in a product dimension table. Dimension tables include attributes that typically would be normalized into separate tables (Snowflake schema). For example, in the US a location can be identified by a zipcode that exists within a neighborhood, city, state, region. All of these attributes would be included in a location dimension table. On an Entity-Relationship(ER) diagram, fact tables often appear small because of the few distinct columns, while dimension tables appear large because of the large number of columns. The diagram ignores the reality that approx 75% or more of the storage is used by the fact table. Dimension tables are assigned a surrogate primary key of a simple integer that is assigned to the combination of low level attributes that form the natural key. Fact tables should also have a single surrogate primary key to allow for situations where there may be two or more facts having the exact same set of dimension keys. Star schema that have more than a dozen or so dimensions are called centipede schema{Kimball p 393 }. Having dimensions of only a few attributes, while simpler to maintain, result in queries with 20, 30, 40 table joins and defeat the ease-of-use performance goals of star schema Benefits The primary benefit of a star schema is its simplicity for users to write, and databases to process: queries are written with simple inner joins between the facts and a small number of dimensions. Star joins are simpler than possible in snowflake schema. Where conditions need only to filter on the attributes desired, and aggregations are fast. The star schema is a way to implement multidimensional database (MDDB) functionality using a mainstream relational database: given most organizations' commitment to relational databases, a specialized multidimensional DBMS is likely to be both expensive and inconvenient.

5) DATA CUBE Users of decision support systems often see data in the form of data cubes. The cube is used to represent data along some measure of interest. Although called a "cube", it can be 2-dimensional, 3-dimensional, or higher-dimensional. Each dimension represents some attribute in the database and the cells in the data cube represent the measure of interest. For example, they could contain a count for the number of times that attribute combination occurs in the database, or the minimum, maximum, sum or average value of some attribute. Queries are performed on the cube to retrieve decision support information. Example: We have a database that contains transaction information relating company sales of a part to a customer at a store location. The data cube formed from this database is a 3-dimensional representation, with each cell (p,c,s) of the cube representing a combination of values from part,customer and store-location. A sample data cube for this combination is shown in

Figure 1. The contents of each cell is the count of the number of times that specific combination of values occurs together in the database. Cells that appear blank in fact have a value of zero. The cube can then be used to retrieve information within the database about, for example, which store should be given a certain part to sell in order to make the greatest sales.

Figure 1(a): Front View of Sample Data Cube

Figure 1(b): Entire View of Sample Data Cube

Operations on Data Cubes Summarization or Rollup Rollup or summarization of the data cube can be done by traversing upwards through a concept hierarchy. A concept hierarchy maps a set of low level concepts to higher level, more general concepts. It can be used to summarize information in the data cube. As the values are combined, cardinalities shrink and the cube gets smaller. Generalizing can be thought of as computing some of the summary total cells that contain ANYs, and storing those in favour of the original cells. Drill-down Drill-down is similar to Rollup, but is in reverse. A drill-down goes from less detailed data to more detailed data. To drill-down, we can either traverse down a concept hierarchy or add another dimension to the data cube. For example, given the data shown in Figure 8, a drill-down on the Province attribute would result in more detailed information about the location. The value Prairies would be replaced by the more detailed values of Alberta, Saskatchewan and Manitoba. The result is the data cube shown in Figure 7, before summarization. This is a reversal of the summarization process.

You might also like