You are on page 1of 9

Online analytical processing and Multidimensional data analysis

OLAP is an application architecture not intrinsically a data warehouse or DBMS. Whether it utilizes a data warehouse or DBMS, OLAP is becoming an architecture that an increasing number of enterprises are implementing to support analytical applications. Majority of OLAP applications are using specialized MDDBMS technology. OLAP system allows users to interactively query and aggregate the data contained in a data warehouse.

4 Main Characteristics Multidimensional data analysis Advanced Database Support Easy-to-use end-user interfaces Support Client/Server architecture

Need for OLAP: Solving modern business problems such as market analysis and financial forecasting requires query centric database schemas that are array oriented and multidimensional in nature. These business problems are characterized by the need to retrieve large numbers of records from very large data set. The multidimensional nature of the problem it is designed to address is the key driver for OLAP.

Although all the necessary data can be represented in a relational DB and accessed via SQL, the two dimensional relational model of data and SQL have some serious limitations for such complex real world problems A query may translate into number of complex SQL statements, each of which may require full table scan,multiple joins, aggregations ..etc. The resulting query may require significant computing resources that may not available at all times and even then may take long time to complete. Another drawback f SQL is its weakness in handling time series data and complex mathematical functions.

OLAP Architecture: 3 Main Modules GUI Analytical Processing Logic Data-processing Logic

OLAP Operations:

OLAP consists of 3 basic operations. 1. Consolidation: Aggregation of data that can be accumulated and compute in one or more dimensions 2. Drill Down: Allows users to navigate through the details 3. Slicing and dicing: it is a feature where by users can take out a specific set of data of the cube and view the slices from different view point. Multidimensional Data Model The multidimensional nature of business reflected in the fact that marketing managers are interested in questions such as How much revenue did the product generate by month ,in north eastern division, broken down by user demographic ,by sales office relative to previous version of the product ,compare with the plan. This is a six dimensional question. One way to look at the multi-dimensional data model is to view it as a hypercube. Multi-dimensional model views data as consisting of facts an dimensions. A fact represent focus of analysis(ex: analysis of sales in a store)and typically includes attributes called measures Dimension: various perspectives used to analyst data Members: Instances of dimensions. (ex: Market Boston, Denver..) Measures : numeric values that allow quantitative evaluation of various aspects of organization to be perfomed Each dimension have associated attributes. (Ex: product id, name, price,.. ) Product Market Time Q1 Q2 .. Q1 Q2 Units 1200 1500 250 300 Q1 Q2 time Q3 Q4 M

Came Boston ra Camer Boston a Tuner Tuner . Denver Denver

1200 1500 1800 P 2100

Fig: Relational M:market P:Product

tables

and

multidimensional

cubes

The table on the left contains detailed sales data by product, market and time. The cube on the right associates sales numbers (amount sol) with dimensions-product type, market and time-wit unit variable organized as cells in an array. The response time of the multi-dimensional query still depends on how many cells have to be added n the fly. The caveat here is that as the number of dimensions increases the number of cube cells increases exponentially. The majority of multi-dimensional queries deals with summarized high level data .The solution to build an efficient multidimensional Db is to consolidate all logical sub totals and totals along all dimensions This preaggregation is especially valuable since typical dimensions are hierarchical in nature. For example time dimension may contain hierarchies of years, quarters, months, weeks and days. Another way to reduce the size of the cube is to properly handle sparse data. Not all ells has meaning across all dimensions.(many of the marketing DB may have more than a 95% of cells empty).Another kind of sparse data is created when many cells contain duplicate data. Multi-dimensional Vs. Multi Relational OLAP: The relational implementations of multidimensional DB systems are sometimes referred to as multi relational DB system. Multi-dimensional models use two schemas Star Schema: denormalized Use a unique table for each dimension. Tables are

Snoflake schema: Extension of Star schema that supports multiple fact tables and joins between them.Tables are normalized

Guidelines for OLAP


1. Multidimensional conceptual view - Query will be multidimensional 2. Transparency - System complexity should be hidden from user 3. Accessibility - Access only required data - Retrieve heterogeneous data for analysis

4. Consistent reporting performance 5. Client server architecture 6. Multi user support 7. Generic dimensionality 8. Dynamic sparse matrix handling 9. Unrestricted cross dimensional operations 10. Intuitive data manipulation

Categorization of OLAP Tools


OLAP tools are based on the concept of multi-dimensional DB and allow a sophisticated user to analyze data using elaborate ,multi-dimensional ,complex views.

1.MOLAP
MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, MOLAP differs significantly in that (in some software) it requires the pre-computation and storage of information in the cube the operation known as processing. Most MOLAP solutions store these data in an DataBase Server optimized multidimensional array storage, rather than in a relational database Architecture:

Info request Result RDBMS SQL MOLAP SERVER Meta Data request processing Front End Tools

Result Set

Result Set

Advantages of MOLAP

Fast query performance due to optimized storage, multidimensional indexing and caching. Smaller on-disk size of data compared to data stored in relational database due to compression techniques.

Automated computation of higher level aggregates of the data. It is very compact for low dimension data sets. Array models provide natural indexing. Effective data extraction achieved through the pre-structuring of aggregated dat Disadvantages of MOLAP

Within some MOLAP Solutions the processing step (data load) can be quite lengthy, especially on large data volumes. This is usually remedied by doing only incremental processing, i.e., processing only the data which have changed (usually new data) instead of reprocessing the entire data set. MOLAP tools traditionally have difficulty querying models with dimensions with very high cardinality (i.e., millions of members). Some MOLAP products have difficulty updating and querying models with more than ten dimensions. This limit differs depending on the complexity and cardinality of the dimensions in question. It also depends on the number of facts or measures stored. Other MOLAP products can handle hundreds of dimensions. Some MOLAP methodologies introduce data redundancy.

ROLAP
ROLAP stands for Relational Online Analytical Processing. ROLAP is an alternative to the MOLAP (Multidimensional OLAP) technology. While both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, ROLAP differs significantly in that it does not require the pre-computation and storage of information. Instead, ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the DataBase Server appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) which summarize the data at any desired combination of dimensions. Architecture: Info request SQL RDBMS RE R ROLAP SERVER Meta Data request processin g Result set Front End Tools

Result

Advantages of ROLAP
ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions with very high cardinality (i.e., millions of members).

With a variety of data loading tools available, and the ability to fine tune the ETL code to the particular data model, load times are generally much shorter than with the automated MOLAP loads.

The data are stored in a standard relational database and can be accessed by any SQL reporting tool (the tool does not have to be an OLAP tool).

ROLAP tools are better at handling non-aggregatable facts (e.g., textual descriptions). MOLAP tools tend to suffer from slow performance when querying these elements.

By decoupling the data storage from the multi-dimensional model, it is possible to successfully model data that would not otherwise fit into a strict dimensional model.

The ROLAP approach can leverage database authorization controls such as row-level security, whereby the query results are filtered depending on preset criteria applied, for example, to a given user or group of users (SQL WHERE clause).

Disadvantages of ROLAP
There is a consensus in the industry that ROLAP tools have slower performance than MOLAP tools. However, see the discussion below about ROLAP performance.

The loading of aggregate tables must be managed by custom ETL code. The ROLAP tools do not help with this task. This means additional development time and more code to support.

When the step of creating aggregate tables is skipped, the query performance then suffers because the larger detailed tables must be queried. This can be partially remedied by adding additional aggregate tables, however it is still not practical to create aggregate tables for all combinations of dimensions/attributes.

ROLAP relies on the general purpose database for querying and caching, and therefore several special techniques employed by MOLAP tools are not available (such as special hierarchical indexing). However, modern ROLAP tools take advantage of latest improvements in SQL language such as CUBE and ROLLUP

operators, DB2 Cube Views, as well as other SQL OLAP extensions. These SQL improvements can mitigate the benefits of the MOLAP tools. Since ROLAP tools rely on SQL for all of the computations, they are not suitable when the model is heavy on calculations which don't translate well into SQL. Examples of such models include budgeting, allocations, financial reporting and other scenarios.

Managed query environment (MQE)


This style of OLAP provides users with the ability to perform limited analysis capability, either directly against RDBMS products ,or by leveraging an intermediate MOLAP server. Some products have developed features to provide data cube and DataBase Server slice dice analysis capabilities.

SQL Query

Result set RDBMS Load MOLAP SERVER Req Set Examples fo r OLAP tools include: Cognos softwares power play. Andyne softwares Pablo.. etc Cognos powerplay Cognos powerplay is an open OLAP solution that can interoperate with wide variety of third party Software tools, DBs and applications. The analytical data used by power play is stored in multi-dimensional data set called power cubes. Specifically starting with version 5 Cognos power play client offers Support for enterprize size data set of 20+million records ,100000 categories and 100 measures Powerful 3D chating capabilities Linked displays that give users multiple views of same data in a report Info request End Tools Front

SQL

Request Set

Faster and easier ranking of data A 32 bit architecture for Windows NT,Windows 95,and Windows 3.1 Unlimited undo levels and customizable toolbars Advanced security control by dimension Remote analysis Complete integration with relational database security and data management features

You might also like