You are on page 1of 31

OBIEE Data Modelling Best Practices and New Features Introduction This white paper aims to set out

some basic good, and best practices for data modelling using Oracle Business Intelligence Enterprise Edition. The basic good and best practices apply to all versions of Oracle Business Intelligence, whilst the latter half of this paper sets out some new features available with the 11g release of Oracle Business Intelligence. The OBIEE Semantic Model Oracle Business Intelligence, like a number of enterprise business intelligence tool, has a metadata layer that aims to hide the complexity of underlying data sources and present information using business terminology. This metadata layer is called the Semantic Model and is stored in a repository, held in a single file usually referred to as an RPD file, after the file extension that is uses. This metadata layer works alongside a separate repository used for holding reports, dashboards and other presentation objects, called the Web Catalog. Conceptually, these two repositories together with the source data can be thought of as a three-layer data architecture, as shown in Figure 1 below.

Figure 1 : The Oracle Business Intelligence Data Architecture

The Oracle Business Intelligence Semantic Model is administered using the Oracle BI

Administration tool, a Windows application that ships with Oracle

Business Intelligence. This tool presents the semantic model as having three layers, as shown in Figure 2 below.

Figure 2 : The Oracle Business Intelligence Administration Tool

The three layers are as follows: 1. The Physical Layer, which contains metadata on data sources, such as relational databases, OLAP cubes, files and XML documents 2. The Business Model and Mapping Layer, which contains a dimensional model of your business information, including calculations, drill paths, business definitions of data items, and in OBIEE 11g, lookup tables. 3. The Presentaton Layer, made up of one or more subject areas based around single fact tables and linked dimensions. When creating an Oracle Business Intelligence semantic model, there are a number of design objective that you should bear in mind. Starting with the physical layer and moving through the business model, and then presentation layers, the first objective should be to simplify the underlying data and present it as a logical dimensional model, with measures held in one or more fact tables linked to one or more conformed dimensions. The federation capabilities of Oracle Business Intelligence should be used to link disparate but related data sets, so that users can write queries that span the organizations data. The calculation capabilities of the business model and mapping layer should be used to create and expose common calculations, whilst logical dimensions can be used to formalize the drill paths within the data set. Finally, data should be presented to users in the form of subject areas (akin to data marts in physical data warehouses), with common links between them in the form of conformed dimensions. Security, both row-level and subject area, should be set up so that users only see the data that is relevant to them, with

access made available through application roles linked to external directories such as Oracle Internet Directory or Microsoft Active Directory. Basic OBIEE Data Modelling Good Practices So what are some basic, good practices for designing data models using any version of Oracle Business Intelligence Enterprise Edition? Whilst this paper does not intent to teach Oracle Business Intelligence data modelling from first principals, here are a number of basic good practices that have worked for the author on a number of projects. 1. If Possible, Use a Data Warehouse as the Data Source Whilst it is possible to have Oracle Business Intelligence report against normalized, OLTP-style data, the SQL generated by the BI Server will become complex and will not perform as well against physical source data already optimized into a star schema. The data modelling process is more straightforward when your source data is stored as a dimensional data warehouse, and your queries will generally run faster. If possible, use a data warehouse as the data source for your Oracle Business Intelligence semantic model. 2. Think in Terms of Dimensional Modelling Again, whilst your source data does not have to be stored as a physical star schema, the business model that you build as part of the semantic model has to be organized dimensionally, with measures in one or more fact tables that then reference one or more (ideally conformed) dimensions. These dimensions should ideally be denormalized, though it is possible to snowflake the business model, and hierarchies and levels should be defined within this model to formalize the drill paths within your data. If you have not done so already, read up on any of the books by Ralph Kimball, and think dimensionally for your business model. 3. Define Keys in the Physical Layer, Ideally Against Aliased Tables The physical layer within your semantic model generally needs primary keys on all tables that will provide dimension source data, and foreign keys between tables that provide fact data and tables that provide dimension data. In addition, if you wish to create reports that span multiple databases, foreign key links between tables in these databases are used by the federation capabilities of Oracle Business Intelligence to join, either at the database level or within the BI Server, data from these sources. An optional good practice is to create aliases for each of the physical tables in the physical layer, prefixing them with their role (Dim_ for dimension, Fact_ for fact or Lkp_ for lookup) and adding a suffix if they play multiple roles. For example, if a table TIMES has metadata imported into the physical layer and is used for both Order Date and

Ship

Date

dimensions,

it

should

be

aliased

as:

Dim_TIMES_Order_Date, and Dim_TIMES_Ship_Date to make it clear to other data modelers what purpose the table plays. Keys should then be defined against these aliases in accordance with the above instructions. 4. Create Business Model using Business Terminology and Logic The business model and mapping layer within the semantic model is used for creating a conformed, dimensional representation of the enterprise data set. As such, all measures (and only measures) should be contained in one or more logical fact tables, the number of which should be determined by the distinct set of granularities in the data set. Reference data used to slice and dice the measures should be held in logical dimension tables, which should have logical dimensions together with hierarchies, logical levels and logical level keys. Logical dimension tables should have logical keys associated with them, and logical joins should be created between the logical fact and logical dimension tables. The logical columns within these logical tables should be renamed to use business terminology, changing for example REV_AMT to Revenue Amount. When creating the business model, good practice is to start by defining, at a high level, the logical fact and then logical dimension tables, whereafter you can then drag and drop individual physical columns into the relevant logical tables, renaming as necessary, until the business model is compete. Figure 3 below shows a simple initial business model, with one logical fact table and two logical dimension tables with accompanying logical dimensions. Note how the fact table does not contain any ID columns for linked dimensions, and how the logical dimensions are annoted to describe their structure (the purpose of this will be explained later in this paper).

Figure 3 : A Simple Business Model

5. Ensure Dimension Element Counts are (Reasonably) Accurate The Number of Elements at this Level setting under the Logical Table Properties dialog for logical tables gives the BI Server useful additional information about how may rows a query against a particularl logical dimension level will return. This information is used when deciding between multiple aggregate logical table sources mapped into a logical table, but is not populated automatically (Figure 4 shows where this information should be entered by the developer).

Figure 4 : The Location of the Number of Elements at this Level Setting

To calculate the number of elements at a particular logical dimension level, use SQL*Plus (or the equivalent for your data source) to run queries similar to the following: select count(distinct(prod_name)) from products; select count(distinct(prod_type_desc)) from products; select products; count(distinct(prod_cat_desc)) from

Ensure that these counts are updated if the number of elements changes significantly, particularly if the number of elements changes significantly in proportion to others in different levels in the same logical dimension, or in comparable levels in other logical dimensions. 6. Publish One Subject Area Per Logical Star it is typically good practice to contain just a single fact table, plus associated dimension tables, within each presentation layer subject area, particularly now that with Oracle Business Intelligence 11g, analyses can span multiple subject areas as long as there are common conforming dimensions. The advantage of having just a single fact table per subject area is that the purpose of the subject area is then very clear, and it is harder to write analyses that contain incompatible presentation columns.

The 11g release of Oracle Business Intelligence makes it simple to create one or more subject areas for a related business model, with one subject area per logical fact table, through the use of the Create Subject Areas for Logical Stars and Snowflakes feature, as shown in Figure 5 below:

Figure 5 : Create Subject Areas for Logical Stars and Snowflakes

Using this feature, for a business model with two logical tables containing measures, would result in the following subject areas in the presentation layer (Figure 6).

Figure 6 : Subject Areas Based on Individual Fact Tables

7. Develop Offline Initially, then Publish Online. Consider MUDE. Whilst it is possible for a single developer to work online with a repository, checking in and checking out individual objects whilst developing the semantic model, this approach does not generally work well with teams of developers and also introduces overheads during development through the need for the BI Administration tool to avoid conflicting development. Good and best practice is for significant amounts of development work to be carried out with offline copies of repository files, with individual developers changes being merged together either using the three-way merge feature in the BI Administration tool, or more formally using the Multi-User Development Environment. Working offline with your own copy of the repository file avoids the need to continiously check-in and check-out changes, and avoids the distruption to user or test queries that might occur if working with an online repository that is also being reported against at the same time. Note that however, whilst multiple developers working online with a repository in OBIEE 10g was unsupported, the 11gR1 release has been re-engineered to allow up to five developers to work online with a repository, making this a valid way to make ad-hoc changes to a repository after it has published by taking it online. 8. Pay Attention to Consistency Checks

When verifying the integrity of your semantic model (File > Check Global Consistency from the BI Administration application menu), pay attention to the warnings and errors that are reported.Whilst you will not normally be able to place online a repository that has errors, warnings are allowed but you may find that analyses return unexpected results, or return errors under particular circumstances. It is general good practice to resolve all errors and warnings before placing a repository online, particularly ones such as: [39008] Logical dimension table Dim Survey Organizations has a source Dim_SURVEY_ORGANIZATION that does not join to any fact source. that indicate that whilst your model is not invalid, there are logical tables within the business model that have logical table sources that map to physical tables without the required foreign key links (this will cause analyses to error when you include these tables in a report), and [39024] Dimension '"Sales"."Stores (Ragged, SkipLevel)"' has defined inconsistent values in its levels' property 'Number of elements'. that indicates logical inconsistencies in your data (caused, in this instance, by dimension member counts being incorrect). In general, most warnings will also lead to errors in your reports at some point, and in particular if you are new to Oracle Business Intelligence data modelling, looks to resolve all warnings and errors before making a repository available for general use. What is a Logical Table Source? A concept that has been mentioned several times in this paper is a logical table source. Logical table sources are a key feature within the semantic model but are often misunderstood, with common misconceptions including a logical table source always equates to one physical table; a logical table source IS a physical table; and every physical table referenced by a logical table requires its own logical table source. So what is a logical table source, and how are they used effectively in an Oracle Business Intelligence semantic model?

Consider a simple business model, with three logical tables, one use for fact data and two for dimensions. Each has one logical table source mapped to it, as shown in Figure 7 below.

Figure 7 : A Sample Business Model, Showing Logical Table Sources

Each one of these logical table source represents a set of mappings, between the logical columns in the business model and physical columns in the physical layer of the semantic model. For example, the CUSTOMERS logical table source that is held in the Sources folder within the Customers logical table contains the column mappings shown in Figure 8 below.

Figure 8 : A Logical Table Source Mapping

In this mapping, all of the logical columns map to physical columns in the same physical table. Now, consider a situation where there are columns in tables joined to the CUSTOMERS physical table that you would also like to include in this mapping, as shown in Figure 9 below.

Figure 9 : Additional Joined Physical Tables Containing Customer Data

To make it possible for the logical table source mapping to reference these additional tables, firstly foreign key joins must exist between the tables in the physical layer of the semantic model. Once the joins are in place, the logical table source used for this logical table can then be extended, so that it references these additional tables, as shown in Figure 10 below:

Figure 10 : Extending a Logical Table Source to Include Related Tables

Then, the logical table source mapping can reference these additional physical column, as shown in Figure 11 below:

Figure 11 : A Logical Table Source Mapping Referencing Multiple Physical Tables

Each logical table in the business model will have one or more logical table sources associated with it; at the start, there will normally be just one. When the BI Server then comes to generate a physical SQL or MDX query to return data for a query using these logical tables, it will look at the total set of physical tables referenced in the logical tables sources and determine whether they can be accessed via a single SQL statement, containing joins

between the tables (in effect, pushing the join down to the underlying database). If this is not possible, because for example the physical tables pointed to by one of the logical table sources are located on a database separate to the other tables, it will instead generate two or more physical SQL or MDX queries, joining the results together in-memory in what is know as horizontal federation. So if logical table sources can be extended to include related tables, why would a logical table require more than one logical table source? The first reason refers back to the horizontal federation concept, and is where individual logical columns in a logical table are actually sourced from different databases, making it impossible to write a single SQL statement that covers both sources. In Figure 12 below, the Total Price measure is mapped to a physical table in the ORCL Oracle database, whilst the Quotas measure is mapped to a Microsoft Excel source called Quotas. As both of these sources are at the same level of granularity, but the BI Server is being asked to run queries against more than one data source, this is again referred to as horizontal federation.

Figure 12 : Multiple Logical Table Sources in a Single Logical Table

Figure 13 shows the different column mappings in the two logical table sources, with one logical table source pointing towards the Oracle database, and one towards the Microsoft Excel spreadsheet. The BI Server will combine these two data sets together at run time through a full-outer stitch join, a full outer join between the two data sets.

Figure 13 : Two Logical Table Source Mappings

For example, in Figure 14 below, we can see one logical table source for the Fact Sales logical table mapping to an database schema containing aggregate data, whilst another points to a schema containing detail-level data.

Figure 14 : Logical Table Sources Mapping to Detail, and Aggregate-Level Data Sources

The Fact_A_SALES_AGG logical table source is differentiated from the Fact_SALES logical table source through the Logical Dimension level mappings held in the logical table source properties, which in this case indicate that the logical table source contains aggregated data, as shown in Figure 15 (Product Category and Quarter are logical levels above the leaf level of their respective dimensions).

Figure 15 : Logical Table Source Logical Dimension Level Mappings

Only one of these logical table sources would be accessed by the BI Server when a physical SQL or MDX query needed to be generated, and the decision would be based on the level and type of aggregation requested by the analyses. This is referred to as vertical federation, and together with horizontal federation, these two scenarios are the major reason why logical tables have more than one logical table source, which are simply sets of mappings between logical and physical columns that may, or may not be referenced depending on the context of the users analysis. New Features Modelling in OBIEE 11gR1 for Data

So now that we have gone through some basic good practices with previous and current releases of Oracle Business Intelligence, it is worth looking at some new features introduced with the 11g release of Oracle Business Intelligence that may be of use to data modelers. These new features include:

1.

Enhanced support for logical dimensions, including skip-level, ragged and parent-child hierarchies When creating level-based hierarchies (the only option with Oracle Business Intelligence 10g), the situation may arise where the hierarchy that you need to create is either ragged (where each leaf member is not necessarily at the same level in the hierarchy), or skip-level (where some members may not have an immediate parent, instead skipping to the level above). Figure 16 below shows a level-based hierarchy that is both ragged, and skip-level.

Figure 16 : A Sample Ragged, and Skip-Level Dimension Hierarchy

Ragged and skip-levels are represented in the underlying dataset through NULLs at either what would normally be the leaf level, or at levels that would normally hold a dimension member ID. The 11g release of Oracle Business Intelligence gives developers the ability to specify dimensions as having ragged, and/or skip-level hierarchies, when editing the properties of the logical dimension, shown in Figure 17 below:

Figure 17 : Specifying Ragged and Skip-Level Dimension Structure

Specifying Ragged and/or Skip-Level means that the BI Server will anticipate NULLs for dimension member IDs, and when used in combination with hierarchical columns (described below), either stop drilling down at that point (for ragged hierarchies) or jump up or down two levels (for skip-level hierarchies). One point should be noted when working with ragged and skip-level hierarchies. When working with a ragged hierarchy, as each leaf member of the ragged dimension may not be at the lowest level of the hierarchy, it is necessary to add a surrogate key to the dimension table to provide a consistent column on which to join to the fact table. This surrogate key should be used as the key for the logical dimension table, and used as the lowest level in the logical dimension, as shown in Figure 18 below.

Figure 18 : Adding a Surrogate Key Level to the Logical Dimension Hierarchy

When creating an associated hierarchical column for the logical dimension, as mentioned in point 2 below, this additional level can be deleted as a presentation level from the hierarchical column, so that the user can only drill down as far as the immediate level above, which is the maximum depth at which a leaf member could normally exist. The 11g Release of Oracle Business Intelligence also introduces support for parent-child, or value-based hierarchies. Now, when a developer chooses to create a logical dimension within the business model, the option is given to create the logical dimension as either level-based or parent-child, as shown in Figure 20 below:

Figure 19 : Specifying a new Logical Level as Having a Parent-Child Hierarchy

Then, when the new logical dimensions properties are specified, the developer has to specify the logical column that contains the Parent ID to go with the Member ID, creating the parent-child link. Once this is done, a closure table has to be created, which contains, for each dimension member, a recursive list of which members are its parents, grandparents etc, avoiding the need for the BI Server to issue recursive SQL when the dimension is traversed. Figure 21 shows the wizard that can be used to create this closure table, which generates two SQL scripts, one to create the closure table, and one to populate it, the second of which needs to be re-run whenever the underlying dimension data is refreshed.

Figure 20 : Creating the Closure Table

So, one question that developers new to the 11g release of Oracle Business Intelligence may have, is when to use a ragged level-based hierarchy, or in fact a parent-child hierarchy, if they wish to create a ragged hierarchy within the business model (as parent-child hierarchies are by their nature, naturally ragged). The answer to this question is determined by the nature of the data in

the source dimension table. If the table naturally has levels (e.g., columns that hold clearly identifiable levels of data) then you should create the dimension hierarchy as level-based and ragged. If the underlying source data does not naturally have named levels (perhaps in an organizational hierarchy, or a hierarchy of trading books) then it would better suited to a parent-child dimension hierarchy. 2. Hierarchical Columns The 11g Release of Oracle Business Intelligence allows you to drag and drop logical dimensions from the business model into presentation layer subject areas, to create hierarchical columns. These columns are made up of presentation layers, and when included in an analysis, allows the user to perform in-column drilling on their data set. Hierarchical columns can exist alongside regular dimension columns (now called attribute columns) and measure columns. Figure 22 shows a hierarchical column being created from a logical column in the business model.

Figure 21 : Creating a Hierarchical Column

When included in an analysis, the hierarchical column allows the user to drill into the hierarchy within the same column, coping with ragged, parent-child and skip-level members without problems, as shown in Figure 23.

Figure 22 : Displaying a Hierarchical Column in an Analysis

As shown in the previous item regarding ragged hierarchies, these hierarchical columns can implement all, or a subset, of the logical levels in their related logical dimension, and each hierarchical column represents a single hierarchy, so that two are created for example when a logical dimension with two hierarchies is dragged into a presentation layer subject area.

3. Lookup Tables Whilst most lookup data will be referenced in a semantic model by extending logical table sources to reference associated data, in some cases developers may wish to import the metadata for standalone, lookup tables that may, or may not, contain lookup values for a particular dimension member.

Figure 23 : Defining a Logical Lookup Table

Typical

scenarios

where

this

may

be

useful,

include:

- Currency conversions that you wish to perform as a separate, standalone calculation - accessing datatypes such as CLOBs that cannot be included in a GROUP BY clause - avoiding the mandatory including of joins and tables in an SQL query relating to a logical table source that uses an outer join The 11g release of Oracle Business Intelligence allows developers to either reference lookup data in a physical table, through a logical table source mapping, through a new LOOKUP function, or to define a logical table as being a lookup table and again reference it through the new LOOKUP function. In both cases, the lookup can be specified as being either DENSE (where there will always be a corresponding lookup value for a dimension member), or SPARSE (where values may not be present, analogous to a left outer join in SQL). The syntax for SPARSE and DENSE LOOKUPs is as follows: Lookup(DENSE <<lookupColumn>>, <<sourceKeyorExpression>>)

Lookup(SPARSE <<lookupColumn>>, <<alternateColumn>>, <<sourceKeyorExpression>>) When a physical table is referenced through the LOOKUP function, the BI Server will generally push the inner or outer join required to access it into the general SQL statement used for the query. For logical tables accessed via a Lookup function, the BI Server will generate a separate SQL statement for the lookup, which it will join back to the main data set using an in-memory join. To create a lookup involving a physical table, first import the physical tables metadata into the physical layer for the semantic model, ensuring that you define a primary key for the table. Then, extend the relevant logical table source to include an entry for the new lookup column. Finally, using the new LOOKUP function, use it to reference the lookup column, using a function call such as: Lookup(DENSE "orcl".""."GCBC_SALES"."Lkp_SEASONS"."SEASON_DESC", "orcl".""."GCBC_SALES"."Dim_TIMES"."MONTH_YYYYMM" ) where SEASON_DESC is the column being looked up, and MONTH_YYYYMM is the column in the logical table that corresponds to the key of the lookup table. To define a logical table as being a lookup table, tick the Lookup Table tickbox (as shown in Figure 24, above) to define it as such in the business model. It will then pass repository consistency checks even though it does not have a logical join to a fact table, and you can reference it using the regular Lookup function syntax.

4. ID and Descriptive Double Columns Another new feature introduced with OBIEE 11g is the ability to associate ID columns with descriptive columns in the repository, so that filters applied to analyses can use these IDs. This is useful, for example, when you wish to display product descriptions as text, but filter on the ID, and is also useful when you wish to create multiple logical columns containing translations but filter on a common ID column. To define this feature, called Double Columns in the OBIEE documentation, edit the properties of the descriptive column in the repository and use the Descriptor ID setting to select the associated ID column, as shown in Figure 25 below.

Figure 24 : Setting an ID Column for a Descriptive Column

To make use of the new ID column, create a dashboard prompt as normal, and you will see the ID column listed as the Included Code Column under the Prompt For Column setting, as shown in Figure 26 below.

Figure 25 : Creating a Dashboard Prompt using Double Columns

The Enable user to select by Code Column tickbox then allows the user to optionally display the code alongside the descriptions when using the dashboard prompt. Figure 27 shows a typical dashboard prompt where the user has chosen to display the IDs (this is optional, and the feature still works without the ID being displayed).

Figure 26 : Displaying the ID along with Description in a Dashboard Prompt

To make use of the double column, create the analysis as normal using the Is Prompted filter, and create the filter against the description column as normal, as shown in Figure 28 below.

Figure 27 : Filtering Against the Descriptive Column

Then, when a column is defined as having an associated column, regardless of whether the user chose to display the ID values alongside the descriptive values in the dashboard prompt, the SQL issued by the BI Server will still filter against the ID values only. 5. Logical Table Source Priority Group Ordering When a situation arises where more than one logical table source could provide data for a logical column, the source that is chosen is typically determined by the level in the logical dimension that the source is mapped in at, so that for example, a logical table source mapping to aggregated data is chosen in preference to a detail-level one when a analysis involves aggregation. In circumstances where there are multiple source mapped in that refer to aggregated data, the Number of Elements at This Level value is considered, for each applicable dimension, to determine which source is used. As it can sometimes be unclear which source will be used in these circumstances, the 11g release of Oracle Business Intelligence introduces a feature called priority group order that allows the developer to explicitely prioritise data sources in this situation. This setting is then generally the favoured determinant in selecting a logical

table source for a logical column. Figure 29 shows this setting, with an aggregate logical table source initially being set to Priority Group 0 (so that it is considered as high priority as the detail-level logical table source, and therefore used), and then being set to 1 (making it lower priority than the detail-level source, leading to it not being used even though otherwise it would qualify)

Figure 28 : LTS Priority Group Settings, and affect on Generate SQL

6. Changes to Repository Handling and Maintenance The final major change to data modelling with Oracle Business Intelligence 11g is in handling of the actual RPD file. As users and groups are no longer stored in the repository (instead, moving to in the current release the WebLogic server), there is now no mandatory Administrator account with an Administrator group. So that repository files can still be opened offline, securely, RPD files now are encrypted and have a repository password, which can be set using the BI Administration tool menu. In addition, Oracle Fusion Middleware Control is now used to take repositories online, rather than the developer editing by hand the NQSConfig.INI file, as shown in Figure 30 below.

When a new repository needs to be taken online, Fusion Middleware Control is used to browse to the required RPD file, and then these changes are activated which in turn moves a copy of the file to the relevant instance directory, and updates the NQSConfig.INI file. After this file has been applied, the Restart to apply recent changes button is pressed which then gives the developer the option to restart all services, or just the BI Server, so that the changes in the configuration file are then made active.

Summary There are a number of good practices that apply when working with Oracle Business Intelligence data modeling, including working with data stored in a data warehouse, building the business model using business terminology and logic and to a dimensional model.

You might also like