You are on page 1of 4

APPLIED SOLUTIONS #1 TECH 2 TECH

An added dimension
Leveraging the Teradata aggregate join index feature
optimizes online analytical processing performance. by Carlos Bouloy and Rupal Shah

E xecutives and front-line employees


often make decisions based on data
that has multiple independent attri-
a cube, or a specialized pre-calculated data
store, while ROLAP uses a standard rela-
tional data store.
Implementing ROLAP cubes offers
end users a simple solution that enables
more dimensions, history, detail and faster
butes or dimensions. Online analytical MOLAP storage generally provides more deployments, while providing fast query
processing (OLAP) is the process of rapid query response, but it might not be responses. Through ROLAP, cubes can
analyzing this type of dimensional data. feasible to regularly move enormous provide access to an enormous amount
Two primary methods of implementing amounts of data to populate the MOLAP of data and perform queries that meet or
OLAP are through multi-dimensional store. As an alternative, you can define exceed most business requirements.
OLAP (MOLAP) and relational OLAP your cubes to use ROLAP mode, which
(ROLAP). Though the purpose of enables a scalable solution. Another The MOLAP challenge
analyzing the data is the same for both advantage to using ROLAP is that it can Most of the time it takes to process a
methods, the architecture and processes be built in a fraction of the time it takes MOLAP cube is spent transferring data
differ: MOLAP stores the information in to populate a MOLAP cube. and populating the MOLAP cache (including
aggregates). The data is transferred to a cube-
building process that resides on a middle
Figure Schema/semantic layer database server or on a complex of servers. However,
transferring large amounts of data can take a
significant amount of time, and moving data
from one server to another can introduce
challenges. These challenges are applicable
in any cube implementation as the MOLAP
environment matures to deliver deeper and
wider analytics.
Leveraging the Teradata aggregate join
index (AJI) feature will optimize ROLAP
performance. An AJI is an aggregated
result set saved as an index in the data-
base. It is transparent to end users and
business intelligence (BI) administrators,
and it is used automatically by the
Teradata Optimizer.
By building AJIs on the Teradata Data-
base, the data transfer and cube build is
replaced with high-speed index builds.
These indexes build in a fraction of the
The aggregate join index (AJI) defined in this article will use the star/snowflake schema/semantic
time it takes to build a MOLAP cube.
layer database.

PAGE 1 | Teradata Magazine | December 2007 | ©2007 Teradata Corporation | AR-5472


TECH 2 TECH APPLIED SOLUTIONS #1

Building AJIs for > Keep your AJI lean. Only place foreign ALTER TABLE FACT_ADD foreign key
ROLAP solutions key columns in the AJI. Name and/or (SALES_CENTER_ID) references with no
The cube uses a dimension map such description columns will result in a check option
as the one shown in table 1 to define the larger AJI that takes longer to build SALE_CENTER (SALES_CENTER_ID)
dimensions and levels accessible in the and maintain.
cube. The dimensional map is based on the > Statistics should be collected on all (Note: The above RI relationship is for
schema/semantic layer database (see figure, primary key and foreign key relation- a star design—i.e., FACT to one dimension
page 59). ship columns. This will assist in the table with lowest level. Otherwise, for
To deliver a timely, optimal and perfor- AJI build and optimizer query plans. snowflake design, RI(s) must be defined for
mant ROLAP solution, the following > Dimension table primary key each higher-level roll up where each level is
Teradata physical database design is rec- columns are defined as unique in its own table.)
ommended in a virtual or physical seman- by either the UNIQUE constraint, Address the lowest levels in your cube
tic layer database as shown in the figure. unique primary index or unique that are not in your AJI by utilizing pri-
These physical table stipulations within secondary index. mary index (PI), secondary index and
the database should be considered: > Implement referential integrity (RI). partitioned primary index (PPI) to opti-
> Snowflake or third normal form RI can be defined with no-check mize detail data access.
(3NF) models are recommended, option given integrity exists within Secondary indexes on columns that corre-
but the solution can be imple- your data. Most data warehouses spond to low level members in the dimen-
mented on a star schema. implement integrity checks within sional model will provide fast access to rows
> Primary and foreign keys are not their low processes: in your transaction/FACT table. This will
compressible. enable you to eliminate these columns/values
> Foreign keys are all defined as NOT ORGS: Business-->Unit-->Division--> from the AJI, thus making the AJI smaller
NULL. Area-->Sales Center while still providing access to this level. An
example of this would be PRODUCT_ID.
The PRODUCT_ID column is normally
Table 1: Dimensional map used in filtering and slicing cubes, and is
often a high cardinality column, a good can-
didate for a secondary index.
PPIs allow a table to be partitioned on
columns of interest while retaining the tra-
ditional use of the PI for data distribution
and efficient access when PI values are spec-
ified in the query. A good candidate for PPI
would be the day level within your time
dimension. Most cubes do not provide
This is a map, based on the star/snowflake schema in the figure on page 59, of the dimensional model for a
cube solution defining all of the dimensions that will be accessible in the cube and their hierarchies. access to day-level data since it is too costly
to bring that level of detail into a cube. The
same holds true with AJIs; it may be too
Table 2: Dimensional map with broad AJI for Teradata
costly to include day within the AJI, but fast
access can be provided to day-level detail
using PPI. This will enable larger cubes to
be defined.
Another candidate for PPI may be a
regional ID such as Branch_ID. If your geo-
graphic dimension has many members that
can be expressed within a PPI clause, then it
may be better to use it for the partitioning
The columns above the red line should be used in creating a broad AJI. scheme. This geographic ID also fits in well

PAGE 2 | Teradata Magazine | December 2007 | ©2007 Teradata Corporation | AR-5472


if you want to provide regional access to are star single tables with a one-level
users, such as branch managers, from a sin- hierarchy. Single-level dimensions
gle relational cube. What is an AJI? such as these are the exception.
An aggregate join index (AJI) is Since there is no higher level in the
AJI strategy an aggregated result set saved dimension, it is included in the DDL
Determining the columns to participate in as an index in the database. previously mentioned. Notice that
the AJI is very important, but can be chal- The AJI will be used automati- ad.Channel_Id is from the foreign
lenging. A good start would be to draw a red cally by the Teradata Optimizer key value from the FACT table.
line across your dimensional model one level when like columns and aggre- > Drop and recreate is one method for
up from the lowest level of each dimension. gates are made frequently rebuilding AJIs after the data ware-
(See table 2, page 60.) This is called a broad within a query plan. house, FACT and dimension tables
AJI. Single-level dimensions, such as Chan- have been refreshed. Rebuilding AJIs
nel type in this example, are the exception. will build in a fraction of the time it
Since there is no higher level than Channel takes to build a MOLAP cube. It is
type in the dimension, it should be included figure, page 59) and the following also possible to update the base data
in the broad AJI definition. considerations understood: with the AJIs in place. The Optimizer
A good rule for selecting columns in > Once the SQL is determined, wrap the will update the base tables and the
the AJI definition is to include low cardi- CREATE JOIN INDEX and PRIMARY AJI(s) at the same time. These types
nality columns in the AJI. High cardinality INDEX syntax and execute the DML of updates can be implemented by
columns are good candidates for secondary statement via Teradata Queryman or TPump or by FastLoading into a
indexes on the FACT table and should be Winddi. Creation time for an AJI will staging table, then insert/select into
excluded from the AJI. High cardinality depend on size of the tables and base table.
columns that are defined within the AJI system usage. The following code is the SQL for
will increase the size of the AJI, thus affect- > Notice that the ak.Area_Id building this particular AJI based on the
ing performance. referenced in the data definition corresponding FACT table:
For larger cubes that contain more than 40 language (DDL) is from the foreign
dimensions, it may be necessary to eliminate key value from the SALES_CENTER CREATE JOIN INDEX AJI_Example ,NO FALL-
seldom-used dimensions from the AJIs. This table (the lowest dimension), not BACK ,CHECKSUM = DEFAULT AS
will ensure that the highest performance is from the AREA table. Hence, unlike SELECT COUNT(*)(FLOAT, NAMED CountStar ),
given to navigations that are most often used, star dimensions, in the semantic ae.Brand_Category_Id ,
as seldom-used navigations will run more layer database the higher levels in ac.Product_Category_Id ,
slowly. Most business users are willing to the AJI definition do not need to ad.Business_Type_Id ,
accept this trade-off given that they are most be included for the Optimizer to ad.Channel_Id ,
likely getting more detail, more dimensions rewrite/use the AJI. ak.Area_Id ,
and timelier data with a ROLAP solution. > Whether star, snowflake or 3NF is al.Year ,
This is a good initial approach to crea- being used, it is recommended to only al.Quarter ,
ting an AJI. As DBAs gain more experience place foreign key values (IDs) in the al.Month ,
and better understanding of the types of AJI. Placing other values, such as SUM(ad.Sales )(FLOAT, NAMED SALES )
analyses end users are requesting, they descriptions and attributes, may speed FROM
can determine more appropriate AJIs to up query performance, but it will Product ac ,
create—whether to build an AJI on a spe- increase the size and time to build Fact ad ,
cific subset of dimension, for instance, or your AJI. The Teradata Optimizer will Brand ae ,
whether to build an AJI across all dimen- take care of SQL requests with Desc Sales_Center ak ,
sions and at what dimensional level. or Name by joining to the dimension Time al
table on the foreign key value and WHERE
Defining AJI then aggregating on the attribute or (((ad.product_id = ac.product_id ) AND
As an AJI is created, the semantic layer description field. (ad.brand_id = ae.brand_id )) AND
database should be referenced (see > The Channel and Business dimensions (ad.sale_center_id = ak.sale_center_id ))

PAGE 3 | Teradata Magazine | December 2007 | ©2007 Teradata Corporation | AR-5472


TECH 2 TECH APPLIED SOLUTIONS #1

AND
(ad.day = al.day ) As DBAs gain more experience and better
GROUP BY ae.Brand_Category_Id,
ac.Product_Category_Id, understanding of the types of analyses end
ad.Business_Type_Id,
ad.Channel_Id, ak.Area_Id, al.Year,
users are requesting, they can determine
al.Quarter, al.Month
PRIMARY INDEX ( Brand_Category_Id,
the appropriate types of AJIs to create.
Product_Category_Id, Business_Type_Id,
Channel_Id, Area_Id, Year, Quarter, Month ); dimensions of a broad AJI, will benefit to be 1,040 rows. The estimated time for this
from the first AJI and build in a fraction step is 0.18 seconds.
(Note: In the example, the physical design is of the time it took to build the first one. 4) We do an all-AMPs RETRIEVE step from
made of star and snowflake dimensions to After all indexes are created, structures Spool 3 (Last Use) by way of an all-rows scan
show the various options in selecting an will provide fast query performances for into Spool 1 (group_amps), which is built
appropriate broad AJI definition.) a variety of OLAP queries: locally on the AMPs. The size of Spool 1 is
> The TIME dimension is a star single > The broad AJI for most frequently estimated with no confidence to be 1,040
table with a four-level hierarchy. used access paths rows. The estimated time for this step is
The DDL includes higher dimen- > The secondary indexes on high 0.01 seconds.
sion levels (e.g., Year, Quarter and cardinality FACT columns 5) Finally, we send out an END TRANSACTION
Month). This is done to ensure the > The PPI on the DATE column in step to all AMPs involved in processing the
Optimizer will use the AJI for higher- the FACT table request.
level queries (e.g., Year) and provide The only types of queries that are not —> The contents of Spool 1 are sent back to
optimal performance in the pure star accounted for in these structures are those the user as the result of statement 1. The
model or dimension. that select low-level dimension members total estimated time is 0.19 seconds.
> The ORG, PRODUCT and BRAND across multiple dimensions without qualify-
dimensions are snowflake multiple ing values. An example is, “Give me the Many opportunities with AJI
tables. These multiple tables make SUM of sales by DAY, by PRODUCT, by This is one approach to using the Teradata
up each level of the hierarchy. For SALE CENTERS with no qualifications Database AJI feature. While MOLAP cubes
example, ORG dimension (four-level (WHERE criteria).” This request would present some challenges, leveraging AJI in
hierarchy), higher-level roll ups are result in many rows being returned to the ROLAP enables users to create deeper and
handled via RI between parent and client and would not be considered an wider analytics. By expanding on the possi-
child tables.) OLAP query. The client would most likely bilities presented through AJI, including the
be transferring a bulk amount of data to a use of secondary indexes and PPIs, the many
Verifying relational queries PC for analysis using another tool. combinations and various AJI constructs can
It’s always a good idea to check your relation- greatly improve your OLAP experience. T
al queries against your defined AJI. To do so, Explain
capture the SQL request via Teradata Data- 1) First, we lock a distinct EXAMPLE “pseudo Carlos Bouloy, a senior consultant, has been
base Query Log access logs. Then check the table” for read on a RowHash to prevent with Teradata for 16 years and specializes in
request using the Teradata Explain command global deadlock for EXAMPLE.AJI_EXAMPLE. optimizing applications. His most recent
to ensure the AJI is called in the query plan. 2) Next, we lock EXAMPLE.AJI_EXAMPLE activities have focused on building ROLAP
(See the Explain plan, right.) for read. solutions on Teradata.
If the AJI does not provide the desired 3) We do an all-AMPs SUM step to aggregate
performance, other AJIs can be built to from EXAMPLE.AJI_EXAMPLE by way of an Rupal Shah is a technical consultant who has
provide faster performance. This can be all-rows scan with no residual conditions, been with Teradata for 15 years. Besides
accomplished by creating AJIs at higher and the grouping identifier in field 1. working with several Teradata OLAP and
levels than the first AJI, or by removing Aggregate Intermediate Results are comput- business intelligence partners, Rupal has pro-
less-often used dimensions from broad ed globally, then placed in Spool 3. The size vided database counseling to various Teradata
AJIs. Note: Higher-level AJIs or subset of of Spool 3 is estimated with no confidence application organizations.

PAGE 4 | Teradata Magazine | December 2007 | ©2007 Teradata Corporation | AR-5472