You are on page 1of 3

How to use Join Index (JI) in Teradata

About JIs
Join index is most like a "materialized view", say it is a stored result of an SQL SELECT , like a
table: you can define the primary index (PI) of the stored result.
What are the main differences between a JI and a Secondary Index?

Different internal structure and purposes

Appears as separate object (not a sub-table) in the DBC tables/views

Can reside in different database than the base table(s)

Maintenance is logged in separate DBQL STEP, therefore cost can be easily measured
What is the difference between JI and a regular table in which I can store the same query result?

You can't access it directly from an SQL query

Optimizer automatically decides to be used or not:

By logic: if the JI can support the query with its data content

By cost: if usage of JI will result a "cheaper" execution (always collect statistics!)

Maintained automatically if the content of the base table(s) change


"Locks" the base table(s) against DROP, RENAME, MLOAD, FASTLOAD, RESTORE and ALTER the
indexed column(s).

JI types
The Join Index types I will list below are not differentiated by SQL phrase, but the structure of the
SQL SELECT used in the JI definition.
They can be combined also in reasonable ways, eg. <single table - aggregate - sparse> or <multi table aggregate>, etc.
Let's take these base tables for our examples:
CREATE TABLE TBL_A
(
Col1 integer
, Col2 integer
, Col3 integer
) PRIMARY INDEX (Col1)
;
CREATE TABLE TBL_B
(
Col1 integer
, Col2 integer
) PRIMARY INDEX (Col1)
;

Single table JI
This is the most simple case of a join index. We include only one table, and typically choose different
PI than the base table has. There are two significantly different kinds of usage:

Non-covering
We select only the filtering column(s) (those will be the PI of the JI also) and the "Rowid" pseudo
column in the JI definition. In this case the filter is strongly selective, and the rowids will be put to a

spool to be joined to the base table's appropriate records. The JI can be very small this way, but note
that we have an additional "join phase".

Covering
The JI is selecting all columns (or all columns required by the SQL to be supported) . That means that
the base table is not required to satisfy the query at all. This will result very fast operation. This case
is typically used for eliminating frequent table redistributions of some "central" tables.
This example shows a non-covering index for the query below:
create join index JI_1
as
SELECT Col1,Col2
FROM TBL_A
PRIMARY INDEX(Col2)
;
select Col3 from TBL_A where Col2=1;

Multi table JI
This kind of JI is for accelerating frequent join statements. Technically it stores the result of a join. It
can cover all the columns of just store the key-pairs, or somewhere between.
create join index JI_2
as
SELECT a.Col3,b.Col1
FROM TBL_A a
join
TBL_B b on a.Col2=b.Col2
PRIMARY INDEX(Col3);

Aggregate JI
The JI's SELECT contains GROUP BY clause. This case is for caching frequent aggregations. Typically can
be very useful for supporting those OLAP applications, that do not have internal aggregation-caching
methods Teradata's optimizer is quite clever, because it can recognize "intermediate" aggregate JIs for
further aggregation instead using the base table. Example:
create join index JI_3
as
SELECT Col1,Col2,sum(Col3) X
FROM TBL_A
GROUP BY 1,2
PRIMARY INDEX(Col2);
All three SELECTs can be served from the JI_3:
SQL1: select Col1
, sum(X) from TBL_A group by 1;
SQL1: select Col2
, sum(X) from TBL_A group by 1;
SQL1: select Col1,Col2, sum(X) from TBL_A group by 1,2;

Sparse JI
Thees JIs contain where clause, say the indexing is not for the whole table. If the where condition of
the JI is a logical subset of the supported SQL's where condition than the JI can support the query.
Typical usage is on the transactional tables: we have frequent accessed on a transaction table by
Customer_id, but have PI of Trx_id. We can observe that 90% of the SELECTs fall onto the last 30 days.
We can put a sparse non covering single table JI on the table with PI:Customer_id
90% of the selects will finish in no time, and 10% of the queries will result in a full table scan,
meanwhile our JI size remains pretty small: we index only 30 days instead of eg. 365 days.
Please note that where condition of a sparse JI can not contain "changing" values like current_date.

That means JI must be regularly redefined if a moving window sparse JI is to be maintained.


create join index JI_4
as
SELECT Col1,Col2,Col3 X
FROM TBL_A
WHERE Col3 between 1 and 10
PRIMARY INDEX(Col2);
Serveable query:
select Col1 from TBL_A where Col2=20 and Col3=4;
Non serveable query:
select Col1 from TBL_A where Col2=20 and Col3=11;

When will join index help?


Join indices typically serve two purposes (or the combination of them):

Store the result of frequently produced (part-)results

Provide alternative access path to the data

Practical examples:
Frequent aggregation or join operation

Frequent redistribution of a table

Non primary index based filtering on a table (causing full table scan)

Non primary index based join of a table (causing redistribution)

Important things about JIs


Statistics
Always collect statistics on the PI of the JIs, it is essential for pushing optimizer to use them

Choosing PI
Te PI of the JI will be the most effective access path to the JI's records, like in case of tables, use the
regular PI choosing methodology. Mind frequent access (b which column(s) are filtered or joined),
distribution(skewness) and hash collision.

You might also like