Professional Documents
Culture Documents
About JIs
Join index is most like a "materialized view", say it is a stored result of an SQL SELECT , like a
table: you can define the primary index (PI) of the stored result.
What are the main differences between a JI and a Secondary Index?
Maintenance is logged in separate DBQL STEP, therefore cost can be easily measured
What is the difference between JI and a regular table in which I can store the same query result?
By logic: if the JI can support the query with its data content
JI types
The Join Index types I will list below are not differentiated by SQL phrase, but the structure of the
SQL SELECT used in the JI definition.
They can be combined also in reasonable ways, eg. <single table - aggregate - sparse> or <multi table aggregate>, etc.
Let's take these base tables for our examples:
CREATE TABLE TBL_A
(
Col1 integer
, Col2 integer
, Col3 integer
) PRIMARY INDEX (Col1)
;
CREATE TABLE TBL_B
(
Col1 integer
, Col2 integer
) PRIMARY INDEX (Col1)
;
Single table JI
This is the most simple case of a join index. We include only one table, and typically choose different
PI than the base table has. There are two significantly different kinds of usage:
Non-covering
We select only the filtering column(s) (those will be the PI of the JI also) and the "Rowid" pseudo
column in the JI definition. In this case the filter is strongly selective, and the rowids will be put to a
spool to be joined to the base table's appropriate records. The JI can be very small this way, but note
that we have an additional "join phase".
Covering
The JI is selecting all columns (or all columns required by the SQL to be supported) . That means that
the base table is not required to satisfy the query at all. This will result very fast operation. This case
is typically used for eliminating frequent table redistributions of some "central" tables.
This example shows a non-covering index for the query below:
create join index JI_1
as
SELECT Col1,Col2
FROM TBL_A
PRIMARY INDEX(Col2)
;
select Col3 from TBL_A where Col2=1;
Multi table JI
This kind of JI is for accelerating frequent join statements. Technically it stores the result of a join. It
can cover all the columns of just store the key-pairs, or somewhere between.
create join index JI_2
as
SELECT a.Col3,b.Col1
FROM TBL_A a
join
TBL_B b on a.Col2=b.Col2
PRIMARY INDEX(Col3);
Aggregate JI
The JI's SELECT contains GROUP BY clause. This case is for caching frequent aggregations. Typically can
be very useful for supporting those OLAP applications, that do not have internal aggregation-caching
methods Teradata's optimizer is quite clever, because it can recognize "intermediate" aggregate JIs for
further aggregation instead using the base table. Example:
create join index JI_3
as
SELECT Col1,Col2,sum(Col3) X
FROM TBL_A
GROUP BY 1,2
PRIMARY INDEX(Col2);
All three SELECTs can be served from the JI_3:
SQL1: select Col1
, sum(X) from TBL_A group by 1;
SQL1: select Col2
, sum(X) from TBL_A group by 1;
SQL1: select Col1,Col2, sum(X) from TBL_A group by 1,2;
Sparse JI
Thees JIs contain where clause, say the indexing is not for the whole table. If the where condition of
the JI is a logical subset of the supported SQL's where condition than the JI can support the query.
Typical usage is on the transactional tables: we have frequent accessed on a transaction table by
Customer_id, but have PI of Trx_id. We can observe that 90% of the SELECTs fall onto the last 30 days.
We can put a sparse non covering single table JI on the table with PI:Customer_id
90% of the selects will finish in no time, and 10% of the queries will result in a full table scan,
meanwhile our JI size remains pretty small: we index only 30 days instead of eg. 365 days.
Please note that where condition of a sparse JI can not contain "changing" values like current_date.
Practical examples:
Frequent aggregation or join operation
Non primary index based filtering on a table (causing full table scan)
Choosing PI
Te PI of the JI will be the most effective access path to the JI's records, like in case of tables, use the
regular PI choosing methodology. Mind frequent access (b which column(s) are filtered or joined),
distribution(skewness) and hash collision.