TD Join Index

Optimization is the technique of selecting the least expensive plan (fastest plan) for the query to fetch results.
Optimization is directly proportional to the availability of -1. CPU resources 2. Systems resources - amps PEs etc. If the table is small redistribute them to all the AMPs to have the AMP local Join. Always JOINs are made AMP local if it cannot then you have the high chance of running out of SPOOL space. Performance tuning and optimization of a query involves collecting statistics on join columns avoiding cross product join selection of appropriate primary index (to avoid skewness in storage) and using secondary index. Avoiding NUSI is advisable.
SELECT customer_number, customer_name FROM customer WHERE customer_number in (1000, 1001, 1002, 1003, 1004); is much less efficient than: SELECT customer_number, customer_name FROM customer WHERE customer_number BETWEEN 1000 and 1004
How does indexing improve query performance? Indexing is a way to physically reorganise the records to enable some frequently used queries to run faster. The index can be used as a pointer to the large table. It helps to locate the required row quickly and then return it back to the user. or The frequesntly used queries need not hit a large table for data. they can get what they want from the index itself. - cover queries. Index comes with the overhead of maintanance. Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained. Indexes cannot be accessed directly by users. Only the optimizer has access to the index. EXPLAIN facility is a teradata extension that provides you with an "ENGLISH" translation of the steps choosen by the optimizer to execute an SQL statement.It may be used oin any valid teradata database with a preface called "EXPLAIN". How teradata makes sure that there are no duplicate rows being inserted when its a SET table?
Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate. If its a duplicate it silently skips it without throwing any error.
A clique is a set of Teradata nodes that share a common set of disk arrays. Cabling a subset of nodes to the same disk arrays creates a clique. Vdisk is group of disks virtually connected together and amp is assinged one set of Vdisk to work with. An amp will be responsible for one VDISK. How to find No. of Records present in Each AMP or a Node for a given Table through SQL? Sel HASHAMP(HASHBUCKET(HASHROW(PRIMARY_INDEX_COLUMNS))) AS AMP_NO, COUNT(*) From DATABASENAME.TABLE_NAME GROUP BY 1; Types of INNER JOINS
1. Ordinary Inner Join 2. Cross Join 3. Self-Join
Join index is useful for queries where the index structure contains all of the columns referenced by one or more joins in a query. Join index was developed so that frequently executed join queries could be processed more efficiently. like the other indexes, a join index store rowID pointers to the associated base table rows.
Collect Statistics Collects demographic data for one or more columns of a base table, hash index, or join index, computes a statistical profile of the collected data, and stores the synopsis in the data dictionary. The Optimizer uses the synopsis data when it generates its table access and join plans. Syntax: COLLECT STATISTICS ON table_name COLUMN column_name; COLLECT STATISTICS ON table_name COLUMN ( column_name, ... ); COLLECT STATISTICS ON table_name INDEX index_name;
COLLECT STATISTICS ON table_name INDEX ( column_name, ... ); [I left out some fewer details and some other ways of collecting statistics, but I hope this helps] You can collect statistics also on Join indexes and Hash Indexes
INDEX
Join Index: The following page shows a pictorial of a Join Index. You can see the SQL used to CREATE the Join Index. You can also see both the base tables which are the Employee_Table and the Department_Table. The Join Index joins the two tables together and keeps the result set inside PERM Space. Users dont ever query the Join Index, but only query the base tables. If a user does join these two tables together the Parsing Engine will be the one to decide if it is faster to build the answer set from the Join Index instead of actually performing the Join on the Base Tables. In a sense the Join Index is the result of joining the two tables together so the Parsing Engine will almost always decide to take the data from the Join Index instead of joining the tables again manually. When the base tables are updated so is the Join Index, thus keeping everything in sync. This chapter will do a great job explaining the concept behind Join Indexes. In Oracle these are called Materialized Views. CREATE JOIN INDEX EMP_DEPT_IDX AS SELECT Employee_No , E.Dept_No, First_Name , Last_Name, Salary, Department_Name, Mgr_No, Budget FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No PRIMARY INDEX (Employee_No) Fundamentals of Join Indexes Not Pointers, but actual data is stored Users never query them directly PE accesses them for faster access to data Updated when underlying tables updated Take up Perm Space FastLoad/MultiLoad wont load with them Can have Non-Unique Primary Indexes Can have Non-Unique Secondary Indexes Collect Statistics on Primary and Secondary Type of Join index: Multi-Table Join IndexJoin Indexes Multi-Table CompressedJoin Index Single-Table Join Index
Aggregate Join Index Sparse Join Index
Hash Indexes are used similar to a Join Index, but Hash Indexes are maintained in AMPLocal tables and used to quickly find certain key columns in a base table Eg: CREATE HASH INDEX EMP_Hash_IDX (Dept_No ,First_Name ,Last_Name) ON Employee_Table; -- Ordered by hash of primary index Eg: CREATE HASH INDEX EMP_Hash_Val (Dept_No ,First_Name ,Last_Name) ON Employee_Table Join Index Details to Remember Details to Know about Join Indexes Max 64 Columns per Table per Join Index. BLOB and CLOB types cannot be defined Triggers with Join Indexes allowed V2R6.2 After Restoring a Table, Drop and Recreate the Join Index. Automatically updated as tables change. FastLoad/MultiLoad wont load with them Can have Non-Unique Primary Indexes Can have Non-Unique Secondary Indexes Collect Statistics on Primary and Secondary
New in V13 Compression in your Join Index from the Base Tables Because values in a column are not unique like everyone else Teradata created Compression. You can compress up to 255 values plus null in the column of a table. Yin the past you could not compress data types of VARCHAR, but you can do that in Teradata V13. In the past when a base table had Multi-Value compression and you created a Join Index the Join Index did not automatically use the Compressed Values from the base table. That has changed. Now, if a base table uses Multi-Value Compression and a Join Index is created on that base table, then the compression of the Join Index also takes place automatically. This can have great savings on space. V13.10 New! Compression in Join Indexes taken Automatically from the Base Tables Compression. During the CREATE of a Join Index, columns that are Compressed in the base table will be automatically Carried from the base table so the Join Index rows Will also be compressed. The feature will promote adoption of more Join Index and Aggregate Join Indexes because of the Increased query performance and space savings.
How Compression is implemented CREATE TABLE Employee_Table ( Employee_No INTEGER ,Dept_No INTEGER ,First_Name VARCHAR(20) ,Last_Name CHAR(20) COMPRESS (Smith, Wilson, Davis, Jones) , Salary Decimal(10,2) ) Unique Primary Index (Employee_No) Up to 255 Values Per Column can be compressed (Plus Nulls).
MultiLoad ========= * Loads data to TeraData from aMainframe or Lan flat file * Multiple tables can be loaded in the same MultiLoad. * Up to 20 INSERTS, UPDATES, or DELETES on upt o 5 tables. * UPSERT is supported * There can be NO - Unique Secondary Indexes (USI), Triggers, Referential Integretiy or Join Indexes. * Duplicate rows are allowed * Each Import task can do multiple INSERT, UPDATE and DELETE functions. * Some Secondary Inexes (NUSI) and RI are allowed * Locks at the table level * Block Level transfering of Data. TPump ===== * Loads data to TeraData from aMainframe or Lan flat file * Processes INSERTS, UPDATES or DELETES * Tables are usually populated. * Can have Secondary Indexes and RI on tables. * Does not support MULTI-SET tables. * Locks at the row hash level * It uses time based checkpoints not count based.

TD Join Index

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TD Join Index

Uploaded by

Copyright:

Available Formats

Optimization is the technique of selecting the least expensive plan (fastest plan) for the query to fetch results.

Aggregate Join Index Sparse Join Index

You might also like