Professional Documents
Culture Documents
Page 1 of 18
Teradata SQL Tuning
Page 2 of 18
Teradata SQL Tuning
Contents
1. Introduction
............................................................................................................4
Reduce the Workload..........................................................................................4
Balance the Workload..........................................................................................4
Parallelize the Workload......................................................................................4
Upgrade...............................................................................................................4
2. Improve SQL statement tuning........................................................5
2.1 Reviewing the Execution Plan.......................................................................5
2.2 Restructuring SQL statement........................................................................5
2.2.1 AND or ‘=’ clause..............................................................................................5
2.2.2 IN or BETWEEN clauses..................................................................................7
2.2.3 LIKE clause.......................................................................................................7
2.2.4 IN and EXISTS clauses....................................................................................7
2.2.5 DISTINCT clause............................................................................................10
2.2.6 UNION or UNION ALL clauses.......................................................................10
2.2.7 CASE statement.............................................................................................11
2.2.8 SELECT DISTINCT clause.............................................................................11
AND has precedence over OR.................................................................................12
2.3 Eliminate the use of Temporary Tables.......................................................12
2.4 Avoid Mixed Type Expressions....................................................................12
3. Indexes..........................................................................................14
3.1 Secondary Indexes......................................................................................14
3.2 Join Index.....................................................................................................14
3.2.1 Single table Join Index....................................................................................15
3.2.2 Aggregate Join Index......................................................................................15
3.2.3 Sparse Index...................................................................................................15
3.2.4 Global (Join) Index..........................................................................................16
3.2.5 Join Index performance..................................................................................16
3.3 Partitioned Primary Index............................................................................17
3.4. Index Usage................................................................................................18
Page 3 of 18
Teradata SQL Tuning
1. Introduction
Performance of Teradata SQL based application depends on several factors, including database
design, network latency, and query optimization, hardware specifications. Poorly tuned queries
often cause performance problems. The objective of tuning a system is to either reduce the
response time for end users of the system, or to reduce the resources used to process the same
work. Both these objectives can be achieved in several ways:
1. If a commonly executed query needs to access a small percentage of data in the table,
then it can be executed more efficiently by using an index. Creation and utilization of
such an index, reduces the amount of resources used.
2. If a user is looking at the first twenty rows of the 10,000 rows returned in a specific sort
order, and if the query (and sort order) can be satisfied by an index, then the user does
not need to access and sort the 10,000 rows to see the first 20 rows.
Upgrade
Software/Hardware upgrade can be another means for attaining the desired performance levels
and should be used only after trying everything from SQL tuning/data model restructuring and
Database tuning perspective.
Page 4 of 18
Teradata SQL Tuning
To ensure performance after a major / minor software or hardware upgrade, perform the following
tasks on the production system.
• Recollect statistics where possible.
• Save the EXPLAINs for the base queries.
• Run the test bed again after the upgrade and get the new EXPLAINs.
• Compare the two EXPLAINs.
• Check for faster or slower response on any of the test queries.
• If slower then check for the before and after performance EXPLAIN.
When examining the optimizer execution plan, look for the following:
• The plan is such that the driving table has the best filter.
• The join order in each step means that the fewest number of rows are being returned to
the next step (that is, the join order should reflect, where possible, going to the best not-
yet-used filters).
• There are any unintentional Cartesian products (even with small tables).
• Each table is being accessed efficiently:
Consider the predicates in the SQL statement and the number of rows in the table. Look for
suspicious activity, such as a full table scans on tables with large number of rows, which have
predicates in the where clause. Determine why an index is not used for such a selective
predicate.
A full table scan does not mean inefficiency. It might be more efficient to perform a full table scan
on a small table, or to perform a full table scan to leverage a better join method (for example,
hash_join) for the number of rows returned.
If any of these conditions are not optimal, then consider restructuring the SQL statement or the
indexes available on the tables.
Page 5 of 18
Teradata SQL Tuning
Page 6 of 18
Teradata SQL Tuning
Assuming there is a useful index on customer_number, the Query Optimizer can locate a
range of numbers much faster (using BETWEEN) than it can find a series of numbers
using the IN clause.
If LIKE is used in a WHERE clause, it is better to try to use one or more leading character
in the clause, if at all possible. For example, use:
If a leading character is used in the LIKE clause, then the Query Optimizer has the ability
to potentially use an index to perform the query thereby, speeding performance.
But if the leading character in a LIKE clause is a wildcard, the Query Optimizer will not be
able to use an index, and a table scan must be run, which reduces performance and
takes more time.
A subquery when used with an IN clause can take advantage of selectivity specified in
the subquery. This is most beneficial when the most selective filter appears in the
subquery and there are indexes on the join columns. Conversely, using EXISTS is
beneficial when the most selective filter is in the parent query. This allows the selective
predicates in the parent query to be applied before filtering the rows against the EXISTS
criteria.
Page 7 of 18
Teradata SQL Tuning
Below are two examples that demonstrate the benefits of IN and EXISTS. Both examples
use the same schema with the following characteristics:
Example 1:
This example demonstrates how rewriting a query to use IN can improve performance.
This query identifies all employees who have placed orders on behalf of customer 144.
Rewriting the statement using IN, results in significantly fewer resources used. The SQL
statement using IN:
Explanation:
In the query using EXISTS, an extra unnecessary step is being performed by the parent
query. From the sub-query, we obtain a table that has equijoined the tables employees
and orders on the basis of employee_id and sales_rep_id. Further, the table is filtered for
customer_id = 144. The obtained table is the resultant table needed but, an extra step of
comparing of employee_id is done once again when the parent query is performed.
However, in the query using IN, the subquery returns only a filtered table on the basis of
the specified customer_id and the join function is performed by the parent query. Thus,
the work performed by the query using IN clause is much less than that of the query
using the EXISTS clause.
Example 2:
Page 8 of 18
Teradata SQL Tuning
This example demonstrates how rewriting a query to use EXISTS can improve
performance. This query identifies all employees from department 80 who are sales reps
who have placed orders. The following SQL statement uses IN:
Page 9 of 18
Teradata SQL Tuning
Explanation:
In the query using IN, the subquery returns the entire orders table with the required
column data. The rest of the filtering and the joining are done in the parent query.
However, in the query using EXISTS, the subquery returns a table equijoined between
employees and orders table. The parent query then performs the filtration according to
the specified conditions. Thus, the work performed by the query using EXISTS clause is
much less than that of the query using the IN clause.
At times this clause is added to every SELECT statement, even when it is not necessary.
The DISTINCT clause should only be used in SELECT statements if it is known that
duplicate returned rows are a possibility, and that having duplicate rows in the result set
would cause problems with the requirements. The DISTINCT clause creates a lot of extra
work, and reduces the physical resources that other SQL statements have at their
disposal. Because of this, the DISTINCT clause is used only if it is necessary.
When using the UNION statement, keep in mind that, by default, it performs the
equivalent of a SELECT DISTINCT on the final result set. In other words, UNION takes
the results of two like recordsets, combines them, and then performs a SELECT
DISTINCT in order to eliminate any duplicate rows. This process occurs even if there are
no duplicate records in the final recordset. Selecting a distinct result requires building a
temporary worktable, storing all rows in it and sorting before producing the output. If it is
known that there are duplicate records, and this presents a problem for the application,
then, the UNION statement should be used to eliminate the duplicate rows.
On the other hand, if it is known that there will never be any duplicate rows, or if there
are, and this presents no problem to your application, then the UNION ALL statement
should be used instead of the UNION statement. The advantage of the UNION ALL is
that it does not perform the SELECT DISTINCT function, which saves a lot of
unnecessary resources from being used. UNION ALL requires no worktable and no
sorting. In most cases it’s much more efficient.
Page 10 of 18
Teradata SQL Tuning
One more potential problem with UNION is the danger of flooding temporary database
with a huge worktable. It may happen if a large result set is expected from a UNION
query.
Combining multiple scans into one scan can be done by moving the WHERE condition of
each scan into a CASE statement, which filters the data for the aggregation.
Example:
The following example asks for the count of all employees who earn less then 2000,
between 2000 and 4000, and more than 4000 each month. This can be done with three
separate queries:
However, it is more efficient to run the entire query in a single statement. Each number is
calculated as one column. The count uses a filter with the CASE statement to count only
the rows where the condition is valid. For example:
The DISTINCT option is used in a SELECT statement to filter out duplicate results from a
query's output. In a simple SELECT from one table this is the easiest and quickest way of
doing things. However, with a more complex query, the query can be recoded to gain a
performance advantage. Consider the following example:
Page 11 of 18
Teradata SQL Tuning
Example:
The query returns authors that have a book already published.
The same result can be obtained by writing the query in the following manner,
The second example gives a slightly better performance than the first one. The reason is
that the EXISTS clause causes a name to be returned when the first book is found, and
no further books for that author are considered (we already have the author's name, and
only want to see it only once)
On the other hand, the DISTINCT query returns one copy of the author's name for each
book the author has worked on, and the list of authors generated subsequently needs to
be examined for duplicates to satisfy the DISTINCT clause.
The DISTINCT clause involves a worktable, which does not happen in the EXISTS
clause.
A derived table is the result of using another SELECT statement in the FROM clause of a
SELECT statement. By using derived tables instead of temporary tables, the application's
performance can be boosted. Temporary tables slow the performance dramatically. The problem
with temporary tables is the amount of overhead that goes along with using them. In order to get
the fastest queries possible, the goal must be to make them do as little work as possible. The
biggest benefit of using derived tables over using temporary tables is that they require fewer
steps, and everything happens in memory instead of a combination of memory and disk. Fewer
the steps involved, along with less I/O, faster is the performance.
Page 12 of 18
Teradata SQL Tuning
Example:
Assume a table with a column charcol defined as VARCHAR(2) and is also the primary index of
the table. The following SQL query is run on this table,
Page 13 of 18
Teradata SQL Tuning
where, numexpr is an expression of numeric type. In SQL, the default implicit type conversion is
from Character to Numeric data type. Thus, in the above query charcoal is converted to numeric
data type.
Since the primary index has been defined on a column of VARCHAR data type and the implicit
conversion makes it of numeric type, a full table scan is performed. This affects the performance
of the query.
This can be avoided by specifying an explicit conversion. The reformed query is,
In this case, since the column on which the primary index has been defined, remains of the same
type, a single AMP data retrieval takes place. This improvises the performance of the query.
3. Indexes
3.1 Secondary Indexes
Secondary Indexes (SI) supply alternate access paths and the use of appropriate secondary
indexes can increase retrieval performance. For best results, secondary indexes should be based
on frequently used set selections and on equality search.
A table can have up to 32 Secondary Indexes that can be created and dropped dynamically.
However, it is not a good idea to create a number of SIs for each table just to speed up set
selection because SIs consume the following extra resources:
• SIs require additional storage to hold their subtables. In the case of a Fallback table, the
SI subtables are Fallback also. Twice the additional storage space is required.
• SIs require additional I/O to maintain these subtables.
When deciding whether or not to define a NUSI, there are other considerations. The
Optimizer may choose to do a Full Table Scan rather than utilize the NUSI in two cases:
As a guideline, choose only those rows having frequent access as NUSI candidates. After the
table has been loaded, create the NUSI indexes, COLLECT STATISTICS on the indexes, and
then do an EXPLAIN referencing each NUSI. If the Parser chooses a Full Table Scan over using
the NUSI, drop the index.
Page 14 of 18
Teradata SQL Tuning
A single table join index is very useful for resolving joins on large tables without having to
redistribute the joined rows across the AMPs. A single-table join index partitions all or a
subset of a base table using a primary index based on the table’s foreign key (preferably
the primary index of the table to which it is to be joined).
An aggregate join index can be defined on two or more tables, or on a single table. A
single-table aggregate join index includes:
• SUM function
• COUNT function
• GROUP BY clause
Sparse indexes index a portion of the table using WHERE clause predicates to limit the
rows indexed. Allowing constant expressions in the WHERE clause of the CREATE JOIN
INDEX statement limits the rows that are included in the join index to a subset of the rows
in the table based on an SQL query result. This capability in effect allows creation of
sparse indexes. When base tables are large, this feature can be used to reduce the
content of the join index to only the portion of the table that is frequently used if the
typical query only references a portion of the rows.
A sparse index can focus on the portions of the tables that are most frequently used. This
capability:
Page 15 of 18
Teradata SQL Tuning
A Global Index is a term used to define a join index that contains the Row IDs of the base
table rows. Queries may use the join index to qualify a few rows, then refer to the base
tables to obtain requested columns that aren't stored in the join index. Such queries are
said to be partially-covered by the index. This is referred to as a partially-covered global
index.
Because the RDBMS supports multi-table, partially-covering join indexes, all types of join
indexes, except the aggregate join index, can be joined to their base tables to retrieve
columns that are referenced by a query but are not stored in the join index.
A partial-covering join index takes less space than a covering join index. Not all columns
that are involved in a query selection condition have to be stored in a partial-covering join
index. The benefits are:
• Disk storage space for the join index decreases when fewer columns are
stored in the join index.
• Performance increases when the number of selection conditions that can
be evaluated on the join index increases.
To provide the Optimizer with the information needed to generate the best plans, you
need to have collected statistics on the primary index columns of each join index.
Consider collecting statistics to improve performance during:
Join indexes, like secondary indexes, incur both space and maintenance costs. For
example, insert, update, and delete operations must be performed twice, once for the
base table and once for the join index. However, if join indexes are suited to your
applications, the improvements in query performance can far outweigh the costs.
MultiLoad and FastLoad utilities cannot be used to load or unload data into base tables
that have an associated join index defined on them because join indexes are not
maintained during the execution of these utilities. If an error occurs, the join index must
be dropped and recreated after that table has been loaded. The TPump utility, which
perform standard SQL row inserts and updates, can be used because join indexes are
properly maintained during the execution of such utilities.
Page 16 of 18
Teradata SQL Tuning
A Partitioned Primary Index (PPI) allows the data rows of a table to be:
• Hash partitioned to the AMPs by the hash of the primary index columns
• Partitioned on some set of columns on each AMP
• Ordered by the hash of the primary index columns within that partition
Two functions, RANGE_N and CASE_N, can be used to simplify the specification of a partitioning
expression.
PPI improves performance as follows:
• Automatic optimization occurs for queries that specify a restrictive condition on the
partitioning column.
• Uses partition elimination to improve the efficiency of range searches when, for example,
the searches are range partitioned.
• Only the rows of the qualified partitions in a query need to be accessed avoid full table
scans.
• Provides an access path to the rows in the base table while still providing efficient join
strategies
• If the same partition is consistently targeted, the part of the table updated may be able to
fit largely in cache, significantly boosting performance
Benefits that are the result of using PPI vary based on:
• The PI access disadvantage occurs only when the partitioning column is not part of the
PI. In this situation, a query specifying a PI value, but no value for the partitioning column,
must look in each partition for that value, instead of positioning directly to the first row for
the PI value.
• The direct join disadvantage occurs when another table with the same PI is joined with an
equality condition on every PI column. For two non-PPI tables, the rows of the two tables
will be ordered the same, and the join can be performed directly. If one of the tables is
partitioned, the rows won't be ordered the same, and the task, in effect, becomes a set of
sub-joins, one for each partition of the PPI table.
Page 17 of 18
Teradata SQL Tuning
…with no confidence
• Conditions outside the above.
The execution strategy provided by the EXPLAIN facility gives a direct feedback on what steps
the optimizer chooses to do. Studying EXPLAIN outputs is an excellent way to know the inner
workings of the Teradata DBS and how it handles SQL. Use of EXPLAIN facility regularly, saves
a lot of time and computing resources by pointing out problems in SQL statements before they
are actually run.
Page 18 of 18