Teradata Performance Tuning

Teradata Performance Tuning - Basic Tips
Performance tuning thumb rules.

Here are very basic steps which are used to PT any given query in given environment . As a pre-requiste , make sure
- user has proper select rights and actual profile settings
- Enough space available to run and test the queries

1. Run explain plan (pressing F6 or EXPLAIN sel * ,)
Then see for potential information like
- No or low confidence
- Product joins conditions
- By way of an all row scan - FTS
- Translate

Also check for
- Distinct or group by keywords in SQL query
- In/ not in keywords and check for the list of values generated for the same

APPROACHES

A. In case of product join scenarios,check for
- Proper usage of alias
- joining on matching columns
- Usage of join keywords - like specifying type of joins (ex. inner or outer )
- use union in case of "OR scenarios
- Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.

B. collects stats
- Run command "diagnostic help stats on for the session"
- Gather information on columns on which stats has to be collected
- Collect stats on suggestions columns
- Also check for stats missing on PI, SI or columns used in joins - "help stats <databasename>.<tablename>
- Make sure stats are re-collected when at-least 10% of data changes
- remove unwanted stats or stat which hardly improves performance of the queries
- Collect stats on columns instead of indexes since index dropped will drop stats as well!!
- collect stats on index having multiple columns, this might be helpful when these columns are used in join conditions
- Check if stats are re-created for tables whose structures have some changes

c. Full table scan scenarios
- Try to avoid FTS scenarios as, it might take very long time to access all the data in every amp in the system
- Make sure SI is defined on the columns which are used as part of joins or Alternate access path.
- Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular
column

2. If intermediate tables are used to store results, make sure that
- It has same PI of source and destination table

3. Tune to get the optimizer to join on the Primary Index of the largest table, when possible, to ensure that the large table is not
redistributed on AMPS

4. For large list of values, avoid using IN /NOT IN in SQLs. Write large list values to a temporary table and use this table in
the query

5. Make sure when to use exists/not exists condition since they ignore unknown comparisons (ex. - NULL value in the column
results in unknown) . Hence this leads to inconsistent results

6. Inner Vs Outer Joins
Check which join works efficiently in given scenarios.Some examples are
- Outer joins can be used in case of large table joining with small tables (like fact table joining with Dimension table based on
reference column)
- Inner joins can be used when we get actual data and no extra data is loaded into spool for processing
Please note for outer join conditions:
1. Filter condition for inner table should be present in "ON" condition
2. Filter condition for outer table should be present in "WHERE" condition
Teradata SQL Query Optimization Or Performance Tuning
SQL and Indexes :

1) Primary indexes: Use primary indexes for joins whenever possible, and specify in the where clause all the columns for the
primary indexes.

2) Secondary indexes (10% rule rumor): The optimizer does not actually use a 10% rule to determine if a secondary index will
be used. But, this is a good estimation: If less than 10% of a table will be accessed if the secondary index is used, then assume the
sql will use the secondary index. Otherwise, the sql execution will do a full table scan.

The optimizer actually uses a least cost method: The optimizer determines if the cost of using a secondary index is cheaper
than the cost of doing a full table scan. The cost involves the cpu usage, and diskio counts.

3) Constants: Use constants to specify index column contents whenever possible, instead of specifying the constant once, and
joining the tables. This may provide a small savings on performance.

4) Mathematical operations: Mathematical operations are faster than string operations (i.e. concatenation), if both can achieve
the same result.

5) Variable length columns: The use of variable length columns should be minimized, and should be by exception. Fixed length
columns should always be used to define tables.

6) Union: The union command can be used to break up a large sql process or statement into several smaller sql processes or
statements, which would run in parallel. But these could then cause spoolspace limit problems. Union all executes the
sqls single threaded.

7) Where in/where not in (subquery): The sql where in is more efficient than the sql where not in. It is more efficient to
specify constants in these, but if a subquery is specified, then the subquery has a direct impact on the sql time.

If there is a sql time problem with the subquery, then the sql subquery could be separated from the original query. This woul d
require 2 sql statements, and an intermediate table. The 2 sql statements would be: 1) New sql statement, which does the
previous subquery function, and inserts into the temporary table, and 2) Modified original sql statement, which doesnt have the
subquery, and reads the temporary table.

8) Strategic Semicolon: At the end of every sql statement, there is a semicolon. In some cases, the strategic placement of this
semicolon can improve the sql time of a group of sql statements. But this will not improve an individual sql statements time.
These are a couple cases: 1) The groups sql time could be improved if a group of sql statements share the same tables (or spool
files), 2) The groups sql time could be improved if several sql statements use the same unix input file.

Reducing Large SQLs :

The following methods can be used to scope down the size of sqls.

1) Table denormalization: Duplicating data in another table. This provides faster access to the duplicated data, but requires
more update time.

2) Table summarization: The data from one/many table(s) is summarized into commonly used summary tables. This provides
faster access to the summarized data, but requires more update time.

3) SQL union: The DBC/SQL Union can be used to break up a large SQL process or statement into several smaller SQL
processes or statements, which would run in parallel.

4) Unix split: A large input unix files could be split into several smaller unix files, which could then be input in series, or in
parallel, to create smaller SQL processing steps.

5) Unix concatenation: A large query could be broken up into smaller independent queries, whose output is written to several
smaller unix files. Then these smaller files are unix concatenated together to provide asingle unix file.

6) Trigger tables: A group of tables, each contains a subset of the keys of the index of an original table. the tables could be
created based on some value in the index of the original table. This provides an ability to break up a large SQL statement into
multiple smaller SQL statements, but creating the trigger tables requires more update time.

7) Sorts (order by): Although sorts take time, these are always done at the end of the query, and the sort time is directly
dependent on the size of the solution. Unnecessary sorts could be eliminated.

8) Export/Load: Table data could be exported (Bteq, Fastexport) to a unix file, and updated, and then reloaded into the table
(Bteq, fastload, Multiload).

9) C PROGRAM/UNIX SCRIPTS: Some data manipulation is very difficult and time consuming in sql. These could be
replaced with c programs/unix scripts. See the C/Embedded sql tip.

Reducing Table Update Time :

1) Table update time can be improved by dropping the tables indexes first, and then doing the updates.After the completion of
the updates, then rebuild the indexes, and recollect the tables statistics on the indexes. The best improvement is obtained when
the volume of table updates is large in relation to the sizeof the table. If more then 5% of a large table is changed.

2) Try to avoid dropping a table, instead, delete the table. Table related statements (i.e. create table, drop table)
are single threaded thru a system permissions table and become a bottleneck. They can also cause deadlocks on the dictionary
tables. Also, any user permissions specific to the table are dropped when the table is dropped, and these permissions must be
recreated.

Teradata Performance Tuning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teradata Performance Tuning

Uploaded by

Copyright:

Available Formats

Teradata Performance Tuning - Basic Tips

Performance tuning thumb rules.

You might also like