Teradata Faq

1. What are the reasons for product joins ? 1. 2. 3. 4. 5. 6.
Stale or no stats causing optimizer to use product join Improper usage of aliases in the query. missing where clause ( or Cartesian product join 1=1 ) non equality conditions like > ,< , between example ( date) few join conditions when or conditions are used.
2. What are advantages of compression on tables? - They take less physical space then uncompressed columns hence reducing space c ost - They improve system performance as less data will be retrieved per row fetche d , more data is fetched per data block thus increasing data loading speed - They reduce overall I/O 3.How many error tables are there in fload and Mload and what is their significa nce/use? 1. ET ET It Fload uses 2 error tables TABLE 1: where format of data is not correct. TABLE 2: violations of UPI maintains only error field name, error code and data parcel only.
2. Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table 1. ET TABLE - Data error MultiLoad uses the ET table, also called the Acquisition Phase error table, to s tore data errors found during the acquisition phase of a MultiLoad import task. 2. UV TABLE - UPI violations MultiLoad uses the UV table, also called the Application Phase error table, to s tore data errors found during the application phase of a MultiLoad import or del ete task Apart from error tables, it also has work and log tables 3. WORK TABLE - WT Mload loads the selected records in the work table 4. LOG TABLE A log table maintains record of all checkpoints related to the load job, it is e ssential/mandatory to specify a log table in mload job. This table will be usefu l in case you have a job abort or restart due to any reason. What are the difference types of temporary tables in Teradata? Answers: a. b. c. Global temporary tables Volatile temporary tables Derived tables
Global Temporary tables (GTT) 1. When they are created, its definition goes into Data Dictionary. 2. When materialized data goes in temp space. 3. That's why, data is active up to the session ends, and definition will remain there up-to its not dropped using Drop table statement. If dropped from some ot her session then its should be Drop table all; 4. You can collect stats on GTT.
5. Defined with the CREATE GLOBAL TEMPORARY TABLE sql Volatile Temporary tables (VTT) 1. Local to a session (deleted automatically when the session terminates) 2. Table Definition is stored in System cache .A permanent table definition is s tored in the DBC data dictionary database (DBC.Temptables) . 3. Data is stored in spool space. 4. That s why; data and table definition both are active only up to session ends. 5. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level (while creating table) 6. Created by the CREATE VOLATILE TABLE sql statement Derived tables 1 Derived tables are local to an SQL query. 2 Not included in the DBC data dictionary database, the definition is kept in ca che. 3 They are specified on a query level with an AS keyword in an sql statement explain plan 1. 2. 3. 4. 5. 6. 7. 8. 9. 10 English version on optimizer plan to identify the objects used and kind of locks applied on those objects to identify the number of amp operation eg. single ot group amp to identify the data conversion in differnt datatype join cols to identify translation of charater set on join columns to indentify the level of confidence to identify the output estimated rows to indentify the kind of joins used to process the request to identify the duplication/redistribution of tables to identify the dynamic partition elimination if the table has PPI
High confidence: Means the optimizer know about the no of rows that would be ret urned as a result of that step. Examples would be, when PI statistics exist, wh en the column or range stats exist or no join involved. Low confidence: Some stats available. Join and stats available on both sides of the join. No confidence: No stats available. Join involved. diagnostic helpstats on for session; -- database recommend which statistics to c ollect What is the difference between MultiLoad & Fastload interns of Performance? Answers: If you want to load, empty table then you use the fastload, so it will very usef ul than the MultiLoad ,because fastload performs the loading of the data in 2pha se and its no need a work table for loading the data . So it is faster as well as it follows the below steps to load the data in the ta ble Phase1 - It moves all the records to the entire AMP first without any hashing Phase2 - After giving end loading command, Amp will hashes the record and send i t to the appropriate AMPS. MultiLoad: It does the loading in the 5 phases
Phase1 Phase2 Phase3 Phase4 Phase5
It It In In In
will get the import file and checks the script reads the record from the base table and store in the work table this acquisition phase it locks the table header the DML operation will done in the tables this table locks will be released and work tables will be dropped.
Types of Teradata Joins Teradata joins When we join two or more tables on a column or set of columns, Joining takes pla ce. This will result in data resulting from matching records in both the tables. This Universal concept remains the same for all the databases. In Teradata, we have Optimizer (a very smart Interpreter), which determines type of join strategy to be used based on user input taking performance factor in mi nd. In Teradata, - Inner join - Outer Join - Cross join some of common join types are used like (can also be "self join" in some cases) (Left, Right, Full) (Cartesian product join)
When User provides join query, optimizer will come up with join plans to perform joins. These Join strategies include - Merge Join - Nested Join - Hash Join - Product join - Exclusion Join
Merge Join -------------------Merge join is a concept in which rows to be joined must be present in same AMP. If the rows to be joined are not on the same AMP, Teradata will either redistrib ute the data or duplicate the data in spool to make that happen based on row has h of the columns involved in the joins WHERE Clause. If two tables to be joined have same primary Index, then the records will be present in Same AMP and Re-Distribution of records is not required. There are four scenarios in which redistribution can happen for Merge Join Case 1: If joining columns are on UPI = UPI, the records to be joined are presen t in Same AMP and redistribution is not required. This is most efficient and fas test join strategy Case 2: If joining columns are on UPI = Non Index column, the records in 2nd tab le has to be redistributed on AMP's based on data corresponding to first table. Case 3: If joining columns are on Non Index column = Non Index column , the both the tables are to be redistributed so that matching data lies on same amp , so the join can happen on redistributed data. This strategy is time consuming sinc e complete redistribution of both the tables takes across all the amps Case 4: For join happening on Primary Index, If the Referenced table (second tab le in the join) is very small, then this table is duplicated /copied on to every
AMP. Nested Join ------------------Nested Join is one of the most precise join plans suggested by Optimizer .Nest ed Join works on UPI/USI used in Join statement and is used to retrieve the sing le row from first table . It then checks for one more matching rows in second ta ble based on being used in the join using an index (primary or secondary) and re turns the matching results. Example: Select EMP.Ename , DEP.Deptno, EMP.salary from EMPLOYEE EMP , DEPARTMENT DEP Where EMP.Enum = DEP.Enum and EMp.Enum= 2345; -- this results in nested join Hash join ---------------Hash join is one of the plans suggested by Optimizer based on joining conditions . We can say Hash Join to be close relative of Merge based on its functionality. In case of merge join, joining would happen in same amp. In Hash Join, one or both tables which are on same amp are fit completely inside the AMP's Memory . Amp chooses to hold small tables in its memory for joins happening on ROW hash . Advantages of Hash joins are 1. They are faster than Merge joins since the large table doesn t need to be sorte d. 2. Since the join happening b/w table in AMP memory and table in unsorted spool, it happens so quickly. Exclusion Join ------------------------These type of joins are suggested by optimizer when following are used in the qu eries - NOT IN - EXCEPT - MINUS - SET subtraction operations
Select EMP.Ename , DEP.Deptno, EMP.salary from EMPLOYEE EMP WHERE EMP.Enum NOT IN ( Select Enum from DEPARTMENT DEP where Enum is NOT NULL );
Please make sure to add an additional WHERE filter with <column> IS NOT NULL usage of NULL in a NOT IN <column> list will return no results.
since
Exclusion join for following NOT In query has 3 scenarios Case 1: matched data in "NOT IN" sub Query will disqualify that row Case 2: Non-matched data in "NOT IN" sub Query will qualify that row Case 3: Any Unknown result in "NOT IN" will disqualify that row - ('NULL' is a t ypical example of this scenario).
Performance tuning thumb rules. Here are very basic steps which are used to PT any given query in given environm ent . As a pre-requiste , make sure - user has proper select rights and actual profile settings - Enough space available to run and test the queries 1. Run explain plan (pressing F6 or EXPLAIN sel * Then see for potential information like - No or low confidence - Product joins conditions - By way of an all row scan - FTS - Translate ,)
Also check for - Distinct or group by keywords in SQL query - In/ not in keywords and check for the list of values generated for the same APPROACHES A. In case of product join scenarios,check for - Proper usage of alias - joining on matching columns - Usage of join keywords - like specifying type of joins (ex. inner or outer ) - use union in case of "OR scenarios - Ensure statistics are collected on join columns and this is especially importa nt if the columns you are joining on are not unique. B. collects stats - Run command "diagnostic help stats on for the session" - Gather information on columns on which stats has to be collected - Collect stats on suggestions columns - Also check for stats missing on PI, SI or columns used in joins - "help stats <databasename>.<tablename> - Make sure stats are re-collected when at-least 10% of data changes - remove unwanted stats or stat which hardly improves performance of the queries - Collect stats on columns instead of indexes since index dropped will drop stat s as well!! - collect stats on index having multiple columns, this might be helpful when the se columns are used in join conditions
- Check if stats are re-created for tables whose structures have some changes c. Full table scan scenarios - Try to avoid FTS scenarios as, it might take very long time to access all the data in every amp in the system - Make sure SI is defined on the columns which are used as part of joins or Alte rnate access path. - Collect stats on SI columns else there are chances where optimizer might go fo r FTS even when SI is defined on that particular column 2. If intermediate tables are used to store results, make sure that - It has same PI of source and destination table 3. Tune to get the optimizer to join on the Primary Index of the largest table, when possible, to ensure that the large table is not redistributed on AMPS 4. For large list of values, avoid using IN /NOT IN in SQLs. Write large list va lues to a temporary table and use this table in the query 5. Make sure when to use exists/not exists condition since they ignore unknown c omparisons (ex. - NULL value in the column results in unknown) . Hence this lead s to inconsistent results 6. Inner Vs Outer Joins Check which join works efficiently in given scenarios.Some examples are - Outer joins can be used in case of large table joining with small tables (like fact table joining with Dimension table based on reference column) - Inner joins can be used when we get actual data and no extra data is loaded in to spool for processing Please note for outer join conditions: 1. Filter condition for inner table should be present in "ON" condition 2. Filter condition for outer table should be present in "WHERE" condition What are the scenarios in which Full Table Scans occurs? 1. The where clause in ondary index 2. SQL Statement which tement. 3. SQL Statement which 4. SQL statement using SELECT statement does not use either primary index or sec uses a partial value (like or not like), in the WHERE sta does not contain where clause. range in where clause. Ex. (col1 > 40 or col1 < =10000)
How many types of Index are present in teradata? Answer: There are 5 different indices present in Teradata 1. Primary Index a.Unique primary index b. non Unique primary index 2. Secondary Index a. Unique Secondary index
b. non Unique Secondary index 3. Partitioned Primary Index a. Case partition (ex. age, salary...) b. range partition ( ex. date) 4. Join index a. Single table join index b. multiple table join index c. Sparse Join index ( constraint applied on join index in where clause) 5. Hash index

Teradata Faq

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teradata Faq

Uploaded by

Copyright:

Available Formats

1. What are the reasons for product joins ? 1. 2. 3. 4. 5. 6.

Phase1 Phase2 Phase3 Phase4 Phase5

You might also like