Introduction To Teradata and Teradata Architecture

Introduction to Teradata and Teradata
Architecture
Agenda
Teradata Overview
Teradata Components, Architecture and Teradata Features
Teradata Data Dictionary and Objects
Teradata Database, User and Space
Teradata Data Recovery and Protection
Teradata Overview
Teradata Overview
Teradata is a Relational Database Management System (RDBMS)
Designed specifically for DW and DSS
Open, UNIX-MP-RAS or NT-based system platforms
Compliant with ANSI industry standards
Runs on single (SMP) or multiple (MPP) nodes
Database server to client applications throughout the enterprise
Uses parallelism to manage terabytes of data
Shared nothing Architecture
Teradata Overview
Enormous capacity
Billions of rows
Terabytes of data
High-performance parallel processing
Single database server for multiple clients Single Version of the Truth
Network and mainframe connectivity
Industry standard access language (SQL)
Manageable growth via modularity
Fault tolerance at all levels of hardware and software
Teradata Components, Architecture and Features
Components and Architecture
Teradata Functional Overview
Teradata Features
Teradata Data Dictionary and Objects
The DD/D ...
is an integrated set of system tables
contains definitions of and information about all objects in the system
is entirely maintained by the RDBMS
is data about the data or metadata
is distributed across all AMPs like all tables
may be queried by administrators or support staff
is accessed via Teradata supplied views
Examples of DD/D views:

DBC.Tables - information about all tables
DBC.Users - information about all users
DBC.AllRights
- information about access rights
DBC.AllSpace
- information about space utilization
HELP Commands Teradata SQL Extension

Databases and Users:
HELP
DATABASE
HELP
USER
Customer_Service ;
Dave_Jones ;
Tables, Views, and Macros:

HELP
TABLE
Employee ;
HELP
VIEW
Emp;
HELP
MACRO Payroll_3;
HELP
COLUMN Employee.*;
Employee.last_name;
Emp.* ;
Emp.last;
HELP
INDEX
Employee;
HELP
STATISTICS
HELP
CONSTRAINT
Employee;
Employee.over_21;
Example of HELP DATABASE

Command:
HELP DATABASE Customer_Service;

*** Help information returned. 15 rows.
*** Total elapsed time was 1 second.
Table/View/Macro name
Contact T
?
Customer T
?
Cust_Comp_Orders V
Cust_Pend_Orders V
Cust_Order_ix I
?
Department
T
?
Employee T
?
Employee_Phone
T
Job T
?
Location T
?
Location_Employee T
Location_Phone
T
Orders
T
?
Set_Ansidate_on
M
Set_Integerdate_on M
Kind Comment
?
?
?
?
?
?
?
SHOW Command Teradata SQL Extension

SHOW commands display how an object was created.
Command
Returns statement
SHOW TABLE tablename;
CREATE TABLE statement
SHOW VIEW
viewname;CREATE VIEW ...
SHOW MACRO macroname;
CREATE MACRO ...
SHOW TRIGGER
triggername;
CREATE TRIGGER
SHOW PROCEDURE procedurename;
CREATE PROCEDURE
SHOW TABLE Employee;
CREATE SET TABLE CUSTOMER_SERVICE.Employee, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
employee_number INTEGER,
manager_employee_number INTEGER,
department_number INTEGER,
job_code INTEGER,
:
salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number );
EXPLAIN Facility Teradata SQL Extension

The EXPLAIN modifier in front of any SQL statement generates an English translation of the Parsers
plan.
The request is fully parsed and optimized, but not actually executed.
EXPLAIN returns:
Text showing how a statement will be processed (a plan)
An estimate of how many rows will be involved
A relative cost of the request (in units of time)
This information is useful for:

predicting row counts
predicting performance
testing queries before production
analyzing various approaches to a problem EXPLAIN
EXPLAIN SELECT last_name, department_number FROM Employee ;

Explanation (partial):
3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.Employee by way of an all-rows
scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of
Spool 1 is estimated to be 24 rows. The estimated time for this step is 0.15 seconds.
Teradata Database, User and Space
Storing and Accessing Data Rows
Storing Rows
AMP
AMP
AMP
AMP
Table A rows
Table B rows
The rows of every table are distributed among all AMPs
Each AMP is responsible for a subset of the rows of each table.
Ideally, each table will be evenly distributed among all AMPs.
Evenly distributed tables result in evenly distributed workloads.
The uniformity of distribution of the rows of a table depends on the choice of the
Primary Index.
Creating a Primary Index
A Primary Index is defined at table creation.
It may consist of a single column, or a combination of columns
Limit of 16 columns with V2R4.1 and prior releases
Limit of 64 columns with V2R5.
UPI
NUPI
CREATE TABLE sample_1

(col_a
INTEGER
,col_b
INTEGER
,col_c
INTEGER)
UNIQUE PRIMARY INDEX (col_b);
If the index choice of column(s) is unique,

we call this a UPI (Unique Primary Index).

(col_x
INTEGER
,col_y
INTEGER
,col_z
INTEGER)
PRIMARY INDEX (col_x);
If the index choice of column(s) isnt unique,

we call this a NUPI (Non-Unique Primary
Index).
Note: Changing the choice of Primary Index

requires dropping and recreating the table.
A UPI choice will result in even distribution

of the rows of the table across all AMPs.
A NUPI choice will result in even distribution

of the rows of the table proportional to the
degree of uniqueness of the index.
Primary Index Values
The value of the Primary Index for a specific row determines the AMP assignment for
that row.
This is done using a hashing algorithm.
PE
Row assignment
Row access
AMP
Secondary index access

Full table scans
Hashing
Algorithm
AMP
Accessing the row by its Primary Index value is:
Other table access techniques:
PI Value
always a one-AMP operation

the most efficient way to access a row
AMP
Accessing Via a Unique Primary Index

A UPI access is a one-AMP operation which may access at most a single row.
(col_a
INTEGER
,col_b
INTEGER
,col_c
INTEGER)
UNIQUE PRIMARY INDEX (col_b);
PE
SELECT col_a
,col_b
,col_c
FROM
sample_1
WHERE col_b = 345;
UPI = 345
Hashing
Algorithm
AMP
AMP
col_a col_b
col_c
AMP
col_a col_b
col_c
col_a col_b
123
345
567
234
456
678
col_c
Accessing Via a Non-Unique Primary Index

A NUPI access is a one-AMP operation which may access multiple rows.
(col_x
INTEGER
,col_y
INTEGER
,col_z
INTEGER)
PRIMARY INDEX (col_x);
PE
NUPI = 25
SELECT col_x
,col_y
,col_z
FROM
sample_2
WHERE col_x = 25;
Hashing
Algorithm
AMP
Both UPI and NUPI

accesses are one
AMP operations.
AMP
col_x col_y
col_z
AMP
col_x col_y
col_z
col_x col_y
col_z
10
30
20
50
70
10
30
25
55
30
80
35
40
25
60
30
80
Primary Keys and Primary Indexes
Indexes are conceptually different from keys.
A PK is a relational modeling convention which allows each row to be uniquely identified.
A PI is a Teradata convention which determines how the row will be stored and accessed.
A significant percentage of tables may use the same columns for both the PK and the PI.
A well-designed database will use a PI that is different from the PK for some tables.
Primary Key
Primary Index
Logical concept of data modeling
Physical mechanism for access and storage
Teradata doesnt need to recognize
Each table must have exactly one primary index
No limit on number of columns
16 column limit (V2R4.1); 64 column limit (V2R5)
Documented in data model
Defined in CREATE TABLE statement
(Optional in CREATE TABLE)

Must be unique
May be unique or non-unique
Identifies each row
May be unique or non-unique
Values should not change
Values may be changed (Delete + Insert)
May not be NULL requires a value
May be NULL
Does not imply an access path
Defines most efficient access path
Chosen for logical correctness
Chosen for physical performance
Duplicate Rows
A duplicate row is a row of a table
whose column values are all identical to
another row in the same table.
Duplicate Rows
col_a col_b
col_c
20
50
25
50
25
50
Because a PK uniquely identifies each row, ideally a relational table should not have
duplicate rows!
The ANSI standard, however, permits duplicate rows for specialized situations, thus
Teradata permits them as well.
You may select whether your table will or will not allow them.
The Teradata default
The ANSI default
CREATE SET TABLE table_A

:
:
CREATE MULTISET TABLE table_B

:
:
Checks for * and disallows duplicate rows.
Doesnt check for and allows duplicate rows.
* Note: If a UPI is selected on a SET table, the duplicate row check is replaced by a
check for duplicate index values.
Row Distribution Using a UPI Case 1
Order
Notes:
Often, but not always, the PK column(s) will be
used as a UPI.
PI values for Order_Number are known to be
unique (its a PK).
Teradata will distribute different index values
evenly across all AMPs.
Resulting row distribution among AMPs is very
uniform.
Assures maximum efficiency for parallel
operations.
AMP
o_#
c_#
7202
7415
AMP
o_dt o_st
o_#
c_#
4/09
7325
4/13
AMP
o_dt o_st
o_#
c_#
4/13
7188
7103
4/10
7225
7402
4/16
AMP
o_dt o_st
o_#
c_#
o_dt o_st
4/13
7324
4/13
4/15
7384
4/12
Row Distribution Using a NUPI Case 2
Order
Notes:
Customer_Number may be the preferred access
column for ORDER table, thus a good index
candidate.
Values for Customer_Number are somewhat nonunique.
Choice of Customer_Number is therefore a NUPI.
Rows with the same PI value distribute to the same
AMP.
Row distribution is less uniform or skewed.
AMP
o_#
c_#
7325
AMP
o_dt o_st
o_#
c_#
4/13
7384
7202
4/09
7225
4/15
AMP
AMP
o_dt o_st
o_#
c_#
o_dt o_st
4/12
7402
4/16
7103
4/10
7324
4/13
7415
4/13
7188
4/13
Row Distribution Using a Highly Non-Unique Primary Index

(NUPI) Case 3
Order
Notes:
Values for Order_Status are highly nonunique.
Choice of Order_Status column is a NUPI.
Only two values exist, so only two AMPs
will ever be used for this table.
Table will not perform well in parallel
operations.
Highly non-unique columns are poor PI
choices generally.
The degree of uniqueness is critical to
efficiency.
AMP
AMP
AMP
AMP
o_dt o_st
o_#
c_#
4/16
7103
4/10
7202
4/09
7324
4/13
7225
4/15
7325
4/13
7415
4/13
7188
4/13
7384
4/12
o_#
c_#
7402
o_dt o_st
Primary Index
Mechanics
Hashing Primary Index Values

SQL with primary index values
and data.
PARSER
PI value = 38
Hashing
Algorithm
For example:
Assume PI value is 38
Hashing
Algorithm
Row Hash
DSW
PI values
and data
1177 7C3C
DSW
Message Passing Layer (Hash Maps)

AMP 0
AMP 1
...
...
AMP x
AMP n - 1
Hash Maps
AMP n
AMP #
Data Table
Row ID
Row Hash
Row Data
Uniq Value
The MPL uses the DSW of 1177

and uses this value to locate
bucket #1177 in the Hash Map.
x '00000000'
RH
Data
x'1177 7C3C' 0000 0001
x 'FFFFFFFF'
Summary
38
Bucket# 1177 contains the AMP

number that has this hash
value effectively the AMP with
this row.
Hashing Down to the AMPs

Index value(s)
hashing algorithm
Row Hash
DSW or
Hash Bucket #
Hash Map
AMP #
{
{
{
{
The hashing algorithm is designed to insure even distribution of

unique values across all AMPs.
Different hashing algorithms are used for different international
character sets.
A Row Hash is the 32-bit result of applying a hashing algorithm to

an index value.
The DSW or Hash Bucket is represented by the high order 16 bits
of the Row Hash.
A Hash Map is uniquely configured for each system.

It is a array of 65,536 entries (buckets) which associates bucket
numbers with specific AMPs.
Two systems with the same number of AMPs will have the same
Hash Map.
Changing the number of AMPs in a system requires a change to
the Hash Map.
Identifying Rows
A row hash is not adequate to uniquely identify a row.
Consideration #1
1254
A Row Hash = 32 bits = 4.2 billion possible values
Because there is an infinite number of possible
data values, some data values will have to share
the same row hash.
7769
Data values input
Hash Algorithm
10A2 2936
10A2 2936
Hash Synonyms
(Dave)
'Smith'
NUPI Duplicates
Consideration #2
A Primary Index may be non-unique (NUPI).
(John)
'Smith'
Different rows will have the same PI value and thus

the same row hash.
Hash Algorithm
0016 5557
Conclusion
A row hash is not adequate to uniquely identify a row.
0016 5557
Rows have
same hash
The Row ID
TO UNIQUELY IDENTIFY A ROW, WE ADD A 32-BIT UNIQUENESS VALUE.
THE COMBINED ROW HASH AND UNIQUENESS VALUE IS CALLED A ROW ID.
Row ID
Each stored row
has a Row ID as a
prefix.
Rows are logically

maintained in Row
ID sequence.
Row Hash
(32 bits)
Uniqueness Id
(32 bits)
Row ID
Row Data
Row ID
Row Data
Row Hash
Unique ID
Emp_No
3B11 5032
3B11 5032
3B11 5032
3B11 5033
3B11 5034
3B11 5034
:
0000 0001
0000 0002
0000 0003
0000 0001
0000 0001
0000 0002
:
1018
1020
1031
1014
1012
1021
:
Last_Name
Reynolds
Davidson
Green
Jacobs
Chevas
Carnet
:
First_Name
Jane
Evan
Jason
Paul
Jose
Jean
:
Secondary Indexes and Table Scans
Secondary Indexes
There are 3 general ways to access a table:
Primary Index access (one AMP access)
Secondary Index access
Full Table Scan
(two or all AMP access)
(all AMP access)
A secondary Index provides an alternate path to the rows of a table.

A table can have from 0 to 32 secondary indexes.
Secondary Indexes:
Do not effect table distribution.
Add overhead, both in terms of disk space and maintenance.
May be added or dropped dynamically as needed.
Are chosen to improve table performance.
Choosing a Secondary Index

A Secondary Index may be defined ...
at table creation
(CREATE TABLE)
following table creation
(CREATE INDEX)
may be up to 16 columns (V2R4.1); 64 columns (V2R5.0)

NUSI
USI
If the index choice of column(s) is
unique, it is called a USI.
Unique Secondary Index)
If the index choice of column(s) is nonunique, it is called a NUSI.

Non-Unique Secondary Index
Accessing a row via a USI is a 2 AMP

operation.
Accessing row(s) via a NUSI is an all AMP

operation.
CREATE UNIQUE INDEX
CREATE INDEX
(Employee_Number) ON Employee;
(Last_Name) ON Employee;
Notes:
Secondary Indexes cause an internal sub-table to be built.
Dropping the index causes the sub-table to be deleted.
Unique Secondary Index (USI) Access

Message Passing Layer
Create USI
CREATE UNIQUE INDEX
(Cust) ON Customer;
AMP 1
AMP 2
USI Subtable
Access via USI
RowID
244, 1
505, 1
744, 4
757, 1
SELECT *
FROM
Customer
WHERE Cust = 56;
Cust
74
77
51
27
RowID
884, 1
639, 1
915, 9
388, 1
AMP 3
USI Subtable
RowID
135, 1
296, 1
602, 1
969, 1
Cust
98
84
56
49
AMP 1
778
AMP 2
Base Table
Table ID
100
Row Hash USI Value

602
to MPL
56
Cust
31
40
45
95
RowID
638, 1
640, 1
471, 1
778, 3
RowID
175, 1
489, 1
838, 4
919, 1
Cust
37
72
12
62
RowID
107, 1
717, 2
147, 2
822, 1
USI Value = 56
Hashing
Algorithm
RowID
288, 1
339, 1
372, 2
588, 1
USI Subtable
Row Hash Unique Val
100
Customer
Table ID = 100
USI Subtable
RowID
555, 6
536, 5
778, 7
147, 1
Table ID
PE
AMP 4
RowID Cust Name

USI
107, 1 37 White
536, 5 84 Rice
638, 1 31 Adams
640, 1 40 Smith
Phone
NUPI
555-4444
666-5555
111-2222
222-3333
AMP 3
Base Table
RowID Cust Name
USI
471, 1 45 Adams
555, 6 98 Brown
717, 2 72 Adams
884, 1 74 Smith
Phone
NUPI
444-6666
333-9999
666-7777
555-6666
AMP 4
Base Table
RowID Cust Name
USI
147, 1 49 Smith
147, 2 12 Young
388, 1 27 Jones
822, 1 62 Black
Phone
NUPI
111-6666
777-4444
222-8888
444-5555
Base Table
RowID Cust Name
USI
639, 1 77 Jones
778, 3 95 Peters
778, 7 56 Smith
915, 9 51 Marsh
Phone
NUPI
777-6666
555-7777
555-7777
888-2222
Non-Unique Secondary Index (NUSI) Access

Create NUSI
CREATE INDEX (Name) ON
Customer;
AMP 1
AMP 2
NUSI Subtable
Access via NUSI

SELECT *
FROM
Customer
WHERE Name = 'Adams';
RowID
432, 8
448, 1
567, 3
656, 1
Name
Smith
White
Adams
Rice
RowID
640, 1
107, 1
638, 1
536, 5
AMP 3
NUSI Subtable
RowID Name
432, 3 Smith
567, 2 Adams
852, 1
Brown
RowID
884, 1
471, 1
717, 2
555, 6
AMP 4
NUSI Subtable
RowID
432, 1
448, 4
567, 6
770, 1
Name
Smith
Black
Jones
Young
RowID
147, 1
822, 1
338, 1
147, 2
NUSI Subtable
RowID
155, 1
396, 1
432, 5
567, 1
Name
Marsh
Peters
Smith
Jones
RowID
915, 9
778, 3
778, 7
639, 1
PE
Customer
NUSI Value = 'Adams'
Table ID = 100
Hashing
Algorithm
Table ID
100
to MPL
AMP 2
Base Table
Row Hash NUSI Value

567
AMP 1
Adams
RowID Cust Name

NUSI
107, 1 37 White
536, 5 84 Rice
638, 1 31 Adams
640, 1 40 Smith
Phone
NUPI
555-4444
666-5555
111-2222
222-3333
AMP 3
Base Table
RowID Cust Name
NUSI
471, 1 45 Adams
555, 6 98 Brown
717, 2 72 Adams
884, 1 74 Smith
Phone
NUPI
444-6666
333-9999
666-7777
555-6666
AMP 4
Base Table
RowID Cust Name
NUSI
147, 1 49 Smith
147, 2 12 Young
388, 1 27 Jones
822, 1 62 Black
Phone
NUPI
111-6666
777-4444
222-8888
444-5555
Base Table
RowID Cust Name
NUSI
639, 1 77 Jones
778, 3 95 Peters
778, 7 56 Smith
915, 9 51 Marsh
Phone
NUPI
777-6666
555-7777
555-7777
888-2222
Full Table Scans

Every row of the table must be read.
All AMPs scan their portion of the table in parallel.
Fast and efficient on Teradata due to parallelism.
Full table scans typically occur when either:
An index is not used in the query
An index is used in a non-equality test
Customer
Cust_ID
USI
Cust_Name
Cust_Phone
NUPI
Examples of Full Table Scans:

SELECT * FROM Customer WHERE Cust_Phone LIKE '524-_ _ _ _';
SELECT * FROM Customer WHERE Cust_Name = 'Davis';
SELECT * FROM Customer WHERE Cust_ID > 1000;
Teradat Data Recovery and Protection
Data Protection Features

Facilities that provide system-level protection
Disk Arrays
RAID data protection (e.g., RAID 1)
Redundant SCSI buses and array controllers
Cliques and Vproc Migration
SMP or O.S. failures - Vprocs can migrate to other nodes within the clique.
Facilities that provide Teradata DB protection
Locks
provides data integrity
Fallback
provides data access with a down AMP
Down AMP Recovery Journal
fast recovery of fallback rows for AMPs
Transient Journal
automatic rollback of aborted transactions
Permanent Journal
optional before and after-image journaling
RAID Technologies
RAID - Redundant Array of Independent Disks
RAID technology provides data protection at the disk drive level. With RAID 1
and RAID 5 technologies, access to the data is continuous even if a disk
drive fails.
RAID technologies available with Teradata:
RAID 1
For every disk there is a mirror disk available.
RAID 5
For every 3 disks there is a parity disk available.If any failure

happened at the disk level the parity disk will be used.
RAID 1 Mirroring
2 Drive Groups each with 1 mirrored pair of disks
Operating system sees 2 logical disks (LUNs) or volumes
If LUN 0 has more activity , more disk I/Os occur on the first two drives in
the array.
Disk Array Controller
LUN 1
LUN 0
Disk 1
Mirror 1
Disk 3
Mirror 3
Block A0
Block A0
Block B0
Block B0
Block A1
Block A1
Block B1
Block B1
Block A2
Block A2
Block B2
Block B2
Block A3
Block A3
Block B3
Block B3
2 Drive Groups each with 1 pair of

mirrored disks
If physical drives are 36 GB each, then each logical unit

(LUN) or volume is effectively 36 GB.
Characteristics
RAID 1
Summary
data is fully replicated

striped mirroring is possible with multiple pairs of disks in a drive group
transparent to operating system
Advantages
maximum data availability

read performance gains
no performance penalty with write operations
fast recovery and restoration
Disadvantages
50% of disk space is used for mirrored data
Summary
RAID 1 provides high data availability and performance, but storage costs are higher
RAID 5 Data Parity Protection

Sometimes referred to as 3 + 1 RAID 5.
When data is updated, parity is also updated.
new_data XOR current_data XOR current_parity = new_parity
Disk Array Controller
LUN 0
Disk 4
Disk 1
Disk 2
Disk 3
Block 0
Block 1
Block 2
Parity
Block 3
Block 4
Parity
Block 5
Block 6
Parity
Block 7
Block 8
Parity
Block 9
Block 10
Block 11
Block 12
Block 13
Block 14
Parity
If physical drives are 36 GB each, then each logical unit

(LUN) or volume is effectively 108 GB.
RAID 5 Summary
Characteristics
data and parity is striped and interleaved across multiple disks
XOR logic is used to calculate parity
transparent to operating system
Advantages
provides high availability with minimum disk space (e.g., 25%) used for
parity overhead
Disadvantages
write performance penalty
performance degradation during data recovery and reconstruction
Summary
High data availability with minimum storage cost
Good choice when majority of I/Os are reads and storage space is at a
premium
Cliques
Clique a set of SMPs that share a common set of disk arrays.
SMP001-4 AMPs
0
4 . 36
DAC-A
DAC-B
SMP001-5 AMPs
1
5 . 37
DAC-A
DAC-B
SMP002-4 AMPs
2
6 . 38
DAC-A
DAC-B
SMP002-5 AMPs
3
7 . 39
DAC-A
DAC-B
Teradata Vproc Migration

Vproc Migration vprocs in the failed node are started in the remaining
nodes within the clique.
SMP001-4 AMPs
SMP Fails
DAC-A
DAC-B
SMP001-5 AMPs
0
3 36
DAC-A
39
DAC-B
SMP002-4 AMPs
1
4 . 37
DAC-A
DAC-B
SMP002-5 AMPs
2
5 .
DAC-A
38
DAC-B
Fallback
A Fallback table is fully available in the event of an unavailable AMP.
A Fallback row is a copy of a Primary row which is stored on a different AMP.
AMP
Primary
rows
Fallback
rows
AMP
2
3
11
6
8
AMP
3
5
Benefits of Fallback
Permits access to table data during AMP off-line period.
Adds a level of data protection beyond disk array RAID.
Automatic restore of data changed during AMP off-line.
Critical for high availability applications.
Cost of Fallback
Twice the disk space for table storage.
Twice the I/O for Inserts, Updates and Deletes.
AMP
7
1
12
11
1
6
Note:
Loss of any two
AMPs in a cluster
causes RDBMS to
halt!
12
Fallback Clusters
A Fallback cluster is a defined set of AMPs across which fallback is implemented.
All Fallback rows for AMPs in a cluster must reside within the cluster.
Loss of one AMP in the cluster permits continued table access.
Loss of two AMPs in the cluster causes the RDBMS to halt.
Cluster 0
AMP 1
AMP 2
AMP 3
AMP 4
Primary
rows
62
27
34
22
50
78
19
14
38
Fallback
rows
34
14
19
38
22
62
50
27
78
Cluster 1
AMP 5
AMP 6
AMP 7
AMP 8
Primary
rows
41
66
58
93
20
88
45
17
37
72
Fallback
rows
93
72
88
45
17
37
58
41
20
66
Recovery Journal for Down AMPs

Recovery Journal is:
Automatically activated when an AMP is taken off-line.

Maintained by other AMPs in the cluster.
Totally transparent to users of the system.
While AMP is off-line
Journal is active.
Table updates continue as normal.
Journal logs Row IDs of changed rows for down-AMP.
When AMP is back on-line
Restores rows on recovered AMP to current status.

Journal discarded when recovery complete.
AMP 1
AMP 2
AMP 3
AMP 4
Vdisk
Primary
rows
62
27
34
22
50
78
19
14
38
Fallback
rows
34
14
19
38
22
62
50
27
78
Recovery
Journal
Row ID for 34
Row ID for 62
Row ID for 14
Transient Journal
Transient Journal provides transaction integrity
A journal of transaction before images.

Provides for automatic rollback in the event of TXN failure.
Is automatic and transparent.
Before images are reapplied to table if TXN fails.
Before images are discarded upon TXN completion.
Successful TXN
BEGIN TRANSACTION
UPDATE Row A
Before image Row A recorded (Add $100 to checking)
UPDATE Row B
Before image Row B recorded (Subtract $100 from savings)
END TRANSACTION
Discard before images
Failed TXN
BEGIN TRANSACTION
UPDATE Row A
Before image Row A recorded
UPDATE Row B
Before image Row B recorded
(Failure occurs)
(Rollback occurs) Reapply before images
(Terminate TXN)
Discard before images
Permanent Journal
The Permanent Journal is an optional, user-specified, system-maintained
journal which is used for recovery of a database to a specified point in time.
The Permanent Journal:
Is used for recovery from unexpected hardware or software disasters.
May be specified for ...
a.) One or more tables
b.) One or more databases
Permits capture of Before Images for database rollback.
Permits capture of After Images for database rollforward.
Permits archiving change images during table maintenance.
Reduces need for full table backups.
Provides a means of recovering NO FALLBACK tables.
Requires additional disk space for change images.
Requires user intervention for archive and recovery activity.

Introduction To Teradata and Teradata Architecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Teradata and Teradata Architecture

Uploaded by

Copyright:

Available Formats

Introduction to Teradata and Teradata

Teradata Components, Architecture and Features

Components and Architecture

Components and Architecture

Components and Architecture

Teradata Functional Overview

Teradata Data Dictionary and Objects

The DD/D ...

is an integrated set of system tables

contains definitions of and information about all objects in the system

is entirely maintained by the RDBMS

is data about the data or metadata

is distributed across all AMPs like all tables

may be queried by administrators or support staff

is accessed via Teradata supplied views

Examples of DD/D views:

- information about access rights

- information about space utilization

HELP Commands Teradata SQL Extension

Tables, Views, and Macros:

Example of HELP DATABASE

HELP DATABASE Customer_Service;

SHOW Command Teradata SQL Extension

EXPLAIN Facility Teradata SQL Extension

A relative cost of the request (in units of time)

This information is useful for:

testing queries before production

analyzing various approaches to a problem EXPLAIN

EXPLAIN SELECT last_name, department_number FROM Employee ;

Teradata Database, User and Space

Storing and Accessing Data Rows

The rows of every table are distributed among all AMPs

Each AMP is responsible for a subset of the rows of each table.

Ideally, each table will be evenly distributed among all AMPs.

Evenly distributed tables result in evenly distributed workloads.

Creating a Primary Index

A Primary Index is defined at table creation.

It may consist of a single column, or a combination of columns

Limit of 16 columns with V2R4.1 and prior releases

Limit of 64 columns with V2R5.

CREATE TABLE sample_1

If the index choice of column(s) is unique,

CREATE TABLE sample_2

If the index choice of column(s) isnt unique,

Note: Changing the choice of Primary Index

A UPI choice will result in even distribution

A NUPI choice will result in even distribution

Primary Index Values

This is done using a hashing algorithm.

Secondary index access

Accessing the row by its Primary Index value is:

Other table access techniques:

always a one-AMP operation

Accessing Via a Unique Primary Index

Accessing Via a Non-Unique Primary Index

Both UPI and NUPI

Primary Keys and Primary Indexes

Indexes are conceptually different from keys.

A PK is a relational modeling convention which allows each row to be uniquely identified.

Logical concept of data modeling

Physical mechanism for access and storage

Teradata doesnt need to recognize

Each table must have exactly one primary index