Professional Documents
Culture Documents
Teradata Architecture
LEVEL LEARNER
Icons Used
Hands-on
Exercise
Coding
Standards
2
Referenc
e
Lend A
Hand
Question
s
Summar
y
Points To
Ponder
Test Your
Understanding
BYNET
Teradata Architecture
Teradata Components
Parsing engine (PE)
BYNET (BanYan NETwork)
AMP
Disk
What is a Node?
Node
Each Node is attached via a Network to a Disk Farm
A Teradata AMP will be assigned a Virtual disk to store its
tables and the rows .
Only the AMP assigned to the virtual disk can read or write
to that disk.
A node holds 40-50 AMPs.
SMP Node
MPP
Two SMP nodes connected via the BYNETs are now one
Massively Parallel Processing (MPP) system.
Parsing Engine
When a user logs into Teradata, a PE will log them in and be
responsible for their entire session
The PE checks the SQL Syntax
The PE creates the EXPLAIN plan checks security and builds a
plan for the AMPs to follow. Hence PE is also known as
Optimizer.
The PE converts EBCDIC (from the mainframe queries) to
ASCII on the way in and the AMPs are responsible for
converting from ASCII to EBCDIC on the way out.
The PE always delivers the final answer set to the user.
The Parsing Engine's biggest responsibility is
building a parallel-aware, cost-based plan for the AMPs to follow
to retrieve the data
Process
BYNET
AMP
AMPS are responsible for storing and retrieving rows from their
assigned disk (Vdisk).
AMPs lock the tables and rows.
AMPs sort rows and do all aggregation.
AMPs handle all space management and space accounting.
AMPs convert ASCII to EBCDIC when returning answer sets to the
mainframe.
In Teradata 13, the AMP Worker Task (AWT) per AMP is increased for better
performance.
All Teradata Tables are spread across ALL AMPS
Disk Array
Teradata Components
Questions
23
24
Summary
The chapters give a detailed overview of the following
processes in Teradata:
The PE checks the syntax of the query, also checks the
security right of the user accessing.
The PE comes up with the best optimized plan for execution
of the query.
The PE passes this plan through BYNET to AMP.
The AMPs follow the plan to retrieve data from its DISKS.
The AMP passes the data to PE through BYNET.
The PE then passes the data to the user.
25
Introduction to RBMS
A
Logical/Relational Model
The Logical Model
Should be designed without regard to usage
It cannot accommodate wide variety of front end tools
It allows database to be created more quickly
Should be same regardless of data volume
Represents real world business in a tabular (relational) form.
Includes all the data definitions within the scope of
enterprise or application
Is generic , Logical model is the template for physical
implementation on any RDBMS platform.
Teradata supports fully normalized logical models
Ability to perform 64 table joins
Ability to perform large aggregations
Logical/Relational Model
A column always contain like data
Relational database contains set of logically related tables
A table is a two dimensional representation of a data consisting of
rows and columns
Column always contain like data
A row is one instance of all the columns in a table
In a relational database, tables are defined as a named collection of
one or more named columns that can have zero or many rows of
related information
Each row represents an occurrence of entity defined by the table. An
entity is defined as a person, place, thing or event about which the
table causes information.
In relational math, the following stand true
Relational Advantage
Advantages of relational database:
Ease of use: The revision of any information as tables consisting of rows and columns is much easier to
understand .
Flexibility: Different tables from which information has to be linked and extracted can be easily
manipulated by operators such as project and join to give information in the form in which it is desired.
Security: Security control and authorization can also be implemented more easily by moving sensitive
attributes in a given table into a separate relation with its own authorization controls. If authorization
requirement permits, a particular attribute could be joined back with others to enable full information
retrieval.
Data Independence: Data independence is achieved more easily with normalization structure used in
a relational database than in the more complicated tree or network structure.
Data Manipulation Language: The possibility of responding to query by means of a language based
on relational algebra and relational calculus e.g SQL is easy in the relational database approach. For data
organized in other structure the query language either becomes complex or extremely limited in its
capabilities.
Cater for future requirements: By having data held in separate tables, it is simple to add records
that are not yet needed but may be in the future. For example, the city table could be expanded to
include every city and town in the country, even though no other records are using them all as yet. A flat
file database cannot do this
Indexing
Index is the physical mechanism to store the data
Primary Index
NO Primary Index
NO Primary Index
To retrieve a record , Teradata performs Full table scan as
there is no primary index.
NO Primary Index
The Teradata Parsing Engine will take the Primary Index Value of a row and
run a math calculation called the Hash Formula on that Primary Index
column value.
It produces 32 - bit row hash which equates to an integer
The Row Hash will go to a bucket in the Hash Map and is assigned to an
AMP
32 bit row hash
00000000000000000101 = 13
Every Teradata System has one Hash Map with a million buckets. Inside the
buckets are AMP numbers
The below example hashed Emp_No 1001 (Primary Index value) and the
output was a Row Hash of 13. Teradata counted over to bucket 13 in the
Hash Map, and it has the number one (1) inside that bucket. This means
that this row will go to AMP 1.
Emp_No 1002 (Primary Index value) and the output was a Row Hash of 5.
Teradata counted over to bucket 5 in the Hash Map, and it has the number
two (2) inside that bucket. This means that this row will go to AMP 2.
There is one Hashing Formula in Teradata, and it is consistent.
Emp No 1001
Emp No 1002
Hash the Primary Index Value for a row with the Hash
Formula.
The output of the Hash Formula is a 32-bit Row Hash.
Take the Row Hash and find its corresponding bucket in the
Hash Map.
Send the row and its Row Hash to the AMP listed in the
Hash Map Bucket.
Skew Factor
NULL values in the Primary Index is the main reason for skew. A
Table with a Unique Primary Index can have only one Null value,
but a NUPI table can have many NULL values, and each NULL
value hashes to the same AMP.
Uniqueness Value
Each AMP will place a Uniqueness Value after the row hash
to track duplicate values
The Hash Formula is consistent so every Smith has the
same Row Hash and the same goes for each Jones and each
Patel. Therefore, duplicate values land on the same AMP.
Row ID
UNIQUE PRIMARY INDEX
The Uniqueness Value on
each Row-ID is 1.
Each AMP sorts their rows by
the Row-ID.
Example
Sel * from Employee_table where
last_name =Smith;
Plan:
1. PE sees the last name as Priamry index
2. It hash Smith and get row hash
3. Row hash =7
4. Counts the bucket in hash map 7 times
and it says Amp 1
5. Passes message to AMP1 through
BYNET to retrieve row has 7s
6. Bring back all columns for Row hash 7
(Smith)
A Non-Unique Primary
Index will NOT spread the
data perfectly evenly.
Multi-Column Primary
Index is often used to fix a
data skew problem
Secondary Index
1.
2.
Emp_no is a USI.
PE will hash 1004 and see which AMP holds row in subtable. (AMP 3).
PE will have the BYNET contact with AMP 3 and retrieves row 1004 (Single AMP).
AMP will pass the real row id of base table row (1,4) back up to PE.
PE will use the ROW ID to find the base table row with another single AMP retrieve.
A USI is a Two-AMP Operation
The first AMP is assigned to read the subtable and the second the base table.
Two binary searches are performed in total, and one row is returned.
Syntax
. The NUSI rows get their own Row-ID, but they are not
hashed to different AMPs and stay AMP local.
First_name is a NUSI.
PE will order each AMP to search if they have kyle in their NUSI subtable
Each AMP will simultaneously perform a binary search on their NUSI Subtable
If AMP has Kyle, PE will order them to retrieve the base row.
If there are 50 AMPs, then all 50 AMPs will perform a binary search simultaneously and
if they find Kyle they perform another binary search on base table.
USI
No
No
2
"0-32"
64
Y
N
Y
Y
Y
Sub-table
Y
N
N
NUSI
No
No
Many
"0-32"
64
N
N
Y
Y
Y
Sub-table
Y
Y
N
Questions
63
Summary
Index is the physical mechanism to store the data
A PK is a relational modeling convention which allows each row to be
uniquely identified
The Primary Index is defined when the table is created.
A table can have only one Primary Index, but you can combine up to 64
columns together max to form one Multi-Column Primary Index.
Hash the Primary Index Value for a row with the Hash Formula.
The output of the Hash Formula is a 32-bit Row Hash.
Row-ID equals the Row Hash of the Primary Index column and the
Uniqueness Value.
Secondary Index can be created and dropped dynamically
Non Unique Secondary Index (NUSI) Subtable contains two columns
Emp_No (The USI column) First_Name (The NUSI column)
Row-ID of the real Primary Index of the base table
NUSI are AMP -Local
Module 4: Space
Objectives:
After completing this chapter, you will be able to answer the
following questions
What is Teradata database and user?
How are space allocated to Teradata objects?
What is the hierarchy of objects in Teradata syatem?
Space
There are three types of space in Teradata
Perm Space : PERM space houses permanent tables,
Secondary Indexes, Join Indexes and Permanent Journals
Temp Space: Temp space is store temporary tables
Spool Space : Spool space is used by each AMP in order to
build the answer set for the user.
A Teradata Database(Example)
A Teradata database is a logical repository for
Tables (requires perm space)
Views (uses no perm space)
Macros (use no perm space)
When a system arrives, there is only one user called DBC.
USER DBC
System user DBC contains all Teradata Database software components and all system
tables.
Syntax:
CREATE DATABASE new_db FROM existing_db
AS
PERMANENT = 20000000
,SPOOL= 50000000
,TEMP = 20000000
new_db is owned by existign_db
A database is empty until all objects are created within it
A database with no PERM space can have view and macros but not tables
A Teradata User
A Teradata user is a database with an assigned password
A Teradata user may also own tables, view, macros, triggers but users with no
perm space may not own tables
A user may logon to Teradata and access objects within:
Itself
Other database for which it has access rights
Syntax:
CREATE USER new_user FROM existing_user
AS
PERMANENT = 10000000
PASSWORD =Acdmy
,SPOOL= 50000000
,TEMP = 20000000
new_user is owned by existing_user
A user is empty until all objects are created within it
Locks
There are four types of Locks
Exclusive Lock: This is placed only on a database or table when the
object is going through a structural change. Prevents any other type of
concurrent access to database or tables and never to rows
Write Lock: This happens on an INSERT, DELETE, or UPDATE request. It
prevents other Read, Write and Exclusive locks
Read Lock: This is placed in response to a SELECT request. This restricts
access by users who require Exclusive or Write locks. If you have a multiuser environment with updates occurring and you need to keep data
consistent, you want a read lock.
Access Locks(Dirty-Read or Stale-Read): An Access lock permits the
user to access to READ an object that may already be locked for READ or
WRITE. An access lock does not restrict access by another user except
when an Exclusive lock is required. This is placed in response to a userdefined LOCKING FOR ACCESS phrase. A user requesting access cannot
be concerned with data consistency.
Locks
Locks are applied at 3 levels
1. Database: Applies to
tables/Views in the database
2. Table/View: Applies to all rows
in a table
3. Row Hash: Applies to all rows
with same Row Hash
Rule:
Lock requests are queued
behind all outstanding incompatible
lock request for the same object.
Row Hash Lock Syntax :
Locking Row for Access SELECT
* FROM TABLE_A;
Compatibility:
Read supports other Read locks and Access Locks
Write supports Access Lock
Cliques
Teradata resets
On the restart the AMPs in Node 1 Migrate
The system is degraded but still able to function
The down node is fixed
Another reset is done and the AMPs return home
Cliques
Fallback
Fallback
(Emp_No INTEGER
, Dept_No SMALLINT
, First_Name VARCHAR(12)
, Last_Name CHAR(20)
, Salary DECIMAL(10,2))
UNIQUEPRIMARY INDEX
( Emp_No );
Fallback Clusters
RAID
RAID Redundant Array of Independent Disks
Two Types of Disk Array protection
RAID 1(Mirroring)
RAID 1 provides each AMP two disks for storing data and two disks
for mirroring.
The data disk and the mirror disk are called a mirrored pair.
RAID 1 costs 50% of the disk space, but it ensures a 99% up time for
customers.
If a single disk goes down, it is easily replaced and Teradata isn't
even effected
RAID
RAID 5(Parity):
For every 3 blocks of data, there is a parity block on a 4th disk.
If a disk fails, any missing blockmay be reconstructed using the
other three disks
Array controller reconstruction of failed disk is longer than RAID
1
Summary:
RAID 1: Good Performance with disk failures. Higher cost in
terms of disk space
RAID 5: Reduced Performance with disk failures. Lower cost in
terms of disk space
Questions
84
Summary
Source
Disclaimer: Parts of the content of this course is based on the materials available from the
websites and books listed above. The materials that can be accessed from the linked sites
are not maintained by Cognizant Academy and we are not responsible for the contents
thereof. All trademarks, service marks, and trade names in this course are the marks of the
respective owner(s).
32
Change Log
34
Version
Number
Changes made
V1.0
Initial Version
V1.1
Slide No.
1-86
Bhuvanya.M
(221634)
Changed By
Effective
Date
05/05/2015
Changes
Effected
Base line
content
Introduction to Teradata
You have successfully completed the
session on Teradata Architecture