Teradata Basics and Architecure

Teradata Basics &
Architecture
1 2012 WIPRO LTD | WWW.WIPRO.COM

Agenda
1 Teradata Product Overview
2 Primary Index and Data Distribution
3 Teradata Space
4 Introduction to Data Protection
5 Teradata Tools and Objects
6 Introduction to Active Data Warehouse

1. Teradata Product Overview

Objectives
After completing this lesson, you will be able to:

Describe the purpose of the Teradata product
List major architectural features of the product
SMP
MPP
BYNET
Distinguish Between Shared Nothing and Shared Everything
Architecture
Describe Teradata Parallelism and Linear Scalability Features
Describe the purpose of the PEs, AMPs, PDE
LAN & Mainframe Connection to Teradata
List Teradata Strengths
Major Difference between Teradata and Other Databases

What is Teradata
DB designed for DWH implementations
Massively Parallel Processing(MPP), Shared Nothing Architecture
Each node is self sufficient and independent of other nodes
Born Parallel
Linear Scalable
Interconnect
BYNET
SMPs : CPUs & Memory

Node1 Node2 Node3 Node4
Disks

Parallel Shared Nothing Architecture
V2 Virtual Processors (Vprocs) Session Mgmt

Parsing
Optimizer
Dispatch
PE vproc PE vproc
Message
Passing Layer
BYNET 0
BYNET 1
Execution
AMP AMP AMP AMP AMP AMP AMP AMP Engines
vproc vproc vproc vproc vproc vproc vproc vproc Run in Parallel
PDE Replaces TOS

Operating System
No one else
Vdisk Vdisk Vdisk Vdisk Vdisk Vdisk Vdisk Vdisk shares

Shared Vs. Shared Nothing

Quick Recap
Oracle is Shared Everything architecture
True
Teradata is Shared Nothing architecture
True
Teradata is scalable and born parallel
True
PE vproc manages Disk I/O
False
AMP vproc manages session
False
Each AMP vproc has its own dedicated vdisk
True
Teradata Optimizer is CBO
True
All communication between nodes and vprocs is thru BYNET
True
PDE replaces Ver 1 TOS
True

Parsing Engines (PEs)
The Parsing Engine is responsible for:

Managing individual sessions (up to 120)
Parsing and Optimizing your SQL requests
Dispatching the optimized plan to the AMPs over BYNET
ASCII / EBCDIC input conversion (if necessary)
Sending the answer set response back to the requesting client
If the user doesnt have proper access rights, the query is

rejected

Access Module Processors (AMPs)
Each AMP is always connected to a single virtual disk (vdisk)

AMPs follow the PEs
AMPs store and retrieve rows to and from disk
AMPs Passes the data to the PE over BYNET
Each AMP holds a portion of every table
Each AMP keeps their portion of table in separate vdisk
The AMPs are also responsible for other db functions
Each AMP can perform up to 80 tasks in parallel
All AMPs can work together in parallel to service any request
Each AMP can work on several requests in parallel

BYNET
The BYNET is a dual redundant, bi-directional network

interconnect
BYNET
Enables communication between AMPs and PEs
Enables multiple SMP nodes to communicate
Automatic load balancing of message traffic
Fully operational dual BYNETs provide fault tolerance
Scalable bandwidth as nodes are added

Parallel Database Extension (PDE)
A software interface layer on top of the operating system that

enables the RDBMS to operate in a parallel environment.
PDE provides ability to :

Execute vprocs.
Run the Teradata RDBMS in a parallel environment
An application, which runs under the control of PDE is called a

Trusted Parallel Application (TPA).
The Teradata RDBMS is the TPA application

Linear Scalability
Start small and grow forever without losing any power
Adding AMPs/PEs/Nodes to the system improve the performance
linearly
Capability for increased workload without decreased performance
More nodes
More work
More users
More data
BYNET
Node
Node1 Node2 Node3 Node4 Work
Users
Data
Linear Scalability (Contd.).
Shared nothing MPP platform
Software can scale linearly with hardware
Parallel unit Vprocs act as self contained mini DBMS.
AMPs do sorting, locking, journaling, loading, backup, recovery

functions independently.
Adding AMPs/PEs/Nodes to the system improve the

performance linearly. E.g.. Doubling the number of nodes will
also double the query execution speed.

Linear Scalability (Contd.).
Linear Scalability
Teradata
Total
Work
Accom
plished
Non-Linear Scalability
Traditional Transaction
Processing Systems
Increasing CPU power

LAN & Mainframe Connection to Teradata
Client Req. Client Req

Network Channel
attached CLI attached CLI
system system ESCON Channel
MTDP Or
TDP
MOSI Bus/Tag Cables
Host Adapter
Ethernet Adapter
P
Gateway S/W
Parser Engine Parser Engine
BYNET D
AMP AMP AMP AMP
Vdisk Vdisk Vdisk Vdisk E

Teradata Strengths
Parallel Processing
Linear Scalability - manageable growth via modularity
Experienced Optimizer - CBO
Load Utilities TPT, FastLoad, MultiLoad, TPump, FastExport
Active Data Warehousing
Easy Database Administration
Designed to run the worlds largest enterprise data warehouse
databases
Executes on UNIX, Linux and Windows operating systems
Runs on single or multiple nodes
Provides Network and Mainframe connectivity
Supports Industry standard access language (SQL)
Fault tolerant at all levels of hardware and software

Teradata Implementations
Worlds Largest Retailers

Worlds Largest Banks
Worlds Largest Global Telecommunication
Worlds Largest Airlines
Worlds Largest Insurance Companies
Two largest commercial data warehouse in the world use
Teradata
AT&T
Walmart

Teradata vs. the Traditional Databases
Teradata designed to accommodate data warehouse implementation.
Traditional database systems were designed for transaction processing.
Data warehousing features in the traditional databases rather patches on the

top of the core database.
Teradata has data warehousing features embedded into the core of the
database.
Teradata uses Shared Nothing Architecture and hence eliminates Resource

Contention.
Linear scalability, unconditional parallelism, multi-faceted parallelism,

intelligent data distribution, parallel-aware optimizer makes Teradata capable
of handle large data and complex queries.

Teradata A Brief History
Year Events
1979 Teradata Corp founded in L.A. Cal
Development begins on a massively parallel computer
1982 YNET technology is patented
1984 Teradata markets the first database computer
DBC/1012 First system purchased by Wells Fargo Bank of Cal.
1987 First public offering of stock
1989 Teradata and NCR partner on next generation of DBC
1991 NCR Corporation is acquired by AT&T
1992 Teradata is merged into NCR
1996 AT&T spins off NCR Corp with Teradata product
1997 Teradata database becomes industry leader in data warehousing

teradata.com
http://teradata.com/enterprise-data-warehousing/
Click on Teradata Purpose-Built Platform Family and Download

the PDF Read
Discuss on Various Teradata Models and Their Features

Case let - AMP
You are the administrator for a Teradata system. What

happens to the performance of the system incase you double
the number of AMPs and the number of users/data remains
same.
A. Performance gets doubled

B. Performance remains the same
C. Performance goes down
D. You cannot add AMPs

Review Questions
1) When was the first Teradata product sold? 1984

2) How many zeroes are there in a trillion?12
3) Which feature of Teradata permits high performance even against
enormous databases? Parallelism
4) Name the operating systems that Teradata can execute on.
Unix, Windows and Linux
5) Is Teradata a client or a server? Server
6) What is the purpose of the PE? Session, Parser, Optimizer, Dispatcher
7) What is the purpose of the AMP? I/O and other db functions
8) How many sessions can a PE support? 120

Match Quiz
1. CLI (F) A. Does aggregating, sorting and joining

2. MTDP (I) B. Handles up to 120 sessions
3. MOSI (E) C. Message passing layer
4. PE (B) D. Balances sessions in a mainframe environnent
5. AMP (A) E. Provides OS independence
6. BYNET (C) F. Lowest level interface to Teradata
7. TDP (D) G. Chooses most efficient access path
8. Optimizer (G) H. PE software that sends planned step to AMP
9. Dispatcher (H) I. Library of service routine for session management
10. Parallelism (J) J. Foundation of Teradata architecture

Summary
In this lesson you learnt about:

Terms associated with relational databases
Advantage of a relational database
Purpose of the Teradata product
Major architectural features of the product
Overall Teradata parallel architecture.
Major components of the Teradata architecture.
Purpose of the PE and the AMP.
Relationship of the Teradata to its client side applications.
Brief history, strengths and implementations of the product

2. Primary Index and Data Distribution

Objectives
At the end of this lesson, you will be able to:

Explain the purpose of the Primary Index
Explain Primary Index types
Explain Data Distribution across VDISKS of AMPs
Explain the role of the hashing algorithm and the hash map in locating a
row.
Explain the makeup of the Row ID and its role in row storage.
State the reasons for selecting a UPI vs. a NUPI
Understand Parallelism

Storing Rows
AMP AMP AMP AMP
Customer Table rows

Vendor Table rows
The uniformity of distribution of the rows of a table depends on

the choice of the Primary Index

Primary Index (PI)
Each table in Teradata is required to have a Primary Index
The Primary Index will determine on which AMP a row will reside
It is a Physical mechanism used to store and access rows
The Primary Index plays 3 roles:
Data Distribution
Fastest Way to Retrieve Data
Incredibly important for Joins
It may consist of a single column, or a combination of up to 64 columns
Defined in CREATE TABLE statement
Changing the choice of Primary Index requires dropping and recreating
the table
May be unique or non-unique, Values may be changed
May be NULL
One AMP operation

Types of PI
Unique Primary Index (UPI)

Does not allow duplicates
Provides even distribution of rows of the table across all AMPs
Does not require duplicate row checking
Retrieves 0 1 row
Non Unique Primary Index (NUPI)

Allows duplicates
Does not provide even distribution of rows
May choose NUPI over UPI because it may be more efficient for query
access and joins
Retrieves 0-Many rows
Both UPI and NUPI access is always one AMP operation

Hash Map
7225
Hashing Algorithm
32 bit Row Hash

Hash bucket # Remaining 16 bits
0000 0001 0000 1010 1100 0111 0101 1011

Hash Map
0 1 0 A OO 01 09 0A 0F
(Hexadecimal)
00
01 05
02
AMP#
Hashing PI Value
25 26
Pick Up Primary Index value
Defined by
user. Used for
Parsing Engine
data
distribution
Hashing Algorithm and access.
First 16 Bits 32 Bit Row Hash

Bucket # Hash Map
2-D Array
associates hash
buckets to
Message Passing Layer (BYNET)
AMPs
Node-1
Node-2
AMP AMP AMP AMP

Data Retrieval
SELECT * FROM ORDER_MASTER;
Client
Parsing Engine
Dispatcher
RET Step
RET Step Message Passing Layer
AMP AMP AMP AMP
29 25 10 50
75
Row Distribution
AMP 1 AMP 2 AMP 3 AMP 4
NUPI ON VIJAY * HARISH MANJU KAVITHA

VIJAY * VASANTH RAVI NIRMALA
CUSTOMER_NAME SAVITHA MANJU PRASHANTH

NUPI ON M
M F
CUSTOMER_SEX_CODE M F
M
M
UPI ON 10001 10002 10003 10004

CUSTOMER_ID 10005 10006 10007 10008

Identifying Rows
A row hash is not adequate to uniquely identify a row.

Duplicate PI values results in same row hash
Different PI values may result in same row hash this is called Hash
Synonym
To uniquely identify a row, AMP will assign a 32-bit uniqueness
value to the 32-bit row hash.
It assigns a 1 if the Row Hash is unique or a 2 if it is the
second, or a 3 if the third, etc.
The combination of 32-bit Row Hash + 32-bit Uniqueness
Value is called Row-ID (64 bit)
Each stored row is prefixed with the Row-ID
Rows are logically maintained in Row ID sequence

Explain plan - FTS
EXPLAIN SELECT * FROM contract;
Explanation
1) First, we lock a distinct RETAIL."pseudo table" for read on a RowHash to
prevent global deadlock for RETAIL.CONTRACT.
2) Next, we lock RETAIL.CONTRACT for read.
3) We do an all-AMPs RETRIEVE step from RETAIL.CONTRACT
by way of an all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs. The size of Spool 1 is
estimated with high confidence to be 15,000 rows (1,320,000
bytes). The estimated time for this step is 0.21seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.21 seconds.

Explain Plan - UPI
EXPLAIN SELECT c_custkey,c_name,c_acctbal

FROM client
WHERE c_custkey = 993;
Explanation
1) First, we do a single-AMP RETRIEVE step from
RETAIL.CLIENT by way of the unique primary
index "RETAIL.CLIENT.C_CUSTKEY = 993" with no
residual conditions. The estimated time for this step is 0.01
seconds.
-> The row is sent directly back to the user as the result of

Explain plan - NUPI
EXPLAIN SELECT
l_orderkey,l_partkey,l_linenumber,l_linestatus
FROM item
WHERE l_orderkey = 54528
Explanation
1) First, we do a single-AMP RETRIEVE step from
RETAIL.item by way of the primary index
"RETAIL.item.L_ORDERKEY = 54528" with no residual conditions
into Spool 1 (one-amp), which is built locally
on that AMP. The size of Spool 1 is estimated with high
confidence to be 4 rows (120 bytes). The estimated time for this
step is 0.02 seconds.
-> The contents of Spool 1 are sent back to the user as the result of

Locating a Row
PE
32 Bit Row Hash

48 Bit Table ID Index Value
Value
DSW AMP Number
AMP File System
Logical Block Identifier Block
Logical Row Identifier Data Row

Locating a Row (Contd.).
DSW part of the row hash fed to Hash Map which identifies the
target AMP number.
The AMP access its Master Index.
Master Index identifies the Cylinder Index
Cylinder Index identifies the data blocks.
A search of data blocks locates the row.

Locating a Row (Contd.).
Master Index
Table Id
+
Row Hash
Table Id Cylinder Index

+
Row Hash
+
Cylinder #
Row Hash + PI Value + Target Row

Data Blocks
Data Block contains one or more row of the same table.
Block sizes range between 512 and 130560 bytes.
Blocks within an individual table can vary, file system adjust

their sizes dynamically as required.
System maintains rows within the block in logical ROW ID

sequence.
Tables involved in Data Warehouse and Decision Support

usually have larger block size to accommodate more rows per
block.

Teradata Parallelism
Each PE supports upto 120 sessions in parallel.
Each sessions may handle multiple requests concurrently up to

16 requests.
MPL design to avoid any bottleneck for the system.
Each AMPs can handle up to 80 task in parallel.

Teradata Parallelism (Contd.).
Multiple sessions can be established by a client utility to

perform multiple tasks in parallel.
Optimizer may concurrently perform more than one step on

behalf of the same request.
Teradata DBMS is supported by a set of parallel client tools to

achieve optimum throughput.

Teradata Parallelism (Contd.).
Query Parallelism
Within-a-Step Parallelism
Multi-Step Parallelism
Common sub-expression elimination.

Query Parallelism
Query parallelism enabled by hash partitioning the data across

all the AMPs defined in the system.
An AMP provides all the database services on its allocation of

data blocks.
Table scans, index scans, projections, selections, joins,

aggregations, sorts executes in parallel across all AMPs.

Query Parallelism (within-a-step)
Optimizer generates steps to execute a SQL request.
A step is often a large chunk of multiple database operations.
Multiple relational operations are processed in parallel by

pipelining.
For example, while a table scan is taking place, selected rows

can be pipelined into join process.

Query Parallelism (Multi-Step)
Executing multiple steps of a query simultaneously across all

unit of parallelism in the system.
One or more processes are invoked for each step on each

AMP to perform a database operation.

Review Questions
For each statement, indicate whether it applies to:

UPIs, NUPIs, Either, or Neither
_______ a) Specified in CREATE TABLE statement

_______b) Provides uniform distribution via the hashing algorithm
_______ c) May be up to 64 columns
_______ d) One AMP operation always
_______ e) Access will return (at most) a single row
_______ f) Used to assign a row to a specific AMP
_______ g) Allows a null
_______ h) Values cannot be changed
_______ i) Required on every table
_______ j) Would permit duplicate rows

Review Exercises
Fill in the Blanks
1. The output of the hashing algorithm is called the _____ _____.

2. To determine the target AMP, the Message Passing Layer
must lookup an entry in the Hash Map based on the ________
number.
3. Two different PI values which hash to the same value are
called Hash ___________ .
4. A Row ID consists of a row hash plus a ____________ value.
5. A uniqueness value is required to produce a unique Row ID
because of _______ _________ and ______ ___________ .

Try on machine
1.Make RETAIL as your default database for the current

session.
2.Get info about indexes on CLIENT, ITEM and CONTRACT

tables
3.Generate and observe the execution plans for rows retrieved

through FTS, UPI and NUPI
4.Make TDUSER as your default database
5.Create your own tables one with UPI and other with NUPI

Summary

Purpose of the Primary Index
Types of Primary Index types
Role of the hashing algorithm and the hash map in locating a row.
Row ID and its role in row storage.
The reasons for selecting a UPI vs. a NUPI
Parallelism

3. Teradata Space

Objectives
After completing this lesson, you will be able to:

Define Teradata Database
Define Teradata User
Distinguish between Database and User
Define Perm Space and it use
Define Spool and Temp Space and their uses

Teradata Database
A Teradata database is a defined logical repository for Tables,

Views, Macros and Triggers.
CREATE DATABASE new_db FROM existing_db AS

PERMANENT = 20000000
,SPOOL = 50000000
New_db is owned by existing_db
Perm Space - Max amount of space available for tables

Spool Space - Max amount of work space available for
requests

Teradata User
A Teradata user is a database with an assigned password

A user may logon to Teradata and access objects within:
Itself
Other databases for which it has access rights
A user is an active repository while a database is a passive
repository
CREATE USER new_user FROM existing_user AS

PERMANENT = 10000000
,PASSWORD = lucky_day
,SPOOL = 20000000
New_user is owned by existing_user

Demo
Demonstrate Creating Database and Users using Teradata

Administrator Client
Explain Hierarchical Structure of Databases
Observe Change in Max Perm of Parent DB when a child DB is

created and dropped

Perm Space
Perm Space is Max amount of space available for storing

Tables, Secondary Index (SI), Permanent Journal
Perm Space defines the upper limit, not allocated at table
creation time
Perm Space is released when data is deleted or when objects
are dropped
Following require no Perm Space
Views, Triggers, Macros
All Perm Space specifications are subtracted from the creator
Perm Space is a zero sum game - the total of all Perm Space
allocations must equal the total amount of disk space available

Spool and Temp Space
Spool Space is Max amount of work space available for

requests
Spool Space is used to hold intermediate and final query result
set
Spool Space is literally unused Perm Space
Spool Space specified is the upper limit for query answer set
If the query exceeds the limit, query gets aborted immediately
You do not add or subtract when spool space is given to
someone else.
Temp Space is also unused Perm Space used for Global
Temporary Tables

Case let Perm Space
You are a DBA for a teradata system that has 200 GB of Perm
Space. You create a User MKRT with 60 GB Perm Space.
User MKRT creates a user SALES with 40 GB of Perm
Space. How much Perm Space User MKRT is left with ?
A. 20 GB
B. 60 GB
C. 100 GB
D. 160 GB

Case let Spool Space
You are a DBA for a teradata system that has 200 GB of

Perm Space and you have reserved 100 GB for Spool. The
system currently has 60 GB of users data. How much is left
out for spool ?
A. 140 GB
B. 40 GB
C. 100 GB
D. 200 GB

Exercise Perm and Spool Space
User A starts with 10 MB of Perm and 10 MB of Spool. User A

creates User B and User C with 1 MB of Perm and 10 MB of
Spool each. How much Perm and Spool User A has now ?
User C creates User D with 1 MB of Perm. Can User C create

tables ?
User B is dropped. How much Perm and Spool User A has now
?

Summary
Teradata Database
Teradata User
Difference between Database and User
Perm Space and it use
Spool and Temp Space and their uses

4. Introduction to Data Protection

Objectives
Define a transaction and transaction modes

Describe transient journal and its function.
Explain the concept of FALLBACK tables.
Describe Cliques and its purpose
Describe permanent journal and its function.
Describe RAID levels.
List the types and levels of locking.

Transaction
A transaction is a logical unit of work.
Statements nested within the transaction either execute

successfully as a group or do not execute.
ANSI Mode
Teradata Mode
ANSI Mode :
Transactions are always implicit in ANSI session mode.
A transaction initiates when :
The first SQL statement in a session executes
The first statement following the close of a transaction
COMMIT or ROLLBACK/ABORT statements close a transaction.
If a transaction includes a DDL statement, it must be the last statement
in the transaction.

Teradata Mode
A Teradata SQL transaction can be a single SQL statement, or

a sequence of SQL statements (MACRO), treated as a single
unit of work.
Implicit
Explicit
Starts with BEGIN TRANSATION statement.
Ends with END TRANSACTION statement.
If a transaction includes a DDL statement, it must be the last
statement in the transaction.
Error results roll back the entire transaction.
Transactions are atomic - either all requests are performed, or
none are.

Transient Journal
Is an automatic feature that provides Data Integrity
Automatic rollback of changed rows in the event of transaction failure
Data is always returned to its original state after a transaction failure.
Takes Before Image (BI) of changes for rollback purpose
BI is stored in AMPs transient journal
AMPs transient journals are maintained in DBC users Perm
Space.
When the transaction is committed, the BI in transient journal is
purged automatically
When a transaction fails
User receives failure message
Transaction is rolled back
Locks are released
Spool files are discarded

Fallback
Fallback table is available in the event of an unavailable AMP

(Single AMP)
A Fallback row is a copy of a Primary row which is stored on a
different AMP.
Automatic restore of data changed during AMP off-line.
Create table with or without Fallback
Add/drop Fallback feature any time
Cost of Fallback
Twice the disk space for table storage.
Twice the I/O for Inserts, Updates and Deletes.
AMP 1 AMP 2
Primary Rows 2 8 6 4 9 5
Fallback Rows 4 9 5 2 8 6

Fallback Cluster
Fallback is always associated with clusters
A Fallback cluster is a defined number of AMPs which are
treated as a single fault tolerant unit.
All Fallback rows for AMPs in a cluster must reside within the
cluster.
Loss of one AMP in the cluster permits continued table access.
Loss of two AMPs in the cluster causes the RDBMS to halt.
AMP 1 AMP 2
Primary Rows 2 8 6 4 9 5
Fallback Rows 4 9 5 2 8 6
Two Clusters
With 2 AMPs each
AMP 3 AMP 4
Primary Rows 12 18 1 6 14 19 15
Fallback Rows 14 19 1 5 12 18 16

Down AMP Recovery Journal (DARJ)
DARJ is started on all AMPs in a cluster when an AMP is down
DARJ keeps track of all changes that would have been written
to the failed AMP.
When the AMP comes back online, the DARJ will catch-up the
AMP by applying the missed transactions.
Once everything is caught up, the DARJ is dropped

Cliques
Clique provides protection against the failure of an entire node
BYNET
Amp Amp Amp Amp Amp Amp Amp Amp
CLIQUE-1 CLIQUE-2
Disk Array Disk Array Disk Array Disk Array

Permanent Journal
The Permanent Journal is an optional, user specified, system-

maintained journal which is used for recovery of a database to
a specified point in time.
The Permanent Journal:
Is used for recovery from unexpected hardware or software disasters.
May be specified for one or more tables
Permits capture of Before Images for database rollback.
Permits capture of After Images for database roll forward.
Permits archiving change images during table maintenance.
Reduces need for full table backups.
Provides a means of recovering NO FALLBACK tables.
Requires additional disk space for change images.
Requires user intervention for archive and recovery activity.

RAID
RAID Redundant Array of Independent Disks provides protection
against a disk failure
Teradata uses RAID-1
RAID 1 Transparent Mirroring
Provides high data availability and performance, but storage costs are high.
Characteristics:
Data is fully replicated
Mirrored striping is possible with multiple pairs of disks in a drive group
Transparent to operating system
Advantages:
Maximum data availability, read performance gains
No performance penalty with write operations
Fast recovery and restoration
Disadvantages:
50% of disk space for mirrored data

RAID 5
Data Parity Protection, Interleaved Parity

Characteristics
Data and parity is striped and interleaved across multiple disks
XOR logic is used to calculate parity
Data is reconstructed on a disk failure
Transparent to operating system
Advantages
Provides high availability with minimum disk space (e.g., 25%) used for
parity overhead
Disadvantages
Write performance penalty
Performance degradation during data recovery and reconstruction

Locks
Locking prevents multiple users who are trying to change the same
data at the same time from violating the data's integrity. This
concurrency control is implemented by locking the desired data.
There are four types of locks:

Exclusive - prevents any other type of concurrent access
Write - prevents other reads, writes, exclusives
Read - prevents writes and exclusives
Access - prevents exclusive only
Locks may be applied at three database levels:

Database - applies to all tables/views in the database
Table/View - applies to all rows in the table/views
Row Hash - applies to all rows with same row hash

Locks
Implicit locking based on the SQL command:

SELECT - applies a Read lock
UPDATE - applies a Write lock
CREATE TABLE - applies an Exclusive lock
Explicit locking using LOCKING modifier

LOCKING FOR ACCESS SELECT * FROM CUSTOMER;
LOCKING FOR EXCLUSIVE UPDATE CUSTOMER SET LID = 2000;
LOCKING FOR WRITE NOWAIT UPDATE CUSTOMER LID = 2001;
Locks would be covered in-depth in Teradata SQL Tuning

Module

Case let - Protection
You are a DBA for a Teradata system and you need to protect
your system against the failure of an entire node. Which
protection feature would you choose ?
A. Clique
B. Fallback Cluster
C. RAID
D. Database Locks

Match Quiz
1. Provides for TXN rollback in case of failure a. Database locks

2. Protects all rows of a table b. Table locks
3. Logs changed rows for down AMP c. Row Hash locks
4. Provides for recovery to a point in time d. FALLBACK
5. Applies to all tables and views within e. Cluster
6. Lowest level of protection granularity f. Recovery journal
7. Protects tables from AMP failure g. Transient journal
8. Protects database from a physical drive failure h. Permanent journal
9. Fault tolerant unit used by Fallback i. RAID
10. Provides protection against failure of a node j. Clique

Summary
Transaction and its modes

Transient journal and its function.
Concept of FALLBACK tables.
Cliques and its purpose
Permanent journal and its function.
RAID levels.
Types and levels of locking.

5. Teradata Tools and Objects

Objectives

Identify Teradata Client Tools and their applications
Identity Teradata Object Types and their purpose

BTEQ
BTEQ, Basic Teradata Query - pronounced BEE-teek , is a

general-purpose command-driven utility
Used to access and manipulate data on the Teradata Database
Generate reports
Perform data movement - Export and Import (suitable for small
volume)
BTEQ is a transparent interface to CLI, to transmit textual SQL to
Teradata server, and deliver response to the user
Limited ability to branch using LABEL
Runs on every supported platform LAN and Channel Attached
Clients
Online Demonstration of using BTEQ
BTEQ would be covered in-depth in Teradata BTEQ and Utilities
Module

Teradata SQL Assistant
Teradata SQL Assistant is an information discovery/query tool
that runs on Microsoft Windows.
Teradata SQL Assistant enables you to access the Teradata
Database as well as other ODBC-compliant databases. Some
of its features include:
Ability to save data in PC-based formats, such as Microsoft Excel,
Microsoft Access, and text files.
History of submitted SQL syntax, to help you build scripts for data
mining and knowledge discovery.
Help with SQL syntax.
Import and export of small amounts of data to and from ODBC-
compliant databases.
Online Demonstration of using Teradata SQL Assistant by

creating DSN

Teradata Administrator
Teradata Administrator provides a comprehensive Windows-
based graphical interface to perform database administration
tasks
Create, Modify and Drop Databases, Users, Roles, Profiles, and User-
Defined Types.
Create Tables (using ANSI or Teradata syntax)
Grant or Revoke access and system rights
Copy Table, View or Macro definitions to another database, or to
another system
Drop or Rename Tables, Views or Macros
Move space from one database to another
Display information about a Database or Users
Display information about a Table, View or Macro
Set up the rules for Query and Access Logging
Online Demo of using Teradata Administrator

Other Client Tools
Index Wizard
Statistics Wizard
Visual Explain
All these would be covered in Teradata SQL Tuning Module
Load and Unload Utilities

FastExport
FastLoad
MultiLoad
TPump
TPT
All these would be covered in Teradata BTEQ and Utilities
Module

Tables
A two-dimensional structure of columns and rows of data
Permanent Tables -Requires Perm Space

SET : No duplicate rows
MULTISET : duplicate rows allowed
Temporary Tables
Derived Tables Requires Spool Space
Volatile Tables Requires Spool Space
Global Temporary Tables Requires Temp Space
All These would be covered in-depth in Teradata SQL Module

Indexes
Apart from PI that we covered in this module, Other Indexes that
Teradata Supports are
Partition Primary Indexes

Single Level
Multi Level
Secondary Indexes
USI
NUSI
Value Order NUSI
Join Index
Hash Index
All these would be covered in-depth in Teradata SQL Tuning

Views
A view is a window into the data contained in relational tables.
A view is sometimes called a virtual table.
It may define a subset of rows of a table.
It may define a subset of columns of a table.
It may reference more than one table.
Data is neither duplicated nor stored separately for a view.
Data can be accessed directly via a table or indirectly via a view,
based on privileges held.
View definitions are stored in the Data Dictionary, not in the users
own space.
Help restrict which rows and columns are visible from base tables.
Help Simplify Query Complexity
VIEWS would be covered in-depth in Teradata SQL Module

Macros
Macros contain one or more prewritten SQL statements.

Macros are a Teradata extension to ANSI SQL.
Macros are stored in the Teradata Data Dictionary.
Macros can be executed from any viable SQL front-end,
including:
Teradata SQL Assistant, BTEQ, LOGON Startup
Another macro
To execute a macro requires the user to have the EXEC
privilege on the macro.
Explicit privileges on the tables or views used by the macro are
not needed by the executing user.
One Macro is One Transaction
MACROS would be covered in-depth in Teradata SQL
Module

Trigger
A trigger is build to perform an action when an DML occurs on a table.

Triggers fire automatically
Triggering statement and trigger action constitute a single transaction, so if the
trigger fails, the transaction fails.
Trigger is a database object of type G
Example : Trigger built on EMP table to insert record into AUDIT table when any
employee salary is updated.
Fires CREATE TRIGGER EMP_SAL_TRIGGER

UPDATE EMP AFTER UPDATE OF (SAL) ON EMP
SET SAL=SAL * 1.1 REFERENCING OLD AS BI
WHERE DEPTNO = 10; NEW AS AI
FOR EACH ROW
(INSERT INTO EMP_AUDIT_TABLE
SELECT * VALUES (BI.EMPNO, BI.SAL, AI.SAL,
FROM EMP_AUDIT_TABLE; DATE); );

Stored Procedures
Consist of a set of control and condition handling statements,

that make SQL a computationally complete programming
language.
A single statement stored procedure body can contain one

control statement,such as LOOP or WHILE, or one SQL DDL,
DML, or DCL statement. Some statements are not allowed,
including:
Any declaration (local variable, cursor, or condition handler) statement
A cursor statement (OPEN, FETCH, or CLOSE)

Stored Procedures (Contd.).
A compound statement stored procedure body consists of a

BEGIN-END statement enclosing a set of declarations and
statements, including:
Local variable declarations
Cursor declarations
Condition handler declaration statements
Control statements
SQL DML, DDL, and DCL statements supported by stored procedures
Compound statements can also be nested.

User-Defined Functions
User-defined functions (UDFs) allow you to extend SQL by

writing your own functions in the C programming language,
installing them on the database, and then using them like
standard SQL functions.
You can also install UDF objects or packages from third-party
vendors, without providing the source code.
UDFs run in parallel, as required, on all AMPs.
Scalar functions take input parameters and return a single
value result.
Aggregate functions produce summary results. They differ from
scalar functions in that they take grouped sets of relational
data, make a pass over each group, and return one result for
the group.

Summary

Teradata Client Tools and their applications
Teradata Object Types and their purpose

6. Introduction to Active Data Warehouse

Objectives
Describe Online Transaction Processing (OLTP )
Describe Decision Support System (DSS)
Describe Data Warehouse
Describe Active Data Warehouse

OLTP
Transactions typically occurs in seconds and not in minutes

Number of rows per transaction is also smaller
Only a few of many possible tables are accessed.
Very little I/O processing is required to complete a transaction
Examples of OLTP transactions
Updating a checking or saving account to reflect a deposit or withdrawal
ATM money withdrawal from a bank
OLTP queries run quickly and are called tactical queries.
Examples:
Altering a campaign based on current status
Determining the best offer for a specific customer

Decision Support System (DSS)
Typically used for strategic long range planning and answering what-
if questions.
Transactions takes minutes to hours.
Many users asking wide variety of questions
Transactions usually involve multiple tables and millions of rows
Examples of DSS transactions/queries
Creating a report to show a comparison of sales from this week to last week
Creating a report that shows the top ten selling items across all stores for one
year
Determine monthly sales of shoes
Following are the few aspects of the DSS environment have gained
importance as technology has improved
the ability to use detailed data
the ability to do adhoc queries
the decreased need to use summary data
10
2012 WIPRO LTD | WWW.WIPRO.COM
0
Data Warehouse
A data warehouse is a central, enterprise wide database that
contains information obtained from operational systems,
designed around DSS.
Data Warehouse Evolution is:

Primarily batch / Pre-defined reports
Increase in ad-hoc queries
Analytical modeling
Continuous update and time sensitive queries became important
Event Based Triggering
In the beginning the warehouse is used for analyzing, which

over time evolves into predicting and finally into
operationalizing.
10
1
Active Data Warehouse
Allows companies to take their OLTP transactions and load

them into the data warehouse in near real-time so users can
analyze the data and make real time decisions before their
competitors.
Supports short tactical OLTP type queries mixed with large
DSS queries
Provides scalability in order to
Support large amounts of detail data
Update operational data store directly
Support an integrated environment with wide mix of queries
Some characteristics of an active data warehouse environment
are
Mission critical application
Tactical queries
24/7 availability and reliability
10
2
Case let - DWH
You are a designer at Wipro. You are designing a data

warehouse which Wipros customers will utilize for strategic
long range planning and answering what-if questions. What
kind of system has to be designed.
A. OLTP
B. DSS
C. OLDB
D. RDDB
10
4
Summary

Online Transaction Processing (OLTP )
Decision Support System (DSS)
Data Warehouse
Active Data Warehouse
10
5
References
Teradata 12 Basics (An Authorized Teradata Certified

Professional Program Study Guide)
10
6
Thank You
10
7

Teradata Basics and Architecure

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teradata Basics and Architecure

Uploaded by

Copyright:

Available Formats

Teradata Basics &

1 2012 WIPRO LTD | WWW.WIPRO.COM

5 Teradata Tools and Objects

6 Introduction to Active Data Warehouse

2 2012 WIPRO LTD | WWW.WIPRO.COM

3 2012 WIPRO LTD | WWW.WIPRO.COM

After completing this lesson, you will be able to:

4 2012 WIPRO LTD | WWW.WIPRO.COM

SMPs : CPUs & Memory

5 2012 WIPRO LTD | WWW.WIPRO.COM

V2 Virtual Processors (Vprocs) Session Mgmt

PDE Replaces TOS

6 2012 WIPRO LTD | WWW.WIPRO.COM

7 2012 WIPRO LTD | WWW.WIPRO.COM

8 2012 WIPRO LTD | WWW.WIPRO.COM

The Parsing Engine is responsible for:

If the user doesnt have proper access rights, the query is

9 2012 WIPRO LTD | WWW.WIPRO.COM

Each AMP is always connected to a single virtual disk (vdisk)

10 2012 WIPRO LTD | WWW.WIPRO.COM

The BYNET is a dual redundant, bi-directional network

11 2012 WIPRO LTD | WWW.WIPRO.COM

A software interface layer on top of the operating system that

PDE provides ability to :

An application, which runs under the control of PDE is called a

The Teradata RDBMS is the TPA application

12 2012 WIPRO LTD | WWW.WIPRO.COM

Shared nothing MPP platform

Software can scale linearly with hardware

Parallel unit Vprocs act as self contained mini DBMS.

AMPs do sorting, locking, journaling, loading, backup, recovery

Adding AMPs/PEs/Nodes to the system improve the

14 2012 WIPRO LTD | WWW.WIPRO.COM

Increasing CPU power

15 2012 WIPRO LTD | WWW.WIPRO.COM

Client Req. Client Req

AMP AMP AMP AMP

Vdisk Vdisk Vdisk Vdisk E

16 2012 WIPRO LTD | WWW.WIPRO.COM

18 2012 WIPRO LTD | WWW.WIPRO.COM

Worlds Largest Retailers

19 2012 WIPRO LTD | WWW.WIPRO.COM

Teradata designed to accommodate data warehouse implementation.

Traditional database systems were designed for transaction processing.

Data warehousing features in the traditional databases rather patches on the

Teradata uses Shared Nothing Architecture and hence eliminates Resource

Linear scalability, unconditional parallelism, multi-faceted parallelism,

20 2012 WIPRO LTD | WWW.WIPRO.COM

21 2012 WIPRO LTD | WWW.WIPRO.COM

Click on Teradata Purpose-Built Platform Family and Download

Discuss on Various Teradata Models and Their Features

22 2012 WIPRO LTD | WWW.WIPRO.COM

You are the administrator for a Teradata system. What

A. Performance gets doubled

23 2012 WIPRO LTD | WWW.WIPRO.COM

1) When was the first Teradata product sold? 1984

24 2012 WIPRO LTD | WWW.WIPRO.COM

1. CLI (F) A. Does aggregating, sorting and joining

25 2012 WIPRO LTD | WWW.WIPRO.COM

In this lesson you learnt about:

26 2012 WIPRO LTD | WWW.WIPRO.COM

27 2012 WIPRO LTD | WWW.WIPRO.COM

At the end of this lesson, you will be able to: