You are on page 1of 105

Teradata Basics &

Architecture

1 2012 WIPRO LTD | WWW.WIPRO.COM


Agenda
1 Teradata Product Overview
2 Primary Index and Data Distribution
3 Teradata Space
4 Introduction to Data Protection

5 Teradata Tools and Objects

6 Introduction to Active Data Warehouse

2 2012 WIPRO LTD | WWW.WIPRO.COM


1. Teradata Product Overview

3 2012 WIPRO LTD | WWW.WIPRO.COM


Objectives

After completing this lesson, you will be able to:


Describe the purpose of the Teradata product
List major architectural features of the product
SMP
MPP
BYNET
Distinguish Between Shared Nothing and Shared Everything
Architecture
Describe Teradata Parallelism and Linear Scalability Features
Describe the purpose of the PEs, AMPs, PDE
LAN & Mainframe Connection to Teradata
List Teradata Strengths
Major Difference between Teradata and Other Databases

4 2012 WIPRO LTD | WWW.WIPRO.COM


What is Teradata
DB designed for DWH implementations
Massively Parallel Processing(MPP), Shared Nothing Architecture
Each node is self sufficient and independent of other nodes
Born Parallel
Linear Scalable

Interconnect
BYNET

SMPs : CPUs & Memory


Node1 Node2 Node3 Node4

Disks

5 2012 WIPRO LTD | WWW.WIPRO.COM


Parallel Shared Nothing Architecture

V2 Virtual Processors (Vprocs) Session Mgmt


Parsing
Optimizer
Dispatch
PE vproc PE vproc
Message
Passing Layer
BYNET 0
BYNET 1
Execution
AMP AMP AMP AMP AMP AMP AMP AMP Engines
vproc vproc vproc vproc vproc vproc vproc vproc Run in Parallel

PDE Replaces TOS


Operating System

No one else
Vdisk Vdisk Vdisk Vdisk Vdisk Vdisk Vdisk Vdisk shares

6 2012 WIPRO LTD | WWW.WIPRO.COM


Shared Vs. Shared Nothing

7 2012 WIPRO LTD | WWW.WIPRO.COM


Quick Recap
Oracle is Shared Everything architecture
True
Teradata is Shared Nothing architecture
True
Teradata is scalable and born parallel
True
PE vproc manages Disk I/O
False
AMP vproc manages session
False
Each AMP vproc has its own dedicated vdisk
True
Teradata Optimizer is CBO
True
All communication between nodes and vprocs is thru BYNET
True
PDE replaces Ver 1 TOS
True

8 2012 WIPRO LTD | WWW.WIPRO.COM


Parsing Engines (PEs)

The Parsing Engine is responsible for:


Managing individual sessions (up to 120)
Parsing and Optimizing your SQL requests
Dispatching the optimized plan to the AMPs over BYNET
ASCII / EBCDIC input conversion (if necessary)
Sending the answer set response back to the requesting client

If the user doesnt have proper access rights, the query is


rejected

9 2012 WIPRO LTD | WWW.WIPRO.COM


Access Module Processors (AMPs)

Each AMP is always connected to a single virtual disk (vdisk)


AMPs follow the PEs
AMPs store and retrieve rows to and from disk
AMPs Passes the data to the PE over BYNET
Each AMP holds a portion of every table
Each AMP keeps their portion of table in separate vdisk
The AMPs are also responsible for other db functions
Each AMP can perform up to 80 tasks in parallel
All AMPs can work together in parallel to service any request
Each AMP can work on several requests in parallel

10 2012 WIPRO LTD | WWW.WIPRO.COM


BYNET

The BYNET is a dual redundant, bi-directional network


interconnect
BYNET
Enables communication between AMPs and PEs
Enables multiple SMP nodes to communicate
Automatic load balancing of message traffic
Fully operational dual BYNETs provide fault tolerance
Scalable bandwidth as nodes are added

11 2012 WIPRO LTD | WWW.WIPRO.COM


Parallel Database Extension (PDE)

A software interface layer on top of the operating system that


enables the RDBMS to operate in a parallel environment.

PDE provides ability to :


Execute vprocs.
Run the Teradata RDBMS in a parallel environment

An application, which runs under the control of PDE is called a


Trusted Parallel Application (TPA).

The Teradata RDBMS is the TPA application

12 2012 WIPRO LTD | WWW.WIPRO.COM


Linear Scalability
Start small and grow forever without losing any power
Adding AMPs/PEs/Nodes to the system improve the performance
linearly
Capability for increased workload without decreased performance

More nodes
More work
More users
More data

BYNET

Node
Node1 Node2 Node3 Node4 Work
Users
Data
13 2012 WIPRO LTD | WWW.WIPRO.COM
Linear Scalability (Contd.).

Shared nothing MPP platform

Software can scale linearly with hardware

Parallel unit Vprocs act as self contained mini DBMS.

AMPs do sorting, locking, journaling, loading, backup, recovery


functions independently.

Adding AMPs/PEs/Nodes to the system improve the


performance linearly. E.g.. Doubling the number of nodes will
also double the query execution speed.

14 2012 WIPRO LTD | WWW.WIPRO.COM


Linear Scalability (Contd.).

Linear Scalability
Teradata

Total
Work
Accom
plished
Non-Linear Scalability
Traditional Transaction
Processing Systems

Increasing CPU power

15 2012 WIPRO LTD | WWW.WIPRO.COM


LAN & Mainframe Connection to Teradata

Client Req. Client Req


Network Channel
attached CLI attached CLI
system system ESCON Channel
MTDP Or
TDP
MOSI Bus/Tag Cables

Host Adapter
Ethernet Adapter
P
Gateway S/W
Parser Engine Parser Engine

BYNET D

AMP AMP AMP AMP

Vdisk Vdisk Vdisk Vdisk E

16 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata Strengths

Parallel Processing
Linear Scalability - manageable growth via modularity
Experienced Optimizer - CBO
Load Utilities TPT, FastLoad, MultiLoad, TPump, FastExport
Active Data Warehousing
Easy Database Administration
Designed to run the worlds largest enterprise data warehouse
databases
Executes on UNIX, Linux and Windows operating systems
Runs on single or multiple nodes
Provides Network and Mainframe connectivity
Supports Industry standard access language (SQL)
Fault tolerant at all levels of hardware and software

18 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata Implementations

Worlds Largest Retailers


Worlds Largest Banks
Worlds Largest Global Telecommunication
Worlds Largest Airlines
Worlds Largest Insurance Companies
Two largest commercial data warehouse in the world use
Teradata
AT&T
Walmart

19 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata vs. the Traditional Databases

Teradata designed to accommodate data warehouse implementation.

Traditional database systems were designed for transaction processing.

Data warehousing features in the traditional databases rather patches on the


top of the core database.

Teradata has data warehousing features embedded into the core of the
database.

Teradata uses Shared Nothing Architecture and hence eliminates Resource


Contention.

Linear scalability, unconditional parallelism, multi-faceted parallelism,


intelligent data distribution, parallel-aware optimizer makes Teradata capable
of handle large data and complex queries.

20 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata A Brief History

Year Events
1979 Teradata Corp founded in L.A. Cal
Development begins on a massively parallel computer
1982 YNET technology is patented
1984 Teradata markets the first database computer
DBC/1012 First system purchased by Wells Fargo Bank of Cal.
1987 First public offering of stock
1989 Teradata and NCR partner on next generation of DBC
1991 NCR Corporation is acquired by AT&T
1992 Teradata is merged into NCR
1996 AT&T spins off NCR Corp with Teradata product
1997 Teradata database becomes industry leader in data warehousing

21 2012 WIPRO LTD | WWW.WIPRO.COM


teradata.com

http://teradata.com/enterprise-data-warehousing/

Click on Teradata Purpose-Built Platform Family and Download


the PDF Read

Discuss on Various Teradata Models and Their Features

22 2012 WIPRO LTD | WWW.WIPRO.COM


Case let - AMP

You are the administrator for a Teradata system. What


happens to the performance of the system incase you double
the number of AMPs and the number of users/data remains
same.

A. Performance gets doubled


B. Performance remains the same
C. Performance goes down
D. You cannot add AMPs

23 2012 WIPRO LTD | WWW.WIPRO.COM


Review Questions

1) When was the first Teradata product sold? 1984


2) How many zeroes are there in a trillion?12
3) Which feature of Teradata permits high performance even against
enormous databases? Parallelism
4) Name the operating systems that Teradata can execute on.
Unix, Windows and Linux
5) Is Teradata a client or a server? Server
6) What is the purpose of the PE? Session, Parser, Optimizer, Dispatcher
7) What is the purpose of the AMP? I/O and other db functions
8) How many sessions can a PE support? 120

24 2012 WIPRO LTD | WWW.WIPRO.COM


Match Quiz

1. CLI (F) A. Does aggregating, sorting and joining


2. MTDP (I) B. Handles up to 120 sessions
3. MOSI (E) C. Message passing layer
4. PE (B) D. Balances sessions in a mainframe environnent
5. AMP (A) E. Provides OS independence
6. BYNET (C) F. Lowest level interface to Teradata
7. TDP (D) G. Chooses most efficient access path
8. Optimizer (G) H. PE software that sends planned step to AMP
9. Dispatcher (H) I. Library of service routine for session management
10. Parallelism (J) J. Foundation of Teradata architecture

25 2012 WIPRO LTD | WWW.WIPRO.COM


Summary

In this lesson you learnt about:


Terms associated with relational databases
Advantage of a relational database
Purpose of the Teradata product
Major architectural features of the product
Overall Teradata parallel architecture.
Major components of the Teradata architecture.
Purpose of the PE and the AMP.
Relationship of the Teradata to its client side applications.
Brief history, strengths and implementations of the product

26 2012 WIPRO LTD | WWW.WIPRO.COM


2. Primary Index and Data Distribution

27 2012 WIPRO LTD | WWW.WIPRO.COM


Objectives

At the end of this lesson, you will be able to:


Explain the purpose of the Primary Index
Explain Primary Index types
Explain Data Distribution across VDISKS of AMPs
Explain the role of the hashing algorithm and the hash map in locating a
row.
Explain the makeup of the Row ID and its role in row storage.
State the reasons for selecting a UPI vs. a NUPI
Understand Parallelism

28 2012 WIPRO LTD | WWW.WIPRO.COM


Storing Rows

AMP AMP AMP AMP

Customer Table rows


Vendor Table rows

The uniformity of distribution of the rows of a table depends on


the choice of the Primary Index

29 2012 WIPRO LTD | WWW.WIPRO.COM


Primary Index (PI)
Each table in Teradata is required to have a Primary Index
The Primary Index will determine on which AMP a row will reside
It is a Physical mechanism used to store and access rows
The Primary Index plays 3 roles:
Data Distribution
Fastest Way to Retrieve Data
Incredibly important for Joins
It may consist of a single column, or a combination of up to 64 columns
Defined in CREATE TABLE statement
Changing the choice of Primary Index requires dropping and recreating
the table
May be unique or non-unique, Values may be changed
May be NULL
One AMP operation

30 2012 WIPRO LTD | WWW.WIPRO.COM


Types of PI

Unique Primary Index (UPI)


Does not allow duplicates
Provides even distribution of rows of the table across all AMPs
Does not require duplicate row checking
Retrieves 0 1 row

Non Unique Primary Index (NUPI)


Allows duplicates
Does not provide even distribution of rows
May choose NUPI over UPI because it may be more efficient for query
access and joins
Retrieves 0-Many rows

Both UPI and NUPI access is always one AMP operation

31 2012 WIPRO LTD | WWW.WIPRO.COM


Hash Map

7225

Hashing Algorithm

32 bit Row Hash


Hash bucket # Remaining 16 bits

0000 0001 0000 1010 1100 0111 0101 1011


Hash Map

0 1 0 A OO 01 09 0A 0F
(Hexadecimal)
00

01 05

02

AMP#
32 2012 WIPRO LTD | WWW.WIPRO.COM
Hashing PI Value
25 26

Pick Up Primary Index value

Defined by
user. Used for
Parsing Engine
data
distribution
Hashing Algorithm and access.

First 16 Bits 32 Bit Row Hash


Bucket # Hash Map
2-D Array
associates hash
buckets to
Message Passing Layer (BYNET)
AMPs
Node-1

Node-2
AMP AMP AMP AMP

33 2012 WIPRO LTD | WWW.WIPRO.COM


Data Retrieval

SELECT * FROM ORDER_MASTER;

Client

Parsing Engine

Dispatcher
RET Step

RET Step Message Passing Layer

AMP AMP AMP AMP

29 25 10 50
75
34 2012 WIPRO LTD | WWW.WIPRO.COM
Row Distribution

AMP 1 AMP 2 AMP 3 AMP 4

NUPI ON VIJAY * HARISH MANJU KAVITHA


VIJAY * VASANTH RAVI NIRMALA
CUSTOMER_NAME SAVITHA MANJU PRASHANTH

AMP 1 AMP 2 AMP 3 AMP 4


NUPI ON M
M F
CUSTOMER_SEX_CODE M F
M
M

AMP 1 AMP 2 AMP 3 AMP 4

UPI ON 10001 10002 10003 10004


CUSTOMER_ID 10005 10006 10007 10008

35 2012 WIPRO LTD | WWW.WIPRO.COM


Identifying Rows

A row hash is not adequate to uniquely identify a row.


Duplicate PI values results in same row hash
Different PI values may result in same row hash this is called Hash
Synonym
To uniquely identify a row, AMP will assign a 32-bit uniqueness
value to the 32-bit row hash.
It assigns a 1 if the Row Hash is unique or a 2 if it is the
second, or a 3 if the third, etc.
The combination of 32-bit Row Hash + 32-bit Uniqueness
Value is called Row-ID (64 bit)
Each stored row is prefixed with the Row-ID
Rows are logically maintained in Row ID sequence

36 2012 WIPRO LTD | WWW.WIPRO.COM


Explain plan - FTS

EXPLAIN SELECT * FROM contract;

Explanation
1) First, we lock a distinct RETAIL."pseudo table" for read on a RowHash to
prevent global deadlock for RETAIL.CONTRACT.
2) Next, we lock RETAIL.CONTRACT for read.
3) We do an all-AMPs RETRIEVE step from RETAIL.CONTRACT
by way of an all-rows scan with no residual conditions into Spool 1
(group_amps), which is built locally on the AMPs. The size of Spool 1 is
estimated with high confidence to be 15,000 rows (1,320,000
bytes). The estimated time for this step is 0.21seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.21 seconds.

37 2012 WIPRO LTD | WWW.WIPRO.COM


Explain Plan - UPI

EXPLAIN SELECT c_custkey,c_name,c_acctbal


FROM client
WHERE c_custkey = 993;

Explanation
1) First, we do a single-AMP RETRIEVE step from
RETAIL.CLIENT by way of the unique primary
index "RETAIL.CLIENT.C_CUSTKEY = 993" with no
residual conditions. The estimated time for this step is 0.01
seconds.
-> The row is sent directly back to the user as the result of
statement 1. The total estimated time is 0.01 seconds.

38 2012 WIPRO LTD | WWW.WIPRO.COM


Explain plan - NUPI

EXPLAIN SELECT
l_orderkey,l_partkey,l_linenumber,l_linestatus
FROM item
WHERE l_orderkey = 54528

Explanation
1) First, we do a single-AMP RETRIEVE step from
RETAIL.item by way of the primary index
"RETAIL.item.L_ORDERKEY = 54528" with no residual conditions
into Spool 1 (one-amp), which is built locally
on that AMP. The size of Spool 1 is estimated with high
confidence to be 4 rows (120 bytes). The estimated time for this
step is 0.02 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.02 seconds.

39 2012 WIPRO LTD | WWW.WIPRO.COM


Locating a Row

PE

32 Bit Row Hash


48 Bit Table ID Index Value
Value

DSW AMP Number

AMP File System

Logical Block Identifier Block

Logical Row Identifier Data Row

40 2012 WIPRO LTD | WWW.WIPRO.COM


Locating a Row (Contd.).

DSW part of the row hash fed to Hash Map which identifies the
target AMP number.

The AMP access its Master Index.

Master Index identifies the Cylinder Index

Cylinder Index identifies the data blocks.

A search of data blocks locates the row.

41 2012 WIPRO LTD | WWW.WIPRO.COM


Locating a Row (Contd.).

Master Index
Table Id
+
Row Hash

Table Id Cylinder Index


+
Row Hash
+
Cylinder #

Row Hash + PI Value + Target Row

42 2012 WIPRO LTD | WWW.WIPRO.COM


Data Blocks
Data Block contains one or more row of the same table.

Block sizes range between 512 and 130560 bytes.

Blocks within an individual table can vary, file system adjust


their sizes dynamically as required.

System maintains rows within the block in logical ROW ID


sequence.

Tables involved in Data Warehouse and Decision Support


usually have larger block size to accommodate more rows per
block.

43 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata Parallelism

Each PE supports upto 120 sessions in parallel.

Each sessions may handle multiple requests concurrently up to


16 requests.

MPL design to avoid any bottleneck for the system.

Each AMPs can handle up to 80 task in parallel.

44 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata Parallelism (Contd.).

Multiple sessions can be established by a client utility to


perform multiple tasks in parallel.

Optimizer may concurrently perform more than one step on


behalf of the same request.

Teradata DBMS is supported by a set of parallel client tools to


achieve optimum throughput.

45 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata Parallelism (Contd.).

Query Parallelism

Within-a-Step Parallelism

Multi-Step Parallelism

Common sub-expression elimination.

46 2012 WIPRO LTD | WWW.WIPRO.COM


Query Parallelism

Query parallelism enabled by hash partitioning the data across


all the AMPs defined in the system.

An AMP provides all the database services on its allocation of


data blocks.

Table scans, index scans, projections, selections, joins,


aggregations, sorts executes in parallel across all AMPs.

47 2012 WIPRO LTD | WWW.WIPRO.COM


Query Parallelism (within-a-step)

Optimizer generates steps to execute a SQL request.

A step is often a large chunk of multiple database operations.

Multiple relational operations are processed in parallel by


pipelining.

For example, while a table scan is taking place, selected rows


can be pipelined into join process.

48 2012 WIPRO LTD | WWW.WIPRO.COM


Query Parallelism (Multi-Step)

Executing multiple steps of a query simultaneously across all


unit of parallelism in the system.

One or more processes are invoked for each step on each


AMP to perform a database operation.

49 2012 WIPRO LTD | WWW.WIPRO.COM


Review Questions

For each statement, indicate whether it applies to:


UPIs, NUPIs, Either, or Neither

_______ a) Specified in CREATE TABLE statement


_______b) Provides uniform distribution via the hashing algorithm
_______ c) May be up to 64 columns
_______ d) One AMP operation always
_______ e) Access will return (at most) a single row
_______ f) Used to assign a row to a specific AMP
_______ g) Allows a null
_______ h) Values cannot be changed
_______ i) Required on every table
_______ j) Would permit duplicate rows

50 2012 WIPRO LTD | WWW.WIPRO.COM


Review Exercises

Fill in the Blanks

1. The output of the hashing algorithm is called the _____ _____.


2. To determine the target AMP, the Message Passing Layer
must lookup an entry in the Hash Map based on the ________
number.
3. Two different PI values which hash to the same value are
called Hash ___________ .
4. A Row ID consists of a row hash plus a ____________ value.
5. A uniqueness value is required to produce a unique Row ID
because of _______ _________ and ______ ___________ .

51 2012 WIPRO LTD | WWW.WIPRO.COM


Try on machine

1.Make RETAIL as your default database for the current


session.

2.Get info about indexes on CLIENT, ITEM and CONTRACT


tables

3.Generate and observe the execution plans for rows retrieved


through FTS, UPI and NUPI

4.Make TDUSER as your default database

5.Create your own tables one with UPI and other with NUPI

52 2012 WIPRO LTD | WWW.WIPRO.COM


Summary

In this lesson you learnt about:


Purpose of the Primary Index
Types of Primary Index types
Role of the hashing algorithm and the hash map in locating a row.
Row ID and its role in row storage.
The reasons for selecting a UPI vs. a NUPI
Parallelism

53 2012 WIPRO LTD | WWW.WIPRO.COM


3. Teradata Space

54 2012 WIPRO LTD | WWW.WIPRO.COM


Objectives

After completing this lesson, you will be able to:


Define Teradata Database
Define Teradata User
Distinguish between Database and User
Define Perm Space and it use
Define Spool and Temp Space and their uses

55 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata Database

A Teradata database is a defined logical repository for Tables,


Views, Macros and Triggers.

CREATE DATABASE new_db FROM existing_db AS


PERMANENT = 20000000
,SPOOL = 50000000

New_db is owned by existing_db

Perm Space - Max amount of space available for tables


Spool Space - Max amount of work space available for
requests

56 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata User

A Teradata user is a database with an assigned password


A user may logon to Teradata and access objects within:
Itself
Other databases for which it has access rights
A user is an active repository while a database is a passive
repository

CREATE USER new_user FROM existing_user AS


PERMANENT = 10000000
,PASSWORD = lucky_day
,SPOOL = 20000000

New_user is owned by existing_user

57 2012 WIPRO LTD | WWW.WIPRO.COM


Demo

Demonstrate Creating Database and Users using Teradata


Administrator Client

Explain Hierarchical Structure of Databases

Observe Change in Max Perm of Parent DB when a child DB is


created and dropped

58 2012 WIPRO LTD | WWW.WIPRO.COM


Perm Space

Perm Space is Max amount of space available for storing


Tables, Secondary Index (SI), Permanent Journal
Perm Space defines the upper limit, not allocated at table
creation time
Perm Space is released when data is deleted or when objects
are dropped
Following require no Perm Space
Views, Triggers, Macros
All Perm Space specifications are subtracted from the creator
Perm Space is a zero sum game - the total of all Perm Space
allocations must equal the total amount of disk space available

59 2012 WIPRO LTD | WWW.WIPRO.COM


Spool and Temp Space

Spool Space is Max amount of work space available for


requests
Spool Space is used to hold intermediate and final query result
set
Spool Space is literally unused Perm Space
Spool Space specified is the upper limit for query answer set
If the query exceeds the limit, query gets aborted immediately
You do not add or subtract when spool space is given to
someone else.
Temp Space is also unused Perm Space used for Global
Temporary Tables

60 2012 WIPRO LTD | WWW.WIPRO.COM


Case let Perm Space

You are a DBA for a teradata system that has 200 GB of Perm
Space. You create a User MKRT with 60 GB Perm Space.
User MKRT creates a user SALES with 40 GB of Perm
Space. How much Perm Space User MKRT is left with ?

A. 20 GB
B. 60 GB
C. 100 GB
D. 160 GB

61 2012 WIPRO LTD | WWW.WIPRO.COM


Case let Spool Space

You are a DBA for a teradata system that has 200 GB of


Perm Space and you have reserved 100 GB for Spool. The
system currently has 60 GB of users data. How much is left
out for spool ?

A. 140 GB
B. 40 GB
C. 100 GB
D. 200 GB

62 2012 WIPRO LTD | WWW.WIPRO.COM


Exercise Perm and Spool Space

User A starts with 10 MB of Perm and 10 MB of Spool. User A


creates User B and User C with 1 MB of Perm and 10 MB of
Spool each. How much Perm and Spool User A has now ?

User C creates User D with 1 MB of Perm. Can User C create


tables ?

User B is dropped. How much Perm and Spool User A has now
?

63 2012 WIPRO LTD | WWW.WIPRO.COM


Summary

In this lesson you learnt about:

Teradata Database
Teradata User
Difference between Database and User
Perm Space and it use
Spool and Temp Space and their uses

64 2012 WIPRO LTD | WWW.WIPRO.COM


4. Introduction to Data Protection

65 2012 WIPRO LTD | WWW.WIPRO.COM


Objectives

At the end of this lesson, you will be able to:

Define a transaction and transaction modes


Describe transient journal and its function.
Explain the concept of FALLBACK tables.
Describe Cliques and its purpose
Describe permanent journal and its function.
Describe RAID levels.
List the types and levels of locking.

66 2012 WIPRO LTD | WWW.WIPRO.COM


Transaction

A transaction is a logical unit of work.

Statements nested within the transaction either execute


successfully as a group or do not execute.
ANSI Mode
Teradata Mode

ANSI Mode :
Transactions are always implicit in ANSI session mode.
A transaction initiates when :
The first SQL statement in a session executes
The first statement following the close of a transaction
COMMIT or ROLLBACK/ABORT statements close a transaction.
If a transaction includes a DDL statement, it must be the last statement
in the transaction.

67 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata Mode

A Teradata SQL transaction can be a single SQL statement, or


a sequence of SQL statements (MACRO), treated as a single
unit of work.
Implicit
Explicit
Starts with BEGIN TRANSATION statement.
Ends with END TRANSACTION statement.
If a transaction includes a DDL statement, it must be the last
statement in the transaction.
Error results roll back the entire transaction.
Transactions are atomic - either all requests are performed, or
none are.

68 2012 WIPRO LTD | WWW.WIPRO.COM


Transient Journal
Is an automatic feature that provides Data Integrity
Automatic rollback of changed rows in the event of transaction failure
Data is always returned to its original state after a transaction failure.
Takes Before Image (BI) of changes for rollback purpose
BI is stored in AMPs transient journal
AMPs transient journals are maintained in DBC users Perm
Space.
When the transaction is committed, the BI in transient journal is
purged automatically
When a transaction fails
User receives failure message
Transaction is rolled back
Locks are released
Spool files are discarded

69 2012 WIPRO LTD | WWW.WIPRO.COM


Fallback

Fallback table is available in the event of an unavailable AMP


(Single AMP)
A Fallback row is a copy of a Primary row which is stored on a
different AMP.
Automatic restore of data changed during AMP off-line.
Create table with or without Fallback
Add/drop Fallback feature any time
Cost of Fallback
Twice the disk space for table storage.
Twice the I/O for Inserts, Updates and Deletes.
AMP 1 AMP 2
Primary Rows 2 8 6 4 9 5
Fallback Rows 4 9 5 2 8 6

70 2012 WIPRO LTD | WWW.WIPRO.COM


Fallback Cluster
Fallback is always associated with clusters
A Fallback cluster is a defined number of AMPs which are
treated as a single fault tolerant unit.
All Fallback rows for AMPs in a cluster must reside within the
cluster.
Loss of one AMP in the cluster permits continued table access.
Loss of two AMPs in the cluster causes the RDBMS to halt.
AMP 1 AMP 2
Primary Rows 2 8 6 4 9 5
Fallback Rows 4 9 5 2 8 6
Two Clusters
With 2 AMPs each
AMP 3 AMP 4
Primary Rows 12 18 1 6 14 19 15
Fallback Rows 14 19 1 5 12 18 16

71 2012 WIPRO LTD | WWW.WIPRO.COM


Down AMP Recovery Journal (DARJ)

DARJ is started on all AMPs in a cluster when an AMP is down

DARJ keeps track of all changes that would have been written
to the failed AMP.

When the AMP comes back online, the DARJ will catch-up the
AMP by applying the missed transactions.

Once everything is caught up, the DARJ is dropped

72 2012 WIPRO LTD | WWW.WIPRO.COM


Cliques
Clique provides protection against the failure of an entire node

BYNET

Amp Amp Amp Amp Amp Amp Amp Amp

CLIQUE-1 CLIQUE-2

Disk Array Disk Array Disk Array Disk Array

73 2012 WIPRO LTD | WWW.WIPRO.COM


Permanent Journal

The Permanent Journal is an optional, user specified, system-


maintained journal which is used for recovery of a database to
a specified point in time.
The Permanent Journal:
Is used for recovery from unexpected hardware or software disasters.
May be specified for one or more tables
Permits capture of Before Images for database rollback.
Permits capture of After Images for database roll forward.
Permits archiving change images during table maintenance.
Reduces need for full table backups.
Provides a means of recovering NO FALLBACK tables.
Requires additional disk space for change images.
Requires user intervention for archive and recovery activity.

74 2012 WIPRO LTD | WWW.WIPRO.COM


RAID
RAID Redundant Array of Independent Disks provides protection
against a disk failure
Teradata uses RAID-1
RAID 1 Transparent Mirroring
Provides high data availability and performance, but storage costs are high.
Characteristics:
Data is fully replicated
Mirrored striping is possible with multiple pairs of disks in a drive group
Transparent to operating system
Advantages:
Maximum data availability, read performance gains
No performance penalty with write operations
Fast recovery and restoration
Disadvantages:
50% of disk space for mirrored data

75 2012 WIPRO LTD | WWW.WIPRO.COM


RAID 5

Data Parity Protection, Interleaved Parity


Characteristics
Data and parity is striped and interleaved across multiple disks
XOR logic is used to calculate parity
Data is reconstructed on a disk failure
Transparent to operating system
Advantages
Provides high availability with minimum disk space (e.g., 25%) used for
parity overhead
Disadvantages
Write performance penalty
Performance degradation during data recovery and reconstruction

76 2012 WIPRO LTD | WWW.WIPRO.COM


Locks

Locking prevents multiple users who are trying to change the same
data at the same time from violating the data's integrity. This
concurrency control is implemented by locking the desired data.

There are four types of locks:


Exclusive - prevents any other type of concurrent access
Write - prevents other reads, writes, exclusives
Read - prevents writes and exclusives
Access - prevents exclusive only

Locks may be applied at three database levels:


Database - applies to all tables/views in the database
Table/View - applies to all rows in the table/views
Row Hash - applies to all rows with same row hash

77 2012 WIPRO LTD | WWW.WIPRO.COM


Locks

Implicit locking based on the SQL command:


SELECT - applies a Read lock
UPDATE - applies a Write lock
CREATE TABLE - applies an Exclusive lock

Explicit locking using LOCKING modifier


LOCKING FOR ACCESS SELECT * FROM CUSTOMER;
LOCKING FOR EXCLUSIVE UPDATE CUSTOMER SET LID = 2000;
LOCKING FOR WRITE NOWAIT UPDATE CUSTOMER LID = 2001;

Locks would be covered in-depth in Teradata SQL Tuning


Module

78 2012 WIPRO LTD | WWW.WIPRO.COM


Case let - Protection

You are a DBA for a Teradata system and you need to protect
your system against the failure of an entire node. Which
protection feature would you choose ?

A. Clique
B. Fallback Cluster
C. RAID
D. Database Locks

79 2012 WIPRO LTD | WWW.WIPRO.COM


Match Quiz

1. Provides for TXN rollback in case of failure a. Database locks


2. Protects all rows of a table b. Table locks
3. Logs changed rows for down AMP c. Row Hash locks
4. Provides for recovery to a point in time d. FALLBACK
5. Applies to all tables and views within e. Cluster
6. Lowest level of protection granularity f. Recovery journal
7. Protects tables from AMP failure g. Transient journal
8. Protects database from a physical drive failure h. Permanent journal
9. Fault tolerant unit used by Fallback i. RAID
10. Provides protection against failure of a node j. Clique

80 2012 WIPRO LTD | WWW.WIPRO.COM


Summary

In this lesson you learnt about:

Transaction and its modes


Transient journal and its function.
Concept of FALLBACK tables.
Cliques and its purpose
Permanent journal and its function.
RAID levels.
Types and levels of locking.

81 2012 WIPRO LTD | WWW.WIPRO.COM


5. Teradata Tools and Objects

82 2012 WIPRO LTD | WWW.WIPRO.COM


Objectives

At the end of this lesson, you will be able to:


Identify Teradata Client Tools and their applications
Identity Teradata Object Types and their purpose

83 2012 WIPRO LTD | WWW.WIPRO.COM


BTEQ

BTEQ, Basic Teradata Query - pronounced BEE-teek , is a


general-purpose command-driven utility
Used to access and manipulate data on the Teradata Database
Generate reports
Perform data movement - Export and Import (suitable for small
volume)
BTEQ is a transparent interface to CLI, to transmit textual SQL to
Teradata server, and deliver response to the user
Limited ability to branch using LABEL
Runs on every supported platform LAN and Channel Attached
Clients
Online Demonstration of using BTEQ
BTEQ would be covered in-depth in Teradata BTEQ and Utilities
Module

84 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata SQL Assistant
Teradata SQL Assistant is an information discovery/query tool
that runs on Microsoft Windows.
Teradata SQL Assistant enables you to access the Teradata
Database as well as other ODBC-compliant databases. Some
of its features include:
Ability to save data in PC-based formats, such as Microsoft Excel,
Microsoft Access, and text files.
History of submitted SQL syntax, to help you build scripts for data
mining and knowledge discovery.
Help with SQL syntax.
Import and export of small amounts of data to and from ODBC-
compliant databases.

Online Demonstration of using Teradata SQL Assistant by


creating DSN

85 2012 WIPRO LTD | WWW.WIPRO.COM


Teradata Administrator
Teradata Administrator provides a comprehensive Windows-
based graphical interface to perform database administration
tasks
Create, Modify and Drop Databases, Users, Roles, Profiles, and User-
Defined Types.
Create Tables (using ANSI or Teradata syntax)
Grant or Revoke access and system rights
Copy Table, View or Macro definitions to another database, or to
another system
Drop or Rename Tables, Views or Macros
Move space from one database to another
Display information about a Database or Users
Display information about a Table, View or Macro
Set up the rules for Query and Access Logging

Online Demo of using Teradata Administrator


86 2012 WIPRO LTD | WWW.WIPRO.COM
Other Client Tools

Index Wizard
Statistics Wizard
Visual Explain
All these would be covered in Teradata SQL Tuning Module

Load and Unload Utilities


FastExport
FastLoad
MultiLoad
TPump
TPT
All these would be covered in Teradata BTEQ and Utilities
Module

87 2012 WIPRO LTD | WWW.WIPRO.COM


Tables
A two-dimensional structure of columns and rows of data

Permanent Tables -Requires Perm Space


SET : No duplicate rows
MULTISET : duplicate rows allowed

Temporary Tables
Derived Tables Requires Spool Space
Volatile Tables Requires Spool Space
Global Temporary Tables Requires Temp Space

All These would be covered in-depth in Teradata SQL Module

88 2012 WIPRO LTD | WWW.WIPRO.COM


Indexes
Apart from PI that we covered in this module, Other Indexes that
Teradata Supports are

Partition Primary Indexes


Single Level
Multi Level

Secondary Indexes
USI
NUSI
Value Order NUSI

Join Index

Hash Index

All these would be covered in-depth in Teradata SQL Tuning

89 2012 WIPRO LTD | WWW.WIPRO.COM


Views
A view is a window into the data contained in relational tables.
A view is sometimes called a virtual table.
It may define a subset of rows of a table.
It may define a subset of columns of a table.
It may reference more than one table.
Data is neither duplicated nor stored separately for a view.
Data can be accessed directly via a table or indirectly via a view,
based on privileges held.
View definitions are stored in the Data Dictionary, not in the users
own space.
Help restrict which rows and columns are visible from base tables.
Help Simplify Query Complexity
VIEWS would be covered in-depth in Teradata SQL Module

90 2012 WIPRO LTD | WWW.WIPRO.COM


Macros

Macros contain one or more prewritten SQL statements.


Macros are a Teradata extension to ANSI SQL.
Macros are stored in the Teradata Data Dictionary.
Macros can be executed from any viable SQL front-end,
including:
Teradata SQL Assistant, BTEQ, LOGON Startup
Another macro
To execute a macro requires the user to have the EXEC
privilege on the macro.
Explicit privileges on the tables or views used by the macro are
not needed by the executing user.
One Macro is One Transaction
MACROS would be covered in-depth in Teradata SQL
Module

91 2012 WIPRO LTD | WWW.WIPRO.COM


Trigger

A trigger is build to perform an action when an DML occurs on a table.


Triggers fire automatically
Triggering statement and trigger action constitute a single transaction, so if the
trigger fails, the transaction fails.
Trigger is a database object of type G
Example : Trigger built on EMP table to insert record into AUDIT table when any
employee salary is updated.

Fires CREATE TRIGGER EMP_SAL_TRIGGER


UPDATE EMP AFTER UPDATE OF (SAL) ON EMP
SET SAL=SAL * 1.1 REFERENCING OLD AS BI
WHERE DEPTNO = 10; NEW AS AI
FOR EACH ROW
(INSERT INTO EMP_AUDIT_TABLE
SELECT * VALUES (BI.EMPNO, BI.SAL, AI.SAL,
FROM EMP_AUDIT_TABLE; DATE); );

92 2012 WIPRO LTD | WWW.WIPRO.COM


Stored Procedures

Consist of a set of control and condition handling statements,


that make SQL a computationally complete programming
language.

A single statement stored procedure body can contain one


control statement,such as LOOP or WHILE, or one SQL DDL,
DML, or DCL statement. Some statements are not allowed,
including:
Any declaration (local variable, cursor, or condition handler) statement
A cursor statement (OPEN, FETCH, or CLOSE)

93 2012 WIPRO LTD | WWW.WIPRO.COM


Stored Procedures (Contd.).

A compound statement stored procedure body consists of a


BEGIN-END statement enclosing a set of declarations and
statements, including:
Local variable declarations
Cursor declarations
Condition handler declaration statements
Control statements
SQL DML, DDL, and DCL statements supported by stored procedures

Compound statements can also be nested.

94 2012 WIPRO LTD | WWW.WIPRO.COM


User-Defined Functions

User-defined functions (UDFs) allow you to extend SQL by


writing your own functions in the C programming language,
installing them on the database, and then using them like
standard SQL functions.
You can also install UDF objects or packages from third-party
vendors, without providing the source code.
UDFs run in parallel, as required, on all AMPs.
Scalar functions take input parameters and return a single
value result.
Aggregate functions produce summary results. They differ from
scalar functions in that they take grouped sets of relational
data, make a pass over each group, and return one result for
the group.

95 2012 WIPRO LTD | WWW.WIPRO.COM


Summary

In this lesson you learnt about:


Teradata Client Tools and their applications
Teradata Object Types and their purpose

96 2012 WIPRO LTD | WWW.WIPRO.COM


6. Introduction to Active Data Warehouse

97 2012 WIPRO LTD | WWW.WIPRO.COM


Objectives
At the end of this lesson, you will be able to:
Describe Online Transaction Processing (OLTP )
Describe Decision Support System (DSS)
Describe Data Warehouse
Describe Active Data Warehouse

98 2012 WIPRO LTD | WWW.WIPRO.COM


OLTP

Transactions typically occurs in seconds and not in minutes


Number of rows per transaction is also smaller
Only a few of many possible tables are accessed.
Very little I/O processing is required to complete a transaction
Examples of OLTP transactions
Updating a checking or saving account to reflect a deposit or withdrawal
ATM money withdrawal from a bank
OLTP queries run quickly and are called tactical queries.
Examples:
Altering a campaign based on current status
Determining the best offer for a specific customer

99 2012 WIPRO LTD | WWW.WIPRO.COM


Decision Support System (DSS)
Typically used for strategic long range planning and answering what-
if questions.
Transactions takes minutes to hours.
Many users asking wide variety of questions
Transactions usually involve multiple tables and millions of rows
Examples of DSS transactions/queries
Creating a report to show a comparison of sales from this week to last week
Creating a report that shows the top ten selling items across all stores for one
year
Determine monthly sales of shoes
Following are the few aspects of the DSS environment have gained
importance as technology has improved
the ability to use detailed data
the ability to do adhoc queries
the decreased need to use summary data

10
2012 WIPRO LTD | WWW.WIPRO.COM
0
Data Warehouse
A data warehouse is a central, enterprise wide database that
contains information obtained from operational systems,
designed around DSS.

Data Warehouse Evolution is:


Primarily batch / Pre-defined reports
Increase in ad-hoc queries
Analytical modeling
Continuous update and time sensitive queries became important
Event Based Triggering

In the beginning the warehouse is used for analyzing, which


over time evolves into predicting and finally into
operationalizing.

10
2012 WIPRO LTD | WWW.WIPRO.COM
1
Active Data Warehouse

Allows companies to take their OLTP transactions and load


them into the data warehouse in near real-time so users can
analyze the data and make real time decisions before their
competitors.
Supports short tactical OLTP type queries mixed with large
DSS queries
Provides scalability in order to
Support large amounts of detail data
Update operational data store directly
Support an integrated environment with wide mix of queries
Some characteristics of an active data warehouse environment
are
Mission critical application
Tactical queries
24/7 availability and reliability
10
2012 WIPRO LTD | WWW.WIPRO.COM
2
Case let - DWH

You are a designer at Wipro. You are designing a data


warehouse which Wipros customers will utilize for strategic
long range planning and answering what-if questions. What
kind of system has to be designed.

A. OLTP
B. DSS
C. OLDB
D. RDDB

10
2012 WIPRO LTD | WWW.WIPRO.COM
4
Summary

In this lesson you learnt about:


Online Transaction Processing (OLTP )
Decision Support System (DSS)
Data Warehouse
Active Data Warehouse

10
2012 WIPRO LTD | WWW.WIPRO.COM
5
References

Teradata 12 Basics (An Authorized Teradata Certified


Professional Program Study Guide)

10
2012 WIPRO LTD | WWW.WIPRO.COM
6
Thank You

10
2012 WIPRO LTD | WWW.WIPRO.COM
7

You might also like