You are on page 1of 30

Teradata Architecture Overview

Eric Prestegard
Senior Solution Architect


November 30, 2005
2 > 10/16/2014
Teradata Warehouse Environment
Teradata Tools & Utilities Foundation (TTU 8.0)
Teradata Database Engine (V2R6.0)
D
a
t
a
b
a
s
e

A
p
p
l
i
c
a
t
i
o
n
s

a
n
d

T
o
o
l
s

Query, OLAP,
& Reporting
Tools
ETL Tools &
Middleware
Adapters
Teradata CRM
Teradata
Industry
Applications
P
l
a
t
f
o
r
m

&

E
n
a
b
l
i
n
g

T
e
c
h
n
o
l
o
g
i
e
s

Teradata Data Warehouse Environment
NCR MPP Platform (5400)
Unix & Windows
Teradata Warehouse 8.0
3 > 10/16/2014
Teradatas Unique Architecture
Parallelism built in from the ground up
Architected for unlimited scalability
Dedicated to automatic management and operation
Easy and fast data movement, in and out
High concurrency and mixed workloads
Committed to the highest levels of reliability and
availability
Supports SMP and MPP hardware
Architected to deliver business value
4 > 10/16/2014
Parallel RDBMS
Session
Security
Access
Qualify
Parse
Plan
Dispatch

Merge
Sort
Teradata Design Philosophy
The Teradata Data Warehouse RDBMS:
> Executes all steps in parallel for maximum performance
> Hides its complexity from the applications
> Manages itself so DBAs can focus on other things
> Scales effortlessly to meet business need
SMP SMP SMP SMP
MPP
Teradata
BYNET
.
.
5 > 10/16/2014
Scalable Hardware
4 Node Teradata System
Nodes
> Latest Intel SMP CPUs
> Configured in 2 to 8 node
cliques
> Windows or Unix
Storage
> Independent I/O
> Scales per node
BYNET Interconnect
> Fully scalable bandwidth
> 1 to 1024 nodes
Connectivity
> Fully scalable
> Channel - ESCON
> LAN, WAN
Server Management
> One console to view
the entire system
HBA1 HBA2
SMP Node
1

HBA1 HBA2
SMP Node
2

HBA1 HBA2
SMP Node
3

HBA1 HBA2
SMP Node
4

Dual BYNET Interconnects
Server Management
CPU1 CPU2 CPU1 CPU2 CPU1 CPU2 CPU1 CPU2
6 > 10/16/2014
NCR 5400 MPP Server
Continued rapid adoption of latest
Intel

Technology
> 2-way ~3.6 processors with
Hyper-Threading
> 800 MHz front side bus
> 64-bit Extension Technology
Industry Standard Form Factor
> Up to 10 nodes per cabinet
> Integrated BYNET
> Integrated Server Management
> N+1 UPS
> Dual AC
Multi-Generation Coexistence
> Investment protection
7 > 10/16/2014
Up to 10
nodes per
cabinet
1
3
1
3
1
3
1
3
1
3
1
3
1
3
1
3
1
3
1
3
1
3
Server Mgmt
BYNET
BYNET
UPS
UPS
UPS
UPS
UPS

NCR 5400 MPP Server
Continued rapid adoption of latest
Intel

Technology
> 2-way ~3.6 processors with
Hyper-Threading
> 800 MHz front side bus
> 64-bit Extension Technology
Industry Standard Form Factor
> Up to 10 nodes per cabinet
> Integrated BYNET
> Integrated Server Management
> N+1 UPS
> Dual AC
Multi-Generation Coexistence
> Investment protection
8 > 10/16/2014
Teradata Parallel Architecture
Dividing the Work with Parallel Units
PE
SMP Node
1

AMP PE
AMP AMP AMP
PE
SMP Node
2

AMP PE
AMP AMP AMP
PE
SMP Node
3

AMP PE
AMP AMP AMP
PE
SMP Node
4

AMP PE
AMP AMP AMP
BYNET Interconnects
Parsing Engine (PE)
- SQL Parser &
Optimizer
- Query Step
Dispatcher

Network Distribution
- via BYNET

Access Module
Processors (AMP)

Disk Partitions
9 > 10/16/2014
The Teradata Software Approach
Dividing the Work
Shared Nothing
> Take a big task
> Slice it vertically into a (large) number of smaller tasks
> Perform those tasks independently
> Balance the work so all the tasks complete simultaneously
> Assign the tasks evenly among the physical resources
> Communicate only at the beginning and end of a task
Benefits
> Large task completes in a short elapsed time
> Maximizes use of resources
> Minimizes communication bottlenecks
10 > 10/16/2014
Rows are distributed evenly by hash partitioning
> Done in real-time as data are loaded, appended, or changed.
> No reorgs, repartitioning, space management
Shared nothing software:
> Each VAMP owns an equal slice of the data.
> Each VAMP works exclusively & independently on its rows
> Nothing centralized: No single point of control for any operation (I/O,
Buffers, Locking, Logging, Dictionary)
Teradata Data Distribution
Dividing the Work
VAMP1 VAMP2 VAMP3 VAMP4 VAMPn
Table A Table B Table C
Prime Index
Teradata Parallel Hash Function
P
D M
P
D M
P
D M
P
D M
P
D M
P
D M
P
D M
P
D M
P
D M
RowHash (Hash Bucket) Data Fields
11 > 10/16/2014
Define Logically not Physically
Define Max Size
for Database
Create a Table
Add Rows
Repeat
Teradata
Teradata manages all data placement
All databases share all space
No Re-orgs, ever!
Other
Define Stripes
Add Stripes to
Table Spaces
Create Tables in
Table Space
Create indexes
in Table Space
Create Logs
in Table Space
Repeat for next
Database
Allows fine tuning of data placement
for OLTP Performance.
12 > 10/16/2014
Tactical and Strategic Queries
Strategic queries
> Complex
> Often multiple joins, multiple steps
> SLA expectation is to complete
within a given time frame
Tactical queries
> Fall into multiple categories
> Tighter SLA expectations
> OLTP type
Single AMP operations
Typically sub-second
> Varied levels of complexity
May involve multiple AMPS for
resolution
Short queries longer than sub-
second
Complex Analytical Query:
Scans all AMPs


O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

O
n
e

A
M
P

One Row from one AMP,
or Few AMP

O
n
e

A
M
P

13 > 10/16/2014
Tactical Queries
VPROCS
AMP & PE
VPROCS
AMP & PE
VPROCS
AMP & PE
VPROCS
AMP & PE
BYNET
Reduce AMPs Work
Single AMP (UPI,NUPI)
Few AMP (USI)
All AMP (NUSI)
Covered Index
Value Ordered
Join Index ,
Aggregate JI
Single Table
Reduce Locking
Access Locking
Row Level Locking
Views, Macros, Stored
Procedures
Explain, Explain, Explain
14 > 10/16/2014
Teradata
Unconditional Parallelism
Optimizer Automatic Parallelization
Cost based considers the cost of
each step
> Cardinality and skew
> Redistribution vs. duplication
> No Hints
Rewrites built-in and cost based
Parallelism is automatic
Parallelism is unconditional
Each query step fully parallelized
No single threaded operations
> Scans, Joins, Index access,
Aggregation, Sort, Insert, Update,
Delete
No single point of reference (e.g.
single node catalog partition)
Visual Explain
15 > 10/16/2014
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs
Amps
Shared Nothing Software
Delivers linear scalability
> Maximizes utilization of SMP resources
> To any size configuration
> Allows flexible configurations
> Incremental upgrades
Linear with a slope of 1 at any size
16 > 10/16/2014
Growing Parallelism via System Expansion
Begin with a 4 Node Teradata System
AMPs
> Each AMP owns an equal
amount of data
> Growing the system adds
more AMPs
> Data per AMP is
proportionately reduced
> Performance is
proportionately increased
Re-Config Utility
> Runs in parallel on each AMP
> Re-distributes appropriate
data to new AMPs
PE
SMP Node
1

AMP PE
AMP AMP AMP
PE
SMP Node
2

AMP PE
AMP AMP AMP
PE
SMP Node
3

AMP PE
AMP AMP AMP
PE
SMP Node
4

AMP PE
AMP AMP AMP
BYNET Interconnects
Table A - 32M Rows
2M Rows/Amp
17 > 10/16/2014
Growing Performance and Capacity
PE
SMP Node
1

AMP PE
AMP AMP AMP
PE
SMP Node
2

AMP PE
AMP AMP AMP
PE
SMP Node
3

AMP PE
AMP AMP AMP
PE
SMP Node
4

AMP PE
AMP AMP AMP
BYNET Interconnects
PE
SMP Node
1

AMP PE
AMP AMP AMP
PE
SMP Node
2

AMP PE
AMP AMP AMP
PE
SMP Node
3

AMP PE
AMP AMP AMP
PE
SMP Node
4

AMP PE
AMP AMP AMP
PE
SMP Node
1

AMP PE
AMP AMP AMP
PE
SMP Node
2

AMP PE
AMP AMP AMP
PE
SMP Node
3

AMP PE
AMP AMP AMP
PE
SMP Node
4

AMP PE
AMP AMP AMP
BYNET
Interconnects
Table A - 32M Rows - 1M Rows/Amp
18 > 10/16/2014
Scalability: Its Not Just about Size
CUSTOMER
CUSTOMER NUMBER
CUSTOMER NAME
CUSTOMER CITY
CUSTOMER POST
CUSTOMER ST
CUSTOMER ADDR
CUSTOMER PHONE
CUSTOMER FAX
ORDER
ORDER NUMBER
ORDER DATE
STATUS
ORDER ITEM BACKORDERED
QUANTITY
ITEM
ITEM NUMBER
QUANTITY
DESCRIPTION
ORDER ITEM SHIPPED
QUANTITY
SHIP DATE
Simple Direct at the start
Moderate Multi-table Join
Regression analysis
Query tool support
Complex, 56-way table join
15 Pages, 37 From Clauses, 7 UNIONs,
(Largest table >1 B rows, < 43 minutes)
Amount of Detailed Data
Concurrent Users
Data Model Complexity Query Complexity
19 > 10/16/2014
The Teradata Advantage - Scalability
Workload
Mix
Query
Complexity
Active Data Warehousing
3-5 Way
Joins
Normalized
TBs
MBs
GBs
Query Data
Volumes
10 TB
Other
Teradata
15 TB
20 TB
Multiple, Integrated
Stars and Normalized
15+ way Joins +
OLAP operations +
Aggregation +
Complex Where
constraints +
Views
Parallelism
Batch Reporting,
Repetitive Queries
Iterative, Ad Hoc Queries
Data Analysis/Mining
Near Real Time Data Feeds
Simple
Star
Multiple,
Integrated
Stars
Data Storage
(raw, user data)
Schema
Sophistication
5-10 Way
Joins
5 TB
# of
Concurrent
Queries
1,000
20 > 10/16/2014
Workload Management
TDQM and Priority Scheduling Facility
QUERY
EXECUTES
Teradata Dynamic
Query Manager
Priority
Scheduler
Database
Query Log
PRE-EXECUTION
POST-EXECUTION
21 > 10/16/2014
Integrated High Availability
Vproc migration
> Move the work to functioning
resources
> Node Failure Protection
> OS failure Protection
> 2 to 8 Node Cliques
> May Leverage Hot Standby Node
Parallel recovery and rollback
Online utilities
> Load, Export, Backup
> Purge
> Checkpoint - Restart
Eliminate many operations
> Reorg, Index rebuild
Fallback
> Covers catastrophic failures

PE
SMP Node
1

AMP PE
AMP AMP AMP
PE
SMP Node
2

AMP PE
AMP AMP AMP
PE
SMP Node
3

AMP PE
AMP AMP AMP
PE
SMP Node
4

AMP PE
AMP AMP AMP
BYNET Interconnects
AMP AMP AMP
22 > 10/16/2014
Fallback - High Level Overview
Fallback is a DATA AVAILABILITY FEATURE used to
maximize the availability of a single instance of TERADATA.

A customer should use Fallback when their ADW requires:
> Extreme high levels of data availability.
> Rapid recovery from failure scenarios.

When Fallback is enabled
> The system can tolerate a large set of catastrophic failure
scenarios that would bring most other databases to a
complete halt with a lengthy recovery procedure.
> Hardware maintenance or upgrades may be performed
while the system remains active.

23 > 10/16/2014
Fallback additional data protection
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Node
Amps
Fallback A second copy of the table automatically maintained by
Teradata Database on separate hardware resources
Protects against unexpected catastrophic failure scenarios:
> Multiple drive failures in same RAID group
> Unrecoverable bit error during RAID reconstruction
> Localized disk array cabinet level catastrophe (fire, power, etc.)
> OS kernel, divice driver, disk array SW or FW failures
> Teradata Database PDE or File System SW
> Administrative or human error
Initiated through a simple DDL command (Create table with Fallback)



X
X
24 > 10/16/2014
Teradata Utility Pak
Teradata Administrator
Teradata SQL Assistant
Teradata SQL Assistant/Web Edition
BTEQ
ODBC
JDBC
CLI
OLE DB Provider
.Net Provider

Teradata Tools and Utilities (TTU)
Teradata
Database
Database Management
Teradata Manager
Teradata Dynamic Query Manager
Teradata System Emulation Tool
Teradata Visual Explain
Teradata Index Wizard
Teradata Statistics Wizard

Meta Data Services

Multi-System Availability
Teradata Query Director




Load/Unload
FastLoad, MultiLoad &
FastExport
Teradata TPump
Access Modules

Mainframe Connectivity
Mainframe Channel Connect
TS/API, CICS, HUTCNS & IMS/DC
25 > 10/16/2014
Teradata utilities are fully parallel
Teradata utilities have checkpoint restart capability
With Teradata, data loads directly from the source into the
database
>no file splitting
>no intermediary file transfers
>no manual data conversion
>Fully checkpoint restart-able
Teradata Utilities - Robust and Mature
Data Warehouse
parallel
in
parallel
out
26 > 10/16/2014
Robust Parallel Utilities for
All Your Data Needs
Utility Purpose Parallel
ARC
Archive and Restore
Yes
FastExport
Fast Data Unload
Yes
FastLoad
Fast Initial Data Load
Yes
MultiLoad
Insert, Update, Upsert,
Delete (high volume)
Yes
TPump
Insert, Update, Upsert,
Delete (trickle)
Yes
27 > 10/16/2014
TERADATA is an Open System
TERADATA
CORBA
ODBC
IIOP
.NET
OLE-DB
ASP
WEB
TERADATA
Utilities
Queues
Adapter(s)
M
e
s
s
a
g
e

B
u
s

P
u
b
l
i
s
h

&

S
u
b
s
c
r
i
b
e

JAVA
JDBC
JSP
EJB
JDBC
JSM
Adapter(s)
TERADATA
Utilities
Messages
Virtually any
application or
middleware
framework can
be integrated
with TERADATA
28 > 10/16/2014
Automation and Elimination
Change Management
HIGH LOW
Workload Management
HIGH LOW
Query Tuning
MED LOW
Workspace Management
HIGH AUTO
Index Reorganizing
MED NONE
Data Reorganizing
MED NONE
Data Balancing Control
NONE NONE
Free Space Management
HIGH AUTO
Data Placement Definition
HIGH AUTO
Data Partitioning Definition
AUTO AUTO
Physical Data Modeling
HIGH LOW
Logical Data Modeling
HIGH HIGH
Database Administration Task Other Teradata
Teradata provides large Total Cost of Ownership advantages
High Availability Failover
HIGH AUTO
29 > 10/16/2014
The Teradata Mission

Teradata Active Data Warehousing


strategic
tactical
event-driven
decision making in a single
centralized
mission-critical
up-to-date
version of the enterprise data


Any Question, By Any User, At Any Time
All Decision Makingfrom One Copy of the Data.
strategic
tactical
Sources
Users
Active Data Warehouse
30 > 10/16/2014
Questions......
eric.prestegard@ncr.com

You might also like