Teradata Overview V 070503

Teradata Overview
7th May 2003
Agenda
Technical Summary of Teradata Database Our Development Priorities
Teradata Database
Is Relational Database Management System
Client Server architecture Support for open standards (ODBC, OLE-DB, ANSI) Support for emerging interoperability Built-in automatic parallel processing
Enables SHARED NOTHING architecture Special purpose data loads Special purpose backup utilities
Runs on Intel platforms

NCR hardware (UNIX SRV.4, W2K) and non NCR hardware (W2K) and 64 Bit Intel (HP-UX)
Teradata is a S/W RDBMS
Hardware Platform
Operating System
Operating System Interfaces

Database Engine and Task Mgt
DB Services - Locking - Memory mgt - Data buffers - Optimiser
Disk/File Access Interfaces
User data
Journal
Teradata is a DB Server
Teradata is run as the only application on the hardware platform
Hardware Platform - INTEL

Unix S RV.4 or W2K
Parallel Data Extensions

(V)AMP = (Virtual) Access Module Processor Point-to-point SCSI Interface
DB Services - Locking - Memory mgt - Data buffers - Optimiser
User Data
Journal
Teradata - the VAMP

SHARED NOTHING The VAMP is an autonomous copy of the RDBMS Each VAMP owns a set of logical disks Multiple VAMPs run concurrently on the hardware node VNET see later Typically 6-10 VAMPs per node
VAMP1 VAMP2
INTEL Node
VNET
VAMP3
VAMP4
User
Data
User
Data
User
Data
User
Data
The Teradata Optimiser

The Teradata Optimiser (Parsing Engine) Talks SQL No complied plans One PE per external data source connection Optimiser produces the data access plan using advanced statistics
> > > > Cost based Not sensitive to sequence No hints No overrides
SELECT CustName, CustAddress FROM Customer WHERE City = Altrincham ORDER BY 1;
PE2
PE1
Cache
VNET
VAMP1
VAMP2
VAMP3
VAMP4
User
Data
User
Data
User
Data
User
Data
Partitioning the Data

Every Table MUST be defined with a primary index Teradata partitions the table automatically using the PI column as the row is inserted. Called HASHing. Every table is evenly distributed to every VAMP
UNIX SVR4 or NT
VNET
VAMP1 VAMP2 VAMP3 VAMP4 VAMP5 VAMP6
Automatic Data Partitioning

The Parsing Engine compiles INSERT SQL The HASH routine will generate a value between 0-65536
> The HASH Map locates a VAMP within the system INSERT INTO Employee (Name,EmpNo,DeptNo,DOB,Sex,EdLev) VALUES (SMITH T,10021,700,460729,F,16);
UNIX SVR4 or NT
Note how the VNET is used for message passing - to pass the row to its destination VAMP
PE2
PE1
VNET VAMP1 VAMP2 VAMP3
Automatic Parallel Reads

SQL request is optimised by the PE
> PE issues an All-AMPs broadcast to the VNET SELECT * FROM Employee WHERE DeptNo = 700 ORDER BY EmpNo;
Each AMP qualifies its rows autonomously PE waits on each AMP to broadcast completion
> PE issues an All-Amps send broadcast to the VNET > Each AMP sends a row to the VNET > VNET merges the qualifying rows
UNIX SVR4 or NT
PE2
PE1
VNET AMP1 AMP2 AMP3
Scalability - Multiple Nodes

Teradata can utilise (up to 512) loosely coupled hardware nodes BYNet is the hardware/software interconnect
> PCI card and cabling > The BYNet performs the inter-node messaging
The VAMPs appear as a single database image

BYNET
UNIX SVR4 or NT
PE2
PE1
UNIX SVR4 or NT
PE3
PE4
Teradata Core Client Tools
Utility FastLoad MultiLoad TPump FastExport BTEQ
Purpose
Fast initial data load into new table. Secondary indexes built later. Fast Update, Insert, Upsert, Delete into 1-5 tables for 1 input pass. Continuous Update, Insert, Upsert, Delete Fast Data Unload of data from tables. More traditional execution of SQL for creating tables, reports, tiny update.
Parallel? Yes Yes Yes Yes Yes
Teradata is Teradata
Automatic Automatic Automatic Automatic Automatic Automatic Automatic Automatic Automatic Automatic
self balancing data placement load balancing of client sessions parallelism for data load/update/archive transaction back-out and control checkpoint/restart of load/update/archive raid disk transparency node recovery transparency workload management re-start of database after abort data connectors for pipes, messaging queuing
No Files, no TableSpaces, no Extents, no Datasets No single point of failure = Very High RAS
Teradata Speaks Many Languages
Desktop
Windows 9x, NT XP, W2K
Internet
Network Computers, MS Internet Explorer Netscape, Java
UNIX
NCR, Solaris, HP, AIX
Mainframes
IBM, Bull and more...
Teradata Warehouse
Query Tools Client Server
ODBC standard connection from user workstation

Nominate all IP addresses in set-up for workload balancing
LAN
PE2 PE1 VNET VAMP1 VAMP2 VAMP3
User Tier ODBC connect - BO Server - DSS Agent Server
Queryman TeraMiner
Middle Tier ODBC connect - BO Server (Universe) - MSI Server
FastLoad
Empty single target only
TCP/IP Call Level Interface
LOGON TDP0/Vic, Winch; DROP TABLE INVOICELINE_ERROR1; DROP TABLE INVOICELINE_ERROR2; BEGIN LOADING INVOICELINE ERRORFILES INVOICELINE_ERROR1, INVOICELINE_ERROR2; DEFINE ORDERNO (CHAR(08)) , ORDERQTY (DEC(05)) , CUSTOMERNO (CHAR(08)) , ITEMNO (CHAR(08)) File = /Custdata; SHOW; INSERT INTO INVOICELINE ( OderNumber , OrderQuantity, CustmerId, ProductId) ;
FastLoad
UNIX SVR4
PE2
PE1 VNET
AMP1
AMP2
AMP3
END LOADING;
FastLoad
Disables transient journals for this job (= fast) BIG History loads (several files = several jobs)
Do Checkpoint Can re-start a job Do check the Error Tables as you go Each job is moving a files worth to Teradata Table is not useable until END LOADING (initiates Step 2).Table now useable
Can abort a single job (Drop all Tables) and start again
MultiLoad
Multiple input files Multiple target tables Logic for control of SQL processing
TCP/IP Call Level Interface MultiLoad
.BEGIN IMPORT MLOAD TABLES ACC_DATA WORKTABLES ACC_LOAD_DELTA_WT, ERRORTABLES ACC_LOAD_DELTA_ET ACC_LOAD_DELTA_UV;
.DML LABEL INSACC; INSERT INTO ACC_DATA (. .DML LABEL UPDACC DO INSERT FOR MISSING UPDATE ROWS; UPDATE ACC_DATA SET. INSERT INTO ACC_DATA SET .. .IMPORT INFILE MLOADIN LAYOUT ACCDELTA APPLY INSACC WHERE CONTROL_CDE = 'I ' APPLY UPDACC WHERE CONTROL_CDE = 'U';
.LAYOUT ACCDELTA .FIELD ACC_NO INTEGER, .FIELD CONTROL_CDE CHAR(1), ...
UNIX SVR4
PE2
PE1 VNET
AMP1
AMP2
AMP3
.END MLOAD;
MultiLoad
Uses purpose built MLOAD journals (not Transient Journal)..Sorts to the sequence processed from the input file(s) UPSERT processing Must think in SET processing terms MultiLoad places an MLOAD lock on the Table
The table is not accessible (dirty read only )
NEVER delete the restart table log which is generated by Multiload

NEVER abort a job - the Table is still not accessible ALWAYS re-submit the script and allow to finish
Teradata Development Priorities
@ctive Data Warehousing
Teradata EDW Positioning
CRM
Better, Faster Customer Communications
Front-Office Operational
Enterprise Data Warehouse Environment Better, Faster Operational Actions

Customer Relationships Demand Chain Supply Chain Financial Operations Business Process Management E-commerce Industry-specific operations
ERP / SCM
Back-Office Operational
Marketing E-Commerce
Enterprise Resource Management Billing & Collections
Sales
Customer Service
Service Provisioning
A single view of the business Analysis of detail-level data Unlimited ability to grow Real-time access to the data from front or back office operational systems Near real-time data feeds from operational systems Eliminate expensive, inefficient data marts and Operational Data Stores
Demand for Mixed Workload

Strategic Decision Support Tactical Decision Support
Complex, Continuous Strategic Updates Queries Short, Tactical Queries
Batch Updates Complex Queries
Query Manager
Integrated, Strategic Decision Support Data
All Decision-Making Data Integrated
Teradata Data Warehouse Solution

IT Users Operational Data Data Transformation Data Staging Centralised Data Warehouse & Management Logical Data Mart
DM DM ODS Data
Single Shared Teradata EDW Database
Business Users
Traditional Data Warehousing

Users construct Business
Questions which are used to query the Data Warehouse Information is returned to the User Application Users make decisions based on the information, and then take
Action Input
Business Question
Data Warehouse
SQL
User Interface / Application Basis of Decision
Enterprise Information Repository

Information
action
Streams of information
Data
@ctive Data Warehousing

Triggers are identified,
and optimized by the Data Warehouse Application continuously queries the Data Warehouse to analyze real-time information (which is continuously refreshed) Results are compared to trigger points If a threshold is reached, an automatic action is initiated, or a User is alerted
Business Application Business Application TRIGGER

FREQUENCY? SQL Query
Data Warehouse
Continuous Query Process Business Question s

Application Triggers
Enterprise Triggers defined Information Repository
ACTION
Basis of Decision
TRIGGER POINT?
Information Data
Automatic Action
Continuous streams of information
Teradata Scalability
Amount of Detailed Data Concurrent Users
Complexity of Data Model

ORDER ORDER NUMBER ORDER DATE STATUS ORDER ITEM BACKORDERED QUANTITY CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER NUMBER NAME CITY POST ST ADDR PHONE FAX
Query Complexity
Simple Direct at the start Moderate Multi-table Join Regression analysis Query tool support Complex, 58-way table join 15 Pages, 37 From Clauses, 7 UNIONs, (Largest table >1 B rows)
ORDER ITEM SHIPPED QUANTITY SHIP DATE ITEM ITEM NUMBER QUANTITY DESCRIPTION

Teradata Overview V 070503

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teradata Overview V 070503

Uploaded by

Copyright:

Available Formats

Teradata Overview

7th May 2003

Runs on Intel platforms

Teradata is a S/W RDBMS

Operating System Interfaces

DB Services - Locking - Memory mgt - Data buffers - Optimiser

Disk/File Access Interfaces

Teradata is run as the only application on the hardware platform

Hardware Platform - INTEL

Parallel Data Extensions

DB Services - Locking - Memory mgt - Data buffers - Optimiser

Teradata - the VAMP

The Teradata Optimiser

SELECT CustName, CustAddress FROM Customer WHERE City = Altrincham ORDER BY 1;

Partitioning the Data

Automatic Data Partitioning

VNET VAMP1 VAMP2 VAMP3

Automatic Parallel Reads

VNET AMP1 AMP2 AMP3

Scalability - Multiple Nodes

The VAMPs appear as a single database image

VNET VAMP1 VAMP2 VAMP3

VNET VAMP4 VAMP5 VAMP6

Teradata Core Client Tools

Utility FastLoad MultiLoad TPump FastExport BTEQ

Parallel? Yes Yes Yes Yes Yes

Teradata Speaks Many Languages

Query Tools Client Server

ODBC standard connection from user workstation

User Tier ODBC connect - BO Server - DSS Agent Server

Middle Tier ODBC connect - BO Server (Universe) - MSI Server

TCP/IP Call Level Interface MultiLoad

.LAYOUT ACCDELTA .FIELD ACC_NO INTEGER, .FIELD CONTROL_CDE CHAR(1), ...

NEVER delete the restart table log which is generated by Multiload

Teradata Development Priorities

@ctive Data Warehousing

Teradata EDW Positioning

Better, Faster Customer Communications

Enterprise Data Warehouse Environment Better, Faster Operational Actions

Enterprise Resource Management Billing & Collections

Demand for Mixed Workload

Batch Updates Complex Queries

Integrated, Strategic Decision Support Data

All Decision-Making Data Integrated

Teradata Data Warehouse Solution

Single Shared Teradata EDW Database

Traditional Data Warehousing

User Interface / Application Basis of Decision

Enterprise Information Repository

@ctive Data Warehousing

Business Application Business Application TRIGGER

Continuous Query Process Business Question s

Enterprise Triggers defined Information Repository

Continuous streams of information

Complexity of Data Model

You might also like