You are on page 1of 26

Teradata Overview

7th May 2003

Agenda
Technical Summary of Teradata Database Our Development Priorities

Teradata Database
Is Relational Database Management System
Client Server architecture Support for open standards (ODBC, OLE-DB, ANSI) Support for emerging interoperability Built-in automatic parallel processing
Enables SHARED NOTHING architecture Special purpose data loads Special purpose backup utilities

Runs on Intel platforms


NCR hardware (UNIX SRV.4, W2K) and non NCR hardware (W2K) and 64 Bit Intel (HP-UX)

Teradata is a S/W RDBMS

Hardware Platform

Operating System

Operating System Interfaces


Database Engine and Task Mgt

DB Services - Locking - Memory mgt - Data buffers - Optimiser

Disk/File Access Interfaces

User data

Journal

Teradata is a DB Server

Teradata is run as the only application on the hardware platform

Hardware Platform - INTEL


Unix S RV.4 or W2K

Parallel Data Extensions


(V)AMP = (Virtual) Access Module Processor Point-to-point SCSI Interface

DB Services - Locking - Memory mgt - Data buffers - Optimiser

User Data

Journal

Teradata - the VAMP


SHARED NOTHING The VAMP is an autonomous copy of the RDBMS Each VAMP owns a set of logical disks Multiple VAMPs run concurrently on the hardware node VNET see later Typically 6-10 VAMPs per node
VAMP1 VAMP2

INTEL Node

VNET

VAMP3

VAMP4

User
Data

User
Data

User
Data

User
Data

The Teradata Optimiser


The Teradata Optimiser (Parsing Engine) Talks SQL No complied plans One PE per external data source connection Optimiser produces the data access plan using advanced statistics
> > > > Cost based Not sensitive to sequence No hints No overrides

SELECT CustName, CustAddress FROM Customer WHERE City = Altrincham ORDER BY 1;

PE2

PE1

Cache

VNET

VAMP1

VAMP2

VAMP3

VAMP4

User
Data

User
Data

User
Data

User
Data

Partitioning the Data


Every Table MUST be defined with a primary index Teradata partitions the table automatically using the PI column as the row is inserted. Called HASHing. Every table is evenly distributed to every VAMP

UNIX SVR4 or NT

VNET
VAMP1 VAMP2 VAMP3 VAMP4 VAMP5 VAMP6

Automatic Data Partitioning


The Parsing Engine compiles INSERT SQL The HASH routine will generate a value between 0-65536
> The HASH Map locates a VAMP within the system INSERT INTO Employee (Name,EmpNo,DeptNo,DOB,Sex,EdLev) VALUES (SMITH T,10021,700,460729,F,16);

UNIX SVR4 or NT

Note how the VNET is used for message passing - to pass the row to its destination VAMP

PE2

PE1

VNET VAMP1 VAMP2 VAMP3

Automatic Parallel Reads


SQL request is optimised by the PE
> PE issues an All-AMPs broadcast to the VNET SELECT * FROM Employee WHERE DeptNo = 700 ORDER BY EmpNo;

Each AMP qualifies its rows autonomously PE waits on each AMP to broadcast completion
> PE issues an All-Amps send broadcast to the VNET > Each AMP sends a row to the VNET > VNET merges the qualifying rows

UNIX SVR4 or NT

PE2

PE1

VNET AMP1 AMP2 AMP3

Scalability - Multiple Nodes


Teradata can utilise (up to 512) loosely coupled hardware nodes BYNet is the hardware/software interconnect
> PCI card and cabling > The BYNet performs the inter-node messaging

The VAMPs appear as a single database image


BYNET

UNIX SVR4 or NT

PE2

PE1

UNIX SVR4 or NT

PE3

PE4

VNET VAMP1 VAMP2 VAMP3

VNET VAMP4 VAMP5 VAMP6

Teradata Core Client Tools

Utility FastLoad MultiLoad TPump FastExport BTEQ

Purpose
Fast initial data load into new table. Secondary indexes built later. Fast Update, Insert, Upsert, Delete into 1-5 tables for 1 input pass. Continuous Update, Insert, Upsert, Delete Fast Data Unload of data from tables. More traditional execution of SQL for creating tables, reports, tiny update.

Parallel? Yes Yes Yes Yes Yes

Teradata is Teradata

Automatic Automatic Automatic Automatic Automatic Automatic Automatic Automatic Automatic Automatic

self balancing data placement load balancing of client sessions parallelism for data load/update/archive transaction back-out and control checkpoint/restart of load/update/archive raid disk transparency node recovery transparency workload management re-start of database after abort data connectors for pipes, messaging queuing

No Files, no TableSpaces, no Extents, no Datasets No single point of failure = Very High RAS

Teradata Speaks Many Languages

Desktop
Windows 9x, NT XP, W2K

Internet
Network Computers, MS Internet Explorer Netscape, Java

UNIX
NCR, Solaris, HP, AIX

Mainframes
IBM, Bull and more...

Teradata Warehouse

Query Tools Client Server

ODBC standard connection from user workstation


Nominate all IP addresses in set-up for workload balancing

LAN
PE2 PE1 VNET VAMP1 VAMP2 VAMP3

User Tier ODBC connect - BO Server - DSS Agent Server

Queryman TeraMiner

Middle Tier ODBC connect - BO Server (Universe) - MSI Server

FastLoad
Empty single target only
TCP/IP Call Level Interface
LOGON TDP0/Vic, Winch; DROP TABLE INVOICELINE_ERROR1; DROP TABLE INVOICELINE_ERROR2; BEGIN LOADING INVOICELINE ERRORFILES INVOICELINE_ERROR1, INVOICELINE_ERROR2; DEFINE ORDERNO (CHAR(08)) , ORDERQTY (DEC(05)) , CUSTOMERNO (CHAR(08)) , ITEMNO (CHAR(08)) File = /Custdata; SHOW; INSERT INTO INVOICELINE ( OderNumber , OrderQuantity, CustmerId, ProductId) ;

FastLoad

UNIX SVR4

PE2

PE1 VNET

AMP1

AMP2

AMP3

END LOADING;

FastLoad
Disables transient journals for this job (= fast) BIG History loads (several files = several jobs)
Do Checkpoint Can re-start a job Do check the Error Tables as you go Each job is moving a files worth to Teradata Table is not useable until END LOADING (initiates Step 2).Table now useable

Can abort a single job (Drop all Tables) and start again

MultiLoad
Multiple input files Multiple target tables Logic for control of SQL processing

TCP/IP Call Level Interface MultiLoad

.BEGIN IMPORT MLOAD TABLES ACC_DATA WORKTABLES ACC_LOAD_DELTA_WT, ERRORTABLES ACC_LOAD_DELTA_ET ACC_LOAD_DELTA_UV;

.DML LABEL INSACC; INSERT INTO ACC_DATA (. .DML LABEL UPDACC DO INSERT FOR MISSING UPDATE ROWS; UPDATE ACC_DATA SET. INSERT INTO ACC_DATA SET .. .IMPORT INFILE MLOADIN LAYOUT ACCDELTA APPLY INSACC WHERE CONTROL_CDE = 'I ' APPLY UPDACC WHERE CONTROL_CDE = 'U';

.LAYOUT ACCDELTA .FIELD ACC_NO INTEGER, .FIELD CONTROL_CDE CHAR(1), ...

UNIX SVR4

PE2

PE1 VNET

AMP1

AMP2

AMP3

.END MLOAD;

MultiLoad
Uses purpose built MLOAD journals (not Transient Journal)..Sorts to the sequence processed from the input file(s) UPSERT processing Must think in SET processing terms MultiLoad places an MLOAD lock on the Table
The table is not accessible (dirty read only )

NEVER delete the restart table log which is generated by Multiload


NEVER abort a job - the Table is still not accessible ALWAYS re-submit the script and allow to finish

Teradata Development Priorities

@ctive Data Warehousing

Teradata EDW Positioning

CRM

Better, Faster Customer Communications

Front-Office Operational

Enterprise Data Warehouse Environment Better, Faster Operational Actions


Customer Relationships Demand Chain Supply Chain Financial Operations Business Process Management E-commerce Industry-specific operations

ERP / SCM
Back-Office Operational

Marketing E-Commerce

Enterprise Resource Management Billing & Collections

Sales

Customer Service

Service Provisioning

A single view of the business Analysis of detail-level data Unlimited ability to grow Real-time access to the data from front or back office operational systems Near real-time data feeds from operational systems Eliminate expensive, inefficient data marts and Operational Data Stores

Demand for Mixed Workload


Strategic Decision Support Tactical Decision Support
Complex, Continuous Strategic Updates Queries Short, Tactical Queries

Batch Updates Complex Queries

Query Manager

Integrated, Strategic Decision Support Data

All Decision-Making Data Integrated

Teradata Data Warehouse Solution


IT Users Operational Data Data Transformation Data Staging Centralised Data Warehouse & Management Logical Data Mart
DM DM ODS Data

Single Shared Teradata EDW Database

Business Users

Traditional Data Warehousing


Users construct Business
Questions which are used to query the Data Warehouse Information is returned to the User Application Users make decisions based on the information, and then take
Action Input

Business Question

Data Warehouse

SQL

User Interface / Application Basis of Decision

Enterprise Information Repository


Information

action

Streams of information

Data

@ctive Data Warehousing


Triggers are identified,

and optimized by the Data Warehouse Application continuously queries the Data Warehouse to analyze real-time information (which is continuously refreshed) Results are compared to trigger points If a threshold is reached, an automatic action is initiated, or a User is alerted

Business Application Business Application TRIGGER


FREQUENCY? SQL Query

Data Warehouse

Continuous Query Process Business Question s


Application Triggers

Enterprise Triggers defined Information Repository

ACTION

Basis of Decision

TRIGGER POINT?

Information Data

Automatic Action

Continuous streams of information

Teradata Scalability
Amount of Detailed Data Concurrent Users

Complexity of Data Model


ORDER ORDER NUMBER ORDER DATE STATUS ORDER ITEM BACKORDERED QUANTITY CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER NUMBER NAME CITY POST ST ADDR PHONE FAX

Query Complexity
Simple Direct at the start Moderate Multi-table Join Regression analysis Query tool support Complex, 58-way table join 15 Pages, 37 From Clauses, 7 UNIONs, (Largest table >1 B rows)

ORDER ITEM SHIPPED QUANTITY SHIP DATE ITEM ITEM NUMBER QUANTITY DESCRIPTION

You might also like