You are on page 1of 15

Module 14: System Restarts

After completing this module, you will be able to: List three different ways to restart the Teradata database. Use the RESTART command. Describe the impact of

Disk(s) failure Disk array controller(s) failure BYNET(s) failure Node failure AWS failure VPROC failure

Explain the difference between a PDE dump and a UNIX panic dump.

Types of Restarts
Scheduled Restarts Changing system parameters (e.g., DBS Control parameter is updated) Software upgrades Configuration changes (addition of new AMPs and/or PEs
Unscheduled Restarts Power failure (e.g., 8/14/2003 the North East U.S. and parts of Canada) Hardware failure Software failure

Accidents
Restart Processes
1. Spool cylinders are returned to free cylinder list (unused cylinder pool). 2. Before logons are enabled, uncommitted work is rolled back.
1st Tables are re-locked for background recovery. 2nd Logons are enabled in cold start.

Scheduled Restarts
Restart Teradata with Use this command Options

Command-line

tpareset <comment>

-f, -x, -y -d, -l, -Q, -P


cold, coldwait cold, coldwait GUI menu choices

DB Console - Supervisor vprocmanager MultiTool (Windows 2000)

restart tpa <comment> restart reset (via GUI choices)

Example: # tpareset -f Change of system parameters To see when restarts occur and brief explanation of how/why for the last week:

LOGON tdpid/systemfe,service; EXEC ALLRESTARTS (DATE - 7,); LOGOFF;


The tpatrace command may also be used to see information about restarts.

# tpatrace 3

(shows last 3 restarts)

Restarting Teradata from DB Window

RESTART TPA [, NODUMP ] [, DUMP = YES] [, DUMP = NO]

[, COLD] [, COLDWAIT]

COMMENT

Restart using the tpareset Command


Example of using the tpareset command: # tpareset -f Change of DBSControl parameters
You are about to restart the database on the system 'u4455' Do you wish to continue (yes/no) [no]: yes tpareset: TPA reset submitted.

Example of using the tpatrace command:

# tpatrace
TPA Initialization Trace for Node 001-01 02/16/2004 02/16/2004 02/16/2004 : 02/16/2004 : 02/16/2004 : 02/16/2004 : 02/16/2004 : 02/16/2004 02/16/2004 02/16/2004 08:25:33 -------------------- PDE starting 08:25:35.06 (346) ---- PDE starting. 08:25:35.07 (346) State is NOTPA/START.

08:25:36.38 (346) State is NOTPA/NETREADY.


08:25:47.15 (346) State is TPA/START. 08:25:48.05 (346) State is TPA/VPROCS. 08:25:49.57 (346) State is TPA/READY. 08:25:49.65 (346) State is TPA/DONE. 08:25:49.66 (346) Crash ceiling/count = 3/0 08:25:49.66 (346) PDE started in 15 seconds.

Restart Messages and Information


Recovery status information is logged to numerous locations:
Software_Event_Log SMP Console Display /var/adm/streams (UNIX)
: Event number 33-10198-00 (severity 40, category 10) Force a TPA restart. : NOTICE: fsgsync.c: PDE: A primary fsg flush started. xcmn_err: Message Date 02/16 - Time 08:25(mm/dd hh:mm) : Event number 34-02900-00 (severity 10, category 10) 04/02/16 08:25:49 Running DBS Version: 05.01.00.00 Event number 34-02900-00 (severity 10, category 10) 04/02/16 08:25:49 Running PDE Version: 05.01.00.00 : 04/02/16 08:25:50 Initializing DBS Vprocs : 04/02/16 08:25:56 Configuration is operational Event number 34-02900-00 (severity 10, category 10) 04/02/16 08:25:56 Starting AMP partitions : 04/02/16 08:25:59 Voting for transaction recovery Event number 34-02900-00 (severity 10, category 10) 04/02/16 08:26:00 Recovery session 1 contains 43 rows on AMP 00000 Event number 34-02900-00 (severity 10, category 10) 04/02/16 08:26:11 Starting PE partitions : 04/02/16 08:26:15 Logons are enabled Feb 16 08:26:15 Teradata DBS Gateway: [455]: error logging started

SMP Console output following a tpareset:

PDE States
The pdestate command can be used to check the current state of the PDE and Teradata software for a specific node.
# /usr/ntos/bin/pdestate PDE: Parallel Database Extension state is TPA.

PDE has three major operational states:


NULL, NOTPA, and TPA

NULL/START NULL/STOPPED NULL/RESET NULL NOTPA/START NOTPA/NETCONFIG NOTPA/NETREADY NOTPA/RECONCILE NOTPA TPA/START TPA/VPROCS TPA/READY TPA/DONE TPA

Unscheduled Restarts
Disk Drive Failures Scenario 1 Failure: Result: Resolution: Scenario 2 Failure: Result: One disk in a drive group No TPA reset Replace disk Array Controllers automatically rebuild the disk

Resolution:

Two disks in a drive group TPA reset (1-5 minutes) AMP taken offline and marked as Fatal Fallback tables OK Non-fallback tables partially available Replace the two disks Reformat LUNs or Volumes in the drive group Perform a table rebuild Restore non-fallback tables

Scenario 3 Failure:

Result: Resolution:

Two disks in 2 different drive groups associated with AMPs in the same cluster 2 AMPs fail in a cluster Machine halts Restore User DBC and tables

Unscheduled Restarts (cont.)


BYNET Failures Scenario 1 Failure: Result: One BYNET fails No TPA reset All traffic auto-switched to remaining BYNET Impact on system performance Repair BYNET

Resolution:

Scenario 2 Failure: Result: Resolution:

Both BYNETs fail Teradata halts and is not available Repair BYNETs

Unscheduled Restarts (cont.)


Node Failure Scenario Failure: Result: Resolution: Node Fails (e.g., O.S. hangs, 2 power supplies fail, memory fails, etc.) TPA restart (1 - 5 minutes) and vprocs migrate to other nodes in clique Possible O.S. reboot (3 - 15 minutes) Repair node and reboot operating system Restart Teradata to allow node to rejoin Teradata configuration Vproc Software Failure

Scenario Failure: Result: Resolution:

AMP or PE Vproc fails TPA restart (1 - 5 minutes) and vprocs may be marked offline If necessary, run Scandisk, Checktable, and Rebuild utilities AWS Failure

Scenario Failure: Result: Resolution:

AWS fails No restart of Teradata; AWS is not available to monitor/manage system Reboot or recover AWS

TPA Reset Crashdumps


U N I X

Collector Task

AMP

AMP

AMP

AMP

Dump Device (/dev/pdedump)

Crashdump Table

1. Selective memory and swapped pages are written to pdedump space. 2. As part of Teradata restart, a background collector task reads pdedump and writes dump information to a Crashdump table in Crashdumps database.

If the Crashdumps database is out of perm space, the collector task outputs a
warning message and retries every 60 minutes to create a crashdump table. UNIX MP-RAS Commands to determine if dumps are present in pdedump: # pdedumpcheck -v # fdlcsp - mode clear (lists /dev/pdedump dumps that are present) (clears all dumps from /dev/pdedump)

Allocating Crashdumps Space


DBC

Crashdumps

Sys_Calendar

SysAdmin

SYSDBA

SystemFE

Allocate approximately 150 200 MB of permanent space per node per crashdump. Example: Four-node system and you want to allocate space for three Crashdumps: ((150 x 4) x 3) = 1800 MB without fallback ((150 x 4) x 3) x 2 = 3600 MB with fallback MODIFY USER Crashdumps AS PERM = 1800E6; Example of Crashdump name: Crash_20040213_012519_02 (Date) (Time) (Segment #)

Help USER Crashdumps;


Table/View/Macro name Crash_20040213_012519_02 Kind T Comment PDE:05.01.00.00,TDBMS:05.01.00.00,TGTW:05.01.00.00;

TPA Dump Maintenance


DELETE from Crashdumps No Is the Crashdump needed? (Contact support center if in doubt.) Options:

Optionally, delete from pdedump device

Allow access to system via network


Yes

Archive to file and ftp to support center Use DUL and archive to tape

UNIX MP-RAS Operating System Dumps


Complete dump of system memory, including:

PDE Kernel
Crash utility may be used to interpret dump.

Review Questions
1. What is the operating system command to restart Teradata? __________________ 2. What is the DB Window supervisor command to restart Teradata? __________________ 3. Which of the following choices will cause a Teradata restart? __________________ A. B. C. D. E. F. G. AWS hard drive failure Single drive failure in RAID 1 drive group Two drive failures in same RAID 1 drive group Single SMP power supply failure SMP CPU failure One of BYNETs fails LAN connection to SMP is lost

Module 14: Review Question Answers


1. What is the operating system command to restart Teradata? tpareset

2. What is the DB Window supervisor command to restart Teradata? restart tpa 3. Which of the following choices will cause a Teradata restart? A. B. C. D. E. F. G. AWS hard drive failure Single drive failure in RAID 1 drive group Two drive failures in same RAID 1 drive group Single SMP power supply failure SMP CPU failure One of BYNETs fails LAN connection to SMP is lost C, E