You are on page 1of 56

DB2 Real Estate Buy, Invest, Sell, Reorg?!

Bill Minor - IBM Toronto Lab bminor@ca.ibm.com TLU- 1243A Data Servers - DB2 for Linux, UNIX, Windows

Highlights
The cost of disk storage represents a significant portion of the overall expense associated with large database systems. Once purchased, managing that storage can significantly add to the total cost of ownership. Effective management and utilization of disk space is instrumental in keeping your database Real Estate costs in check. The goals of this presentation are to:

Provide intimate details into the reorg utility Provide an overview of Data Management in DB2 Highlight customer usage scenarios including best practices, monitoring, tuning, autonomics and troubleshooting Illustrate the role of reorganization in new Viper features such as Table Partitioning, Data Row Compression,and Large RIDs

Agenda
DB2 Real Estate Overview of Reorganization Table Compression Page and Extent Size Selection DMS Tablespace Architecture Registry Variables High Water Mark Large Record Identifiers (RIDs) Log Space Consumption
3

A Confession!
I am not a Realtor, Financial Analyst, Investment Advisor, Stock Trader, Card Counter, Poker Tour Champ, One does not have to be an expert to realize that investing in real estate is a significant proposition By analogy your DB2 storage is a critical and valuable investment. Just as there are many facets/intricacies/strategies when dealing with Real Estate, so to with your management of DB2 Storage.

The Costs of Information Real Estate


Hardware, Software, Licensing, Support costs Poorly optimized and utilized database Database Administration/Management: People costs Infrastructure costs: floor space, power, cooling Frustration When it comes to storage, it is estimated that it costs (TCO) $5 for every $1 spent on physical storage.
5

DB2 Real Estate


What? Storage objects, DB2 tables Table types: regular Multidimensional Clustered (MDC) Range Partitioned (RPT) Range Clustered Tables (RCT) Database Partitioned Also relevant tablespace type and characteristics
SMS vs DMS REGULAR vs. LARGE vs. TEMPORARY
There are many aspects facets to managing DB2 Real Estate I am going to focus on storage with an emphasis on table reorg or reorganization
6

DB2 Reorganization
Many changes to table data (INSERTs/UPDATEs/DELETEs) can affect the physical organization of table and index data to the point where performance is adversely affected

Goals of REORG: Defragment or compact data onto fewer data pages Physically recluster data into the same logical sequence as an index Eliminate pointer-overflow records DB2 9 - build a (new) compression dictionary and to compress the rows in the table using the compression dictionary Conversion to Large Rids Schema changes
The result: Access to a reorganized object can be done with minimal I/O and bufferpool misses as well as with maximum prefetcher effectiveness i.e. maintain or improve query performance
7

Access Modes of Table REORG


'Offline' ==> "Classic Reorg" (as pertains to Tables)
ALLOW READ ACCESS ALLOW NO ACCESS (the default) (truly 'offline')

'Online' ==> "Inplace Reorg"


ALLOW WRITE ACCESS ALLOW READ ACCESS

(not to be confused with 'in-tablespace reorg' as pertains to classic table reorg)

(the default)

OFFLINE: Table available for read only access during reorg up to copy phase

ONLINE: Table available for full S/I/U/D access during reorg

Table Reorganization Command (CLP Syntax)


REORG {TABLE table-name Table-Clause} [On-DbPartitionNum-Clause] Table-Clause:
[INDEX index-name] [[ALLOW {READ | NO} ACCESS] [USE tablespace-name] [INDEXSCAN] [LONGLOBDATA [USE long-tablespace-name]] [KEEPDICTIONARY | RESETDICTIONARY]] | [INPLACE [ [ALLOW {WRITE | READ} ACCESS] [NOTRUNCATE TABLE] [START | RESUME] | {STOP | PAUSE} ]]

Examples: db2 reorg table staff index inx1_staff inplace allow write access db2 reorg table emp inplace pause on dbpartitionnum(10 to 100) db2 reorg table emp_resume longlobdata db2 reorg table department resetdictionary db2 reorg table payroll index pr1 use tempspace1

An Overview of Classic ('Offline') Table Reorg Processing


Shadow copy approach
Tablespace used to hold shadow copy is governed by user (USE clause)
For

DMS tablespaces, implication to the High Water Mark (more to come)

TEMP tablespace is required and it varies (next slide)

Phases: Dictionary Build, Sort, Build, Replace(or Copy), Index Rebuild


Dictionary Build: there is an additional scan of the table data if INDEXSCAN specified Index build/rebuild is now parallelized in Viper II (no need to set INTRA_PARALLEL cfg)

Processing Modes:
Reclustering via table scan sort (default) or index scan (via INDEXSCAN clause) Space reclamation (compaction) via table scan

LONG/LOB data is not reorged by default


When

reorged, XML data is not "reorged", only empty pages are removed
10

Offline Table REORG - TEMP Space Usage


Recall the phases of table reorg:
Dictionary Build, Sort, Build, Replace(or Copy), Index Rebuild

Three of these phases can consume TEMP tablespace


Sort: table scan sort (default) processing if sort spills to disk Build: if the shadow copy is to be built in a temp (USE clause) Index Rebuild: if associating sort processing spills to disk

If multiple temporary tablepaces exist


The table reorg USE <tempspace> clause only guarantees that the specified tempspace is used for the table shadow copy
Index recreate and scan sort processing can use another available temp space (the choice is governed internally according to temp usage)

11

'Offline' Table Reorg Reclustering:Table Scan Sort (default)


Table is scanned and records are sorted in order to create new reorganized version of the table rather (reclustering index is not scanned)

A reorg may be required because clustering index isn't well clustered so a table scan sort will give better I/O characteristics (may be slower for sparse tables where index itself is somewhat small)

Caveats: Table scan sort is disabled 'under-the-covers' if


LONG/LOB data is to reorganized Length of sort record is too large (RID is included in sort record)

Index recreate optimization:


If reclustering index is SMS type or unique DMS type, recreation of this index will not require a sort. Rather this index is rebuilt by simply scanning the newly reorganized data table. Any other indexes that require recreation will involve a sort If just reclustering index (of the required type) exists no temp space considerations in this case
12

'Offline' Table REORG - Scan Sort Temp Storage


db2 reorg table T1 index I1 db2 reorg table T1 index I1 use TEMPSPACE1

3x TEMP
SORT

2x TEMP

TDASPILL

SHADOW TDASPILL

T1

TDAMERGE

T1

TDAMERGE SHADOW

USERSPACE1

TEMPSPACE1

USERSPACE1

TEMPSPACE1

13

Inplace or Online Table Reorganization


Inplace Table Reorganization
Rows moved within existing table object to re-establish clustering, reclaim free space, and eliminate overflows Executes as asynchronous background application (process name - db2reorg) Table must be at least 3 pages in size Cannot inplace reorg LONG/LOB data (use 'offline' reorg)

Attributes:
Minimal extra storage requirement Incremental: benefit of effects seen immediately No iterative log processing phase Table quiesce for object 'switch over' at end can be avoided Think of it as a Trickle Reorg

14

Online Table Reorganization

Reclustering:
db2 reorg table t1 index i1 inplace

vs.
TIME

Space Reclamation:
db2 reorg table t1 inplace

VACATE PAGE RANGE: MOVE & CLEAN to make space free space FILL PAGE RANGE: MOVE & CLEAN to fill space

Move rows from end of table, filling up holes at the start

VACATE PAGE RANGE: MOVE & CLEAN to make space

Uses clustering index during FILL phases

Backward scan starts at end, fills holes earlier in table identified by simultaneous forward scan
15

REORGCHK - Table Statistics


db2 reorgchk on table bminor.staff3
Table statistics: F1: 100 * OVERFLOW / CARD < 5 F2: 100 * (Effective Space Utilization of Data Pages) > 70 F3: 100 * (Required Pages / Total Pages) > 80 SCHEMA NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG ---------------------------------------------------------------------------------------Table: BMINOR.STAFF3 6144 0 153 153 276480 0 45 100 -*----------------------------------------------------------------------------------------

F1: 100 * OVERFLOW / CARD < 5 The total number of Overflow records in the table should be less than 5% F2: 100 * (Effective Space Utilization of Data Pages) > 70 There should be less than 30% free space in the table F3: 100 * (Required Pages / Total Pages) > 80 The number of pages that contains no rows at all should be less than 20% of the total number of pages in the table
16

Classic vs Inplace Table Reorg


Reorg Mode Classic Reorg Approach Shadow copy technique: rebuilds table in different storage; indexes rebuilt and truncated to new size Trickle row movement technique: Moves rows within existing table to reestablish clustering and/or pack rows so as to reclaim space. Storage If TEMP tablespace specified, table rebuilt there, then copied back Else, table rebuild directly in original tablespace. Table reorganized within original table storage Phases 0) Dictionary built (if necessary) 1) Sort 2) Build Table 3) Replace/Copy 4) Index Rebuild Key Options Clustering index vs no clustering index Scan/sort vs index scan Other Objects Indexes always reorganized
LOB/LF/XML

Avail-ability By default, table is available for read until phase 3) Can select no access

optionally reorganized

Inplace Reorg

1) Move rows 2) Truncate (opt)

No truncation

Neither indexes nor


LOB/LF/XML

reorganized

By default, table is available for R/W access If/when truncate is done, table is available for read access
17

Locks Acquired for Table Reorg


Table Reorg Mode Catalog Locks (SYSCAT.TABLES) Table Being Reorganized

Classic Table Reorg

- IS Table Lock - NS Row Lock

- IX Tablespace Lock - U Table Lock - Upgrade to Z Table Lock for Copy Phase -IX Tablespace Lock - IS Table Lock - X Alter Table Lock -S Row Lock on rows moved/cleaned - Upgrade to S Table Lock to prepare for Truncation -Special Z Table Lock for drain/wait on Truncate

Inplace Table Reorg

- IS Table Lock - NS Row Lock

18

Table Reorganization Support Matrix


Reorg Mode DPF Table Partitioning MDC

Classic Reorg

Fully supported (can be invoked on all or specified DB partitions) Fully supported (can be invoked on all or specified DB partitions)

Supported (invoked on all table partitions)

Fully supported

Inplace Reorg

Not supported

Not supported (not needed for reclustering)

19

Monitoring Table REORGs


Table Snapshot
db2 get snapshot for tables on SAMPLE

db2pd tool
db2pd -db SAMPLE -reorgs file=reorg_pd.out (db2pd -db SAMPLE -tcbstats) (db2pd -db SAMPLE -mempools)

LIST HISTORY
db2 list history reorg all for SAMPLE

View and Table Functions


db2 select * from sysibmadm.snaptab_reorg db2 select * from table(sysproc.snap_get_tab_reorg('SAMPLE', dbpartitionnum)) as tb db2 select * from table(sysproc.admin_list_hist( )) as listhistory db2 select * from table(sysproc.admin_get_tab_info(<schema>, <tabname>)) as t db2 select * from table(sysproc.admin_get_compress_ tab_info(<schema>, <tabname>, <exec-mode>)) as t

Administrator Notification Log


$HOME/sqllib/db2dump/<instance_name>.nfy
20

REORG Table - History File (Example)


db2 list history reorg all for SAMPLE
Operation= REORG
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID -- --- ------------------ ---- --- ------------ ------------ -------------- Log file being written to when G T 20070313103600 N S0000022.LOG ---------------------------------------------------------------------------- REORG started Table: "BMINOR "."STAFF2" ---------------------------------------------------------------------------Comment: REORG START 'Online' Start Time: 20070313103600 ("N"=Online, "F"=Offline) REORG End Time: 20070313103600 ----------------------------------------------------------------------------

Table REORG

("T"=Table,"I"=Index)
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID REORG -- --- ------------------ ---- --- ------------ ------------ -------------G T 20070313103729 N S0000023.LOG Status ---------------------------------------------------------------------------Table: "BMINOR "."STAFF2" NOTE: "Comment" ---------------------------------------------------------------------------field reports REORG Comment: REORG Done Status only for Start Time: 20070313103729 'online' case. For End Time: 20070313103729 Log file being 'offline' it specifies written to when reclustering index ---------------------------------------------------------------------------REORG completed and temp space ids.
21

REORG Monitoring - Table Snapshot


db2 reorg table staff3 index i3
(Temp table for spilling sort) Table Schema = BMINOR Table Name = STAFF3 Table Schema = <140><BMINOR > Table Type = User Table Name = TEMP (00007,00002) Data Object Pages = 1184 Table Type = Temporary Index Object Pages = 190 Data Object Pages = 820 Rows Read = Not Collected Rows Read = Not Collected Rows Written = 20736 Rows Written = 72178 Overflows = 0 Overflows = 0 Page Reorgs = 0 Page Reorgs = 0 Table Reorg Information: Reorg Type = 'Offline' reorg (read access up until Replace phase) Reclustering Table Reorg Reclustering reorg via table scan sort Allow Read Access Recluster Via Table Scan ID of index being used to recluster by Reorg Data Only Reorg Index = 1 Reorg Tablespace = 2 Start Time = 02/26/2007 13:48:48.908388 Reorg Phase = 1 - Sort Sort phase (phases only applicable to offline reorg) Max Phase = 5 Total number of phases to occur: Phase Start Time = 02/26/2007 13:48:48.923862 Dictionary Build,Sort, Build, Replace, Status = Started Index Recreate Current Counter = 986 Max Counter = 1183 Progress indicator - currently 83% complete (986/1183x100) Completion = 0 End Time = 22

ADMIN_LIST_HIST( ) Table Function - DPF Example


Database Connection Information Database server SQL authorization ID Local database alias = DB2/AIX64 9.1.2 = BMINOR = SAMPLE

DBPARTITIONNUM OBJECTTYPE SQLCODE START_TIME -------------- ---------- ----------- -------------0 T - 20070303152449 1 record(s) selected. myhost: db2 connect to sample completed ok

Database Connection Information Database server SQL authorization ID Local database alias = DB2/AIX64 9.1.2 = BMINOR = SAMPLE

db2_all "db2 connect to sample; db2 select dbpartitionnum,objecttype,sqlcode,start_time from table'(('sysproc.admin_list_hist'())' as listhistory where operation=\\'G\\'"

DBPARTITIONNUM OBJECTTYPE SQLCODE START_TIME -------------- ---------- ----------- -------------100 T -964 20070303152118 1 record(s) selected. myhost: db2 connect to sample completed ok

OLR encounter sql0964 - 'log full', on dbpartitionnum 100


23

'Offline' or 'Online' Table REORG?


'Offline' Table REORG:
PROS:
Provides the fastest table reorganization especially if LOBs/LONGs are not required to be reorged (if they are only classic reorg supported for reorging LONG/LOBs) Indexes are rebuilt once the table has been reorganized Original version of table can be read only up until the last phase of reorg (replace phase) The only way to rebuild a new compression dictionary, and/or to compress all rows in table using the existing or newly created compression dictionary

CONS:
Large space requirement: shadow copy approach so need approximately twice as much space as the original table Limited access: read-only until Replace/Copy phase All-or-nothing process Can only be stopped by the app or user who understands how to stop the process

Recommendation: Choose this method if you can reorganize tables during a maintenance window

24

'Offline' or 'Online' Table REORG?


'Online' Table REORG:
PROS:
Allows apps to access the table while executing Can be paused and resumed Runs asynchronously Requires less working storage since table is incrementally processed

CONS:
Slower than Classic method (~10-20x) Only allowed for tables with type-2 indexes Cannot reorganize LONG/LOBs Indexes are maintained, not rebuilt, so index reorganization may subsequently be required Requires more log space

Recommendation: Choose this method for 24x7 operations with minimal maintenance windows
25

Additional REORG Notes


Different tables can be reorged simultaneously as long as no resource constraints or limitations
Restriction: for offline table reorg, DMS temp spaces cannot be shared by simultaneous reorgs

If the table contains mixed row format because the table value compression has been activated or deactivated, an offline table reorganization can convert all the existing rows into the target row format

If the table is partitioned onto several database partitions, and the table reorganization fails on any of the affected database partitions, only the failing database partitions will have the table reorganization rolled back

The granularity of table reorg is at the Database Partition level not the Table Range Partition level
Table Ranges are reorg sequentially one after the other and global indexes rebuilt once all ranges have been reorganized

26

Reducing the Need to Reorganize Tables


ALTER TABLE to add PCTFREE space to each data page
Considered

only by the load and table reorg. Range is from 0 to 99% with default value of 0

Sort the data

Load the data

Creating multi-dimensional clustering (MDC) tables

(For MDC tables, clustering is maintained on the columns that you specify as arguments to the ORGANIZE BY

DIMENSIONS clause of the CREATE TABLE statement. However, REORGCHK might recommend reorganization of an MDC table if it considers that there are too many unused blocks or that blocks should be compacted) APPEND mode tables If the index key values of these new rows are always new high key values for example then the clustering attribute of the table will try to place them at the end of the table. Having free space in other pages will do little to preserve clustering. Hence, placing the table in append mode may be a better choice than a clustering index

Automatic Dictionary Creation on Table Growth (TLU-1242A) Dictionary created as table is populated and reaches a certain threshold in size (Viper II)
27

Online Index Reorg : Overview


REORG {TABLE table-name Table-Clause | INDEXES ALL FOR TABLE table-name Index-Clause} [On-DbPartitionNum-Clause] Index-Clause: [ALLOW {READ | NO | WRITE} ACCESS] [{CLEANUP ONLY [ALL | PAGES] | CONVERT}] Goals: To improve physical clustering Remove fragmentation The table and original index are available for concurrent transactions There are 4 phases involved in OLIR: Build Phase: All indexes on the table are rebuilt in a new index storage object a shadow object (as opposed to the ghost index) Log Catch up Phase: The catch up is done for all indexes on the table Object Switch Phase: Super exclusive table lock acquired Shadow object becomes THE index object Cleanup Phase: Old index object removed

28

Online Index Reorg : Table Partitioning and MDC Notes and Limitations

MDC REORG with ALLOW WRITE not supported Note: ALLOW READ is supported Table Partitioning Supports ability to reorg individual indexes (as opposed to ALL indexes of a table) Supported in all availability modes (ALLOW NONE, ALLOW READ, ALLOW WRITE) Natural thing to do, since with table partitioning, each index for the table is in its own storage object (and OLIR operates on a storage object basis) Also supports REORG INDEXES ALL in ALLOW NONE

29

OLIR Hints & Tips


Enlarge the util_heap_sz if you see ADM9500W in the Administration Notification Log (it will also appear in the db2diag.log) Informational log records are buffered in the utility heap If the utility heap is exhausted performance will suffer as the catch up phase will involve reading log files, and, possibly, retrieving them from archive

Ensure the tablespace is large enough for the shadow/ghost object/index Remember for Reorg, the shadow object will contain all indexes, so will require (very approximately) the same amount of space as the current index object on the table For Create, the ghost index will simply require the space for the newly created index Use LARGE tablespaces

Ensure you commit as soon as possible after index creations Minimizes time table S lock held

30

Locking Associated with Online Index Reorg


Reorg Mode Catalog Locks (SYSCAT.TABLES) Table for Indexes Being Reorganized

Online Index Reorg

- IS Table Lock - NS Row Lock

-ALLOW NO ACCESS: Z lock on table -ALLOW READ ACCESS: S lock on table -ALLOW WRITE ACCESS:IN lock on table -S drain lock for each index (all writers must be aware) -S lock at end to perform final catch-up -Quiesce concurrent writers: Z lock to perform index switch

31

Reorgchk Index Statistics


Index statistics:
F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
The clustering ratio of an index should be greater than 80% (Low cluster ratio means index sequence not the same as table sequence)

F5:

100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))

Less than 50% of the space reserved for index entries should be empty

F6:

(100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100

Determine if recreating the index would result in a tree having fewer levels

F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20


The number of pseudo-deleted RIDs on non-pseudo-empty pages should be less than 20 percent

F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20
The number of pseudo-empty leaf pages should be less than 20 percent of the total number of leaf pages

32

Monitoring Online Index Reorg - Administrator Log


2007-03-12-22.26.22.319328 Instance:bminor Node:000 PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618 relation data serv sqlrreorg_indexes Probe:15 Database:SAMPLE ADM9501W BEGIN online index reorganization on table "BMINOR .STAFF" (ID "3") and table space "USERSPACE1" (ID "2"). ^^ 2007-03-12-22.26.22.322948 Instance:bminor Node:000 PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618 relation data serv sqlrreorg_indexes Probe:18 Database:SAMPLE ADM9503W Online index reorganization proceeds on index ID "1" in table "BMINOR .STAFF" (ID "3") and table space "USERSPACE1" (ID "2"). ^^ 2007-03-12-22.26.22.345025 Instance:bminor Node:000 PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618 relation data serv sqlrreorg_indexes Probe:18 Database:SAMPLE ADM9503W Online index reorganization proceeds on index ID "2" in table "BMINOR .STAFF" (ID "3") and table space "USERSPACE1" (ID "2"). ^^ 2007-03-12-22.26.22.382075 Instance:bminor Node:000 PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618 relation data serv sqlrreorg_indexes Probe:31 Database:SAMPLE ADM9502W END online index reorganization on table "BMINOR .STAFF" (ID "3") and table space "USERSPACE1" (ID "2").

<instanceName>.nfy NOTIFYLEVEL=3 (default)

33

DB2 9 Deep Compression


Estimate Compression: INSPECT ROWCOMPESTIMATE Enable Compression: CREATE TABLE . [COMPRESS {NO | YES}] ALTER TABLE [COMPRESS {NO | YES}] Compress a table: REORG TABLE <tabname> [KEEPDICTIONARY | RESETDICTIONARY]
(Session TLU-1242: Deep Compression)
34

Table REORG Deep Compression

EMPTY TABLE

Uncompressed Row Data


Table REORG

Compressed Row Data Dictionary

INSERT COMPRESS YES LOAD

INDEX

35

Administrative Table Function ADMIN_GET_TAB_INFO( )


db2 describe "select * from table(sysproc.admin_get_tab_info(BILLM','STAFF')) as t"

TABSCHEMA TABNAME TABTYPE DBPARTITIONNUM AVAILABLE DATA_OBJECT_P_SIZE DATA_OBJECT_L_SIZE INDEX_OBJECT_P_SIZE INDEX_OBJECT_L_SIZE LONG_OBJECT_P_SIZE LONG_OBJECT_L_SIZE LOB_OBJECT_P_SIZE LOB_OBJECT_L_SIZE

XML_OBJECT_P_SIZE XML_OBJECT_L_SIZE INDEX_TYPE REORG_PENDING INPLACE_REORG_STATUS LOAD_STATUS READ_ACCESS_ONLY NO_LOAD_RESTART NUM_REORG_REC_ALTERS INDEXES_REQUIRE_REBUILD LARGE_RIDS LARGE_SLOTS DICTIONARY_SIZE
36

ADMIN_GET_TAB_COMPRESS_INFO ( )
Column TABSCHEMA TABNAME DBPARTITIONNUM DATA_PARTITION_ID COMPRESS_ATTR DICT_BUILDER DICT_BUILD_TIMESTAMP COMPRESS_DICT_SIZE EXPAND_DICT_SIZE ROWS_SAMPLED PAGES_SAVED_PERCENT BYTES_SAVED_PERCENT AVG_COMPRESS_REC_LENGTH Data Type VARCHAR(128) VARCHAR(128) SMALLINT INTEGER CHAR(1) VARCHAR(30) TIMESTAMP BIGINT BIGINT INTEGER SMALLINT SMALLINT SMALLINT

New to Viper II

37

Page Size Selection


Default DB page size is 4K, can override on CREATE DATABASE CREATE DATABASE SAMPLE PAGESIZE 16 k
Must always have a system temporary table space with a page size that matches the catalog table space (SYSCATSPACE) page size All CREATE BUFFERPOOL and CREATE TABLESPACE statements will default to the database page size unless explicitly specified

Larger page sizes allow


Larger capacity limits for objects (REGULAR or LARGE table spaces) Longer rows in tables, larger keys in indexes (25% of page size) Fewer logical and/or physical page reads (more things on each page)

Smaller page sizes allow


Possibly less page contention (fewer rows/keys on each page) Possibly better I/O behavior for pure OLTP environments

page size * extent size == space per block for MDC tables Very, very important to prevent sparse blocks/cells

38

Extent Size Selection


Unit of disk allocation to table storage objects Allocate an extent of page size pages on init and extend of objects Round robin approach across all containers SMS allocates page by page until size exceeds 1 extent DMS table spaces have an EMP (Extent Map) storage object for each table storage object 2 extents minimum per object (data, index) Larger extents allow for Less frequent allocations during growth Less frequent EMP mapping during table scan Best for large tables Smaller extents allow for Most optimal storage, less waste due to partial extents being used Less storage for empty or very small tables extent size == block size for MDC tables Very, very important to prevent sparse blocks/cells

39

DMS Tablespace Architecture The Extent of Extents


create tablespace dms1 managed by database using (file c.1 50000) extent size 4 0 1 2
xx

create table t1

insert into t1

-create table t2 -load into t1 -load into t2

Tablespace Header First Extent of SMPs Object Table Extent Extent Map for T1
First Extent of Data Pages for T1

xx xx T1 z

xxx xx T1 zz

xxxx xxxx x T1 T2 EMP T1 DAT T1 DAT T1 yy EMP T2 DAT T2 DAT T1 DAT T2 zzz

3
4 5

EMP T1 DAT T1

6 7
31968

8 9

40

DB2_OBJECT_TABLE_ENTRIES Registry Variable


db2set DB2_OBJECT_TABLE_ENTRIES=nnnnn
Specifies the expected number of objects in a table space. If you know that a large number of objects (for example, 1000 or more) will be created in a DMS table space, you should set this registry variable to the approximate number before creating the table space. This will reserve contiguous storage for object metadata during table space creation. Reserving contiguous storage reduces the chance that an online backup will block operations which update entries in the metadata (for example, CREATE INDEX, IMPORT REPLACE). It will also make resizing the table space easier because the metadata will be stored at the start of the table space.

Tablespace Header
xx

Tablespace Header
xx xx

First Extent of SMPs Object Table Extent

First Extent of SMPs Object Table Extent Object Table Extent Object Table Extent

41

DB2_TRUNCATE_REUSESTORAGE Registry Variable


db2set DB2_TRUNCATE_REUSESTORAGE=IMPORT You can use this variable to resolve lock contention between the IMPORT with REPLACE command and the BACKUP ... ONLINE command. In some situations, online backup and truncate operations are unable to execute concurrently. When this occurs, you can set DB2_TRUNCATE_REUSESTORAGE to "IMPORT" or "import", and physical truncation of the object, including data, indexes, long fields, large objects and block maps (for multi-dimensional clustering tables), is skipped and only logical truncation is performed. That is, the IMPORT with REPLACE command empties the table, causing the object's logical size to decrease, but the storage on disk remains allocated. This registry variable is dynamic; you can set it or unset it without having to stop and start instance. You can set DB2_TRUNCATE_REUSESTORAGE before an online backup starts and then unset it after online backup completes. For multi-partitioned environments, the registry variable will only be active on the nodes on which the variable is set. DB2_TRUNCATE_REUSESTORAGE is only effective on DMS permanent objects. admin_get_tab_info( )
DATA_OBJECT_P_SIZE vs DATA_OBJECT_L_SIZE

42

The "High Water Mark" (HWM)


It is the page number of the highest allocated page in a DMS tablespace HWM is impacted by:
Offline' Index

REORG of a table within the DMS tablespace that the table resides in

REORG with either ALLOW READ ACCESS or ALLOW WRITE ACCESS

HWM affects:
Redirected

Restore - redefinition of containers allowing tablespace to shrink in size; cannot be shrunk lower

than HWM
Dropping

or reducing the size of container via ALTER TABLESPACE only affects extents above the HWM

T1
DMS PERM TABLESPACE

T1 db2 reorg table T1 T1SHADOW T1'

HWM
43

The "High Water Mark" (HWM)


If no free extents below the HWM then the only way to reduce the HWM is to drop the object holding it up

db2dart /DHWM Displays detailed tablespace information including which extents are free, which are in use and what object is using them as well as information about the object holding up the HWM db2dart /LHWM provides guidance as to how the HWM might potentially be lowered If DMS table data object holding up HWM then 'offline' REORG of table within the DMS tablespace that the table resides can be used to lower the HWM if enough free extents exist below the HWM to contain the shadow copy

If DMS index object holding up HWM, index reorg may be able to reduce HWM

db2dart /RHWM If empty SMP extent holding up HWM


http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&q1=high+water+mark&uid=swg21234267&loc=en_US&cs=utf8&lang=en

Viper II: ALTER TABLESPACE REDUCE and Online Backup will remove these

44

Large RID the new default in DB2 9


RID Row Identifier
A reference to the location of a row in a table Contains the page number and the slot number (location on page)

Before DB2 9
RID is 4 bytes, 3 byte page number and 1 byte slot number Default table space data type was REGULAR Tables (data part) could not be placed in LARGE table spaces

DB2 9
New 6 byte RID, 4 byte page number and 2 byte slot number Infrastructure - runtime, sections, sort, log records, locks all large RID Default table space data type for DMS table spaces is now LARGE Tables can now be placed in LARGE table spaces Indexes contain regular or large RIDs only, based on the table space type where the table data is stored; it has nothing to do with the type of table space where the index resides

45

Large RIDs More pages, More rows, Bigger Tables


More Pages:
Maximum tablespace size by page size Page Size 4 KB 8 KB 16 KB 32 KB 4 Byte RID 64 GB 128 GB 256 GB 256 GB 6 Byte RID (Large RIDs) 2 TB 4 TB 8 TB 16 TB

New to DB2 9 (default)

For tables in LARGE table spaces (DMS only). Also all SYSTEM and USER temporary table spaces For tables in all tablespace types: regular, temporary, DMS, SMS

More Rows per Page:

Maximum rows per page by page size Page Size REG TBSP Min Rec Length 14 30 62 127 REG TBSP Max Records 251 253 254 253 LARGE TBSP Min Rec Length 12 12 12 12 LARGE TBSP Max Records 287 580 1165 2335

4 KB 8 KB 16 KB 32 KB

Maximum number of rows: Large RIDs - 1.1x1012 4 byte RIDs - 4x109


46

Converting Existing Tablespaces To LARGE


ALTER TABLESPACE <name> CONVERT TO LARGE
New option must be the only option, cannot be combined with other alter capabilities Fully logged and supports ROLLBACK and RESTORE/ROLLFORWARD

If table space is defined with AUTORESIZE YES


If MAXSIZE is NONE, then growth of the table space is automatic! Else MAXSIZE is restricting table space growth and should be increased

Otherwise, storage has to be increased to benefit from a larger capacity


Enable AUTORESIZE or Add a new stripe set or Extend existing containers

New tables created will fully support large RIDs, both page and slot numbers Previously existing tables continue to be restricted to ~255 rows/page and to 3 byte page numbers until a reorganization of the table or indexes occur
SQL1236N Table "<table-name>" cannot allocate a new page because the index with identifier "<index-id>" does not yet support large RIDs

BEST PRACTICE: Perform the ALTER TABLESPACE during upgrade/migration Be pro-active in rebuilding indexes on tables (or reorganizing tables) afterwards
47

Large RIDs - What Actions Need To Be Taken?


The table will not support a larger 4 byte page number until all indexes on the table support large RIDs
SELECT TABNAME, TABSCHEMA, DBPARTITIONNUM FROM TABLE (ADMIN_GET_TAB_INFO( '', '' )) AS T WHERE LARGE_RIDS = P

The table will not support >255 rows (slots) per page until the table itself has been reorganized with the classic/offline REORG TABLE
SELECT TABNAME, TABSCHEMA, DBPARTITIONNUM FROM TABLE (ADMIN_GET_TAB_INFO( '', '' )) AS T WHERE LARGE_SLOTS = P

Can my table benefit from large slots (more rows per page)?
SELECT TABSCHEMA, TABNAME, AVGROWSIZE FROM SYSCAT.TABLES

If the (average row size - 2) for a table is smaller than the minimum record length for the page size used, then there could be storage benefits when converting the table space to large and reorganizing the table to enable large slots

48

Log Consumption INSERT and DELETE


Row images are logged so that DB2 can redo or undo actions Real log space from active log is written and consumed Virtual log space from active log is reserved for rollback INSERT Row image being inserted is logged (required for redo!) Reserve log space for delete on undo Space for row image is not required in reserved space DELETE Row image being deleted is logged (required for undo!) Reserve log space for insert on undo Space for row image is required in reserved space When row compression is active, the row images are compressed, resulting in fewer bytes logged, reserved, and less log files usage
49

Log Consumption - UPDATE


There are three different types of UPDATE log records written by DB2:
1. Full before and after row image logging.
The entire before and after image of the row is logged. This is the only type of logging performed on tables enabled with DATA CAPTURE CHANGES.

2. Full XOR logging.


The XOR differences between the before and after row images, from the first byte that is changing until the end of the smaller row, then any residual bytes in the longer row.

3. Partial XOR logging.


The XOR differences between the before and after row images, from the first byte that is changing until the last byte that is changing. Byte positions may be first/last bytes of a column. Row images must be the exact same length.

50

Log Consumption UPDATE Examples


1. Full before and after row image logging (DATA CAPTURE CHANGES)
Fred 500 10000 Plano TX 24355 John 500 10000 Plano TX 24355

Fred

500

10000

Plano

TX

24355

John

500

10000

Plano

TX

24355

2. Full XOR logging (row length changing updated)


Fred 500 10000 Plano TX 24355 Frank 500 10000 Plano TX 24355

11011010100101001100100101001010101011010010101010110101010101010

01

3. Partial XOR logging (row length does not change)


Fred 500 10000 Plano TX 24355 John 500 10000 Plano TX 24355

110110101001010011
51

Log Consumption Full XOR Logging Details


When the total length of the row is changing, which is common when variable length columns are updated and also when row compression is enabled, DB2 will determine which byte is first to be changing and log a Full XOR log record.

1. Full XOR logging (length change) with changed column at/near beginning of row
Fred 500 10000 Plano TX 24355 Frank 500 10000 Plano TX 24355

11011010100101001100100101001010101011010010101010110101010101010

01

2. Full XOR logging (length change) with changed column at/near end of row
500 10000 Plano TX 24355 Fred 500 10000 Plano TX 24355 Frank

110110101001010011

01

52

Log Consumption Partial XOR Logging Details


When the total length of the row is not changing, even when row compression is enabled, DB2 will compute and write the most optimal Partial XOR log record possible.
1. Partial XOR logging (no length change) with a gap between columns being changed

Fred

500

10000

Plano

TX

24355

John

500

12345

Plano

TX

24355

11011010100000000000000001001001

2. Partial XOR logging (no length change) with no gap between columns being changed

500

Fred

10000

Plano

TX

24355

500

John

12345

Plano

TX

24355

1101101010010100110101
53

Log Consumption Best Practices


Columns which are updated frequently (changing value) should be: grouped together defined towards or at the end of the table definition These recommendations are independent of Row compression Row format (default or null/value compression) The benefit would be: better performance less bytes logged less log pages written smaller active log requirement for transactions performing a large number of updates.

54

Summary
Real Estate is a BIG investment Knowing details about your DB2 Real Estate will allow you to better leverage that investment With DB2 9 (Viper) and Viper II (DB2 9.5) significant new functionality has been developed to help with the management of storage Going forward, one can expect the trend to continue

55

DB2 Real Estate Buy, Invest, Sell, Reorg?!


Bill Minor - IBM Toronto Lab bminor@ca.ibm.com TLU- 1243A Data Servers - DB2 for Linux, UNIX, Windows

THANK YOU!!!
. Your Feedback is greatly appreciated.
56

You might also like