You are on page 1of 42

Logical I/O

Julian Dyke Independent Consultant


Web Version
1

2005 Julian Dyke

juliandyke.com

Agenda
Introduction Logical I/Os Buffer Cache Behaviour Statistics Conclusion

   

2005 Julian Dyke

juliandyke.com

Logical I/Os


Logical I/Os are read operations Buffers are cached in shared memory Most logical I/Os can be satisfied from cache The remainder will result in physical I/Os Logical I/Os include  current reads  consistent reads

2005 Julian Dyke

juliandyke.com

Current Reads


Current reads
      

Current version of block Can be updated Can be dirty Includes all changes Only one current version of block in buffer cache Only one current version of block across all instances Can be used to construct consistent versions

2005 Julian Dyke

juliandyke.com

Consistent Reads


Consistent reads
      

Potentially historic version of block Consistent to a specific System Change Number (SCN) Cannot be updated Cannot be dirty Can be used to construct consistent versions Can have multiple versions of same block in buffer cache Can be
 single  multi

block (sequential reads)

block (scattered reads)

Can be traced using events 10200 / 10201

2005 Julian Dyke

juliandyke.com

Logical I/O statistics




session logical reads statistic  Total number of logical reads in session  Unreliable at system level  At session level session logical reads = db block gets + consistent gets db block gets statistic  Number of current reads consistent gets statistic  Number of consistent reads

2005 Julian Dyke

juliandyke.com

Buffer Pools


There are up to eight buffer pools  DEFAULT  KEEP, RECYCLE Oracle 8.0 and above  2K, 4K, 8K, 16K and 32K Oracle 9.0 and above 32K not available on all platforms Cannot have non-standard block size same as DEFAULT block size

2005 Julian Dyke

juliandyke.com

Buffer Pool Headers




One for each buffer pool (usable or unusable) Externalized in  V$BUFFER_POOL  V$BUFFER_POOL_STATISTICS Based on X$KCBWBPD Created in shared pool permanent memory when instance is started Contain one or more working sets

2005 Julian Dyke

juliandyke.com

X$KCBWBPD


Externalises buffer pool header


ADDR INDX INST_ID BP_NAME BP_ID BP_BLKSZ BP_GRANSZ BP_BUFPERGRAN BP_LO_SID BP_HI_SID BP_SET_CT BP_SIZE BP_STATE BP_CURRGRANS BP_TGTGRANS BP_PREVGRANS RAW(4) NUMBER NUMBER VARCHAR2(20) NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER

Buffer Pool Name Buffer Pool ID Block Size Granule Size Buffers per Granule Minimum Working Set ID Maximum Working Set ID Number of Working Sets Number of Buffers

2005 Julian Dyke

juliandyke.com

Hash Buckets


Hash value of each block calculated from  Data Block Address (DBA)  Block Class Number of hash buckets dependent on number of buffers in cache e.g.
# buffers # hash buckets 500 64 6000 1024

Each hash bucket contains  Cache Buffers Chains latch  Pointer to array of double linked lists

10

2005 Julian Dyke

juliandyke.com

Hash Buckets
# hash chains BH BH

BH cache buffers chain latch BH

11

2005 Julian Dyke

juliandyke.com

Buffer Headers


Each buffer header describes contents of one buffer All buffers accessed via buffer header Buffer header contains pointers to  Buffer  Cache Buffers Chains latch Buffer header includes double linked lists for  Cache Buffers Chain list  Replacement list  Users list  Waiters list

12

2005 Julian Dyke

juliandyke.com

X$BH


Externalises buffer headers


ADDR INDX INST_ID HLADDR BLSIZ NXT_HASH PRV_HASH NXT_REPL PRV_REPL TS# FILE# DBARFIL DBABLK OBJ BA CR_SCN_BAS RAW(4) NUMBER NUMBER RAW(4) NUMBER RAW(4) RAW(4) RAW(4) RAW(4) NUMBER NUMBER NUMBER NUMBER NUMBER RAW(4) NUMBER

Hash List Address Block Size Hash List Replacement List Tablespace# Absolute File Number Relative File Number Block Number Object ID Buffer Address

13

2005 Julian Dyke

juliandyke.com

Working Sets


Introduced in Oracle 8.1.5 Each buffer pool contains one or more working sets Working set header  created in shared pool permanent memory  associated with one DBWn process  protected by cache buffers lru chain latch Each working set maintains separate set of LRU lists

14

2005 Julian Dyke

juliandyke.com

LRU Lists


In Oracle 9.2 each working set maintains 4 LRU lists




LRU - replacement list - normal blocks LRU-W - write list - dirty blocks LRU-XO - object list - buffers involved in  DROP  TRUNCATE LRU-XR - range list - buffers involved in  ALTER TABLESPACE BEGIN BACKUP  ALTER TABLESPACE END BACKUP  ALTER TABLESPACE OFFLINE  ALTER TABLESPACE READ ONLY

15

2005 Julian Dyke

juliandyke.com

Main and Auxiliary Lists




Each LRU contains  main list  auxiliary list Auxiliary list includes  dirty buffers identified by DBWn processes  buffers being written Buffers are moved from main to auxiliary list by DBWn processes to avoid unnecessary scans Processes scan auxiliary lists first for free buffers Buffers also allocated to auxiliary list  at startup  after FLUSH_CACHE

16

2005 Julian Dyke

juliandyke.com

Working Set Lists


Working Set Header Hot Replacement List MAIN AUX Write List MAIN AUX Object List MAIN AUX Range List MAIN AUX Buffer Header Cold

17

2005 Julian Dyke

juliandyke.com

Replacement List
      

In Oracle 8.1.5 and above a mid-point insertion algorithm is used Buffer cache has a hot end and a cold end Buffers are inserted at mid-point Mid-point is head of cold end Starts at hot end - moves down cache Maximum mid-point determined by _db_percent_hot_default Default value is 50% Head of Hot End Head of Cold End

Hot End
18

Replacement List

Cold End

2005 Julian Dyke

juliandyke.com

X$KCBWDS


Externalises working set header


ADDR INDX INST_ID SET_ID DBWR_NUM BLK_SIZE NXT_REPL PRV_REPL NXT_REPLAX PRV_REPLAX CNUM_REPL ANUM_REPL COLD_HD HBMAX HBUFS NXT_WRITE RAW(4) NUMBER NUMBER NUMBER NUMBER NUMBER RAW(4) RAW(4) RAW(4) RAW(4) RAW(4) RAW(4) RAW(4) NUMBER NUMBER RAW(4)

Working Set ID Database Writer Number MAIN Replacement List AUX Replacement List Number of buffers on MAIN Replacement List Number of buffers on AUX Replacement List Insertion Point Maximum number of Hot Buffers Number of Hot Buffers

19

2005 Julian Dyke

juliandyke.com

Touch Count


Each buffer header maintains  touch count  timestamp Touch count represents number of 3 second intervals in which buffer has been accessed since  buffer last read into cache  touch count last reset Each time buffer is accessed  if timestamp more than 3 seconds ago  increment touch count  set timestamp to current time

20

2005 Julian Dyke

juliandyke.com

Touch Count


When buffer reaches tail of cold end  If touch count >= 2 then buffer is moved to hot end  Otherwise used as next free buffer Hot criteria determined by  _db_aging_hot_criteria  default value is 2 touches Time interval determined by  _db_aging_touch_time  default value is 3 seconds

21

2005 Julian Dyke

juliandyke.com

Single versus Multi-Block Reads




Single block reads  Used with current reads  Can be used with consistent reads  Waits recorded by db file sequential read Multi block reads  Frequently used with consistent reads  Maximum number of physical blocks read specified by DB_FILE_MULTIBLOCK_READ_COUNT  Waits recorded by db file scattered read  Blocks moved to cold end of buffer cache

22

2005 Julian Dyke

juliandyke.com

Single-Block Reads
Head of Hot End 42 71 92 0 71 92 34 0 3 92 34 72 0 3 4 34 72 45 4 2 Head of Cold End 87 72 33 45 42 52 11 4 2 1 72 33 45 42 52 71 11 4 1 2 33 45 42 52 71 66 11 1 2 0 Block Number

45 42 52 71 66 49 11 1 2 0

Read Block 42 87 34 33 11 Set firsttouchat head Move buffer42 to head Insert block 71contents Update buffer count of cold Get touch counthead of cold Insert buffer at Get first available buffer available buffer on of block 42 forhotcold end endblock 42 from cold71 to zero end from end end 34

87 33 42 11 1

Touch Count

23

STOP

2005 Julian Dyke

juliandyke.com

Consistent Reads
Current Block

Head of Hot End 40 56

Head of Cold End 27 17 132 128 34 17 27 132

Block Number 17 34 27 150 System Change 27 Number 128 132 150 95 34 27 150

Consistent Block

33 27 95 150

95 33 85

Read Block 27 - SCN 132 128 Insert bufferto head of Applyconsistent version Read currentatrollback cold Get first available buffer of undo version of end to into 128 block 27 SCN buffer from cold end132

24

STOP

2005 Julian Dyke

juliandyke.com

Multi-Block Reads
DB_FILE_MULTIBLOCK_READ_COUNT = 4 Head of Hot End Head of Cold End

8 7 4 3 2 1 6 5

6 7 3 2 1 5

6 5 2 1

5 1

1 5

5 2 1 6

7 6 5 3 2 1

8 7 6 5 4 3 2 1

Read Block3 6 5 2 8 Read Block 1 7 4 Insert next fourtohead end Move buffers atblocksof Read block 1 availableend Move next four cold end Read block 2 to cold into Get next four availableinto Move buffers head Insert block3 at cold 4 7 8 6 Get first four5 toblocks of cold endfrom cold end buffers from cold end buffers cold end

5 4 1

6 3 2

7 2 3

8 1 4

25

STOP

2005 Julian Dyke

juliandyke.com

Dirty Blocks


When blocks are updated they are marked dirty Changes immediately written to redo buffer Changes written back to disk asynchronously by DBWn process DBWn process  scans from cold end of MAIN replacement list  moves dirty blocks to auxiliary list  writes dirty blocks back to disk Written blocks remain on auxiliary list until re-used

26

2005 Julian Dyke

juliandyke.com

Buffer Pinning


In Oracle 8.0 and above, Oracle uses pinning to reduce number of logical I/Os If buffer will be accessed again by the statement, it is pinned in the buffer cache Frequently used with index scans Only appears to be used with consistent gets  not observed with current gets If pinning was not implemented, number of logical I/Os would significantly increase

27

2005 Julian Dyke

juliandyke.com

Buffer Pinning Statistics




buffer is not pinned count statistic  Number of pin-able buffers not pinned by this session when visited  Equivalent to number of logical I/Os (for that part of statement) buffer is pinned count statistic  Number of buffers already pinned by this session when visited Number of buffers visited =  buffer is not pinned count + buffer is pinned count

28

2005 Julian Dyke

juliandyke.com

Consistent Gets Statistics




consistent gets - examination statistic  Number of consistent gets that could be immediately performed without pinning the buffer  Generally apply to indexes  Require one latch get  Included in consistent gets statistic no work - consistent read gets statistic  Number of consistent gets that could be performed without requiring rollback or cleanout  Generally apply to tables  Require two latch gets  Included in consistent gets statistic

29

2005 Julian Dyke

juliandyke.com

Full Table Scan


SELECT SUM(c2) FROM t1;
0 SELECT STATEMENT 1 0 TABLE ACCESS (FULL) OF 'T1'

Read Block 1 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 Empty Block Data Block Segment Header 1 3 2

session logical reads consistent gets no work - consistent read gets buffer is not pinned count table scans (short tables) table scans rows gotten table scans blocks gotten Empty Blocks

23 22 20 19 21 18 15 13 12 4 2 17 16 10 11 14 9 8 7 6 5 3 1 5 1 9 15 13 12 23 22 20 19 21 18 17 16 10 11 14 4 8 2 7 6 3 6 18 17 16 15 12 10 1 9 5 4 2 14 13 11 8 7 3 12 10 9 5 4 3 2 17 16 14 13 11 1 8 7 6 18 15 1 48 40 36 20 56 52 32 28 24 16 12 44 8 68 64 4 72 60 12 10 9 14 13 11 8 7 6 18 17 16 15 1 5 4 3 2 Unused Blocks

Segment Header

Data Blocks

Table T1 High Water Mark

30

STOP

2005 Julian Dyke

juliandyke.com

Full Table Scan - Summary




In Oracle 9.2  segment header initially read 3 times  segment header read again every 10 extents All blocks are read up to high water mark For longer tables blocks can be prefetched Algorithm differs for Automatic Segment Space Managed tablespaces

31

2005 Julian Dyke

juliandyke.com

Unique Scan
SELECT c2 FROM t1 WHERE c1 = 42;
0 SELECT STATEMENT 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1' 2 1 INDEX (UNIQUE SCAN) OF 'I1'

Table Read Index Block 3 4 1 Data Leaf Block Branch Block Index I1

session logical reads consistent gets consistent gets - examination buffer is not pinned count index fetch by key table fetch by rowid rows fetched by callback Branch Block Leaf Blocks

2 3 1 3 2 1 3 2 1 2 1 1 1 1

Table T1
32
STOP

2005 Julian Dyke

juliandyke.com

Index Organised Table


SELECT c2 FROM t1 WHERE c1 = 42;
0 SELECT STATEMENT 1 0 INDEX (UNIQUE SCAN) OF 'I1'

session logical reads consistent gets consistent gets - examination index fetch by key

2 1 2 1 2 1 1

3 Read Index Block 1 Leaf Block Branch Block

Index I1

Branch Block Leaf Blocks

33

STOP

2005 Julian Dyke

juliandyke.com

Single Table Hash Cluster


SELECT c2 FROM t1 WHERE c1 = 42;
0 SELECT STATEMENT 1 0 TABLE ACCESS (HASH) OF 'T1'

Read Table Block 7 Data Block

session logical reads consistent gets no work - consistent read gets cluster key scans cluster key scan block gets buffer is not pinned count

1 1 1 1 1 1

Table T1 Leaf Blocks

34

STOP

2005 Julian Dyke

juliandyke.com

Clustering Factor


Measures relationship between index entries and corresponding data blocks Used by CBO to calculate cost of using index Good clustering factor approaches number of blocks in table; Bad clustering factor approaches number of rows in table CBO will favour indexes with a better clustering factor

 

Bad Clustering Factor

Good Clustering Factor

35

2005 Julian Dyke

juliandyke.com

Range Scan - Bad Clustering Factor


SELECT c2 FROM t1 WHERE c3 = 42;
0 SELECT STATEMENT 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1' 2 1 INDEX (RANGE SCAN) OF 'I2'

Read Index Block 2 Table 22 18 14 10 6 3 1 Data Leaf Block (Pinned) Branch Block Index I2

session logical reads consistent gets consistent gets - examination no work - consistent read gets index scans kdiixs1 buffer is not pinned count buffer is pinned count table fetch by rowid Branch Block Leaf Blocks

8 6 5 4 2 1 7 3 8 6 5 4 2 1 7 3 1 1 6 4 3 2 5 1 1 1 7 5 4 3 1 6 2 5 3 2 1 4 6 4 3 2 5 1

Table T1
36
STOP

2005 Julian Dyke

juliandyke.com

Range Scan - Good Clustering Factor


SELECT c2 FROM t1 WHERE c4 = 42;
0 SELECT STATEMENT 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1' 2 1 INDEX (RANGE SCAN) OF 'I3'

Read Index Block 8 Table 9 3 4 1 Data Leaf Block (Pinned) Branch Block Index I3

session logical reads consistent gets consistent gets - examination no work - consistent read gets index scans kdiixs1 buffer is not pinned count buffer is pinned count table fetch by rowid Branch Block Leaf Blocks

4 3 2 1 3 4 3 2 1 3 1 1 2 1 1 1 1 3 2 1 2 9 8 6 2 1 5 3 7 4 6 5 4 2 1 3 2

Table T1
37
STOP

2005 Julian Dyke

juliandyke.com

Clustering Factor - Summary


Bad Clustering Factor session logical reads consistent gets consistent gets - examination no work - consistent gets index scans kdiixs1 buffer is not pinned count buffer is pinned count table fetch by rowid 8 8 1 6 1 7 5 6 Good Clustering Factor 4 4 1 2 1 3 9 6

Higher clustering factor  Reduces number of logical I/Os required  Increases number of buffers that can be pinned

38

2005 Julian Dyke

juliandyke.com

Row Prefetching


For queries returning more than one row specify maximum number of rows per round trip If prefetch size too small  Increased number of round trips  Degrades performance If prefetch size too large  Increased number of packets  May degrade performance

39

2005 Julian Dyke

juliandyke.com

Row Prefetching


Applies to  OCI  Pro*C  JDBC  PL/SQL  SQL*Plus

OCI_ATTR_PREFETCH_ROWS Host Array setRowPrefetch () BULK COLLECT SET ARRAYSIZE

OCI default prefetch value is 1 (returns 2 rows per fetch)


res = OCIAttrSet ( (dvoid *)stmt, (ub4)OCI_HTYPE_STMT, (dvoid *)&prefetchRows, (ub4)0, (ub4)OCI_ATTR_PREFETCH_ROWS, (OCIError *)err );

40

2005 Julian Dyke

juliandyke.com

Row Prefetching


Example - full table scan  1000 row table  31 blocks (+ segment header)
Prefetch Size 1 2 3 4 5 10 20 50 100 250 500 1000 Consistent Gets 1003 337 276 227 130 82 53 43 37 35 34 518

Consistent Gets

Prefetch Size

41

2005 Julian Dyke

juliandyke.com

Thank you for your interest


For more information and to provide feedback please contact me My e-mail address is: info@juliandyke.com My website address is: www.juliandyke.com

42

2005 Julian Dyke

juliandyke.com