You are on page 1of 102

Oaktable

          

Jonathan Lewis and ORACLE_TRACE Oracle_Trace crashes my Database I start the SGA attach by searching every offset Anjo Kolk says James Morle wrote a program using x$ksmmem I show James my first draft using x$ksmmem James is baffled by why I'm hard coding offsets James says the offsets are in some X$ table I search, turn up a mail by Jonathan Lewison x$kqfco Goldmine all the offsets Thanks Mogens Nogard! Thanks to TomKyte's Decimal to Hex

http://oraperf.sourceforge.net

Direct Oracle SGA Memory Access


Reading data directly from Oracles shared memory segment using C code
Wednesday, July 06, 2011

SGA on UNIX
SMON Snnn Dnnn Pnnn

PMON
Shared Pool

SGA
Database Buffer Cache

Redo Log Buffer

CKPT

DBWR ARCH LGWR oracle sqlplus

Machine Memory

SGA on NT
Snnn SMON
Shared Pool Database Buffer Cache Redo Log Buffer

Dnnn

Pnnn

CKPT

Machine Memory

PMON

DBWR

LGWR

ARCH

oracle

Process Space

sqlplus

What is the SGA


 Memory Cache  Often Used Data  Rapid Access  Shareable  Concurrently Access

SGA 4 main regions

 Fixed information

Users info Database statistics X$dual etc

 Data block cache  SQL cache ( library cache/shared pool)  Redo log buffer

How is the SGA info Used?

 Automatically

data blocks cached Log buffer Sql cache Updates of system and user statistics

 User Queries

User info v$session System info v$parameter Performance statistics v$sysstat, v$latch, v$system_event Buffer cache headers, x$bh

Why Direct Access with C?

 Reading Hidden Information


Sort info on version 7 OPS locking info version 8 Contents of data blocks (only the headers or visible in X$)

 Access while Database is Hung  High Speed Access


Sampling User Waits, catch ephemeral data Scan LRU chain in X$bh Statistically approximate statistics  SQL statistics per user

 Low overhead

Database Slow or Hung


Often happens at the largest sites when cutting edge support is expected.      Shared Pool errors ORA 4031 Archiver or Log file Switch Hangs Hang Bugs Library Cache Latch contention ORA-00379: no free buffers available in buffer pool DEFAULT

Statistical Sampling

By Rapidly Sampling SQL statistics and the users who have the statistics open, one can see how much work a particular user does with a particular SQL statement

Low Overhead
 Marketing Appeal  Clients are sensitive about their production databases  Heisenberg uncertainty affect less overhead less affect monitoring has on performance which we are monitoring

SGA made visible through x$tables


     Most of the SGA is not visible X$KSMMEM Exception, Raw Dump of SGA Information Externalized through X$ tables Useful or Necessary information is Externalized Externalized publicly through V$ Tables

Machine Memory

0x80000000

SGA

SGA

Buffer Cache

Graphic SGA
SGA
0x80000000

Fixed Area
Buffer Cache

Shared Pool Log Buffer

Fixed Area
SGA
0x80000000

X$KSUSECST- user waits

0x85251EF4

X$KSUSECST
170 Records
2328 bytes
Row 1 Row 3 Row 2

0x85251EF4

X$KSUSECST Record
One Record in X$KSUSECST

1276

2328 bytes

X$KSUSECST Fields

1276 Seq #

1278
Event #

1280 p1

1284 p2

1288 p3

Externalization of C structs: X$ tables


If Structure foo was externalized in a X$ SQL> describe x$foo Column Name -----------------------------ADDR INDX ID B Type -------RAW(8) NUMBER NUMBER NUMBER

SGA is One Large C Struct

struct foo { int id; int A; int B; int C; }; struct foo foo[N];

Struct C code
#include <stdio.h> #include <fcntl.h> #define N 20 /* structure definition: */

struct foo { int id; int a; int b; int c; };


/* end structure definition */

Struct Record
main(){ struct foo foo[20]; int fptr; /* zero out memory of struct */ memset(foo,0,sizeof(foo));

foo[0].id=1; foo[0].a=12; foo[0].b=13; foo[0].c=13;

/* row 0 */

Struct Write to File


foo[1].id=2; foo[1].a=22; foo[1].b=23; foo[1].c=24; /* row 1 */

/* write to file, simulate SGA */ if ((fptr = open("foo.out",O_WRONLY | O_CREAT,0777)) < 0 ) return -1;

write(fptr,foo,sizeof(foo));
return 0; }

Simulate SGA with a File

write(fp,foo,sizeof(foo));

Simulate SGA with a File


Row 0 Row 1

ID 0 0 0 0 1 4 6 4 4

C 4 1 8 C 2 1 4

ID 6 8 1 2 4 0 1 1 6 0 2 2 0 4 Increasing 0 4

3 8 2 8 1 Memory address 0

bits bytes hex bytes oct bytes

Struct File Contents $ ./foo $ ls -l foo.out -rw-r--r-- joe dba 320 Feb 10 19:41 foo.out int = 32 bits Int = 4 bytes 20 entries * 4 int * 4 bytes/int = 320 bytes

od octal dump
$ od -l foo.out 1 0000000 2 0000020 0000040 0 * 0000500 12 22 0 13 23 0 13 24 0

Struct File Contents

Address is in Hex Column 2 is the ID Column 3 is field A Column 4 is field B Column 5 is field C

X$ tables ?

 Ok, x$foo =~ foo[20]  How do I get a list of x$ tables?  Where is each X$ located?  V$Fixed_Tables

V$Fixed_Table list of X$ tables


SQL> desc v$fixed_table; Name Null? Type ----------------------------------------- -------- ----------------NAME VARCHAR2(30) OBJECT_ID NUMBER TYPE VARCHAR2(5) TABLE_NUM NUMBER

Graphic: X$ Addresses
SGA
0x80000000

0x8???????? X$????

V$Fixed_Table
spool addr.sql select 'select 'addr, ||''''||name||''''||' from ' || name ||' where rownum < 2;' from v$fixed_table where name like 'X%' / spool off @addr.sql

Example: finding the address


select a.addr , 'X$KSUSE' from X$KSUSE where rownum < 2 ;

X$ layout
6802B244 X$KSLEMAP 6802B7EC X$KSLEI 6820B758 X$KSURU 6820B758 X$KSUSE - v$session 6820B758 X$KSUSECST v$session_wait 6820B758 X$KSUSESTA v$session_stat 6820B758 X$KSUSIO 6826FBD0 X$KSMDD 6831EA0C X$KSRCHDL

What's in these X$ views


 V$ views are documented  V$ views are based often on X$ tables  The map from v$ to X$ is described in : V$Fixed_View_Definition

V$Fixed_View_Definition
SQL> desc V$Fixed_View_Definition Name Type ----------------------------------- -------------VIEW_NAME VARCHAR2(30) VIEW_DEFINITION VARCHAR2(4000)

Definition of V$Session_Wait
SQL> select VIEW_DEFINITION from V$FIXED_VIEW_DEFINITION where view_name='GV$SESSION_WAIT'; VIEW_DEFINITION ----------------------------------------------------------------------select s.inst_id,s.indx,s.ksussseq,e.kslednam, e.ksledp1,s.ksussp1,s.ksussp1r,e. ksledp2, s.ksussp2,s.ksussp2r,e.ksledp3,s.ksussp3,s.ksussp3r, decode(s.ksusstim, 0,0,-1,-1,-2,-2, decode(round(s.ksusstim/10000),0,-1,round(s.ksusstim/10000))) , s.ksusewtm, decode(s.ksusstim, 0, 'WAITING', -2, 'WAITED UNKNOWN TIME', -1, ' WAITED SHORT TIME', 'WAITED KNOWN TIME') from x$ksusecst s, x$ksled e where bit and(s.ksspaflg,1)!=0 and bitand(s.ksuseflg,1)!=0 and s.ksussseq!=0 and s.ksussop c=e.indx

The Fields in X$ tables


 OK, I've picked an X$  I've got the starting address  Now, how do I get the fields?

X$KQFTA
 Kernel Query Fixed_view Table  INDX use to find column information  KQFTANAM X$ table names

X$KQFCO
     Kernel Query Fixed_view Column KQFCOTAB Join with X$KQFTA.INDX KQFCONAM Column name KQFCOOFF Offset from beginning of the row KQFCOSIZ Columns size in bytes

X$KSUSECST Fields

1276 Seq #

1278
Event #

1280 p1

1284 p2

1288 p3

Address

BYTES

SGA Contents in Resume


In resume: Oracle takes the C structure defining the SGA and maps it onto a shared memory segment 0x800000 0 Fixed SGA
Memory address Increasing

Buffer Redo Library Cache Buffer Cache Oracle provides access to some of the SGA contents via X$ tables

**** Procedure *****


1. 2. 3. 4. 5. 6. 7. 8. Choose a V$ view Find base X$ Tables for v$ view Map X$ fields to V$ fields Get address of X$ table in SGA Get the size of each record in X$ table Get the number of records in X$ table Get offsets for each desired field in X$ table Get the base address of SGA

1) V$SESSION_WAIT Example

     

List of all users waiting Detailed information on the waits Data is ephemeral Useful in Bottleneck diagnostics High sampling rate candidate Event 10046 captures this info Good table for SGA sampling

V$SESSION_WAIT Description
SQL> desc v$session_wait Name ----------------------------------------SID SEQ# EVENT P1TEXT P1 P1RAW P2TEXT P2 P2RAW P3TEXT P3 P3RAW WAIT_TIME SECONDS_IN_WAIT STATE ) Type -------------------------,NUMBER ,NUMBER ,VARCHAR2(64) ,VARCHAR2(64) ,NUMBER ,RAW(4) ,VARCHAR2(64) ,NUMBER ,RAW(4) ,VARCHAR2(64) ,NUMBER ,RAW(4) ,NUMBER ,NUMBER ,VARCHAR2(19)

V$SESSION_WAIT Short
SQL> desc v$session_wait Name Type ---------------------------------------SID NUMBER SEQ# NUMBER EVENT VARCHAR2(64) P1 NUMBER P2 NUMBER P3 NUMBER)

V$FIXED_VIEW_DEFINITION Gives mappings of V$ views to X$ tables SQL> select VIEW_DEFINITION from V$FIXED_VIEW_DEFINITION where view_name='V$SESSION_WAIT;

V$SESSION_WAIT View Definition


VIEW_DEFINITION --------------------------------------------------------------------select s.inst_id, s.indx, s.ksussseq, e.kslednam, e.ksledp1, s.ksussp1, s.ksussp1r, e.ksledp2, s.ksussp2, s.ksussp2r, e.ksledp3, s.ksussp3, s.ksussp3r, round(s.ksusstim / 10000), s.ksusewtm, decode(s.ksusstim, 0, 'WAITING', -2, 'WAITED UNKNOWN TIME', -1, 'WAITED SHORT TIME', 'WAITED KNOWN TIME') from x$ksusecst s, x$ksled e where bitand(s.ksspaflg,1)!=0 and bitand(s.ksuseflg,1)!=0 and s.ksussseq!=0 and s.ksussopc=e.indx

View Definition Short


VIEW_DEFINITION --------------------------------------------------------------------select s.indx, s.ksussseq, e.kslednam, s.ksussp1, s.ksussp2, s.ksussp3 from x$ksusecst s, x$ksled e where s.ksussopc=e.indx

2) V$SESSION_WAIT Based on X$KSUSECT


VIEW_DEFINITION ---------------------------------------------------select indx, ksussseq, ksussopc, ksussp1, ksussp2, ksussp3 from x$ksusecst

Equivalent SQL Statements


select indx, ksussseq, ksussopc, ksussp1, ksussp2, ksussp3 from x$ksusecst from v$session_wait ) select sid seq# event p1 p2 p3

Note: x$ksusecst. Ksussopc is the event # x$ksled.kslednam is a list of the event names where x$ksled.indx = x$ksusecst. ksussopc

3) V$ to X$ Field Mapping

4) Get base SGA address for X$ table

Find the location of X$KSUSECST in the SGA SQL> select addr from x$ksusecst where rownum < 2 ADDR -------85251EF4

5) Find the Size of Each Record


SQL> select ((to_dec(e.addr)-to_dec(s.addr))) row_size from (select addr from x$ksusecst where rownum < 2) s, (select max(addr) addr from x$ksusecst where rownum < 3) e ;

ROW_SIZE ---------------2328

6) Find the Number of Records in the structure


SQL> select count(*) from x$ksusecst ; COUNT(*) -------------170

Get Offsets for Each Desired Field in X$ table


SQL> select c.kqfconam field_name, c.kqfcooff offset, c.kqfcosiz sz from x$kqfco c, x$kqfta t where t.indx = c.kqfcotab and t.kqftanam='X$KSUSECST' order by offset ;

X$KQFTA - X$ Tables Names


List of X$ tables   INDX use to find column information KQFTANAM X$ table names

To get Column information join with X$KQFCO X$KQFTA.INDX = X$KQFCO.KQFCOTAB

X$KQFCO X$ Table Columns


List of all the columns in X$ Tables     KQFCOTAB KQFCONAM KQFCOOFF KQFCOSIZ Join with X$KQFTA.INDX Column name Offset from beginning of the row Columns size in bytes

Field Offsets
FIELD_NAME -----------------------------ADDR INDX KSUSEWTM INST_ID KSSPAFLG KSUSSSEQ KSUSSOPC KSUSSP1 KSUSSP1R KSUSSP2 KSUSSP2R KSUSSP3 KSUSSP3R KSUSSTIM KSUSENUM KSUSEFLG OFFSET ---------0 0 0 0 1 1276 1278 1280 1280 1284 1284 1288 1288 1292 1300 1308 SZ ---------4 4 4 4 1 2 2 4 4 4 4 4 4 4 2 4

What are all the fields at OFFSET 0?


These are all calculated values and not stored explicitly in the SGA. ADDR INDX INST_ID KSUSEWTM memory address record number, like rownum database instance ID calculated field

Unexposed Fields
What happens between OFFSET 1 and 1276? Unexposed Fields Sometimes exposed elsewhere, in our case

V$SESSION V$SESSTAT

Fields at Same Address


Why do some fields start at the same address? KSUSSP1 KSUSSP1R Are at the same address Equivalent of V$SESSION_WAIT.P1 V$SESSION_WAIT.P1RAW These are the same data, just exposed as Hex Decimal

7) Offsets of Fields

8) Get Base SGA Address


SQL> select addr from x$ksmmem where rownum < 2 ADDR -------------80000000

Results X$KSUSECST

Machine Memory

0x80000000

SGA

SGA

Fixed Area
SGA
0x80000000

X$KSUSECST- user waits

0x85251EF4

X$KSUSECST
170 Records
2328 bytes
Row 1 Row 3 Row 2

0x85251EF4

X$KSUSECST Record
One Record in X$KSUSECST

1276

2328 bytes

X$KSUSECST Fields

1276 Seq #

1278
Event #

1280 p1

1284 p2

1288 p3

Attaching to the SGA


 UNIX System Call shmat To attach to shared memory Unix as a system call void *shmat( int shmid, const void *shmaddr, int shmflg

);

ID and Address arguments to shmat


The arguments are:
  

shmid shared memory identifier specified shmaddr starting address of the shared memory shmflg - flags

The argument shmflg can be set to SHM_RDONLY . To avoid any possible data corruption the SGA should only be attached read only. The arguments shmid and shmaddr need to be set to Oracles SGA id and address.

Finding Oracle SGAs ID and Address


Use ORADEBUG to find the SGA id SQL> oradebug setmypid Statement processed. SQL> oradebug ipc Information written to trace file.

Finding Trace File


SQL> show parameters user_dump
NAME VALUE ----------------------- -------------------------------user_dump_dest /u02/app/oracle/admin/V901/udump SQL> exit $ cd /u02/app/oracle/admin/V901/udump $ ls -ltr | tail -1 -rw-r----- usupport dba Aug 24 18:01 v901_ora_23179.trc

Finding SHMID in Trace File


$ vi v901_ora_23179.trc Total size 004456c Minimum Subarea size 00000000 Area Subarea Shmid Stable Addr Actual Addr 0 0 34401 0080000000 0080000000

Attaching to the SGA


Shmid Shmaddr Shmflg 34401 0x80000000 SHM_RDONLY

The SGA attach call in C would be: Shmat(34401, 0x80000000, SHM_RDONLY); This call needs to be executed as a UNIX user who has read permission to the Oracle SGA

C Code Headers
#include <stdio.h> #include <sys/ipc.h> #include <sys/shm.h> #include <errno.h> #include "event.h" event.h is for translating the event #s into event names

Events.h
Spool events.h select 'char event[][100]={' from dual; select '"'||name||'",' from v$event_name; select ' "" };' from dual; spool off

Define Base Addresses and Sizes


/* SGA BASE ADDRESS */ #define SGA_BASE 0x80000000

/* START ADDR of KSUSECST(V$SESSION_WAIT) */ #define KSUSECST_ADDR 0x85251EF4 /* NUMBER of ROWS/RECORDS in KSUSECST */ #define SESSIONS 150 /* SIZE in BYTES of a ROW in KSUSECST */ #define RECORD_SZ 2328

Define Offsets to Fields


#define KSUSSSEQ 1276 #define KSUSSOPC 1278 #define KSUSSP1R 1280 #define KSUSSP2R 1284 #define KSUSSP3R 1288 /* sequence # */ /* event # */ /* p1 */ /* p2 */ /* p3 */

Set Up Variables
main(argc, argv) int argc; char **argv; { void *addr; int shmid; int shmaddr; void *current_addr; long p1r, p2r, p3r; unsigned int i, seq, tim, flg, evn;

Attach to SGA
/* ATTACH TO SGA */ shmid=atoi(argv[1]); shmaddr=SGA_BASE; if ( (void *)shmat( shmid, (void *)shmaddr, SHM_RDONLY) == (void *)-1 ) { printf("shmat: error attatching to SGA\n"); exit(); }

Set Up Sampling Loop


/* LOOP OVER ALL SESSIONS until CANCEL */ while (1) { /* set current address to beginning of Table */ current_addr=(void *)KSUSECST_ADDR; sleep(1); printf("^[[H ^[[J"); /* clear screen */ /* print page heading */ printf("%4s %8s %-20.20s %10s %10s %10s \n", "sid", "seq", "wait","p1","p2","p3");

Loop over all Sessions


for ( i=0; i < SESSIONS ; i++ ) { seq=*(unsigned short *)((int)current_addr+KSUSSSEQ); evn=*(short *) ((int)current_addr+KSUSSOPC); p1r=*(long *) ((int)current_addr+KSUSSP1R); p2r=*(long *) ((int)current_addr+KSUSSP2R); p3r=*(long *) ((int)current_addr+KSUSSP3R); if ( evn != 0 ) { printf("%4d %8u %-20.20s %10X %10X %10X \n", i, seq, event[evn] ,p1r, p2r,p3r ); } current_addr=(void *)((int)current_addr+RECORD_SZ); } } }

Output
$ sga_read_session_wait 34401 sid seq wait p1 0 40582 pmon timer 12C 1 40452 rdbms ipc message 12C 2 43248 rdbms ipc message 12C 3 24706 rdbms ipc message 12C 4 736 smon timer 12C 5 88 rdbms ipc message 2BF20 8 178 SQL*Net message from 6265710 p2 0 0 0 0 0 0 1 p3 0 0 0 0 0 0 0

Pitfalls
      Byte Swapping 32 bit vs 64 bit Multiple Shared Memory Segments Segmented Memory Addresses are "unsigned int" Misaligned Access

Little Endian vs Big Endian


 Is low byte values first or high byte values first ?  a byte is 8 bits

00000000-11111111 bits,0 31 dec, 0x0 - 0xFF hex

 Big Endian is "normal" , highest bit first  In ascii, the word "byte" is stored as

b = 62, y = 79, t = 74, e = 65 b y t e 62 79 74 65

 echo 'byte' | od -x

 Little Endian, ie byte swapped (Linux, OSF, Sequent, ? )


y b e t 79 62 65 74

Byte Swap Example


Short = 2 bytes ie 16 bits Goal, get the flag in the "second" byte #ifdef __linux uflg=*(short *)((int)sga_address)>>8; #else uflg=*(short *)((int)sga_address); #endif

Byte Swap
Big Endian: 00 00 00 00 00 00 00 01 Little Endian 00 00 00 01 00 00 00 00 Solution, push the value over 8 places, to the right, ie >>8

64 bit vs 32 bit
 SQL> desc x$ksmmem
Name ------------------------------------ADDR INDX INST_ID KSMMMVAL -> 32 bit Raw(8) -> 64 bit Type --------RAW(4) NUMBER NUMBER RAW(4)

Segmented Memory
x$ksuse can be dis-contiguous Work around: select 'int users[]={' from dual; select '0x'||addr||',' from x$ksuse; select '0x0};' from dual;

Misaligned Access
 Some platforms seg fault when addressing misaligned bytes, need to read in even bytes or units of 4 bytes depending on platform

x$ksusecst Record: What's Missing?


One Record in X$KSUSECST ???
1276

???

2328 bytes

Select Addr from X$? where Rownum< 2;


6802B244 X$KSLEMAP 6802B7EC X$KSLEI 6820B758 X$KSURU 6820B758 X$KSUSE v$session 6820B758 X$KSUSECST v$session_wait 6820B758 X$KSUSESTA v$sesstat 6820B758 X$KSUSIO 6826FBD0 X$KSMDD 6831EA0C X$KSRCHDL

x$ksuse Record Contains x$ksusecst


One Record in X$Ksusecst
v$session v$sesstat
236

v$session_wait v$session
1276

2328 bytes

x$ksusesta

x$ksusecst

x$ksuse

Getting v$sesstat addresses


select '#define '|| upper(translate(s.name,' :-()/*''','________'))||' '|| to_char(c.kqfcooff + STATISTIC# * 4 ) from x$kqfco c, x$kqfta t, v$statname s where t.indx = c.kqfcotab and ( t.kqftanam='X$KSUSESTA' ) and c.kqfconam='KSUSESTV' and kqfcooff > 0 order by c.kqfcooff /

User Drilldown Query: 4 joins


select w.sid sid, w.seq# seq, w.event event, w.p1raw p1, w.p2raw p2, w.p3raw p3, w.SECONDS_IN_WAIT ctime, s.sql_hash_value sqlhash, s.prev_hash_value psqlhash, st.value cpu from v$session s, v$sesstat st, v$statname sn, v$session_wait w where w.sid = s.sid and st.sid = s.sid and st.statistic# = sn.statistic# and sn.name = 'CPU used when call started' and w.event != 'SQL*Net message from client' order by w.sid;

Other Fun Stuff


The next example is output from an SGA program that follows the LRU of the Buffer Cache The program demonstrates the insertion point of LRU cold end of LRU hot end of the LRU Full Table Scan Insertion Point

LRU HOT

LRU COLD

You might also like