You are on page 1of 21

Profiling Oracle:

How it Works

Cary Millsap
Hotsos Enterprises, Ltd.

Revised 31 January 2005

Conventional Oracle “tuning” methods are outrageously complex, and they don’t
work. For more than three decades, good software developers have known how
to optimize computer-executed tasks using a simple method that works every
time. This presentation describes how to use this method in an Oracle
environment, and it demonstrates the method’s use with an example.

1 How Long? Why?


After you strip everything away except for what you really want, performance
analysis is the study of how long something takes. In daily life, you analyze
performance every day. Getting places on time, choosing the right queue at the
toll plaza, minimizing errand times: each is a tiny exercise in performance
analysis. In life, the study of performance is the study of how long something
takes. To make something faster, you begin by considering exactly where the
time went for the individual task that’s most interesting to you.

In your Oracle life, though, you’ve been trained to view things differently.
Professionals have taught you that looking directly at how long an individual
task takes is either unproductive or impossible. You’re taught that the only way
to analyze Oracle performance is by studying a complex network of statistics that
don’t exactly answer the real question you’re trying to answer.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 1


You can see a great example of what I’m talking about by looking at a standard
Oracle Statspack report.1 Nowhere on a Statspack report does there appear any
account of how long your particular program took. Instead, Statspack gives you a
completely different view. It gives you information about resources consumed on
your system while your program was running—including resources your
program consumed plus resources consumed by anything else running during the
same observation period. Statspack gives you information from the system’s
perspective, not from your perspective.

Asking Statspack how long something took is rather like asking how long a flight
from Dallas to Chicago should take, and then having someone give you an
answer like this:

“The average fuel consumption per flight for all flights conducted upon the
planet in the past year is x.”

There are only a couple of things you can do with information like that:

• You could try to extract the information you really wanted from the
information you got. For some, it’s a fun technical challenge. For example,
you might use the relationship average-flight-duration = total-fuel ÷
average-fuel-consumption-rate to try to compute some approximation of a
world-wide average flight duration. But even if you got this far, you would
need to realize that this “average flight duration” you’ve found includes
thousands of different kinds of flights. Such a number would include some
18½-hour flights from Los Angeles to Singapore, some 35-minute flights
from Honolulu to Maui, and everything in-between. What can an average
like that possibly tell you about the duration of a flight from Dallas to
Chicago? If you’re lucky, the duration of a Dallas-to-Chicago flight might
resemble the worldwide average. But, realize… You’ve got to be lucky for
the answer to do you any good at all.

• Or you could insist upon a direct answer to your question: How long does a
flight from Dallas to Chicago take? Such information is, of course,
attainable through other means.

Why doesn’t a Statspack report tell you how long your program takes? One good
reason is that such information cannot be derived from the kinds of operational
data that Statspack uses as its input (at least not from the stuff you’re taught to
give it as its input).2 Perhaps the biggest problem with Statspack is that when

1
Statspack is “a set of SQL, PL/SQL, and SQL*Plus scripts that allow the collection,
automation, storage, and viewing of performance data. This feature has been replaced by
the Automatic Workload Repository.” From Oracle® Database Performance Tuning
Guide 10g Release 1 (10.1) available at otn.oracle.com.
2
Chapter 8 of Optimizing Oracle Performance, by Cary Millsap and Jeff Holt, contains
more than 40 pages of detail describing why this is so.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 2


you’re trying to diagnose a performance problem, sometimes it tells you. The big
problem, you see, is that sometimes it doesn’t. Statspack might give you the right
answer for the same reason that a worldwide flight-time average might actually
resemble the average flight time from Dallas to Chicago. But it might not.

Traditional Oracle performance analysis methods just don’t work for several
classes of commonly occurring performance problems.3 The problem is that
there’s no direct correspondence between the metric values you’re “supposed to”
study and the goodness or badness of performance in your Oracle system.

Example: Most analysts are taught that a database buffer cache


hit ratio value near 100% is an indicator of good system
health. However, such an assumption leads to false positive
conclusions, because several types of program inefficiencies
drive nice, high database buffer cache hit ratio values. One
such problem is application code that fails to use Oracle’s
array processing features. Imagine fetching 100 rows from a
single Oracle block with 100 fetch calls. This operation would
produce a 99% buffer cache hit ratio (a cache miss on the first
read, and then 99 hits on subsequent re-reads of the same
block). Imagine fetching the same 100 rows from the same
Oracle block with just one fetch call. This single-fetch version
executes more quickly than the 100-fetch version (much more
quickly if you conduct the two tests through Oracle Net on a
WAN), but its buffer cache hit ratio would fall to 0%.4

The traditional performance analysis methods that the Oracle culture teaches its
newcomers are medieval in comparison to the tools and methods that
professional software developers use.5 Why not just look at what you’re trying to
see (what took so long)? The answer “because it’s impossible” is simply not
acceptable, because developers do it all the time. The way we do it is by using
software profilers.

3
One might argue that they’re so “commonly occurring” specifically because traditional
methods leave them undiagnosed.
4
For details and more examples, see “Why a 99%+ database buffer cache hit ratio is not
ok” and “Why you should focus on LIOs instead of PIOs,” available at www.hotsos.com.
5
That’s not to say that tools that Oracle performance analysts use don’t look nice enough.
Most traditional Oracle performance analysis tools are tremendously attractive. They just
don’t tell you where your code is spending its time.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 3


2 The Role of Profilers
A profiler is a tool that reports a useful decomposition of a computer program’s
response time, so that a performance analyst can accurately diagnose the root
cause of a performance problem. Profilers are nothing new. The principle is at
least three decades old.6 Countless software developers even today use the GNU
profiler called gprof, which has been around since 1988.7 Exhibit 1 shows an
example of a flat profile. A flat profile shows the total amount of time that a
program has spent executing each of its named internal subroutines.

Exhibit 1. Sample output of gprof.8


$ gprof
Flat profile:

Each sample counts as 0.01 seconds.


% cumul. self self total
time seconds seconds calls us/call us/call name
68.59 2.14 2.14 62135400 0.03 0.03 step
31.09 3.11 0.97 499999 1.94 6.22 nseq
0.32 3.12 0.01 main

This particular program’s response time was 3.12 seconds (you can see this in
the bottom row of the cumul. seconds column). From the output, you can see
exactly where the time went: 68.59% of the program’s response time was
consumed by 62,135,400 executions of the subroutine called step.

GNU gprof output doesn’t show exactly where a program’s


time went, but it’s a statistically valid enough indication of it
for gprof to serve plenty well as a reliable performance
diagnostic tool. The GNU gprof manual page explains the
impact of the statistical errors and how to work around them.
Oracle timing data, which I’ll get to in a moment, produces
more accurate timing information that is not susceptible to the
same statistical errors that gprof endures.

Whenever more than one thing is happening on a computer system at a given


time, a profiling tool like gprof is utterly indispensable for determining why a

6
Knuth, D. E. 1971. “Empirical study of FORTRAN programs” in Software—Practice
and Experience, Apr/Jun 1971, Vol. 1, No. 2, pp105–133. Mainframers in the audience
might remember the famous mainframe profiling tool called STROBE, or the Candle
Omegamon feature called INSPECT.
7
Fenlason, J.; Stallman, R. 1988. “The GNU profiler,” available at http://www.gnu.org/-
software/binutils/manual/gprof-2.9.1/html_mono/gprof.html.
8
Gough, B. J. 2004. An Introduction to GCC—for the GNU compilers gcc and g++.
Bristol UK: Network Theory Ltd.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 4


particular program is running longer than you want it to.9 Especially under
conditions of heterogeneous load, the kinds of hit rates and utilization metrics
that you’ve been trained to consider are stunningly unreliable indications of what
you might do to make a program run faster.

But you, the Oracle performance analyst, can’t use gprof to debug and optimize
the code in the Oracle kernel (I’ll show you why in a minute). You have to rely
upon the operational data that the Oracle kernel developers have given you. In
the 1980s, you were literally condemned to the state of not being able to find out
where your response time is going. But in late 1992, Oracle Corporation made a
significant step toward solving the problem. If you’re using any version of
Oracle at least as new as 7.0.12, then you automatically possess an essential key
to being able to profile your Oracle application programs.

3 Instrumenting Your Code


To use gprof, you have to instruct the compiler to build the appropriate profiling
library functions into your executable—when you use the GNU C compiler
called gcc, you have to use the -pg option. Thus, you can’t use gprof upon the
Oracle kernel code, because you’re not permitted to compile it. However, you
can create profile data for an Oracle session by activating a special Oracle
mechanism that causes the Oracle kernel to store its own operational timing data.
Oracle Corporation calls this mechanism extended SQL trace.

An Oracle session’s extended SQL trace attribute is easy to manipulate. Before


you get started, make sure that the Oracle instance parameter called
TIMED_STATISTICS is set to TRUE. This is safe and easy to do. (Beginning
with version 7.3, a database administrator can do it online with an ALTER
SYSTEM command.) Beginning with Oracle version 9, Oracle Corporation
began shipping Oracle database software with TRUE as the parameter’s default
value.

Once you have your TIMED_STATISTICS setting in proper shape, you can
activate and deactivate tracing using standard Oracle packages, as shown in
Exhibits 2 and 3.

9
Software profiling is a momentous rediscovery of the twenty-first century. Newer
profiling tools like the DTrace tool that is built into Sun Microsystems’s Solaris 10
kernel provide spectacular diagnostic capabilities See http://www.sun.com/2004-0518/-
feature/ for more information.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 5


Exhibit 2. Activating and deactivating Oracle extended SQL trace in Oracle
version 10.
/* If you can modify your Oracle application’s source code, then do this. */
dbms_monitor.session_trace_enable(null, null, true, true)
/* Your code to be profiled goes here. */
dbms_monitor.session_trace_disable()

/* Otherwise, if your application tasks identify themselves by service/module/action


name with either dbms_application_info or with the new version 10 OCI attributes
(OCI_ATTR_*), then do this. */
dbms_monitor.serv_mod_act_trace_enable(:service, :module, :action, true, true)
/* Code to be profiled runs during this time. */
dbms_monitor.serv_mod_act_trace_disable(:service, :module, :action) /* …or just
disconnect */

/* Otherwise, do this */
dbms_monitor.session_trace_enable(:sid, :serial, true, true)
/* Code to be profiled runs during this time. */
dbms_monitor.session_trace_disable(:sid, :serial) /* …or just disconnect */

Exhibit 3. Activating and deactivating Oracle extended SQL trace in Oracle


versions 8 and 9.
/* If you can modify your Oracle application’s source code, then do this. */
dbms_support.start_trace(true,true)
/* Your code to be profiled goes here. */
dbms_support.stop_trace()/* …or just disconnect */

/* Otherwise, do this */
dbms_support.start_trace_in_session(:sid, :serial, true, true)
/* Code to be profiled runs during this time. */
dbms_support.stop_trace_in_session(:sid, :serial) /* …or just disconnect */

4 Profiling Oracle
The problem with profiling Oracle prior to 1992 was that Oracle customers
couldn’t do it. The Oracle kernel simply couldn’t emit the operational data
required to produce a profile. This is why traditional methods of looking at
system-wide resource consumption statistics were invented in the first place (like
the ones implemented in Statspack).

When Oracle Corporation fixed that problem in version 7.0.12, we graduated to


a new problem: raw extended SQL trace data is messy and difficult to interpret.
For almost a decade, most Oracle customers either didn’t know that extended
SQL trace data even existed, or they knew about it but were unaware of how
much diagnostic power lay in the data, because the files Oracle emits are so
complicated.

Since the year 2000, a few colleagues and I have devoted our careers to
determining the minimal necessary sequence of steps required to solve an

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 6


arbitrary Oracle performance problem. What we’ve learned is that Oracle
extended SQL trace files contain so much diagnostic value that we almost never
need to look at any other operational performance data.

This way of looking at a system from the user’s experience


inward is an upside-down way to look at a system compared to
the traditional approach of looking from the guts of the system
outward. But with experience it becomes clear that looking at a
system from the inside out was the upside-down way all along.

After you learn how to extricate meaning from Oracle extended SQL trace data,
there remains only one problem. To manage the information contained within the
very large extended SQL trace files that the Oracle kernel emits, you need
software assistance. This used to be a difficult technical problem. Now it’s
merely a question of economics, because today you can buy textbooks and
education that teach you how to extract what you need from your trace data, and
you can buy prepackaged software or services if you prefer to have someone else
do the work for you.10

4.1 Flat Profiles


The GNU gprof tool serves as a good motivational metaphor for Oracle
profiling, because it produces two kinds of profiles. The first kind of profile, the
flat profile (which you’ve seen already), is a decomposition of total response
time by subroutine call. One type of flat profile for an Oracle session
decomposes response time by named subroutines inside the Oracle kernel that
Oracle Corporation calls timed events. Creating a flat profile by timed event for
a selected Oracle session is straightforward. Each year I teach a few hundred
students how to do it. Exhibit 4 shows a flat profile for a real performance
problem that my company helped a customer solve a few years ago.

10
Further information about one means of obtaining such courses, books, software, and
services is rendered in the About Hotsos section at the end of this paper.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 7


Exhibit 4. A flat profile for a problem program on an Oracle system.
Oracle subroutine Duration (secs) # Calls Dur/call
--------------------------------- ----------------- --------- -----------
SQL*Net message from client 984.010 49.6% 95,161 0.010340
SQL*Net more data from client 418.820 21.1% 3,345 0.125208
db file sequential read 279.340 14.1% 45,084 0.006196
CPU service, EXEC 136.880 6.9% 67,888 0.002016
CPU service, PARSE 74.490 3.8% 10,098 0.007377
CPU service, FETCH 37.320 1.9% 57,217 0.000652
unaccounted-for 27.720 1.4% 1 27.720000
latch free 23.690 1.2% 34,695 0.000683
log file sync 1.090 0.1% 506 0.002154
SQL*Net more data to client 0.830 0.0% 15,982 0.000052
log file switch completion 0.280 0.0% 3 0.093333
enqueue 0.250 0.0% 106 0.002358
SQL*Net message to client 0.240 0.0% 95,161 0.000003
buffer busy waits 0.220 0.0% 67 0.003284
db file scattered read 0.010 0.0% 2 0.005000
SQL*Net break/reset to client 0.000 0.0% 2 0.000000
--------------------------------- ----------------- --------- -----------
Total response time 1,985.190 100.0%

We were actually able to solve the customer’s performance problem by looking


at only the data shown in Exhibit 4. The highlighted number gave us everything
we needed to determine that the company had implemented an Oracle Net
configuration error. Diagnosis required roughly ten minutes. Testing the
solution required another half hour. Production implementation took place the
following day. The repair eliminated roughly half of the total response time
consumed by this program. However, prior to profiling the program, the
company had spent several months trying to solve the problem with traditional
performance methods—with no success whatsoever.11

Another type of flat profile that is valuable in an Oracle context is the profile by
database call. For example, it is interesting to know how much time was
consumed by parsing for a given SQL statement. How much time by executing?
And how much time by fetching? For such information to constitute a true
profile the total response time rendered in the table needs to be the total response
time contributed by the statement. To do this requires the addition of a row
denoting time spent between database calls. Sometimes the time spent between
calls dominates a statement’s total response time.

Exhibit 5 shows an example of a database call profile. In this example, you can
see that the dominant response time consumer for the statement was the time
spent between database calls, followed distantly by the time spent executing and,
finally, parsing the statement. If you’re familiar with the behavior that causes
most between-call event time that is attributable to SQL statements, the greatest
benefit of the database call profile in Exhibit 5 is the information about how

11
The problem and its solution are described in their entirety in Optimizing Oracle
Performance, pages 327–332.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 8


many calls there were to each type of database call (highlighted). There’s more
information about this in the case study later in this paper.

Exhibit 5. A flat profile that decomposes Oracle response time by database call.
-----Duration (seconds)-----
Database call Elapsed CPU Other Calls Rows LIOs PIOs
-------------------- ------------- ----- ------ ------ ----- ------ ----
Between-call events 23.850 93.2% 0.000 23.850 0 0 0 0
EXEC 0.880 3.4% 0.890 -0.010 348 348 3,859 351
PARSE 0.870 3.4% 0.820 0.050 696 0 0 0
-------------------- ------------- ----- ------ ------ ----- ------ ----
Total 25.600 100.0% 1.710 23.890 1,044 348 3,859 351
-------------------- ------------- ----- ------ ------ ----- ------ ----
Total per EXEC 0.074 0.3% 0.005 0.069 3 1 11 1
Total per row 0.074 0.3% 0.005 0.069 3 1 11 10

4.2 Call Graphs


The second kind of profile in the GNU gprof output is the call graph. A call
graph shows response time as it pertains to the relationships among subroutines
being called and subroutines doing the calling. A call graph in the GNU gprof
sense is not a picture; it’s a table that represents parent-child relationships using
indentation.

A call graph is interesting in the Oracle context because of the way that Oracle
database calls can nest (Oracle Corporation uses the term recursive SQL to refer
to nested database calls). For example, the following statement produces such
nested database calls:
select object_name from dba_objects where object_id=:id

Specifically, the Oracle parse call for this statement motivates three database
call “children”: a parse call, an exec call, and a fetch call upon a new SQL
statement that is never mentioned in the top-level application code. The SQL
statement that the Oracle kernel must parse, execute, and fetch before it can
finish parsing our application’s select statement is this one:
select text from view$ where rowid=:1

A call graph for the relationship between our application SQL and its recursive
SQL might look like this:
------Duration (seconds)------
Including
Statement Self children
-------------------------------------- --------------- -------------
select object_name from dba_objects... 0.200s 40.0% 0.500 100%
│ select text from view$ where ro... 0.300s 60.0% 0.300 60%
-------------------------------------- --------------- -------------
Total 0.500s 100.0%

A true call graph for any task that had an n-second duration would show a total
response time of n seconds, even if it had to add a line item labeled “time spent

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 9


between statements.” Oracle sessions often consume a considerable amount of
response time between statements, as I’ll reinforce shortly in the case study.

The great benefit of having an Oracle call graph is that it enables you to affix the
blame for poor performance accurately upon the specific SQL statement that is
causing your pain. For example, if a recursive SQL statement that reloads your
dictionary cache (part of your shared pool) is a dominant contributor to your
response time, a call graph shows you which SQL statements the recursive
statement is a child of. Perhaps it’s possible to parse those particular statements
less often. With the call graph, you can target those specific statements that will
produce the greatest result for the least effort. Without the call graph, you might
mistakenly believe that you had to reduce parse calls system-wide, or that your
best recourse is to increase the size of your shared pool.

5 Case Study
When you use a profiler, your work follows a repeatable plan. This section
shows a case that illustrates the process. The root cause of the performance
problem in this case is like dozens we see each year in the field.

5.1 Baseline
The case begins with a batch job that takes too long to run. The database server
is a single-CPU Windows XP machine running Oracle version 9.2.0.4. There are
two concurrent processes on the system, each running the same kind of batch
job, but manipulating completely different tables in a single database. Each
client process inserts 5,000 rows into its own table. The application client
process runs on the same host as the Oracle kernel. The client connects via the
following alias defined in the system’s tnsnames.ora file:
v92 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = CVM-LAP02)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = v92.hotsos)
)
)

Exhibit 6 shows the baseline flat profile for the slow job. The job ran for
11.6 seconds, and the flat profile shows why.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 10


Exhibit 6. The baseline flat profile for the problem batch job.12
Oracle subroutine Duration (secs) # Calls Dur/call
--------------------------------- ----------------- --------- -----------
instrumentation gap 8.182 70.5% 10,071 0.000812
SQL*Net message from client 2.371 20.4% 20,007 0.000119
CPU service, PARSE calls 1.843 15.9% 5,001 0.000368
CPU service, EXEC calls 0.411 3.5% 5,002 0.000082
SQL*Net message to client 0.066 0.6% 20,007 0.000003
db file sequential read 0.055 0.5% 2 0.027650
log file sync 0.017 0.1% 3 0.005714
latch free 0.006 0.1% 19 0.000315
instrumentation overlap -1.350 -11.6% 140 0.009495
--------------------------------- ----------------- --------- -----------
Total 11.601 100.0%

The profile is clear: 70.5% of the slow program’s response time is due to a
response time component called instrumentation gap. Whether you know what
that means or not, the profile says undeniably where you need to invest your
effort. But what does instrumentation gap mean? There’s no “official” Oracle
timed event named this (if you review the content of Oracle’s V$EVENT_NAME
fixed view, you’ll notice that there’s no Oracle timed event named
instrumentation gap).

The instrumentation gap timed event is an event that a profiler must synthesize
to recognize otherwise unexplained jumps in the timer values rendered within an
Oracle trace file. It’s all part of the effort to account for exactly the response
time that the user has experienced. In the Oracle case, when instrumentation gap
time dominates a profile, it usually indicates that a process has spent a lot of its
time in the operating system’s preempted state.13 The most effective way to
reduce the amount of time that a program spends in this state is to reduce
competition for the system’s CPU. (An operating system scheduler will not
preempt a process if there’s no other process waiting in the run queue.)

The second timed event listed in the profile is SQL*Net message from client.
Most Oracle instruction teaches you that you’re supposed to ignore this
particular event, because it represents time that the Oracle kernel spends “idle,”
awaiting some command. However, this particular “idle” event accounts for
20.4% of our program’s response time. To ignore the time attributed to this
event would be an exceptionally bad policy, as you’ll see in a moment.

12
All the profiles for this case study were computed using a pre-beta version of a
software product called Hotsos Profiler P5. The profiles for this particular case may vary
slightly in later, more thoroughly tested versions of the software. However, the gist of the
case—the flow of the diagnosis—are exactly correct.
13
For details about why it works this way, see pages 170–175 of Optimizing Oracle
Performance.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 11


The third row in the profile, labeled CPU service, PARSE calls, is your first clue
about what kind of CPU service it is that this program has been competing to
use. A program can be preempted and tally instrumentation gap time only if it is
trying to consume CPU service. This row tells us that this program has
consumed nearly two seconds of CPU time making Oracle parse calls: 5,001 of
them, in fact.

The fourth row in the profile (CPU service, EXEC calls) shows that the program
consumed 0.411 seconds of its total response time actually inserting the rows
into its table. Contemplation of this row puts things into perspective: of the
11.632 seconds it took to run this thing, the program spent less than half a
second (only 3.5% of the job’s run time) doing the work I had asked it to do. An
efficient program would spend less time doing anything other than inserting the
rows we need inserted.

Experienced profile users will recognize a common poor-performance pattern in


this profile. Notice that its number of parse calls nearly equals the number of
exec calls: This program parses too much.

5.2 Drill-down
So, which SQL is it that’s being parsed too much? The answer to this question
lies in the job’s profile by SQL statement (the job’s call graph), shown in
Exhibit 7. From this profile, it’s easy to tell that the offending SQL statement is
an insert statement, which single-handedly accounts for 97.7% of the job’s
11.601-second total response time.

Exhibit 7. The SQL statement profile for the problem case.


Duration of Duration incl.
Cursor statement self (sec) children (sec)
--------------------------------------- ------- ------ ------- ------
insert into parse2 values (1, lp… 11.333 97.7% 11.336 97.7%
│ select file# from file$ where ts… 0.002 0.0% 0.002 0.0%
│ update tsq$ set blocks=:3,maxblo… 0.001 0.0% 0.001 0.0%
│ select u.name,o.name, t.update$,… 0.000 0.0% 0.000 0.0%
[[no text for statement]] 0.000 0.0% 0.094 0.8%
│ select order#,columns,types from… 0.039 0.3% 0.039 0.3%
│ │ select i.obj#,i.ts#,i.file#,i.bl… 0.000 0.0% 0.000 0.0%
│ │ select name,intcol#,segcol#,type… 0.000 0.0% 0.000 0.0%
│ │ select ts#,file#,block#,nvl(bobj… 0.000 0.0% 0.000 0.0%
│ │ select type#,blocks,extents,mine… 0.000 0.0% 0.000 0.0%
… (35 other statements have been elided)
--------------------------------------- ------- ------ ------- ------
Total 11.601 100.0%

The database call profile for this SQL statement tells the story of where the
statement spent all this time. This profile is shown in Exhibit 8.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 12


Exhibit 8. The database call profile for the problem case.
Oracle hash value: 2581399381
Statement re-use: 5,000 similar but distinct texts
Response time contribution: 11.333 seconds (97.7% of task total 11.601 seconds)

-----Duration (seconds)-----
Database call Elapsed CPU Other Calls Rows LIOs PIOs
-------------------- ------------- ----- ------ ------ ----- ------ ----
Between-call events 6.541 57.7% 0.000 6.541 0 0 0 0
PARSE 3.801 33.5% 1.843 1.958 5,000 0 0 0
EXEC 0.991 8.7% 0.411 0.580 5,000 5,000 30,851 0
-------------------- ------------- ----- ------ ------ ----- ------ ----
Total 11.333 100.0% 2.253 9.079 10,000 5,000 30,851 0
-------------------- ------------- ----- ------ ------ ----- ------ ----
Total per EXEC 0.002 0.0% 0.000 0.002 2 1 6 0
Total per row 0.002 0.0% 0.000 0.002 2 1 6 0

insert into parse2 values (1, lpad('1',20))

Most of the job’s time was consumed between database calls. What’s the best
way to reduce the time consumed between database calls? The best way to
reduce time spent between database calls is to reduce the number of database
calls to begin with. Exhibit 8 shows that this job’s insert “statement” executed
10,000 database calls altogether, with 5,000 parse calls and 5,000 exec calls.
Parsing consumed far more time than executing.

So, why does the job execute 5,000 Oracle parse calls? The highlighted text in
the profile heading tells the answer: there are 5,000 similar but distinct versions
of this SQL text. A quick look either at the raw trace data or the application code
itself shows the following sequence of SQL statements that were processed
during the problem job:
insert into parse2 values (1, lpad('1',20))
insert into parse2 values (2, lpad('2',20))
insert into parse2 values (3, lpad('3',20))

insert into parse2 values (5000, lpad('5000',20))

Each distinct SQL statement is, in turn, parsed and then executed. That’s a big
waste of time and energy, because with Oracle, an application doesn’t need to do
so much work to get the job done of inserting 5,000 rows into a table. A look at
the application source code reveals the problem. The source code says,
essentially:14
for each row, varying $n over some range of values
$result = do("insert into t values ($n, …)")

14
Each year, we see this problem manifested in many languages including Java, C,
Visual Basic, and others. In this paper, I use a pseudocode that’s similar in structure to
Unix shell or Perl programming, using the convention that scalar variables are denoted
with a $ prefix. For example, the statement $x=1 assigns the value 1 to the variable
named $x, and f($x) executes the function called f upon the value of $x.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 13


Most database interfaces offer a function similar to the do() function shown
here. A call to do($statement) behaves effectively like this:
$handle = parse($statement)
$result = exec($handle)

So the application is making database calls like this:


for each row, varying $n over some range of values
$handle = parse("insert into t values ($n, …)")
$result = exec($handle)

One rule of thumb that every Oracle application developer ought to know is this
one:

It’s almost never appropriate to execute Oracle parse calls inside a loop.

5.3 Proposed Solution


The original application code can be rewritten to produce exactly the same
functional result more efficiently. The new code is this:
$handle = parse("insert into t values (?, …)")
for each row, varying $n over some range of values
$result = exec($handle, $n, …)

Here are the changes:

1. The new code doesn’t use do() at all. Instead, it separates the parse and
exec calls into two distinct function calls, so that the parse may be executed
once and only once, outside the loop.
2. In the parse call, it uses a placeholder (in this case, the ? symbol) instead of
some actual value to be inserted. This allows for one parse call to create a
reusable cursor handle that can accommodate many different exec calls with
different values.
3. In the exec call, the argument list includes the actual values that are to be
bound into the original SQL statement in the position denoted by the
placeholder.

Writing the code this way should eliminate 4,999 parse calls and of course some
amount of time that is attributable to those eliminated parse calls.

5.4 Estimated Impact


In a little test case executed on my laptop PC, it’s good enough to come up with
an idea and then run with it. How much time will we save if we eliminate
4,999 parse calls? Easy, just change the code, re-run it, re-profile it, and see. But
in the business world, sometimes it’s not so easy. Sometimes we can’t try
something in a production system without constructing a compelling cost-benefit
case to a manager that holds the authority to approve or reject the

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 14


implementation of an idea. A profile provides excellent information for making
that case. Exhibit 9 is the best place to begin. It is the timed event profile for just
the problem insert statement.

Exhibit 9. The timed event profile for the problem SQL


statement.
Oracle subroutine Duration (secs) # Calls Dur/call
--------------------------------- ----------------- --------- -----------
instrumentation gap 7.956 70.2% 9,861 0.000813
SQL*Net message from client 2.370 20.9% 20,000 0.000118
CPU service, PARSE calls 1.843 16.3% 5,000 0.000369
CPU service, EXEC calls 0.411 3.6% 5,000 0.000082
SQL*Net message to client 0.066 0.6% 20,000 0.000003
log file sync 0.004 0.0% 2 0.002096
latch free 0.003 0.0% 18 0.000192
instrumentation overlap -1.320 -11.6% 139 0.009497
--------------------------------- ----------------- --------- -----------
Total 11.333 100.0%

It is reasonable to expect that the proposed elimination of parse calls will result
in the improvements depicted in Exhibit 10. The highlighted figures show the
estimated result of the call reduction caused by rewriting the code. I expect for
the code change to impact more than just the CPU service, PARSE calls
component. In fact, there are four timed events for which I expect an elimination
of parse calls to create a response time improvement.

Exhibit 10. Estimated savings of the proposed source code


modification to eliminate Oracle parse calls.
Time Est. # Est. time
# Calls
Response time component before calls after
before
(seconds) after (seconds)
instrumentation gap 9,861 7.956 4,862 3.923
SQL*Net message from client 20,000 2.370 15,001 1.778
CPU service, PARSE calls 5,000 1.843 1 0.000
CPU service, EXEC calls 5,000 0.411 5,000 0.411
SQL*Net message to client 20,000 0.066 15,001 0.050
log file sync 2 0.004 2 0.004
latch free 18 0.003 18 0.003
instrumentation overlap 139 –1.320 139 –1.320
Total 11.333 4.848

I have derived each “estimated time after” figure using the following relation:

callsafter
timeafter = timebefore .
callsbefore

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 15


Note that forecasting response time this way works only if two assumptions hold
true. First, individual call durations must be uniformly distributed. If call
durations are not uniformly distributed, then you need skew data to allow you to
distinguish between best-case and worst-case predictions. It is therefore an
important feature of profiling software to inform you about the skew inherent in
the statistic that you’re analyzing. Second, a timed event’s response time must be
directly proportional to the number of calls your program makes to the timed
event. I’ll show you shortly why sometimes this assumption fails to match
reality.

It is therefore reasonable to suppose that response time for the statement will
drop from 11.333 seconds (before) to 4.843 seconds (after). Plugging this
estimated new response time value back into the SQL statement profile for the
job (Exhibit 7) shows that we should expect the job that presently takes
11.601 seconds to consume only 5.116 seconds after the proposed change.
That’s not a bad day’s work: it’s a 56% reduction in response time. As I’ll reveal
shortly, a forecast computed like I’ve shown here is almost always a
conservative estimate, which is additional good news.

5.5 Impact Analysis


Running the program again after pulling the parse call out of the loop produces
spectacular results, as shown in Exhibit 11. The response time is now only
1.530 seconds—a whopping 87% reduction in response time.

Exhibit 11. The profile for the job formerly known as the “problem job.”
Oracle subroutine Duration (secs) # Calls Dur/call
--------------------------------- ----------------- --------- -----------
instrumentation gap 0.974 63.7% 4,993 0.000195
SQL*Net message from client 0.393 25.7% 5,010 0.000078
CPU service, EXEC calls 0.381 24.9% 5,011 0.000076
CPU service, PARSE calls 0.090 5.9% 11 0.008194
log file sync 0.027 1.8% 1 0.027396
SQL*Net message to client 0.008 0.5% 5,010 0.000002
latch free 0.000 0.0% 9 0.000000
instrumentation overlap -0.343 -22.4% 38 0.009035
--------------------------------- ----------------- --------- -----------
Total 1.530 100.0%

I hope that two thoughts are swirling through your head at this point:

• Wow. This program runs even faster than I thought it was going to!

• Wow. You sure stink at forecasting!

The first thought is a pleasant one because it represents an expectation nicely


exceeded. However, the second thought is not so nice. Missing a forecast by this
much is treacherous, even when the error is “in your favor.”

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 16


Remember, my forecast estimated a 56% reduction in response time (from
11.601 seconds to 5.116 seconds), but what we actually achieved was an 87%
reduction (to 1.530 seconds). What if your manager responsible for making the
go/no-go decision on implementing this solution had had a 75% cut-off point in
his head? Then he would have rejected our proposal on the grounds that an
expected 56% improvement wasn’t good enough. You would never have had the
chance to show off your good work. If the forecast had been more accurate, your
manager would have approved the project.

So, what happened!? Why were the results so much better than the forecast?

5.6 Forecast Error Analysis


To become a better forecaster, it is critical to practice using feedback. I
presented a forecast for improvement in four separate lines of the statement’s
timed event profile. Let’s analyze each individual line item of the forecast in
turn. Exhibit 12 shows the analysis.

Exhibit 12. Analysis of my performance improvement


forecasting error.
Before After Prediction
Response time component actual Prediction actual error
(seconds) (seconds) (seconds) (seconds)
instrumentation gap 7.956 3.923 0.974 2.949
SQL*Net message from client 2.370 1.778 0.393 1.385
CPU service, PARSE calls 1.843 0.000 0.090 –0.090
SQL*Net message to client 0.066 0.050 0.008 0.042
Total 11.333 4.848 1.530 3.318

My overall estimate was wrong by 3.318 seconds. Here’s why:


CPU service, PARSE calls
I actually did a pretty good job on this one. I missed by only -0.090 seconds.
SQL*Net message to client
I didn’t do too badly here either, at least on the timing prediction. However,
you might notice that I missed the change to the number of calls pretty
badly. I forecast a reduction from 20,000 calls to 15,001 calls. My timing
estimate was close because the total time consumed was small to begin with.
SQL*Net message from client
Here, my forecasting error is more significant. I missed by nearly a second
and a half. The reason is that I predicted a reduction in calls from 20,000 to
15,001, but the actual call count dropped to about 5,000. Had I accurately
predicted this call count drop, my timing prediction would have been
0.593 seconds for this response time component, which would have been off

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 17


by only 0.2 seconds. The reason that the call count fell more than I expected
is apparently a driver-specific issue that I haven’t researched fully. It’s an
area in which I’ve satisfied myself that, for now, a pleasant surprise is good
enough.15
instrumentation gap
Here’s where I really messed up, missing by nearly three full seconds. The
reason for this part of the forecasting error is, happily, predictable using
concepts described in another of my works.16 In a nutshell, the problem with
the forecast is that I predicted a linear improvement in response time
(remember the formula I showed for computing timeafter?). However, the
delays induced by queueing for a CPU don’t work in a linear way.
Exhibit 13 shows how response time (R) for a single-CPU system degrades
(goes up) non-linearly as utilization on a system (ρ) increases (moves to the
right).
Moving left-to-right on such a curve is a miserable experience: response
time degrades exponentially as load increases, especially when you get
rightward of the little dashed line—called the knee in the curve—shown
here at ρ = 0.5. However, eliminating parse calls—which we’ve done with
the application code change—represents a right-to-left movement along this
curve, which is a very happy thing: response time improves exponentially. I
predicted a linear improvement in response time, but the actual
improvement was exponential, so my forecast was pessimistic.

Exhibit 13. Response time (R) as a function of utilization (ρ).

15
In my test, I used Perl with the DBI module version 1.37 and the DBD::Oracle
module version 1.12.
16
See chapter 9, Queueing Theory for the Oracle Practitioner, in Optimizing Oracle
Performance, by Cary Millsap and Jeff Holt.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 18


5.7 Next Steps
A good thing about software profilers is that they make it much easier to
determine when you’re finished “tuning.” In this case study, I’ve eliminated 87%
of the program’s original response time. The job now spends 24.9% of its time
actually inserting (CPU service, PARSE calls) and 25.7% of its time in the client
application and communicating between the database and client processes
(SQL*Net message from client). But it still spends 63.7% of its time doing other
things (presumably managing process preemptions). Is this good enough?

That is a question for your business to decide. The answer comes from
answering the following questions:

How much better could we make it?

How much would it cost to make it that much better?

One strategy for eliminating even more database calls plus undoubtedly some of
the SQL*Net message from client duration is to use Oracle’s array insertion
feature. So the question becomes:

Is the performance you might gain by using Oracle’s array insertion feature
within the application worth the cost of learning how to use it?

I leave that decision to you as an exercise.

6 Conclusion
A profiler should be the backbone of any performance measurement strategy. A
good Oracle profiler gives you the first information you need in any performance
problem diagnosis project: exactly where all of your program’s time has gone.

A profiler gives you two capabilities that you can’t find anywhere else:
Measuring
The core benefit of using a profiler is that you can see exactly what impact a
given problem creates for exactly the program you care about. This
capability is indispensable in problem diagnosis situations, and even in
preemptive performance management tasks.
Forecasting
A profiler aids forecasting in two distinct ways. First, it allows you to play
what if? for each component of your program’s response time. It shows you
the details of what impact a proposed change will have upon your program,
with a minimum of mathematical effort. Second, it allows you to see the
exact result of a change after you make it. When you use a profiler in a test
environment, you get a much more detailed idea of how the change you are
testing will behave when you implement your change in production.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 19


One of the most important practical features of a profiler is that it shows you
where investment of time and energy is futile. Imagine banking your hopes that a
USD 250,000 I/O subsystem upgrade will cut your response times in half. Now,
before the upgrade, imagine your profiler output showing that every one of your
Top Twenty Slow Programs has less than 5% of its total time consumed by I/O
processing. Introduction to profiling commonly changes planning in a profound
way.

By selecting the particular case study that I’ve shown in this paper, I’ve meant to
illustrate several lessons, including the following.

• Some common performance problems, like the one described here, are much
easier to diagnose with a good profiler than with any other tool. Other
metrics (low system-wide parse-to-exec ratios, for example) don’t provide
the measurement and forecasting capabilities that profiler provides.

• Sometimes, it’s difficult to create an accurate response time forecast—I


chose this case study specifically to illustrate this point. However, a profiler-
based forecast always provides excellent value. When a profile-based
forecast is wrong, it will almost always be wrong in the pessimistic
direction.

• Eliminating work is a highly effective optimization strategy, for two reasons.


First, eliminating work of course reduces response time for every instruction
you can eliminate from your program’s code path. But what many people
fail to consider (I think because their profilers don’t illuminate the issue) is
that, second, eliminating work reduces response time both for your program
and every other program on the system that was queueing behind the
repaired job for some resource. The impact is particularly prominent when
you’re running multiple concurrent copies of the same program.

Acknowledgments
Thank you to the people who have made Oracle profiling possible: Anjo Kolk,
the father of response-based Oracle performance analysis; Mogens Nørgaard, the
man who introduced me to the Oracle features that make profiling possible;
Virag Saksena, who showed me the true potential of extended SQL trace data;
Jeff Holt, the father of our Hotsos Profiler product line; Jon Bentley, the author
of More Programming Pearls, which planted the profiling seed in my mind;
Jonathan Gennick, my O’Reilly editor who helped me learn; Gary Goodman and
the staff at Hotsos Enterprises, who make it possible for me to feed my family
while having fun at work. Thank you to my contributors and proofreaders: Larry
Klein, Karen Morton, and James Steel. And thank you to my beautiful and
loving family—Mindy, Alex, and Nik, and my parents Van and Shirle—who
support me every day.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 20


About the Author
Cary Millsap is a software developer, and instructor for Hotsos Enterprises, Ltd,
a company that he co-founded in 1999. He is the author of Optimizing Oracle
Performance, with Jeff Holt, a work for which the authors received the Oracle
Magazine Authors of the Year award for 2004.

About Hotsos
Hotsos Enterprises, Ltd. is a Dallas-based company with more than
600 customers and clients worldwide that use the company’s software products,
classroom and on-site education, and consulting services to improve the
performance of their Oracle systems. Hotsos’s revolutionary methods and tools
help database administrators and application developers solve performance
problems faster and more permanently than ever before. For more information
on Hotsos products and solutions, visit www.hotsos.com or call
+1.817.488.6200.

You can find more information about Hotsos products and services related to
this paper by visiting the following resources:
http://www.oreilly.com/catalog/optoraclep/index.html
This is the home page for the book Optimizing Oracle Performance. It
includes a free download of the first chapter.
http://www.hotsos.com/courses/PD101.php
This page describes the Hotsos educational course that covers collection and
interpretation of Oracle performance diagnostic data.
http://www.hotsos.com/products/profiler.html
This page describes the Hotsos Profiler, a software profiler for Oracle
extended SQL trace data. The Hotsos Profiler P5 (scheduled for release in
early 2005) is the only profiler for Oracle that possesses all of the profiling
features discussed in this paper.
http://www.hotsos.com/services/index.html
This page describes Hotsos performance on-site and remote services for
helping you optimize your Oracle systems.

Copyright © 2004 by Hotsos Enterprises, Ltd. All rights reserved. 21

You might also like