Dsi 306

Data Server Internals DSI306
VLDB
Student Guide
Production 1.0
September 1999
Copyright Oracle Corporation, 1992, 1996, 1997, 1998. All rights reserved.
This documentation contains proprietary information of Oracle Corporation. It is
provided under a license agreement containing restrictions on use and disclosure and
is also protected by copyright law. Reverse engineering of the software is prohibited.
If this documentation is delivered to a U.S. Government Agency of the Department of
Defense, then it is delivered with Restricted Rights and the following legend is
applicable:
Restricted Rights Legend
Use, duplication or disclosure by the Government is subject to restrictions for
commercial computer software and shall be deemed to be Restricted Rights software
under Federal law, as set forth in subparagraph(c)(1)(ii) of DFARS 252.227-7013,
Rights in Technical Data and Computer Software (October 1988).
This material or any portion of it may not be copied in any form or by any means
without the express prior written permission of the Worldwide Education Services
group of Oracle Corporation. Any other copying is a violation of copyright law and
may result in civil and/or criminal penalties.
If this documentation is delivered to a U.S. Government Agency not within the
Department of Defense, then it is delivered with Restricted Rights, as defined in
FAR 52.227-14, Rights in Data-General, including Alternate III (June 1987).
The information in this document is subject to change without notice. If you find any
problems in the documentation, please report them in writing to Worldwide Education
Services, Oracle Corporation, 500Oracle Parkway, Box 659806, Redwood Shores,
CA 94065. Oracle Corporation does not warrant that this document is error-free.
Oracle, Oracle Alert, Oracle Financials, SQL*Net, SQL*Plus, Oracle Application
Object Library, Oracle7, Oracle8, Oracle Applications, Oracle Forms, Oracle Human
Resources, Oracle InterOffice, Oracle Installer, Oracle Manufacturing, Oracle Office,
Oracle Payables, Oracle Projects, Oracle Purchasing, Oracle Service, Oracle Web
Application Server, Oracle Web Customers, Oracle Web Employees, Oracle
WebServer, Oracle Web Suppliers, Oracle Workflow, and PL/SQL are trademarks or
registered trademarks of Oracle Corporation.
All other products or company names are used for identification purposes only, and
may be trademarks of their respective owners.
Authors
Design:
Alok Satyawadi
Sandra Cheetham
Andy Shrives
Steve Tran
Development:
Alok Satyawadi
Kishore Bahmidipati
Sandra Cheetham
Andy Shrives
Prakash Penta
Daniel Semler
Steve Tran
Implementation:
Troy Anthony
Stuart Mcleod
Technical Contributors
and Reviewers
Roderick Manalac
Robert Farrington
Troy Anthony
Publisher
Data Server Internals 306 - VLDB 0 -1
Copyright Oracle Corporation, 1998. All rights reserved.
0-1
Introduction
Acknowledgement
Thanks to all those who contributed directly or indirectly to this project, including
Alok Satyawadi, Kishore Bahmidipati, Sandra Cheetham, Andy Shrives, Troy
Anthony, Prakash Penta, Daniel Semler, Stuart Mcleod, Robert Farrington, Roderick
Manalac and Steve Tran
0-2 Oracle Confidential
Chapter Introduction
Chapter 1: Oracle*XA and X/OPEN DTP Standard
Chapter 2: Multi-Threaded Server (MTS)
Chapter 3: Manual Partitioning
Chapter 4: Partitioned Tables and Indexes
Chapter 5: Parallel DML
Chapter 6: RAID Redundant Array of Inexpensive
Disks
Chapter 7: Oracle 8 Advanced Queuing
Chapter 8: Tuning Data Load
Chapter 9: Miscellaneous Enhancements
0-3 Oracle Confidential
Labs and Case Studies
Lab 2 : Multi-Threaded Server
Lab 3 : Manual Partitioning , Star queries, bitmap indexes
Lab 4 : Partitioned Tables and Indexes
Lab 5 : Parallel DML
Lab 7 : Advanced Queuing - Demonstration
Lab 8 : Data Loading
Data Server Internals 306 - VLDB
**Oracle Confidential: For internal use only**
DSI 306 - Unit 1, Oracle*XA and X/Open DTP 1--1
Version 1.2
1
1
Oracle*XA and X/OPEN DTP
Standard
Acknowledgements:
Design: Alok Satyawadi, WSSG, Advanced Analysis
Development: Kishore Bahmidipati, COE
Review:
Edit: 07/24/98
Schedule: Timing Topic
90 minutes Lecture
90 minutes Total
Version 1.2
DSI306:1-2 Copyright Oracle Corporation, 1998. All rights reserved.
Overview
TP Monitors
Traditional Client-Server Architecture
TPM Client-Server Architecture
Components of the X/OPEN DTP Model
XA Interface and Oracle*XA
XA in OAS
Oracle Sessions with XA
Oracle8 XA enhancements
Version 1.2
TP Monitors
A TP Monitor is a middleware that
coordinates the flow of transactions
between client applications and resource
managers. It is useful for short duration
high performance transactions, not
conversational long term interactions.
Some examples of TP Monitors are TUXEDO/Q from BEA Systems,
TOP END from NCR, CICS from IBM, ENCINA from Transarc etc.
Version 1.2
TP Monitors
Support 1000s of clients by sharing
services.
Moves details of system physical
deployment out of the client and server
applications into the TP monitor layer.
Supports simple clients such as ATMs as
well as smart clients such as PCs.
Version 1.2
Traditional Client-Server
Architecture
Server Hardware
Presentation
Services
Application
Logic
Client Hardware
Client
RDBMS
engine
A typical client-server application. Notice that the application logic and the
presentation logic are both bundled together in the client application.
The server is typically something like an Oracle8 Server.
Typically client-server applications have the presentation logic and application
logic bundled together with the client or the server (or divided between the
two.) In case of a 3-tiered design, the client is geared towards handling the
presentation logic alone. The application logic (transactional semantics in a
database application) is shifted into the application server.
Version 1.2
TP Monitor Client-Server
Architecture
RDBMS
Engine
Client
1
2 3
Presentation
Services
TP Monitor
Application
Server
A typical TP Monitor client-server (or 3-tier )architecture design.
In this design, the client is concerned only with the presentation logic. The
application server has the responsibility of passing client requests to the
database which is the third tier. The TP monitor API is used to communicate
with the database. It has the transaction management code.
Version 1.2
The X/Open Distributed
Transaction Processing (DTP)
Model
It is an architecture which specifies rules
for interoperation of multiple application
programs which share possibly
heterogeneous resource managers.
X/Open (Distributed Transaction Processing) DTP architecture is proposed by
the X/Open company.
The X/Open DTP architecture defines a standard interface that allows multiple
application programs to share resources provided by multiple (possibly
different) resource managers and allows their work to be coordinated into
global transactions.
Version 1.2
What is Oracle*XA?
An Oracle product that conforms to
X/Open Distributed Transaction
Processing software architectures XA
interface.
Version 1.2
XA Interface
Application Program (AP)
Defines Transactions
Resource Manager (RM))
Controls a Shared
Resource
Transaction
Manager (TM)
Manages and
Controls
Transactions
XA Interface
TX Interface
SQL
Heres the interaction between the AP, TM and RM.
These are the three main components in a typical TP monitor application.
Application program is nothing but the application server which is a program
written in either Pro*C or OCI with TP monitor calls embedded in it. Note that
the actual application never has XA calls embedded in it. The TP monitor calls
call the XA library routines.
The Resource manager in our case would be Oracle Server.
Transaction manager is a component of the TP Monitor.
Version 1.2
Components of the X/Open DTP
Model
Application Program (AP) - Defines
transaction boundaries and specifies
actions that constitute a transaction.
Resource Manager (RM) - provides access
to shared resources such as databases.
Transaction Manager (TM) - Assigns
identifiers to transactions (XIDs),
monitors and coordinates their progress.
Application Program (AP)
Transaction Manager
(TM)
XA
interface
Oracle Shadow
Oracle Shadow
SGA
SGA
PMON
LGWR
PMON
LGWR
TX Interface
SQL*Net / Net8 (Pipe, TCP, etc.)
.
.
Resource Manager (RM)
Version 1.2
How do the AP, TM and RM
interact?
AP tells the TM it is starting a new
transaction by executing tx_begin().
TM receives the tx_begin() command
and generates a new XID.
TM then informs the RM to start a new
transaction by passing this XID to the RM
by executing xa_start().
XID is a transaction identifier.
TM is a component of the TP Monitor.
Version 1.2
How do the AP, TM and RM
interact? (contd.)
xa_start() executes and its
corresponding shadow process is
equipped to handle work on behalf of this
transaction. Return success from
xa_start().
TM receives confirmation from RM and
passes it to AP.
After the AP receives confirmation from the TM, it begins communicating
with the RM via SQL.
The end of a transaction is signalled either by a tx_commit() or a tx_rollback()
as the case may be. At this point, the TM ends the transaction associated with
the given XID.
Version 1.2
Effect of XA on Oracle
TM
Library
Oracle*XA
Library
Pro*
OCI
SQLLIB
UPI OPI
Session
Switching
2PC
RDBMS Kernel
Oracle Shadow Process
Application
Server
SQL*Net or Net8
There is a caveat to linking XA applications.
First of all, there are the Oracle specific libraries (OCI or Pro*) that need to be
linked in.
Then there are the TP Monitor specific libraries that need to be linked in.
Last but not the least, the XA libraries need to be linked in too.
When coding an application server, the developer uses the TP Monitor API,
OCI calls and Pro*C calls but not the XA library calls. Notice that the XA
library calls are invoked by the TM library or the Pro* library.
Also note that the entire application server is a single operating system process
which communicates with the Oracle Server.
Version 1.2
Oracle Sessions (non-MTS)
Session Shadow
Process
Normal two-task operation
Without using MTS.
Version 1.2
Oracle*XA Sessions (7.3 and
lower)
service1
service2
service3
Transaction
Application Server
Login Completion Transaction
xa_open
xa_start
The XA specification requires that the Resource Manager be able to move a
transaction from one application server process to another, and even be able to
commit in a separate process. In Oracle, transactions are tied to sessions, so
that means that we also have to be able to move sessions. Therefore, the
session/transaction can't have any state which is tied to a particular application
server process.
The AP logs into a total of three sessions.
A transaction could invoke multiple services. Each of these services could be
provided by potentially different application servers. When we talk about
session migration, we are talking about the migration of only the transaction
session migrating from one application server process to another. There are
three sessions per application server process but not all of which are started
right off the bat as soon as the application server connects to the Oracle Server.
Only two sessions are started when the application server logs into the
database, the login and completion sessions The third session (transaction)is
started only when a transaction is started via the xa_start call. This is the
session that has information such as the GTID which is used as a link between
the session and the transaction.
Version 1.2
Oracle-Managed Transactions
CLIENT:
tx_call(debit_credit);
APPLICATION SERVER:
debit_credit(request_block)
TPSVCINFO *request_block;
{ /* unpack request_block->data into amt,
from_acct, to_acct */
EXEC SQL UPDATE; /* credit */
EXEC SQL UPDATE; /* debit */
EXEC SQL COMMIT;
tx_return();
}
This is an example of an Oracle-managed transaction.
Notice that there is no explicit tx_begin() or tx_commit()/tx_rollback() in the
application code. The first SQL statement (EXEC SQL UPDATE here)
automatically marks the beginning of the transaction and the EXEC SQL
COMMIT marks the end of the transaction.
Version 1.2
TPM-Managed Transactions
CLIENT:
tx_begin();
tx_call(debit);
tx_call(credit);
tx_commit();
APPLICATION SERVER 1:
debit(request_block)
{ /* unpack request_block->data into amt,
from_acct*/
EXEC SQL UPDATE; /* debit */
tx_return();
}
This piece of code demonstrates one kind of application structure in which the
client initiates the transaction by executing the tx_begin() call. Application
Server 1 executes the debit function (interacts with the database) and then
application server 2 executes the credit function and finally the client commits
the transaction by executing the tx_commit() call.
Version 1.2
APPLICATION SERVER 2:
credit(request_block)
{ /* unpack request_block->data into amt, to_acct
*/
EXEC SQL UPDATE; /* credit */
tx_return();
}
Version 1.2
Transaction started by an application
server.
CLIENT:
call_func(debit_credit,request_block, ...);
SERVER:
debit_credit(
TPSVCINFO * request_block;
{/* unpack request_block->data into amt, to_acct */
tx_begin();
In this example, the application server begins the transaction (note the
tx_begin() call), does the SQL operations and then commits the transaction
with the tx_commit() call (shown on the next slide.)
Version 1.2
EXEC SQL UPDATE SET balance = balance - amt WHERE
acct = from_acct;
EXEC SQL UPDATE SET balance = balance + amt WHERE
acct = to_acct;
tx_commit();
tx_return();
}
Version 1.2
XA in OAS
The Oracle Application Server uses XA for
its transaction management across
multiple Resource Managers.
SQL
XA
WRB RDBMS
Cartridge
TX
Transaction
manager
Resource
manager
Application
X/Open DTP Model WAS 3.0 Transaction
Model
In the above scenario, the WRB (Web Request Broker) is the transaction
manager. In other words, it interacts with the Resource Manager (Oracle) via
the XA interface. This slide show how the transaction services are
implemented. First, the X/Open XA client library is embedded within the Web
Request Broker and accessed through the standard TX API from any cartridge.
Unlike standard OCI or Pro*C programs, the actual database logon and
commit and rollback statements are issued through TX, whereas everything
else remains the same.
Version 1.2
Strong integration with OPS
Functionality Changes
Global + Local transactions
Loosely coupled transaction branches
Dynamic XA
OPS: Various TP Monitors require session-based locking rather than the
conventional process-based locking, so that transactional work can be moved
across different application servers processes. The XA library can be used with
the OPS option on all platforms. It was not possible with Version 7 to use the
Oracle XA library together with the Oracle Parallel Server option on certain
platforms. Only if the platform's implementation of the distributed lock
manager supported transaction-based rather than process-based locking would
the two work together. This limitation is no longer the case; if you can run the
Oracle Parallel Server option, then you can run the Oracle XA library. You can
recover failed transactions from any instance of Oracle Parallel Server. You
can also heuristically commit in-doubt transactions from any instance. An XA
recover call will give a list of all prepared transactions for all instances.
Transactions: It is now possible to have both global and local transactions
within the same XA connection. Local transactions are transactions that are
completely coordinated by the Oracle Server. For example, the update below
belongs to a local transaction. Global transactions, on the other hand, are
coordinated by an external transaction manager such as a transaction
processing monitor. In these transactions, the Oracle Server acts as a
subordinate and processes the XA commands issued by the transaction
manager. The update shown below belongs to a global transaction.
Version 1.2
xa_open(oracle_xa+acc=p/SCOTT/TIGER+sestm=10", 1, TMNOFLAGS);
/*TM opens connection to the
Oracle */
tx_begin(); /* begin global transaction, the TM issues XA
commands to Oracle to start a global txn */
UPDATE EMP set sal = sal + 1; /* Update is performed in the global
transaction*/
tx_commit(); /* commit global txn, the TM issues XA command to the Oracle
server to commit the global txn */
The Oracle7 Server forbids a local transaction from being started in an XA
connection. The update shown below would return an ORA-2041 error code.
xa_open("oracle_xa+acc=p/SCOTT/TIGER+sestm=10" , 1, TMNOFLAGS);
/* Transaction manager opens connection to the Oracle server */
UPDATE EMP set sal = sal + 1; /* Oracle 7 returns an error */
The Oracle8 Server, on the other hand, allows local transactions to be started in
an XA connection. The only restriction is that the local transaction must be
ended (committed or rolled back) before starting a global transaction in the
connection.
Loosely coupled transaction branches: The Oracle8 Server supports both
loosely and tightly coupled transaction branches in a single Oracle instance.
The Oracle7 Server supported only tightly coupled transaction branches in a
single instance, and loosely coupled transaction branches in different instances.
S2 S1
B1 B2
T
Multiple Tightly Coupled Branches
Version 1.2
S1
B1 B2
T1 T2
Session Operating on multiple branches
T1
T1 T2
B1
B2
S1 S2
Loosely Coupled Branches
Version 1.2
Dynamic registration: can be used if, and only if, both the XA application and
the Oracle Server are Version 8. Allows the Oracle Server for just-in-time
participation in a global transaction. This could result in needless overhead for
RMs which are not playing any role in a global transaction. To mitigate this
situation, when the Oracle server receives any work from an application, it
contacts the TM to check if it is part of a global transaction. This just-in-time
approach results in increased efficiency and improved performance, over all.
The addition of support for dynamic registration requires the implementation of
ax_reg and ax_unreg calls. These calls are made by the RM to the TM to
inquire if the current database call is part of a global transaction. This enables
the TM to defer beginning a transaction until it is required thus eliminating the
xa_start and xa_end calls to RMs that are not actually called by the AP in a
given transaction. This also enables us to bundle the xa_start call with the first
statement in the transaction. The xa_switch structure will be modified to set the
TMREGISTER flag. The xa_start call is bundled with the first statement in the
transaction only if the client is using V8 OCI, otherwise the first statement in
the transaction results in two round-trips. Similarly if an O8 OCI client is
connected to an Oracle 7.3 Server, the xa_start call will not be bundled with the
first statement in the transaction. Here is a table which explains the various
configurations that are supported under dynamic registration.
Client Server Dynamic XA xa_start piggybacked
8.0 OCI 8.0 Supported Yes
8.0 OCI 7.3 Supported No
8.0 Pro* 8.0 Supported No
8.0 Pro* 7.3 Not Supported N/A
8.1 Pro* 8.0/8.1 Supported Yes
8.1 Pro* 7.3 Not Supported N/A
Version 1.2
Performance Improvements
XA now uses new OCI
remove SQLLIB when not using Pro*
Code improvements
Error message handling
Version control
XA uses the new OCIs transactional interface which allows for a more
scalable and performant implementation of XA.
7.x OCI applications used to require the use of SQLLIB when the @DBNAME
syntax is used in Pro* programs. This meant OCI programmers had to buy
SQLLIB, even if they had no desire to develop Pro* applications. This is no
longer the case because applications do not have to use the sqlld2 function to
obtain an LDA structure.
Version 1.2
Miscellaneous improvements
Installation
New Open String Parameters
Loose_Coupling (TRUE/FALSE)
SesWt (in seconds)
_VERSION73 (TRUE/FALSE)
Installation
Do not have to run the xaview.sql script if running Oracle8 XA application
against an Oracle8 server.
Three new open string parameters have been added. They are:
Loose_Coupling
This parameter has a Boolean value and should not be set to true when
connected to an Oracle7 Server. If set to true, it indicates that global
transaction branches will be loosely coupled, that is, locks will not be shared
between branches.
SesWt
This parameter's value indicates the time-out limit when waiting for a
transaction branch that is being used by another session. If Oracle cannot
switch to the transaction branch within SesWt seconds, XA_RETRY will be
returned.
SesWt was added to differentiate between two timeouts
1) timeout when a transaction is inactive (SesTm)
2) timeout when you wait to switch a transaction being used by another session
(SesWt).
In Oracle 7.x SesTm was used for both.
Version 1.2
_VERSION73
Does not have a value associated. Forces an Oracle8 server to provide the same
behaviour as Oracle 7.3.
Two parameters have been made obsolete and should only be used when
connected to an Oracle Server Release 7.3.
GPWD
The group password is not used by Oracle8. A session that is logged in with
the same user name as the session that created a transaction branch will be
allowed to switch to the transaction branch.
SesCacheSz
This parameter is not used by Oracle8 because session caching has been
eliminated.
Version 1.2
Init.ora parameters
transactions
open_links_per_instance
Extensions to XA interface
OCISvcCtx *xaoSvcCtx(text *dbname)
OCIEnv *xaoEnv(text *dbname)
int xaosterr(OCISvcCtx *SvcCtx, sb4
error)
Set the transactions init.ora parameter to the expected number of concurrent
global transactions.
The parameter open_links_per_instance specifies the number of migratable
open database link connections. These dblink connections are used by XA
transactions so that the connections are cached after a transaction is committed.
Another transaction can use the connection if the user that created the
connection is the same as the user that owns the transaction. This parameters is
different from the open_links parameter which is the number of connections
from a session.
1. OCISvcCtx *xaoSvcCtx(text *dbname):
This function returns the OCI service handle for a given XA connection. The
dbname parameter must be the same as the dbname parameter passed in the
xa_open string. OCI applications can use this routing instead of the sqlld2
calls to obtain the connection handle. Hence, OCI applications need not link
with the SQLLIB library. The service handle can be converted to the Version 7
OCI logon data area(LDA) using OCISvcCtxToLda() [Version 8 OCI]. Client
applications must remember to convert the Version 7 LDA to a service handle
using OCILdaToSvcCtx() after completing the OCI calls.
Version 1.2
2.OCIEnv *xaoEnv(text *dbname):
This function returns the OCI environment handle for a given XA connection.
The dbname parameter must be the same as the dbname parameter passed in
the xa_open string.
3.int xaosterr(OCISvcCtx *SvcCtx, sb4 error):
This function, only applicable to dynamic registration, converts an Oracle
error code to an XA error code. The first parameter is the service handle used
to execute the work in the database. The second parameter is the error code
that was returned from Oracle. Use this function to determine if the error
returned from an OCI command was caused because the xa_start failed. The
function returns XA_OK if the error was not generated by the XA module and
a valid XA error if the error was generated by the XA module.
Version 1.2
Oracle8 XA Sessions
Session Caching is eliminated by the use
of new transactional OCI.
Reduces server memory usage and also
reduces code path
With dedicated connections (non-MTS),
sessions are created in the PGA instead
of the SGA.
Session caching is unnecessary with the new transactional OCI. Therefore, the
old xa_open string parameter, SesCacheSz, has been eliminated. Consequently,
you can also reduce the sessions init.ora parameter. Instead, set the transactions
init.ora parameter to the expected number of concurrent global transactions.
Because sessions are not migrated when global transactions are resumed,
applications must not refer to any session state beyond the scope of a service.
For information on how to organize your application into services, refer to the
documentation provided with the transaction processing monitor. In particular,
savepoints and cursor fetch state will be cancelled when a transaction is
suspended or detached (xa_end). This means that a savepoint taken by the
application in a service will be invalid in another service, even though the two
services may belong to the same global transaction.
Version 1.2
Oracle8 XA Sessions
A new type of object called a transaction
branch has been added, and this branch,
rather than the session, moves from
process to process.
The session, with all its associated state
including cursors, stays with the original
process until an xa_close is issued.
In version 7.3 and lower:
A connection between a TM application server and a dedicated Oracle server is
established by xa_open in each application server at TM boot and is closed by
xa_close when TM shutdown or for load balancing reasons.
Transactional state created by xa_start could be "detached" from the initial
connection by xa_end and could be attached by another application server
(process) using xa_switch to follow processing of the same transaction by
another TM application server and finally it could be attached by a commit
server to be completed using xa_commit or xa_rollback .
In this implementation transactional state created by xa_start was encapsulated
in a session migrating between stateless application servers and was logically
cleared by Oracle after commit or abort.
In version 8:
Whether using OCI or Pro*C, the process can now preserve session state
across transactions. However, at xa_end (which generally corresponds to the
end of a TP Monitor service or transactional RPC), positional information will
be reset. This new Oracle8 XA architecture will provide as good or better
performance as Oracle7 with a dramatic reduction in the number of sessions
and cursors required.
By processes we mean application server processes.
Version 1.2
Summary
TP Monitors
Traditional Client-Server Architecture
TPM Client-Server Architecture
X/OPEN DTP Model
Components of the DTP Model
XA Interface and Oracle*XA
Oracle Sessions with XA
Version 1.2
References
Oracle8 Server Application Developers
Guide
Oracle8 Concepts Manual
Programmers Guide to the Oracle Call
Interface Volume I
Oracle7 Server Distributed Systems,
Volume 1: Distributed Data
DSI 306 - Unit 2, Multi-Threaded Server 2-1
Version 1.2
1 1
2
2
Multi Threaded
Servers (MTS)
Acknowledgements:
Design: Sandra Cheetham (scheetha.uk), World Wide Support
Andy Shrives (ashrives.uk) World Wide Support
Development: Sandra Cheetham
Andy Shrives
Review:
Edit: 07/24/98
2 hours Lecture
30 minutes Examples
30 minutes Lab Exercises
3 hrs Total
Version 1.2
2 2
Outline
MTS Architecture Overview
Initialization Parameters
Networking Changes:
Client Side Changes
Server Side Changes
When to Use MTS (High Concurrency)
Monitoring/Tuning MTS
Miscellaneous topics
Version 1.2
MTS - Architecture
User
Process
Dispatcher Processes
Oracle
Server Code
System Global Area
Request
Queue Response Queues
Client Workstation
Database Server
Shared
Server
Processes
1
2
3
4
5
6
7
Application
Code
Notes:
The NET 8 Listener is set up to specifically receive connection requests on
behalf of an application. When MTS is configured, the listener process redirects
these requests to an existing Dispatcher process (1).
Dispatchers belong to a specific instance and are protocol specific message
handlers. Dispatchers place service requests onto a request queue(2) held
within the SGA.
A shared Server process picks up the requests from the SGA queue(3/4) placing
the result onto the response queue within the SGA(5). The client never has
direct interaction with the Server process that actually does the processing of
that clients requests. The Dispatcher process then passes the response back to
the client(6/7)
The SGA contains memory structures specific to the MTS configuration.
These structures includes two queues: a request queue for holding requests
placed by the Dispatcher processes on behalf of a client, and a response queue
containing the reply serviced by the Server process. The requests and replies
are dequeued as they are successfully processed. To ensure correct distribution
of these messages on the queues, the Users session data is also kept in the SGA
(UGA).
Dispatchers are lightweight processes by design, allowing rapid transport of
information to/from the server and clients. This allows a single Dispatcher to
service multiple clients with significantly less overhead than a Dedicated Server
Connection.
Version 1.2
MTS Initialization Parameters
LARGE_POOL_SIZE
LARGE_POOL_MIN_ALLOC
SHARED_POOL_RESERVED_SIZE
MTS_SERVICE
LOCAL_LISTENER
MTS_SERVERS
MTS_MAX_SERVERS
MTS_DISPATCHERS
MTS_MAX_DISPATCHERS
Version 1.2
LARGE_POOL_SIZE- For each user
connecting via MTS, allow approx 200K
per user.
LARGE_POOL_MIN_ALLOC - Minimum
allocation size taken from Large_Pool.
SHARED_POOL_RESERVED_SIZE -
Shared Pool size reserved for large
allocations.
Notes:
LARGE_POOL_SIZE - in Oracle 8, the most significant change in MTS
architecture is the separation of the UGA (User Session Data) from the Shared
Pool into the Large Pool. This is set up by an init.ora parameter
LARGE_POOL_SIZE. If specified, the large pool is used for session memory
if running with the multithreaded server. It is also used for IO buffers during
backup operations.
This eliminates fragmentation of the shared pool and allows you to allocate a
large area of memory specifically for MTS users. The recommendation for
sizing is very much application dependent but typically for a standard Forms
application we are talking approximately 200K per user.
LARGE_POOL_MIN_ALLOC - This parameter specifies the minimum
allocation size from the LARGE_POOL.
SHARED_POOL_RESERVED_SIZE - This controls the amount of shared
pool size reserved for large allocation. In Oracle 7.x, it was recommended
that this parameter was increased when MTS was being used.
Reference: Oracle Server Release 8.0 Administrators Guide
Version 1.2
MTS_SERVICE -Defines the Service
name that the Dispatchers will register
with the Listener. Defaults to
DB_name.
LOCAL_LISTENER- Addresses and
protocols must match those in
listener.ora configuration file.
Notes:
MTS_SERVICE - This is the SERVICE name that the Dispatchers will
register with the corresponding listener. This parameter defaults to the
DB_NAME parameter. If DB_NAME is also not set, Oracle returns the error
ORA-00114 missing value for system parameter mts_service, when you start
the database. If you choose the same MTS_SERVICE name as the
DB_NAME then client connection requests will get an MTS connection if
available. If MTS is not available, they get a DEDICATED connection.
LOCAL_LISTENER- This specifies the full address at which the Dispatchers
are to register. This corresponds to the entry in the Listener configuration file.
MTS_LISTENER_ADDRESS = (ADDRESS=(PROTOCOL=tcp)\
(PORT=5000)(HOST=ZEUS))
This is an optional parameter and if specified overrides the
MTS_LISTENER_ADDRESS and MTS_MULITPLE_LISTENERS
parameters.
Version 1.2
MTS_SERVERS - The initial and
minimum number of shared servers
when the instance is started. (Dynamic)
MTS_MAX_SERVERS - The maximum
number of shared servers which can be
started for duration of an instance.
MTS_DISPATCHERS - As a minimum
includes the number of Dispatchers per
protocol to be started. (Dynamic) See Notes.
Notes:
MTS_SERVERS. - A number of shared server processes are created at
instance startup. The appropriate number of INITIAL shared server processes
depends on how many users connect concurrently AND their processing
requirements. If each user makes relatively few requests over a period of time,
then each user process is idle for a large period of time. In this situation, one
server process can service 10 to 20 users. If each user requires a significant
amount of processing, a higher ratio of server processes may be needed.
Additional shared servers start automatically , up to the
MTS_MAX_SERVERS limit when needed and are deallocated automatically
if they remain idle for too long. However, the INITIAL servers always remain
allocated, even if idle. Setting MTS_SERVERS too high may incur
unnecessary overhead.
MTS_SERVERS is a dynamic parameter and may be altered by:
alter system set mts_servers=20
MTS_MAX_SERVERS - This parameter should be set for an appropriate
number of shared server processes allowed to be running simultaneously.
Defaults to 20 or 2 times the number of MTS_SERVERS (whichever is
greater.)
Reference: Oracle8 Administrator's Guide
Version 1.2
MTS_DISPATCHERS - The number of dispatcher processes started at
instance startup is controlled by the parameter MTS_DISPATCHERS..
When setting the MTS_DISPATCHERS parameter, you can include any valid
protocol.
The instance must be able to provide as many connections as there are
concurrent users on the database system; the more dispatchers you have, the
better potential database performance users will see, since they will not have to
wait as long for dispatcher service.
The MTS_DISPATCHERS parameter has changed considerably for NET8 and
now includes the ability to specify both the ip address and Port. This overcomes
the multiple network card problem seen in previous versions where it was
impossible to configure and control more than 1 network address. The ability to
specify which PORT the Dispatcher can listen on, alleviates a Firewall problem
(whereby all network connects must be channelled through a predetermined
PORT. Previously a random port was selected as each Dispatcher started.
To force the IP address used for the dispatchers, enter the following:
MTS_DISPATCHERS="(ADDRESS=(PARTIAL=TRUE)\
(PROTOCOL=TCP)(HOST=144.25.16.201))(DISPATCHERS=2)"
This will start 2 dispatchers that will listen on an IP address of 144.25.16.201,
which must be a card that is accessible to the dispatchers.
To force the Dispatcher process to grab a specific PORT you can add the
(PORT=) keyword pair, as follows:
(PROTOCOL=TCP)(HOST=144.25.16.201)(PORT=5000))\
(DISPATCHERS=1)"
(PROTOCOL=TCP)(HOST=144.25.16.201)(PORT=5001))\
(DISPATCHERS=1)"
You can specify multiple MTS_DISPATCHERS in the INIT.ORA file, but they
must be adjacent to each other.
The MTS_DISPATCHERS parameter is dynamic to the extent that you can
modify the number of dispatchers per protocol, but changing mutiplexing or
pooling will cause all existing connections to hang.
Version 1.2
MTS_DISPATCHERS
Attributes
* Listener
+Multiplex
+Pool
+Ticks
+Sessions
(+ = Used for Connection
Manager control.)
* Address
* Description
* Protocol
* Connections
* Dispatchers
* Service
NOTES:
MTS_DISPATCHERS allows you to enable various attributes for each
dispatcher. In Oracle 7.3, you specified a protocol and an initial number of
dispatchers. These attributes are specified in a position-dependent, comma-
separated string assigned to MTS_DISPATCHERS. For example:
MTS_DISPATCHERS = "TCP, 3"
While remaining backwardly compatible with this format, the parsing software
in Oracle8 supports a name-value syntax (similar to the syntax used by Net8)
to enable the specification of the existing and additional attributes in a
position-independent case-insensitive manner. For example:
MTS_DISPATCHERS = "(PROTOCOL=TCP)(DISPATCHERS=3)"
The ADDRESS and DESCRIPTION attributes provides support for the
specification of additional network attributes. (This enables support of multi-
homed hosts.)
The attributes CONNECTIONS, DISPATCHERS, LISTENER, MULTIPLEX,
POOL, SERVICE, and TICKS are optional.
Version 1.2
MTS_MAX_DISPATCHERS - The
maximum number of Dispatcher
processes (all protocols combined)
which can be started for duration of
an instance.
MTS_RATE_LOG_SIZE- Specifies the
sample size used to calculate
dispatcher rate statistics
MTS_RATE_SCALE - Specifies the
scale at which dispatcher rate
statistics are reported
Notes:
MTS_MAX_DISPATCGERS specifies the maximum limit of dispatcher
processes that may be created on an MTS configured instance.
for example: mts_max_dispatchers = 10
This sets the maximum limit of Dispatcher processes to 10. The MTS instance
may initially start up 5 Dispatcher Processes but then need to create additional
Dispatchers based on connection load but will never exceed the value specified
by this parameter.
Version 1.2
Obsolete
MTS_LISTENER_ADDRESS
MTS_MULTIPLE_LISTENERS
Above are supported for backward
compatibility only
LOCAL_LISTENER overrides
MTS_LISTENER_ADDRESS if present.
Version 1.2
Net8 Changes (MTS)
Server side changes
Client side changes
MTS Support on NT
MTS with OPS
Connection Manager
Notes:.
Version 1.2
Server Side Changes
Major architecture change for UGA
Session Data ie LARGE_POOL_SIZE.
MTS_DISPATCHERS allows support for
IP addresses (useful for hosts with
multiple IP addresses).
MTS_DISPATCERS allows Dispatcher
port to be specified (useful for some
firewalls).
Additional V$ views for MTS
statistics/tuning.
Support for Connection Manager.
Notes:
MTS user session data was previously held in Shared_Pool. This has now
been separated into Large_Pool to prevent fragmentation of Shared_Pool.
Stateful inspection based firewalls work by restricting tcp/ip traffic to certain
ports. MTS can now be used with these firewalls since the port number can be
predefined. For more information see
http://netweb.us.oracle.com/solutions/security/snet.html
Version 1.2
Client Side Changes
None for MTS! - The MTS service name
is still read from SID=<> section of the
tnsnames.ora. (as in Oracle7)
However to use Connection Manager
(See Appendix A) changes are needed
to the tnsnames.ora and sqlnet.ora on
the client side
A sample tnsnames.ora for use with Connection Manager:
CMAN = (description =
(address_list =
(address =
(protocol = tcp)
(host = isis)
(port = 1610) <- Default Port for
) Connection Manager
(address =
(protocol = tcp)
(host = isis)
(port = 1521)
)
)
(connect_data =
(sid = NET8_MTS_SERV)
)
(source_route = YES)
)
and sqlnet.ora:
trace_level_client = off
use_cman = true
automatic_ipc = off
Version 1.2
MTS Support On NT
Available on NT from Version 8.0.4
Winsock 2 required (NT 4.0)
Notes:
Version 1.2
MTS on OPS
Now OK with Oracle 8 as Oracle DLM is
group based.
Notes:
Many DLMs in Oracle 7 were process based, for example OPENVMS. This
led to the potential for false deadlocks.
See <Note 44886.1> and <Bug 357836> for further information.
Version 1.2
Connection Manager
New with Net8
Some features of Connection Manager
require MTS to be configured
See Appendix A for more details
Version 1.2
When To Use MTS
High User Concurrency - > 500 users
Applications with high think time
e.g. Order Entry System.
XA Applications and Dblinks.
Notes:
MigratableDB links
When using XA and database links, MTS has to be used. A DBLINK End
Point is just a file descriptor which can only be opened in a processes
private memory area (this is an Operating System restriction). Using MTS,
database links as virtual circuits are SGA based and can therefore be
referenced by any shared server so overcoming this restriction.
Version 1.2
When NOT To Use MTS
No real benefit seen with User
Concurrency < 500.
Batch jobs should always connect using
a Dedicated Server process.
e.g. Job which is going to tie up a
Server process.
To startup/shutdown/perform media
recovery; for example Enterprise
Manager.
Notes:
Version 1.2
Performance Tuning
Dynamic Performance Tables
Identify Contention for Dispatcher
Processes
Examine Busy Rates for Dispatcher
Processes
Identify Large_Pool_Size allocation
(MTS session data).
Identifying Contention for Shared
Server Processes
Version 1.2
Performance Tuning
Monitoring Dynamic Performance Tables
V$SESSTAT Contains max(memory) used by a session.
V$QUEUE Contains information about multi-threaded
message queues.
V$CIRCUIT Contains information about virtual circuits,
which are user connections through
dispatchers and servers.
V$DISPATCHER Contains information about dispatcher
processes
V$SHARED_SERVER Contains information about shared servers
V$MTS Contain information about MTS Servers started
/terminated dynamically since instance startup
Notes:
Version 1.2
Performance Tuning
Identifying Contention for Dispatcher
Processes:-
SELECT network "Protocol",
DECODE( SUM(totalq), 0, 'No Responses'
SUM(wait)/SUM(totalq) ||
' hundredths of seconds')
"Average Wait Time per Response"
FROM v$queue q, v$dispatcher d
WHERE q.type = 'DISPATCHER'
AND q.paddr = d.paddr
GROUP BY network;
Notes:-
V$QUEUE contains statistics reflecting the response queue activity for
dispatcher processes. These columns show wait times for responses in the
queue:
WAIT - the total waiting time, in hundredths of a second, for all responses
that have ever been in the queue
TOTALQ - the total number of responses that have ever been in the queue
Use the above query to monitor these statistics occasionally while your
application is running.
This query returns the average time, in hundredths of a second, that a response
waits in each responsequeue for a dispatcher process to route it to a user
process. This query uses the V$DISPATCHER table to group the rows of the
V$QUEUE table by network protocol. The result of this query might look like
this:
Protocol Average Wait Time per Response
tcp .1739130 hundredths of seconds
If the average wait time for a specific network protocol continues to increase
steadily as your application runs, then by adding dispatcher processes you may
be able to improve performance. A wait time greater than 1 should be seen as a
problem.
Version 1.2
Performance Tuning
Examine Busy Rates for Dispatcher
Processes:-
select name, (busy/(busy+idle)) * 100
"% of time busy" from v$dispatcher;
Notes:-
The above statement will show Percentage Busy Time for each Dispatcher
Processes.
SQL> select name, (busy/(busy+idle)) * 100 "% of time busy" from
v$dispatcher;
NAME % of time busy
----- --------------
D000 11.022
If this figure consistently shows a high value for ALL dispatcher processes
you should investigate further. It may be that batch jobs or jobs better suited
to a Dedicated Server connection are being routed via MTS dispatchers rather
than a high number of requests being processed. If the latter is the case then
you should look at increasing the number of dispatcher processes.
Version 1.2
Performance Tuning
Calculating Large_Pool_Size Allocation for
MTS users from V$SESSTAT:
SELECT SUM(value) || ' bytes'
"Total memory for all sessions"
FROM v$sesstat, v$statname
WHERE name = 'session uga memory'
AND v$sesstat.statistic# =
v$statname.statistic#;
SELECT SUM(value) || ' bytes'
"Total max mem for all sessions"
FROM v$sesstat, v$statname
WHERE name = 'session uga memory max'
AND v$sesstat.statistic# =
v$statname.statistic#;
Notes:
Monitor the V$SESSTAT to decide how much larger to make the
Large_pool_Size. If Large_pool_Size is not specified as an init.ora parameter
then the UGA will be sitting in the Shared_Pool for all Multi-Threaded Server
users.
Issue these queries while your application is running, ensuring that a peak
load is being determined
Version 1.2
Performance Tuning
Identifying Contention for Shared Server
Processes:-
SELECT DECODE( totalq, 0, 'No Requests',
wait/totalq || ' hundredths of seconds')
"Average Wait Time Per Requests"
FROM v$queue
WHERE type = 'COMMON';
Notes:
MTS Connections always appear as NONE or SHARED in the Server
column of V$Session view depending on whether or not a task is currently
being serviced by a shared server.
Contention for shared server processes can be reflected by a steady increase in
waiting time for requests in the request queue. The dynamic performance table
V$QUEUE contains statistics reflecting the request queue activity for shared
server processes. These columns show wait times for requests in the queue:
WAIT - the total waiting time, in hundredths of a second, for all requests that
have ever been in the queue
TOTALQ - the total number of requests that have ever been in the queue
SQL> select name, requests,(busy/(busy+idle)) * 100 "%time busy" from
v$shared_server;
NAME REQUESTS %time busy
S000 108 11.0047454
S001 62 10.6330216
A high busy time with a small number of requests may suggest that unsuitable
batch jobs are being processed via Shared Servers rather than the
Shared_Servers being swamped with large numbers of requests. If the latter is
the case increasing the number of Shared Servers may improve performance.
Version 1.2
MTS - Known Problems
MTS and DBLinks can tie up Shared Servers if
ADDRESS_LIST is specified in TNSNAMES for
connection description. BUG 300008
TIMEOUTS specified for Connection Manager ie TICKS
is documented as being in second but is 10 seconds.
BUG 572327
CMCTL (part of Connection Manager) will not shut down
cleanly if any connections remain. You have to
physically terminate the two processes. Fixed in 8.0.5.
MTS_DISPATCHERS is NOT dynamic if Connection
Pooling/Multiplexing are being used. Attempting to do
an alter system set mts_dispatchers. will hang all
existing connections.
Version 1.2
MTS Reference Information
Note 47684.1 KKP FAQ MTS
Note 47564.1 KKP Top20 MTS
Note 44886.1 MTS and OPS
Oracle Server Release 8.0 Administrators Guide
Oracle Net8 Release 8.0 Administators Guide
Oracle 8.0 Server Concepts Guide
Richard Powell!
Version 1.2
2
2
APPENDIX A
Version 1.2
CONNECTION MANAGER
Scalability
Multiplexing (Concentration)
Connection Pooling
Multi-Protocol Support
Access Control (Security)
Configuration
Processes
Notes:
Oracle Connection Manager acts like a router through which client connection
requests may either be sent on its next hop or directly to a server. Clients who
route their connection requests through a Connection Manager may then take
advantage of the connection concentration, network access control, or
multiprotocol support features configured on that Connection Manager.
Version 1.2
Scalability: Multiplexing
Connections
minimized from
backend to server
Clients with multiple
applications require
single transport
Multiple (logical)
network sessions
use single (physical)
transport connection
Open link available
for network sessions
Session
1
Session
2
Process 1
Session
n
Node A Node B
Single
Physical
Link
Session
1
Session
2
Process 2
Session
n
Client Server
Version 1.2
Multi-Threaded Server required for
Multiplexing.
Configuring the MTS-Dispatcher for
Multiplexing (Concentrating):
init.ora
Key parameters:
MULTIPLEX (MULT or MUL)
Version 1.2
MULT = 1 | ON | YES | TRUE | BOTH
Multiplexing is enabled for both incoming and outgoing
network connections.
MULT = IN
Multiplexing" is enabled for incoming network
connections.
MULT = OUT
Multiplexing" is enabled for outgoing network
connections.
MULT = 0 | OFF | NO | FALSE
Multiplexing" is disabled for incoming and outgoing
network connections.
Notes:
By default, Multiplexing is disabled on both incoming and outgoing network
connections.
An example of an init.ora containing mts parameters for connection pooling:
mts_max_dispatchers=22
mts_servers=2
mts_dispatchers="(DISPATCHERS=1)(PROTOCOL=TCP)(MULTIPLEXING=ON)"
mts_max_servers=10
mts_service=NET8_MTS_SERV
Version 1.2
Scalability: Connection Pooling
Clients Connect to Connection Manager
Connection Manager redirects to MTS
Dispatchers on Server.
Notes:
Connection Pooling allows you to have more CLIENT sessions active than
there are active NETWORK sessions.
For example you could have 1000 PCs connected to a Unix Server each
running Oracle Office and only utilize 256 network sockets.
Version 1.2
Idle network sessions time out
releasing corresponding transport
connection for use by other network
sessions.
Ideal for interactive, high think/search time
applications (for example e-mail,
data warehousing)
Version 1.2
Multi-threaded server required for Pooling.
Dispatcher ages least-recently-used
connections, physically disconnecting them
while maintaining their context.
If a disconnected client needs to access the
server and there are no network sessions
available it will wait for a network resource to
become available. The resource will either
become available due to a connection ending
or going idle.
Notes:
Sessions are only timed out when when the configuration is running short of
End Points ie when demand means we NEED to time them out.
(DEMO)!
Version 1.2
Configuring the MTS_Dispatchers for
Connection Pooling:
init.ora
Key parameters:
CONNECTION (CON or CONN)
TICKS (TIC or TICK)
POOL (POO)
Version 1.2
CONNECTION (CON or CONN) = n
n is the maximum number of network connections to
allow for each dispatcher.
Default is set by Net8.
Platform dependent and relates to open file
descriptor limit
Notes:
The platform specific limits for Connection Pooling are due to differing open
file descriptor limits.
Version 1.2
TICKS (TIC or TICK) = n
Default is set byNet8 and might be platform specific..
15 seconds is the default for the SOLARIS platform.
n is the size of a clock tick in seconds.
Notes:
TICKS specifies the time a connection must remain idle before it is placed on
the LRU list of connections to be reused. There is currently a problem in that
the timing (n) actually appears to be 10 SECONDS as opposed to 1.
Version 1.2
POOL (POO) = n
n is the timeout in ticks for both incoming and
outgoing network connections.
POOL (POO) = ON | YES | TRUE | BOTH
"Connection Pooling" is enabled for both incoming
and outgoing network connections with the default
timeout.
POOL (POO) = NO | OFF | FALSE
"Connection Pooling" is disabled for both incoming
and outgoing network connections.
Version 1.2
POOL (POO) = IN
"Connection Pooling" is enabled for incoming
network connections with the default timeout.
POOL (POO) = OUT
"Connection Pooling" is enabled for outgoing
network connections with the default timeout.
POOL (POO) = (IN=n) | (OUT=n) |
((IN=n)(OUT=m))
n is the timeout in ticks for incoming network
connections.
m is the timeout in ticks for incoming network
connections.
Notes:
An example of an init.ora containing mts parameters for connection pooling:
mts_servers=2
mts_dispatchers="(DISPATCHERS=1)(PROTOCOL=TCP)\
(POOL=ON)(CONNECTIONS=5)(TICKS=1)"
mts_max_servers=10
mts_service=NET8_MTS_SERV
mts_listener_address="(ADDRESS=(PROTOCOL=tcp)\
(host=isis)(port=1528))"
This will start one MTS dispatcher for the TCP/IP protocol with connection
pooling enabled allowing 5 connections maximum and a timeout value of 10
seconds.
Version 1.2
Transparent protocol conversion for all
protocols supported by NET8.
Bi-directional protocol conversion
Scalable from PC to Mainframe
Replaces the MultiProtocol Interchange
(MPI) available with SQL*Net V2.
Can be configured to use MTS or
Dedicated Connections.
Notes:
Protocols supported include: TCP/IP, Named Pipes, SPX/IPX & LU 6.2.
There is no DECNET support for NET8 under OpenVMS, Digital UNIX or
Windows NT.
An example CMAN.ORA, including support for SPX is included in the demo.
Version 1.2
Multi Protocol Support
Connection Manager
Client
Client
Client
Or ac l e
Connec t i on
Manager
Database
Server
TCPIP
SPX/IX
SPX/IX
SPX/IX
Version 1.2
Access Control
Provides flexible rules to control access
from/to specific hosts/domains and
databases.
Can be configured for MTS or Dedicated
Connections.
Controlled by cman_rules parameter in
CMAN.ORA.
Compliments invited/excluded
validnode checking in Protocol.ora
Notes:
Sets the rules for the network access control portion of Oracle Connection
Manager.
Values: SRC - Source host name or IP address (in dot notation) of session
request
DST - Destination server host name or IP address (in dot notation)
SRV - Database server SID or MTS Service Name
ACT - Accept or Reject incoming requests with the previous characteristics.
The wildcard for host name is the single character `x'. In the case of an IP
address (d.d.d.d), you may wild card the individual d's with an `x'.
cman_rules= (rules_list=
(rule=
(src=isis)(dst=isis)(srv=NET8_MTS_SERV)(act=accept)
)
(rule=
(src=138.3.64.91)(dst=isis)(srv=NET8_MTS_SERV)(act=accept)
)
(rule=
(src=isis)(dst=ukp2094.uk)(srv=NET8_MTS_SERV)(act=acccept)
)
)
Version 1.2
CMAN Configuration
CMAN.ORA:
cman =
(address_list=
(address = (protocol = tcp) (host = isis)
(port=1610))
)
cman_profile =
(parameter_list =
(tracing = yes) (log_level = 4)
(maximum_relays=512) (show_tns_info = no)
(relay_statistics = no)
(trace_directory=/oracle/804/network/admin)
(log_directory=/oracle/804/network/admin)
)
Notes:
CMAN.ORA (an editable file which resides in TNS_ADMIN directory).
Cman.ora default parameter vaues:-
MAXIMUM_RELAYS = 8 LOG_LEVEL = 0
TRACING = NO TRACE_DIRECTORY = <pathname>
RELAY_STATISTICS = NO SHOW_TNS_INFO = NO
USE_ASYNC_CALL = YES AUTHENTICATION_LEVEL = 0
MAXIMUM_CONNECT_DATA = 1024
ANSWER_TIMEOUT = 0
Available Values:
MAXIMUM_RELAYS = [0-10240] SHOW_TNS_INFO = [YES, NO]
LOG_LEVEL = [0-4] USE_ASYNC_CALL = [YES, NO]
TRACING = [YES, NO] AUTHENTICATION_LEVEL = [0, 1]
TRACE_DIRECTORY = <pathname>
RELAY_STATISTICS = [YES,NO]
MAXIMUM_CONNECT_DATA = [257-4096]
ANSWER_TIMEOUT = [0 to n]
Reference: Oracle Net8 Administrator's Guide Release 8.0 A58230-01
Version 1.2
CMAN Processes
CMGW - Gateway Process acting as
hub for Connection Manager.
CMADM - Admin Process
responsible for maintaining
address information.
CMCTL - Control program to
stop/start/list status of both
CMAGM and CMGW.
Notes:
CMGW is a gateway process acting as a hub for the Connection Manager.
This process is responsible for the following:
- registering with the Connection Manager Administration process
- listening for incoming SQL*Net 2.x or NET 8.x connection requests.
- initiating connection requests to listeners for clients
- relaying data between the client and server
- answering requests initiated by CMCTL .
CMADM is a multi-threaded process that is responsible for all administrative
issues of the Connection Manager. Its primary function is to maintain address
information in the Oracle Names Server for the SQL*Net 2.x and NET 8.x
clients. Other responsibilities include:
- processing the CMGW registration
- registers Source Route address with local Oracle Names Servers
- identifying all listeners serving at least one database instance
- registering address information about the CMGW and listeners
- monitoring changes in the network and update the Names Server
answering requests initiated by CMCTL.
Version 1.2
Notes:
CMADM
Communication between CMGW and CMADM is done via interprocess
communications. The Connection Manager periodically goes to the Names
Server (if configured) to update its cache of available services.
CMCTL
The Connection Manager Control Utility (CMCTL) is a tool that you run from
the operating system prompt to start and control Oracle Connection Manager.
The general form of the Connection Manager Control Utility is:
CMCTL command [process_type]
where the process_type is the type of process that the command is being
executed on. The choices are:
cman (both main and administration processes)
adm (only the administration process)
cm (main process only).
For example, to start both the administration and main processes, you would
execute the following:
CMCTL> start cman
Other commands include:-
> STOP
> STATUS
> STATS
Version 1.2
When To Use Connection
Manager
IF EVER?
Connection Pooling - User population >
10000 with interactive, high think/search
time applications e.g. e-mail.
Multiplexing - Would be used when network
resources are restricted - on a standard Unix
platform this will rarely occur I.e. > 10000
concurrent users!!
Notes:
In reality the huge number of connections needed to make either Connection
Pooling or Multiplexing necessary, means that these features will NOT be
commonly used.
Version 1.2
When To Use Connection
Manager
Multiplexing may also be used to reduce
network traffic/costs across a WAN.
Some LAN/WANs are charged on a per
connection basis - with multiplexing
enabled ie say 5 network connections
being multiplexed through one physical
transport - costs are significantly less.
Notes:
WAN costs are sometimes charged on a per connection basis. Multiplexing
allows you to reduce the number of physical transports opened and thus reduce
costs.
Version 1.2
When to Use Connection
Manager
Security - Cman_rules provide
high degree of access control.
NOTE: Access Control and MP Support
do NOT need MTS configured.
Notes:
When using Connection Manager for Access Control or Multi-Protocol
Support, if the SID is used instead of an MTS_SERVICE name then a
dedicated connection will be made.
Version 1.2
2
2
The End
DSI 306 - Unit 3, Manual Partitioning 3--1
Version 1.2
Design: Alok Satyawadi (Advanced Analysis)
Implementation: Troy Anthony (Core Competancy development)
Review:
80 minutes Lecture
30 minutes Exercises
110 minutes Total
3
3
Lesson 3
Manual Partitioning
Version 1.2
The aim of this lesson is to have participants understand the need for partitioning, and
use the features provided in Oracle V7.3 and above to implement manual partitioning
effectively.
This concept may be referred to as manual partitioning or partition views and is
essentially the breaking up of large tables into smaller pieces and then defining a view
that reconstructs the large table. This methodology acted as a precursor to table
partitioning introduced with Oracle V8.0.3.
Partition views did exist before Oracle V7.3, however limitations within the optimiser
meant that they didnt always function as envisaged.
Objectives
At the end of this lesson, you will be able
to
Understand performance issues related
to large tables
Understand operational issues with
large tables
Be aware of some of the limitations
inherent with manual partitioning
Version 1.2
Topics
Topics to be discussed include:
Manual Partitioning
Star Joins and Star Schemas
Bitmapped Indexes
Four main topics will be discussed in this lesson. These are:
1) Partitioning. Its advantages and how partitioning can be achieved from Oracle version
7.3 onwards, using a technique called manual partitioning.
2) Star Joins - A relationship between disjoint tables that can be exploited by the optimizer,
often resulting in better performance. The restrictions associated with star and snowflake joins will
be investigated.
Star schemas - the schema structure (layout) that is referred to as a star design.
4) Bitmapped Indexes - a new index type, available from Oracle version 7.3 onwards. The
suitability and structure of this index will be discussed.
Version 1.2
Manual Partitioning
Version 1.2
Reasons for Partitioning
Data sets / tables are getting bigger
Breaking tables into smaller pieces
(partitions) gives improvement in:
Availability
Administration
Performance
Data sets and individual tables are growing. It is not uncommon now to have applications with
tables containing hundreds of thousands of rows, encompassing Gigabytes of information. For
example,
Customer examples:
1. US based retail chain. 750 GB information stored as 13 sets of monthly
data. 100 users issuing complex queries. Multi-dimensional database design.
2. Australasian telecommunications company.
Before Oracle 8 introduced the concept of table partitioning, dbas and database designers had
to manually break large tables in to smaller pieces. As with table partitioning, manual
partitioning offers that individual partitions can be :
- added or dropped independantly
- reorganised, backed up and restored independantly
- used in a SELECT clause (query or sub-query)
- split, merged and loaded incrementally whilst maintaining local indexes
- replaced via partitioned view to create a new configuration (effectively splitting,
dropping or adding online).
- eliminated from a search if falling outside of a key range
Technical Note: Table Partitioning is the term being used to differentiate
between Partition views and the Oracle 8 method of partitioning.
Version 1.2
Advantages of Partitioning
Very large databases (VLDBs)
Reducing downtime for maintenance
Reducing downtime due to failure
DSS performance
I/O performance
Disk striping: performance vs
availability
Partition transparency
These represent different classes of databases or situations where partitioning has an
impact on the feature listed.
VLDBs - Databases that owe their size to a few very large tables and indexes.
May be on-line transaction processing (OLTP) or decision support systems (DSS).
OLTP databases are designed for large numbers of concurrent transactions. DSS are
designed for complex query access to large amounts of data.
Maintenance - Can perform reorganisation, rebuilding, splitting, truncating and
so forth on individual partitions without disabling access to the whole table. Some
portions may still be accessible (of course it depends on how the queries are
structured).
Failure - The failure of an individual drive(s) will only restrict access to the
partitions resident on that drive. Some of the tables may still be accessed (very
dependant on striping, and how the query/view is constructed).
DSS - Can take advantage of partition elimination to improve the performance
of queries.
I/O performance- Can balance the I/O load by positioning individual tables on
separate devices (dependant on striping).
Disk Striping - Can stripe partitions across disk to maximize availability or
performance. A trade off exists between the two.
Partition Transparency - As the partitions are individual tables can restrict
access to these tables to favour certain jobs. Or avoid accessing some partitions when
scheduled jobs may affect performance.
Version 1.2
How to achieve partitioning
In Oracle 7 partitioning is achieved with
Partition Views (PV)
In Oracle 8 partitioning is achieved with
Partitioned Tables (PT)
In order to provide some of the advantages of partitioning (discussed in earlier slides)
the concept of Partition Views (PV) was introduced in the 7.3 timeframe. This provided
some of the advantages that Oracle 8 offered with Partitioned Tables (PT), such as
manageability and improved performance.
PVs simulate Partitioned Tables (PT) through using a UNION-ALL view over a number
of non-partitioned tables. A discussion on the similarities and the deficiencies of PV
versus PT follows.
Version 1.2
Similarities of Partition Views
with Table Partitioning
Used in a SELECT clause
Underlying indexes behave like local
indexes
Independant partitions
Split and merge support
Incremental and parallel loading possible
PVs simulate PTs through a UNION-ALL view over a series of non-partitioned tables.
Constraints (typically check constraints) provide a simulation of partitioning criteria, as
well as enabling partition elimination.
PVs are similar to PTs in the following respects:
1. A PV can be used similarly to a table in a SELECT clause (in both
queries and sub-queries).
2. Underlying indices on each table of the PV behave like local indices on a
PT.
3. The optimizer can skip irrelevant partitions where predicates relate the
value of the partitioning column to a constant or bind variable.
4. Partitions can be added or dropped independantly.
5. Partitions can be re-organised, backed up and restored independantly.
6. Partitions can be split and merged independantly.
7. A PV can be loaded incrementally, in parallel whilst maintaining local
indices.
Version 1.2
Deficiencies of Partition Views
over Partitioned Tables
DDL must be issued separately
Administrative functions complicated
A PV cannot be the target of DML
SQL*Loader does NOT support PVs
Plan size dependant on no. of partitions
No global indexes
No support for concatenated partition
keys
When compared to a PT a PV is deficient in the following respects:
1. DDL commands must be issued against each underlying table seperately.
For example: To analyze a PV a user must analyze all underlying tables,
to add an index a user must add an index to all underlying tables. These
operations may be submitted to each partition in parallel.
2. Administrative operations (such as split) must be performed as separate
operations on each underlying table of the PV.
For example a split operation would consist of one or two create table
as select operations followed by a redefining of the PVs view text.
A considerable amount of planning and programming each PV operation
must be weighed against the flexibility that PVs provide.
3. A PV may not be the target of a DML statement.
4. Sql*Loader does not support PVs. The data must be externally
partitioned. PTs also require user partitioning for parallel loads.
5. The plan size for a PV will grow with the number of partitions, regardless
of how many are accessed. The shared pool plan will have control blocks
allocated for all partitions.
6. No equivalent for a global index exists. This may make PVs unsuitable for
OLTP applications, although not necessarily a problem for Data
Warehousing.
7. Concatenated partition keys are not supported.
Version 1.2
Advantages of Partition Views
over Partitioned Tables
Partitions can be remote tables
Partitioning can be check constraints
Partitions can overlap or have range
gaps
Simultaneous partitioning over more
than one column
Splitting, adding, or dropping on-line
Different access paths for each table
PVs have some advantages over PTs, in that:
1. A partition can be a remote table.
2. Range Partitioning is just an option, the partitioning can be as expressive
as a check constraint.
3. Partitions can overlap or have gaps in the range. For example: P1 = 1..10
,P2 = 7..17, P3=20..30. A query for value 9 would search P1 and P2, a
query for value 19 would search no partitions.
4. Simultaneous partitioning by more than one column is possible. For
example an orders table could be partitioned by ordernumber and
orderdate.
5. Replacing the PV view enables ease of change of the configuration.
Partitions can be split, added or dropped on-line.
6. Access methods to the underlying tables may be different. The optimizer
will attempt to choose the best means of data retrieval.
Version 1.2
Creating Partition Views
The steps needed to create partition views
are as follows:
Create multiple identical tables
Load the tables
Enable check constraints (optional but
recommended)
Add indexes (optional)
Analyze the partitions
Create the partition view
Ensure session parameters enabled
These steps will be explained with an example over the next few slides.
Class Management Note: These steps are expanded on in the following slides.
Version 1.2
Creating Partition Views
SQL> create table q1_sales (sales_date date,
2 order_no number,
3 salesman varchar2(20),
4 constraint c0 check (sales_date between
5 to_date('01-JUL-1997','DD-MON-YYYY') and
6 to_date('30-SEP-1997','DD-MON-YYYY')) disable )
1. Create the base tables:
Creating base tables with check constraints (more about their use later):
SQL> l
1 create table q1_sales (sales_date date,
2 order_no number,
4 constraint c0 check (sales_date between to_date('01-JUL-1997','DD-MON-YYYY'
) and
5* to_date('30-SEP-1997','DD-MON-YYYY')) disable )
SQL> /
Table created.
SQL> l
1 create table q2_sales (sales_date date,
2 order_no number,
4 constraint c1 check (sales_date between to_date('01-AUG-1997','DD-MON-YYYY'
) and
5* to_date('31-DEC-1997','DD-MON-YYYY')) disable )
SQL> /
Table created.
Class Management Note: This is only half the SQL, another two tables with
adjusted date ranges would be created.
Version 1.2
Creating Partition Views (cont)
Load the base tables:
sql loader
% sqlldr scott/tiger direct=true control=q1.ctl
- Insert statements
SQL> insert into q1_sales select * from sales_data
2 where sales_date between
3 to_date(01-JUL-1997,DD-MON-YYYY) and
4 to_date(30-SEP-1997,DD-MON-YYYY) ;
Use some method to load the base tables, either from an external source via
sqlloader, or from an existing table source.
For improved performance disable any constraints that may exist. Ensure that data
files for sqlloader match the partition range so that constraints may be enabled
successfully
Version 1.2
Enable check constraints
(recommended) SQL> alter table q2_sales enable constraint c1;
Table altered
SQL> alter table q3_sales enable constraint c2;
Table altered
Add indexes (optional)
SQL> create index q1_ind on q1_sales
(sales_date,order_no);
Index created
SQL> create index q2_ind on q2_sales
(sales_date,order_no);
Index created
The use of check constraints is recommended by Oracle. Enabling the check
constraints allows the optimizer to skip irrelevant partitions. The same functionality can
be achieved through the use of WHERE clauses in the partition view statement. For
example:
SQL> create table q1_sales add constraint c0 check (sales_date between
01-JUN-1997 and 30-SEP-1997);
SQL> create table q1_sales add constraint c0 check (sales_date between
01-JUN-1997 and 30-SEP-1997);
SQL> create view sales_view as
select * from q1_sales UNION ALL
select * from q2_sales;
is the equivalent of
SQL> create view sales_view as
select * from q1_sales where sales_date between 01-JUN-1997 and
30-SEP-1997 UNION ALL
select * from q2_sales where sales_date between 01-AUG-1997 and
30-DEC-1997 ;
Functionality of these two views would be different (explanation follows).
Version 1.2
Analyze the partitions
SQL> analyze table q1_sales compute statistics;
Table analyzed
SQL> analyze table q2_sales compute statistics;
Table analyzed
The cost based optimizer is always used with partition views. An ANALYZE statement
against each partition involved in the view is necessary.
These can be run in parallel to reduce the time to calculate the statistics (provided that
the partitions are located on different drives, and that sufficient machine resources are
available).
Version 1.2
Creating Partition views (cont)
SQL> create or replace view sales_data as
2 select * from q1_sales UNION ALL
5 select * from q4_sales ;
Create the view
The view ties together the smaller tables,once they have been created or identified.
ALL underlying tables must have identical definitions and all columns or a SELECT *
.. must be specified in the SELECT statement. If indexes are to be used then all
underlying tables must have identical indexes.
As mentioned previously WHERE clauses can be specified in the view definition to
provide the partition boundary limitations. This is not recommended for a number of
reasons:
1. Check constraint predicates are not evaluated per row for a query.
2. Check constraint predicates prevent inserting rows into the incorrect
partition (where they would be lost/hidden from the view).
3. The data dictionary provides a clearer picture of partitioning information.
The only benefit in creating a view with a WHERE clause providing the partition range
information is that a mixture of local and remote tables can be used (check constraints
are not retrieved from remote databases). An example of this view could be:
SELECT * FROM eastern_sales@east.com.au WHERE loc = EAST
UNION ALL
SELECT * from western_sales@west.com.au WHERE loc = WEST ;
Version 1.2
Creating Partition views (cont)
Ensure that the initialisation / session
parameter is set to TRUE
PARTITION_VIEW_ENABLED = TRUE
(in init<XXX>.ora)
Set using an Alter Session command
SQL> ALTER SESSION SET PARTITION_VIEW_ENABLED =
2 TRUE ;
Session Altered
When the parameter PARTITION_VIEW_ENABLED is set to TRUE the option of
partition elimination is available to the optimizer.
If a query contains a predicate that limits the results to a subset of the views partitions
then the optimizer can eliminate those partitions that are not required from the query.
This operation is performed at run-time. Even though partitions are skipped there is
still some overhead per partition which will be most noticeable for those partitions
scanned via an index and only returning a small number of records.
Note - Compatible has to be set greater than 7.3 if this is being executed in version 7).
Version 1.2
Partition View Performance
The optimisation of queries through a
partition view have some stringent
requirements:
Only simple queries are allowed
Practical limits exist on the number of
base tables
A UNION-ALL view will be treated as a manual partition if all queries composing it are
simple queries. That is, in every branch of the union-all view, the queries are of the
following form:
1. Exactly one table is specified in the FROM clause, and the table specified
is a base table and not a view (nesting of manual partitions is NOT supported).
2. The select list is a * or a literal expansion of this (ie. ALL columns).
3. The query does not use any of the following constructs: where clause, group
by, aggregate functions or distinct, rownum, start with or connect by clause.
The schema definition must be identical for each base table referred to in the union-all
branch, and the number and definition of indexes should be identical.
Practical limits generally restrict the size of the base objects. Whilst we have raised
some structural limits that governed the database size, practical limits still apply. For
example, in version 7 the maximum number of datafiles was 1022 on most ports. This
limit has now risen to 1022 datafiles per tablespace, however operational
considerations such as maintenance and recoverability will still restrict table size to
something in the order of 1 to 10 GB (but this may be rising due to the increased
maximum size of a segment within the one tablespace).
Version 1.2
Parallelism
Query
Slave 4 Slave 5 Slave 6
Slave 1 Slave 2 Slave 3
Readers
Sorters
Consider a view defined as follows:
Create view global_emp_view
as select ename,job,location
from asiapac_emp
where department = AP
UNION ALL
select ename,job,location
from emea_emp
where department = EMEA
UNION ALL
select ename,job,location
from us_emp
where department = US
The illustration above would result from a query like the following:
Select location, count(job)
from global_emp_view
where job = Consultant
group by location;
Version 1.2
The slave sets would have been passed a DFO similar to this:
Select /*+rowid(asiapac_emp)*/ ename, job,location
from asiapac_emp
where department = AP and rowid between :1 and :2
UNION ALL
Select /*+rowid(emea_emp)*/ ename, job,location
from emea_emp
where department = EMEA and rowid between :3 and :4
UNION ALL
Select /*+rowid(us_emp)*/ ename, job,location
from us_emp
where department = US and rowid between :5 and :6
The Query Cordinator (QC) will hand out rowid ranges via the rowid manager for each
manager to the UNION ALL until all the rowids for each table have been exhausted.
The scanners (slaves 1 .. 3) perform the UNION ALL select statement giving the net
performance of all tables being scanned in parallel.
Hash key partitioning by the scanners sees their rows passed on to consumers who
perform the sum and group by operations.
The concurrent scanning of all branches of the UNION ALL view which allows for a
second slave set was introduced in 7.3.
Instructor Note: Stress that the view must be ALL the columns of the base
tables. however it is possible to select a sub-set from the view.
Version 1.2
Parallelism and
Alternative execution plans
Create view Q1 as
Select * from Jan_orders
union all
Select * from Feb_orders
union all
Select * from Mar_orders
Jan_orders
Feb_orders
Mar_orders
SELECT sum(revenue) FROM Q1
WHERE ORDER_DATE BETWEEN
'29-JAN-98' AND '26-FEB-98';
An index on ORDER_DATE in each
table allows :
- Quick index scan of Jan_orders due to
high index selectivity.
-Full table scan of Feb_orders due to
low index selectivity.
- An index probe of Mar_orders
eliminates this partition.
An EXPLAIN PLAN output could be something similar to that below.
step (card,bytes,cost)
--------------------------------------------------------------------
Q13 (34,2448,353) Serial
VIEW Q1 (22,682,345) CombUp Q2235000
UNION-ALL PARTITION (,,) CombUp Q2235000
TABLE ACCESS BY ROWID Jan_orders (4,432,2) CombUp Q2235000
INDEX RANGE SCAN L_Jan_orders (,,) CombUp Q2235000
TABLE ACCESS BY ROWID Feb_orders (2,216,2) CombUp Q2235000
TABLE ACCESS FULL Feb_orders (,,) CombUp Q2235000
TABLE ACCESS BY ROWID Mar_orders (1,216,2) CombUp Q2235000
INDEX RANGE SCAN L_Mar_orders (,,) CombUp Q2235000
Note: The table Mar_orders has been eliminated, however the last line of the
explain plan shows an index lookup on this table. This is a requirement for the
elination (in version 7) and would probably not occur if Table Partitioning
(version 8 option) was being used.
Version 1.2
Manual Partitioning Bugs
Bugs that have an effect on manual
partitioning include:
504930 (v7.3.3)
489631 (v7.3.2) not a bug
470360 (v7.2.3)
415446 (v7.3.2) not a bug
416511 (v7.3.3)
688520 (v8.0.4)
This is a list of some common bugs and issues that customers may face. This
is not an exhaustive list, but some of the more relevant problems. Bug
decsriptions and solutions are readily available in WebIV.
Hdr: 504930 7.3.3 RDBMS 7.3.3 PERFORMANCE PRODID-5
Abstract: PARTITION ELIMINATION NOT WORKING WHEN USING NL JOINS AND
PARTITION COL IS DATE
Hdr: 489631 7.3.2.3.0 RDBMS 7.3.2.3.0 PAR EXECUTION PRODID-5
Abstract: QUERIES ON PV TAKE TOO LONG
This is not a bug but shows a common problem and the reason why the feature is
functioning as defined.
Hdr: 470360 7.2.3 RDBMS 7.2.3 QRY OPTIMIZER PRODID-5
Abstract: INCORRECT NUMBER OF ROWS RETURNED VIA SELECT FROM A UNION VIEW
Hdr: 415446 7.3.2 RDBMS 7.3.2 QRY OPTIMIZER PRODID-5 PORTID-2
Abstract: FULL TABLE SCAN IS DONE ON ALL PARTITIONS OF PARTITIONED-VIEW
Once again not a bug, but a common concern with customers and an explanation why
this feature is not working.
Hdr: 416511 7.3.3 RDBMS 7.3.3 SQL EXECUTION PRODID-5
Abstract: QUERY WITH PARTITION VIEW AND IN PREDICATE GIVES WRONG ANSWER
Hdr: 688520 8.0.4.1 RDBMS 8.0.4.1 QRY OPTIMIZER PRODID-5 PORTID-453
Abstract: INCORRECT RESULTS WITH PARTITION VIEW AND BITMAP INDEXES
Version 1.2
Retrieval Mechanisms suitable
for Data Warehouses
In the following section we will discuss
Star Joins and Star Schemas
Bitmapped Indexes
Version 1.2
Star Joins and Star schemas
Version 1.2
Star Join Topology
PNO COUNTCODE DISC SALE
PNO PRODEDESC NUM
WEEKNO WEEKNAME START
COUNTCODE LOCATION
Fact Table
Lookup Table
Dimension Table
Lookup Table
When referring to a STAR query, typically we are talking about the joining of a large
table (termed a fact table) with 2 or more smaller tables (termed dimension or
lookup tables).
No relationship exists between the dimension tables. Done diagrammatically the join
could look like this:
As you can see, a star pattern, which is where the query obtained its name. In this
example the SALES table is the FACT table. The other tables add additional
information to the sales table, and are the DIMENSION tables. These tables allow
such questions as Who made this purchase?, Where was this bought? to be
answered - illustrating typical uses of data warehouses.
CUSTOMERS SKUS
SUPPLIERS STORES
SALES
Version 1.2
The example shown on the preceeding page is a very simple example of a star
schema. Star queries are typical queries applied to star schemas. Many business
models do not cleanly correspond to these simple schemas. There may be many more
dimension tables, or additional foreign key/primary key relationships. Models such as
these lead to very complex structures, often called snowflake schemas (at least in
some Oracle bulletins).
A snowflake schema (ie. A complex star schema) :
The suitability of the star query model still applies in this more complex model.
FACT
Dimension Dimension
Dimension Dimension
Dimension
Dimension
Dimension Dimension
Dimension Dimension
Version 1.2
When do we consider Star
Joins?
A STAR hint is specified
If more than 6 non-single row tables are
specified in the query
For a STAR query to be considered the following restrictions apply (see kko.c) :
1. A STAR hint is specified in the query OR
2. More than 6 non-single row tables are included in the join.
Based on this information we then evaluate the suitability of a star join according to :
1. At least 3 tables are included in the join.
2. The fact table must have at least KKOFCMIN rows. This defaults to
15000.
3. The larger table must have a concatenated index on at least two columns
, the leading part of which must be referenced by the predicate clause.
(Three bitmap indexes satisfies this condition as well).
4. There can be no access method or join method hints on the large table.
5. It must be legal for the large table to be last in the join order.
6. The query predicate must include a where clause of the type <column> =
constant for at least one of the dimension tables.
Note: There is implied use of the Cost Based Optimizer.
Note: Function kko1ro() recognises tables that are independantly single row,
where a rowid predicate and a unique index return only one row. It sets cardinality to
1, which should improve performance for tables that havent been analysed.
Note: Prior to version 7.3, if the query contained greater than 5 tables then star
joins were not considered
Version 1.2
Join ordering
Button Button Button Button
Button Button Button
Button Button Button
Button Button Button Button Button
FACT TABLE
Button
Button
Button
Butt
on
Button
Button
Butt
on
Button
Button
Butt
on
Button
Button
Butt
on
Button
Button
Butt
on
Button
X
Cart.
Product
B
u
t
t
o
n
B
u
t
t
o
n
B
u
t
t
o
n
B
u
t
t
o
n
B
u
t
t
o
n
B
u
t
t
o
n
B
u
t
t
o
n
Button
B
u
t
t
o
n
Button
B
u
t
t
o
n
Button
B
u
t
t
o
n
Button
B
u
t
t
o
n
Button
B
u
t
t
o
n
Concat.Index
When a star join is considered we evaluate performing a cartesian product of the
smaller dimension tables, and then joining this via the concatenated index to the
larger fact table. Cartesian products are considered extremely expensive and are
generally avoided in an Online Transaction Processing (OLTP) environment. The Rule
Based Optimizer will never choose to perform a cartesian product and therefore can
never allow star queries. However, the cost based optimizer (from version 7.3
onwards) recognises that in certain situations the pre-joining of smaller tables into a
cartesian product and then joining that result with a concatenated index, is more
efficient than joining each dimension table with the larger fact table sequentially.
Star queries are evaluated along with other permutations provided that the earlier
criteria are met (ie. No conflicting hints, greater than 6 tables in the join, fact table has
over 15000 rows, and so forth).
The function kkooqb() (optimize query block) - the top level function of the optimizer,
follows join ordering along these lines (pseudo code) :
IF an ORDERED hint specified, then use the order given in the FROM clause
ELSE IF (non-single-row tables >=6 OR STAR hint)
Is star-join possible? GO TO STAR transformation, otherwise
General case: Generate permuatations until the calculated cost of
evaluating more permutations exceeds the calculated cost of the
best plan seen so far. (ie. While resource_cost > (#permutations * 0.3 * #tables)
Version 1.2
Star case: We fix the largest table last and permute the other tables, which for a star
join will be disconnected. This should find the best path through the small tables. The
order of the "dimensions" will not necessarily match the concatenated index order
which will give less than optimal performance.
The setting of event 10053 to trace optimizer progress will generate more information
on the permutations generated in query evaluation.
Version 1.2
Star Joins - Explain Plans
SQL> select sum(promo_amt) from
prod_id_desc a, store_id_desc b,sales_desc c
where a.prod_id = c.prod_id and b.store_id = c.store_id
and c.sales_date = 16-MAY-98;
SQL> select /*+ STAR */ sum(promo_amt) from
prod_id_desc a, store_id_desc b,sales_desc c
where a.prod_id = c.prod_id and b.store_id = c.store_id
and c.sales_date = 16-MAY-98;
The following table descriptions were used:
SQL> desc prod_id_desc; SQL> desc store_id_desc
Name Type Name Type
------------------------------- ----- ------------------------------- ----
PROD_ID NUMBER STORE_ID NUMBER
PRDO_ID_NAME VARCHAR2(20) STORE_ID_NAME VARCHAR2(20)
SQL> desc sales_desc
Name Type
------------------------------- ----
SALES_DATE DATE
STORE_ID NUMBER
PROD_ID NUMBER
PROMO_AMT NUMBER
PROMO_QTY NUMBER
The queries shown generated the following explain plans :
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=50)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=6 Card=1 Bytes=50)
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'SALES_DESC' (Cost=4 Card=1 Bytes=24)
5 4 INDEX (RANGE SCAN) OF 'DATE_STORE_PROD' (NON-UNIQUE) (Cost=3 Card=1)
6 3 TABLE ACCESS (FULL) OF 'STORE_ID_DESC' (Cost=1 Card=21 Bytes=273)
7 2 TABLE ACCESS (FULL) OF 'PROD_ID_DESC' (Cost=1 Card=21 Bytes=273)
Version 1.2
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1345 Card=1 Bytes=50 )
3 2 MERGE JOIN (CARTESIAN) (Cost=22 Card=441 Bytes=11466)
5 3 SORT (JOIN) (Cost=21 Card=21 Bytes=273)
6 5 TABLE ACCESS (FULL) OF 'PROD_ID_DESC' (Cost=1 Card =21 Bytes=273)
7 2 TABLE ACCESS (BY INDEX ROWID) OF 'SALES_DESC' (Cost=3 Card=1 Bytes=24)
8 7 INDEX (RANGE SCAN) OF 'DATE_STORE_PROD' (NON-UNIQUE) (Cost=2 Card=1)
As seen in the second explain plan a CARTESIAN product of the two smaller tables is then
joined, via a NESTED-LOOP-JOIN with the index lookup of the fact table. This is as
expected with a STAR join. The hint was necessary to force the star join to occur as there
were too few tables involved in this join example.
Note:
What is suprising here is the SORT (JOIN) operation in step 5. This should not be
occurring, as a we are generating a cartesian product. Bug 331893 addressed this issue in
version 7.3 (which is where this example originated), however this bug is closed with status
92 (Not a BUG), and an explanation stating in part ... There is no sort done for a
cartesian join. . If no sorting is performed then why a sort appears in the explain plan is not
addressed. Further investigation of this scenario is warranted, and may make a useful
excercise for those interested.
Version 1.2
Common Bugs
Examples of some of the more common
bugs include:
399815 (v7.3.3) not a bug
563101 (v8.0.3)
681257 (v8.0.4) - Not resolved
635105 (v8.0.4)
521637 (v8.0.4)
Hdr: 399815 7.3.2.1 RDBMS 7.3.2.1 QRY OPTIMIZER PRODID-5 PORTID-
358
Abstract: STAR QUERY - TAKING TOO LONG TO EXECUTE. RUNS
FOR HOURS.
Not a bug but a useful discussion.
Hdr: 563101 8.0.3.0.0 RDBMS 8.0.3.0.0 QRY OPTIMIZER PRODID-5
PORTID-453
Abstract: INCORRECT RESULTS ON STAR QUERY
Hdr: 681257 8.0.4.1 RDBMS 8.0.4.1 QRY OPTIMIZER PRODID-5
Abstract: STAR_TRANSFORMATION_ENABLED=TRUE ON SELECT ON
PARALLEL JOIN
Hdr: 635105 8.0.4 RDBMS 8.0.4 PRODID-5 PORTID-451
Abstract: STAR QUERY TRANSFORMATION NEEDS TO ACT ON INLINE
VIEWS
Hdr: 521637 8.0.4 RDBMS 8.0.4 PRODID-5 PORTID-453 ORA-7445
Abstract: SIGSEGV ON PARALLEL JOIN WITH
STAR_TRANSFORMATION_ENABLED=TRUE
Version 1.2
Bitmapped Indexes
Version 1.2
Bitmapped Indexes
Introduced in Oracle 7.3
Useful in queries where :
the WHERE clause contains multiple
predicates on low cardinality columns
individual predicates on these
columns select a large # of rows
the tables accessed have many rows
Bitmapped indexes were introduced in Oracle version 7.3. Prior to version 7.3.2 it was
necessary to set event(s) 10111, 10112, 10114 (and possibly 10113 and 10115) to
force bitmaps to operate. The functionality under these events may be suspect.
Re-inforcing the definitions mentioned: Cardinality of a column is determined by the
number of distinct values it contains. The ratio of the distinct values to total values
determines whether a column is of low or high cardinality.
Bitmap Index is a non-unique B*Tree index in which leaf nodes contain
values associated with a compressed list of rowids (as opposed to key,rowid pairs).
More on this follows.
Bitmap indexes (hereafter referred to as bitmaps) have become popular as a query
resolution tool, with Data Warehouse environments being seen to be able to benefit
from their access methods. Bitmaps can provide pre-filtering of data via Boolean
operations, which occur prior to table access, minimizing the table access required
and allowing rapid query execution.
Version 1.2
How do Bitmaps work?
Index on Size
Index on Colour
SMALL 0 1 0 0 0 1 1 0
MEDIUM 1 0 0 0 1 0 0 1
LARGE 0 0 1 1 0 0 0 0
itemno Size Colour
001 Medium Blue 80
002 Small Red 29
003 Large Green 11
004 Large Blue 07
005 Medium Red 15
006 Small Red 24
007 Small Green 17
008 Medium Blue 12
...
Blue 1 0 0 1 0 0 0 1
Green 0 0 1 0 0 0 1 0
Red 0 1 0 0 1 1 0 0
Item Table
Stock
The example is meant to illustrate the structure of the bitmap index. A bitmap can be
created on one or more columns of a table. The bitmap structure has the format:
key value, start rowid, end rowid, bitmap segment
These structures are then stored in a conventional b*tree arrangement.
A proprietary / patented compression algorithm is used to compress the rowids, using
byte-aligned compression where the bitmap is divided in to bytes and stored as GAPs
(bytes only contain zeros) or MAPs (byte contains 1s). The GAPs are then
compressed, but the MAPs are not.
Each bitmap segment may represent only part of the rowid range for a given key
value. During the bitmap creation process (covered later) the minimum row length is
calculated by examining the column definitions, and the maximum number of rows per
block is determined. The bitmap segments are then constructed so that each segment
is no larger than half of an Oracle block. So expanding on the example illustrated -
assume that we can store 4 rows per block, then the index on colour would look like:
Bitmap Index on Column Colour
key-value start-rowid end-rowid bitmap segment
Blue 1.0 2.3 10010001
where the start and end-rowid values are block#.slot#.
Version 1.2
here is a leaf block dump showing the first key value (a 0), the rowid range
pairs and the start of the bitmap segment:
Leaf block dump
===============
header address 17090332=0x104c71c
kdxcolev 0
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 1
kdxcofbo 38=0x26
kdxcofeo 1010=0x3f2
kdxcoavs 972
kdxlespl 0
kdxcoavs 972
kdxlespl 0
kdxlende 0
kdxlenxt 16778685=0x10005bd
kdxleprv 0=0x0
kdxledsz 0
kdxlecol 0
kdxlebksz 1888
row#0[1010] flag: ----, lock: 0
col 0; len 1; (1): 80 <-------------------- KEY VALUE = 0
col 1; len 6; (6): 01 00 00 92 00 00 <------------------ Start ROWID
col 2; len 6; (6): 01 00 01 84 00 2f <------------------- END ROWID
col 3; len 858; (858): <-------------- Beginning of BITMAP SEGMENT
cd 42 08 21 84 10 02 fd 0b 42 08 21 84 10 02 fd 0b 42 08 21 84 10 02 fd 0b
42 08 21 84 10 02 fd 0b 42 08 21 84 10 02 fd 0b 42 08 21 84 10 02 fd 0b 42
08 21 84 10 02 fd 0b 42 08 21 84 10 02 fd 0b 42 08 21 84 10 02 fd 0b 42 08
21 84 10 02 fd 0b 42 08 21 84 10 02 fd 0b 21 84 10 42 08 21 fd 0b 10 42 08
21 84 10 fd 0b 08 21 84 10 42 08 fd 0b 08 21 84 10 42 08 fd 0b 08 21 84 10
42 08 fd 0b 08 21 84 10 42 08 fd 0b 08 21 84 10 42 08 fd 0b 08 21 84 10 42
08 fd 0b 08 21 84 10 42 08 fd 93 0a 10 42 08 21 84 10 fd 0b 42 08 21 84 10
42 fd 0b 08 21 84 10 42 08 fd 0b 08 21 84 10 42 08 fd e4 02 08 21 84 10 42
08 fd 0b 08 21 84 10 42 08 fd 0b 08 21 84 10 42 08 fd 0b 08 21 84 10 42 08
fd 0b 08 21 84 10 42 08 fd 0b 08 21 84 10 42 08 fd 0b 08 21 84 10 42 08 fd
Version 1.2
Bitmap Querying
A number of Bitmap operations exist.
These enable bitwise operations between
bitmaps and reduce table access:
~Bitmap AND
~Bitmap OR
~Bitmap MINUS
~Bitmap INDEX
~Bitmap MERGE
~Bitmap CONVERSION
The Bitmap operations that exist are
AND - performs a bitwise and operation between two bitmaps.
OR - performs a bitwise or operation between two bitmaps.
MINUS - performs a bitwise subtraction from one bitmap to another. This
function is also used to eliminate rowids which have a NULL in the indexed column.
INDEX - performs an access against the bitmap ONLY. A number of different
querying operations will not need to perform a table access as the information is
available within the bitmap. For example:
Equality checks such as - Select count(*) from Sales where
sales_rep_id = 1001;
will just count bits set in the index.
Distinct queries such as - Select Distinct STATE from CUSTOMER will
return the key value for each bitmap index entry.
MERGE - Performs a bitmap OR operation for a RANGE of values. A query
such as Select count(*) from sales where order_typ = 2 and
promo_typ > 3;
This will not need to perform a bitmap range scan and then merge this with a bitmap
AND operation. See the following explain plan:
Version 1.2
SQL> select count(*) from sales_desc where
2 ord_typ = 1 and sales_rep_id > 2;
COUNT(*)
----------
3570
Execution Plan
----------------------------------------------------------
2 1 BITMAP CONVERSION (COUNT)
3 2 BITMAP AND
4 3 BITMAP INDEX (SINGLE VALUE) OF 'BIT_SALES_DESC_ORD_TYP'
5 3 BITMAP MERGE
6 5 BITMAP INDEX (RANGE SCAN) OF 'BIT_SALES_DESC_REP_ID'
An explain plan showing a bitmap AND operation and a bitmap
CONVERSION. No table access required.
SQL> select count(*) from sales_desc where sales_rep_id = 1 and promo_typ = 0;
COUNT(*)
----------
315
Execution Plan
----------------------------------------------------------
2 1 BITMAP CONVERSION (COUNT)
3 2 BITMAP AND
4 3 BITMAP INDEX (SINGLE VALUE) OF 'BIT_SALES_DESC_REP_ID'
5 3 BITMAP INDEX (SINGLE VALUE) OF 'BIT_SALES_DESC_PROMO_TYP'
CONVERSION - This is the process of converting the bitmap segment
retrieved into valid rowids for table access.
Version 1.2
Advantages of Bitmaps
STAR Transformations
Index Maintenance
Creation time
Index size
Performance ?
Bitmap indexes have some other advantages over B*tree indexes, but the environments
in which a bitmap index is suitable (from performance reasons) is limited.
Bitmaps are limited to systems where high concurrency of modifications to the base data
is NOT required. They are generally not suitable for an Online Transaction Processsing
(OLTP) environment. This is due to the fact that as each bitmap segment represents a
range of rowid values the level of granularity of locking is at the block level (of the bitmap
index). This could mean that many thousands of table rows are locked for that update
period. Bitmap indexes are usually recommended for Data Warehouse environments
where DML activity is limited, or active only during certain periods.
Having said that, bitmap indexes do have some advantages over b*tree indexes in many
respects. For suitable queries bitmap indexes may cause performance improvements by
many orders of magnitude. Some internal tests (see paper by Cetin Ozbutun) show an
improvement by 1000 times over table scans. Suitable queries are, for example, those
that have many predicates (as the efficiency of joining bitmaps is often better than that of
joining b*trees) or where the queries are on low cardinality columns (as it is often not
efficient to create a b*tree index on these columns, hence a full table scan is the only
possible resolution path).
The size of bitmap indexes is much smaller than corresponding b*tree indexes. As the
cardinality increases the size of the bitmap grows but it will still be smaller than the
b*tree index. In a table of 100,000 rows a b*tree index on a column with a cardinality of 6
was approximately 7 times larger than a corresponding bitmap index (260 K versus 1.9
MB).
Version 1.2
Tests performed by Cetin Ozbutun show that on a table with one million rows and
varying cardinalities that bitmap indexes are smaller than B*tree indexes up to a
cardinality of 500,000 (ie 2 distinct values in the table). It may even be better to use a
bitmap index on tables with high cardinality of data if the degree of uniquness (number
of rows returned per value) is high - for example a cardinality of 1 million in a 1 billion
row table may be better than a b*tree index.
Creation time for a bitmap index can be up to twice as fast as that of a b*tree index (for
a cardinality of 2). As cardinality increases so does time to create the bitmap, there will
be a threshold after which creating bitmaps exceeds that of creating b*tree indexes.
The creation time for a b*tree index remains fairly constant for varying cardinalities. The
processes involved in the creation of a bitmap index will be discussed in the next few
slides.
Star transformations are also possible. We have been discussing the usage of star
schemas and star queries in DSS environments. If 3 or more bitmap indexes are
available on the FACT table we can use these to perform a star transformation,
attempting to join the columns via the bitmap indexes. Essentially, if the join columns
have bitmap indexes (at least 3 bitmaps must be present) the optimizer will attempt to
identify the rows of the fact table that are of interest based on the criteria supplied to
the dimension tables before doing the actual join. This method is supposedly suitable
for star and snowflake schemas, and is deemed more flexible as it doesnt require the
use of concatenated indexes, and not all dimension tables need to be present in the
join.
Once the bit AND or bit OR operations are performed some method of joining these
bitmap merged entries is chosen by the optimizer. Often this is a hash join. However,
on tests I performed I could not get this to work. On a table of 100,000 rows with a
bitmap index on all queried columns I obtained the following explain plan:
Execution Plan
----------------------------------------------------------
4 3 HASH JOIN (Cost=52 Card=1 Bytes=90)
6 5 TABLE ACCESS (FULL) OF 'ORDER_TYP_DESC' (Cost=1Card=1 Bytes=26)
7 5 TABLE ACCESS (BY INDEX ROWID) OF 'SALES_DESC2' (Cost=362 Card=324 Bytes=12312)
8 7 BITMAP CONVERSION (TO ROWIDS)
9 8 BITMAP AND
10 9 BITMAP INDEX (SINGLE VALUE) OF 'BIT_DESC2_REP_ID'
11 9 BITMAP INDEX (SINGLE VALUE) OF 'BOT_SALES_DESC2_ORD_TYP'
12 4 TABLE ACCESS (FULL) OF 'PROMO_TYP_DESC' (Cost=1 Card=1 Bytes=26)
13 3 TABLE ACCESS (FULL) OF 'SALES_REP_ID_DESC' (Cost=1 Card=1 Bytes=39)
Version 1.2
Forcing the same query to use a STAR join via a HINT caused the explain plan
on the next page to be produced. This was notably slower in execution, so the
optimizer could be making the correct choice, however, the optimizer is not
behaving as others have found in testing which may indicate one of two things:
1) My tests do not satisfy a STAR schema. The specifics of the test
environment may be more strict than many are lead to believe. Or
2) STAR queries are not functioning as expected.
Execution Plan
----------------------------------------------------------
6 5 TABLE ACCESS (FULL) OF 'ORDER_TYP_DESC' (Cost=1Card=1 Bytes=26)
8 7 TABLE ACCESS (FULL) OF 'PROMO_TYP_DESC' (Cost=1 Card=1 Bytes=26)
10 9 TABLE ACCESS (FULL) OF 'SALES_REP_ID_DESC' (Cost=1 Card=1 Bytes=39)
15 1 TABLE ACCESS (BY INDEX ROWID) OF 'SALES_DESC2' (Cost=362Card=324 Bytes=12312)
17 16 BITMAP AND
18 17 BITMAP INDEX (SINGLE VALUE) OF 'BIT_DESC2_REP_ID'
19 17 BITMAP INDEX (SINGLE VALUE) OF 'BIT_DESC2_STOR_ID'
Bitmap maintenance during DML operations on the base table is achieved as
optimally as possible. Changes to the bitmap index are maintained in a sort
buffer and are not applied to the bitmap until the completion of the DML
statement. The changes are sorted and applied in sort order, therefore
reducing the amount of redo information generated. An insert or delete
operation causes a bit to be enabled or disabled, whereas an update is
equivalent to a delete followed by an insert. DML may cause a new bitmap
segment to be created when the rowid inserted into the existing segment
causes it to exceed its maximum length, or the insertion of the new rowid is
outside the existing range for this value.
The locking granularity during any DML is one bitmap segment (up to half an
Oracle block in length).
Version 1.2
Query Resolution using Bitmaps
A bitmap resolves a query using
standard indexing methods to locate the
key value(s), then:
Generates a rowid range
Retrieves rowids from the row source
Uses bit mask to identify required
rows
When a bitmap is unable to resolve a query request using the bitmap segment alone
(for example, queries such as Select count(*) .. where key_value = xxx can be
resolved by only accessing the index itself), it is necessary to perform the following
functions:
Using standard index resolution methods identify the key value(s) and their
associated bitmap segments (remembering that there is a start rowid, end rowid
associated with each bitmap segment. This segment is up to 1/2 Oracle block in size
(an upper limit), with contiguous segments possibly being related to the same key
value).
From the index structure a beginning and ending rowid is generated. We then
have to read this rowid range from the row source (in most cases the table the index is
built on). The rowids are read as a continuous stream of values, byte aligned to match
the bitmap segment. The bitmap segment is then used as a mask to filter desired rows
from the row stream.
This is why the number of rows returned plays a part in determining the
suitability of a bitmap. With many rows being returned the cost of the I/O associated in
retrieving them is a lesser proportion of the overall query cost.
To illustrate this here is a simplified example. A table has 3002 rows, with a bitmap
index on one column (colour). Only two rows in the table have the colour BLUE. Here
is the bitmap segment generated:
Version 1.2
The bitmap segment for the value "BLUE"
row#1[1347] flag: ----, lock: 0
col 0; len 4; (4): 42 4c 55 45 <--- The value BLUE
col 1; len 6; (6): 01 c0 14 9f 00 00 <--- Start ROWID (DBA, slot 00 00)
col 2; len 6; (6): 01 c0 16 28 00 07 <--- End ROWID (DBA, slot 00 07)
col 3; len 4; (4): 06 c0 b6 46 <--- Bitmap segment represnts two values
Using events 10710, and 10715 we trace bitmap access and rowid conversion
Tue Oct 20 15:43:16 1998
*** SESSION ID:(8.3) 1998.10.20.15.43.16.000
kkrbtsta(107e670): started
kkrbxsta
(109d608)kkrbxgky(109d608): startkey count=1
kkrbxgky(109d608): startkey=(4): 42 4c 55 45 <--- Key value (BLUE)
kkrbxgky(109d608): stopkey count=1
kkrbxgky(109d608): stopkey=(4): 42 4c 55 45
kkrbtfch bitmap to rowids(107e670):
kkrbxfch(109d608): record: srid=01c0149f.0000, erid=01c01628.0007, data(4)=[06..
.] <--- Start rowid (srid), End rowid (erid), bit mask to use
kdibr1r2r(107e688): bml 4 srid=01c0149f.0000, erid=01c01628.0007
kdibci3init(107e69c): src_stream=efffd308
kdibc3sids(107e69c): want=57, got=2
01c0149f.0000
kkrbtfch(107e670): rowid=01c0149f.0006 tobj 3508 <-- Rowid satisfying search
kkrbtfch(107e670): rowid=01c01628.0000 tobj 3508 <-- Rowid satisfying search
kkrbtfch(107e670): total rowcount=2 <--- Number of rows returned
The functions kkrbtfch,kkrbxfch are to do with fetching and converting rowids from the source
(table). We need to perform approximately 25 I/Os to satisfy this selection, but it may still be
faster than performing a full table scan comparing the key value in every row.
The reason we need to do this is due to the fact that blocks may contain varying numbers of
rows, and the bitmap segment which is just a string of 1s and 0s cannot identify a change in
block (it can only identify a change to the next row, regardless of where that row physically
resides). DML therefore locks all the rows affected by a bitmap segment (may be 1000s of rows
hence 1000s of table blocks) as we have to ensure that we have a start/end rowid from which to
determine our bitmap segment.
Note: An interesting fact about the start/end rowids is that the Start rowid is the last block of
the last extent, with the row to start at being row 0. The row that actually contains the value is
row #6 (but I suppose we have to start at a block boundary). The End rowid is the last row
(reading from the head of the table) in the block that contains the requested value (the actual
value in that block is row #0).
Version 1.2
Bitmap querying
10.0 BLUE
10.1 VERMILLION
10.3 PUCE
11.0 ORANGE
11.1 ORANGE
11.2 YELLOW
11.3 RED
12.0 RED
12.1 RED
12.2 AZURE
12.3 BLUE
Row i ds Row i ds
1
0
0
0
0
0
0
0
0
0
1
Bi t map Bi t map 1. Generate Mask
2. Apply mask
to rowids
This slide graphically represents what is happening with the bitmap query. If this is the
bitmap segment for the colour BLUE built on a table containing a COLOUR column. The
values in the rowid diagram show block.row # and the colour value.
The bitmap segment is expanded (converted) to generate a mask with ones being a hole
in the template and zeroes being solid (for an equality search at least. Inequality searches
would reverse this). This is Step 1.
In Step 2 the mask is applied to the row stream read from the table (or row source), and
those rows that satisfy the ones in the bitmap segment are returned to the user.
Version 1.2
Creating a Bitmap Index
The creation process of a bitmap index is:
Slave Set 1 (if parallel creation)
~Perform table scan
~Peform inversion
Slave Set 2 (if parallel creation)
~Sort Bitmap segments
~Coalesce adjacent bitmaps
~Create index
A bitmap index is created in the following stages:
1. A table scan is performed to determine the number of distinct column
values.
2. Inversion then takes place, where the column values, and the rows in which
they reside are passed to a bitmap generator. The inversion process takes these
key/rowid pairs and generates key/bitmap segment values. The parameter
CREATE_BITMAP_AREA_SIZE (default 8 MB) can speed up index creation time by
allowing for a minimum number of index entries to be created. The amount of memory
should be at least as much memoryas the number of distinct values in the indexed
columns times half the Oracle block size if this is the case.
If the index were being created in parallel steps 1 and 2 would be performed by each
query slave based on the rowid ranges they receive. They would then pass the
bitmaps on to slave set 2 which would perform the next three steps.
3. The bitmaps are then sorted on key value, start rowid so that they can easily
be placed into a b*tree structure. The parameter SORT_AREA_SIZE influences the
sort performance.
4. Bitmap compaction may occur if the bitmaps generated are not at their
maximum length (1/2 Oracle block). Adjacent bitmaps for the same key value may be
coalesced.
Version 1.2
Bitmaps and other 08 features
Partitioning
Sql*loader
Database events ?
Bitmap indexes can be created on partitioned tables. Local bitmap indexes may be
created on these objects. This gives a one-to-one correspondance between bitmap
index and table partition.
Sql*loader now has an option to create a new bitmap index instead of attempting to
maintain an existing one during the load of a large number of rows. This is the default
behaviour of sql*loader, but can be turned off by providing the SINGLEROW option to
sql*loader (not recommended to use singlerow for large numbers of rows). Sql*loader
sorts the rows being loaded and when the load is complete the sorted rows are
merged with the existing index.
A number of database events are available to monitor some of the activities involved
with bitmap index use. The event range is 10710 - 10729 (only up to 10719 has been
used in v8.0.4) See next page.
Version 1.2
/ 10710 - 10729 are Reserved for BITMAP row sources
10710, 00000, "trace bitmap index access"
// *Comment: display the start-rowid, end-rowid of each bitmap segment - kkrbx row source
10711, 00000, "trace bitmap index merge"
// *Comment: enables analysis of bitmap index merge - kkrbu row source
10712, 00000, "trace bitmap index or"
// *Comment: enables analysis of bitmap index or - kkrbo row source
10713, 00000, "trace bitmap index and"
// *Comment: enables analysis of bitmap index and - kkrba row source
10714, 00000, "trace bitmap index minus"
// *Comment: enables analysis of bitmap index and - kkrba row source
// *Comment: enables analysis of bitmap index minus - kkrbm row source
10715, 00000, "trace bitmap index conversion to rowids"
// *Comment: enables analysis of bitmap index merge - kkrbt row source
10716, 00000, "trace bitmap index compress/decompress"
// *Comment: enables analysis of bitmap index compression/decompression - modules kdibc,
kdibci, kdibco, kdibc3
10717, 00000, "trace bitmap index compaction trace for index creation"
// *Comment: enables analysis of bitmap index creation - modules kkrbc
10718, 00000, "event to disable automatic compaction after index creation"
// *Comment: enables isolation of index creation problems. ie compaction
10719, 00000, "trace bitmap index dml"
The usage of these events may be limited. The information associated with the events is
deinitely limited.
Version 1.2
Bitmap Index Bugs
Common bugs associated with bitmap
indexes include:
548273 (v8.0.4)
662907 (v7.3.3) Not a bug.
565010 (v7.3.2)
452456 (v7.3.2)
Hdr: 548273 8.0.3 RDBMS 8.0.3 OPTIMIZER PRODID-5 PORTID-2
Abstract: INCORRECT RESULTS AFTER ADDING 3RD BITMAP INDEX
STAR_TRANSFORMATION_ENABLED=TRUE
Hdr: 662907 7.3.3.4.0 RDBMS 7.3.3.4.0 QRY OPTIMIZER PRODID-5
PORTID-610
Abstract: SIMPLE JOIN IS NOT USING BITMAP INDEX
Not a bug but some interesting discussion.
Hdr: 565010 7.3.2.3 RDBMS 7.3.2.3 QRY OPTIMIZER PRODID-5 PORTID-
87
Abstract: COUNT(*) RETURNS WRONG RESULT WHEN USING BOTH
BITMAPPED AND BTREE INDEXES
Hdr: 452456 7.3.2.3 RDBMS 7.3.2.3 ROW ACC MTHD PRODID-5 PORTID-
87 ORA-600
Abstract: PROBLEM INSERTING UNION SELECT INTO BITMAPPED
INDEX TABLE
Version 1.2
Lesson Summary
Manual Partitioning was introduced in
version 7 to provide some Oracle 8
functionality
Star queries are suitable in DSS
environments
Bitmap indexes have some advantages
over B*Tree indexes (particularly in DSS
environments).
Version 1.2
Key Terms
Manual Partitioning
UNION ALL views
Object Partitioning
Range partitioning
Star queries / Star schemas
Star, snowflake,cartesian product
Bitmap Indexes
cardinality, inversion, merge, AND/OR
References:
Bitmap Indexes (Oracle 7.3 and 8.0) - Cetin Ozbutun, Server Technology
Oracle Corporation, June 1997 (ACTA Journal)
Bitmapped Indexes in Oracle 7 - Andy Mendelsohn, Ray Roccaorte, et al,
October 1995, Oracle Warehouse, Part # C10405
Star Query processing in the Oracle7 Server - Linda Willis, May 1996, Note
33648.1.
Star Queries - Roselle Beard, February 1998, Note 35065.1
Optimizer Doesnt like partition view query with complex predicate -
Author unknown. PR Entry 1024056.6
Architecting a single large table as manually partitioned multiple tables -
Prabhaker Gongloor and Sameer Patkar, Oracle Corporation.
Module 3, Oracle Version 7 Internals - Roderic Manalac, 1995, Oracle
Corporation
Functional Specifications for Partition Views, Oracle 7,Release 7.3 - Gary
Hallmark, November 1995, Oracle Corporation.
Using union all views over partitioned tables in Oracle 8 - Steve Dixon,
January 1998, Note 50615.1
Partition Views in 7.3 - Steve Dixon, December 1997, Note 43194.1
Designing a data warehouse application: tips and techniques - Deepak
Gupta, Scalable systems, Advanced technologies, Oracle Corp.
DSI 306 - Unit 4, Partitioned Tables 4--1
Version 1.2
1 1
4
4
Partitioned Tables and
Indexes
Acknowledgements:
Design: Alok Satyawadi (asatyawa.us), WSSG, Advanced Analysis
Development: Prakash Penta (ppenta.uk), WWS, UK
Review:
120 minutes Lecture
60 minutes Lab
180 minutes Total
Version 1.2
Outline
Partitions Overview
Dictionary Objects
Locking
Maintenance
Optimizer
Recovery
Version 1.2
Partitions Overview
Reasons for using partitions
Availability, Performance
Objects for partitioning
Tables with large structured data
Large indexes
Partitioning Methods
Partition by range
Class notes : A quick review of Lessons 1 and 2 from the Oracle 8: New
Features for Adminstrators training course.
Version 1.2
Partitions Overview
Partitioned Indexes
Local and Global Indexes
Prefixed and Non-prefixed indexes
Local prefixed indexes
Local non-prefixed indexes
Global prefixed indexes
Index unusable attribute
Version 1.2
ts# segfile# segblock#
ts# segfile# segblock#
Dictionary Objects
Non Partitioned Tables
obj# dataobj# type# name subname obj$
obj# dataobj# ts# file# block# bobj# tab$
uet$ ts# segfile# segblock#
ts# file# block# seg$
dataobj# : Only objects with disk representation will have data object numbers.
It is set to obj# for non-partitioned tables and indexes. This is the object id
which appears in a block dump.
type# is 1 for non-partitioned indexes and 2 for non-partitioned tables.
subname is set to NULL for non-partitioned objects.
bobj# (base object id) is set to Null for non-partitioned tables and set to the
object id of the table for a non-partitioned index.
Version 1.2
Dictionary Objects
Non-Partitioned Indexes
obj# dataobj# type# name subname
obj$
obj# dataobj# ts# file# block# bobj#
ind$
ts# block# file#
seg$
uet$
ts# segblock# segfile#
Version 1.2
Dictionary Objects
Partitioned Tables
obj$
tab$
obj#
partobj$
obj#
partcol$
tabpart$
ts# block# file#
seg$
uet$
Partitioned Tables : Base Object
1. The base object will have a row in obj$. obj$.dataobj# will be set to null.
2. There will be one row in tab$ with ts#, file#, block# set to zero obj$.type is
set to 2 (table). bobj# is set to null for tables.
3. There will be one row in partobj$. Contains object wide partitioning info
such as number of partitions, default physical attributes for the partitions etc.
4. There will be one row each in partcol$ for each partitioned column.
Partitioned Tables : Partitions
1. There will be one row in obj$ for each partition. obj$.dataobj# is usually set
to obj$.obj#. obj$.name is set to base object name. obj$.subname is set to
partition name. obj$.type is set to 19 for table partitions.
2. There will be one row for each partition in tabpart$. These are linked to the
base object by bobj#. Contains information about the partition bound value
(long column) and the physical characterestics of the partition.
Instructor Note - If time permits create a partiotioned object and show the
relationships that exist between the corresponding dictionary objects.
Version 1.2
Dictionary Objects
Partitioned Indexes
obj$
obj#
partobj$
obj#
partcol$
indpart$
ts# block# file#
seg$
uet$
ind$
Partitioned Indexes : Base Object
1. The base object will have a row in obj$. obj$.dataobj# will be set to null.
2. There will be one row in ind$ with ts#, file#, block# set to zero obj$.type is
1 (index). bobj# is set to table object id.
3. There will be one row in partobj$. Contains object wide partitioning info
such as number of partitions, default physical attributes for the partitions etc.
4. There will be one row each in partcol$ for each partitioned column.
Partitioned Indexes: Partitions
1. There will be one row in obj$ for each partition. obj$.dataobj# is usually set
to obj$.obj#. obj$.name is set to base object name. obj$.subname is set to
partition name. obj$.type is set to 20 for index partitions.
2. There will be one row for each partition in indpart$. These are linked to the
base object by bobj#. Contains information about the partition bound value
(long column) and the physical characterestics of the partition.
Version 1.2
Dictionary Objects
Sequence partition_name$
Catalog views
{user|all|dba}_part_tables
{user|all|dba}_part_indexes
{user|all|dba}_part_key_columns
{user|all|dba}_tab_partitions
{user|all|dba}_ind_partitions
Sequence partition_name$ is used to generate partition names of the type
SYS_Pnnn when explicit partition names are not specified.
_part_tables is a view based on partobj$, tab$, obj$, ts$
Describes the default storage parameters, partition count, partition key
count for partitioned tables.
_part_indexes is a view based on partobj$, ind$, obj$, ts$
Describes the default storage parameters, partition count, partition key
for partitioned indexes
_part_key_columns is a view based on partcol$, tab$, ind$, col$, obj$.
Describes partitioning key columns for all partitioned objects.
_tab_partitions is a view based on tabpart$, obj$, ts$, seg$
Describes the partition level information, storage parameters, statistics,
partition ranges for each table partition.
_ind_partitions is a view based on indpart$, obj$, ts$, seg$
Describes the partition level information, storage parameters, statistics,
partition ranges for each index partition.
Version 1.2
Dictionary Objects
Catalog Views contd .
{user|all|dba}_part_col_statistics
{user|all|dba}_part_histograms
_part_col_statistics is similar to _tab_col_statistics except it stores
statistics at table partition level. Its a view based on tabpart$, hist_head$
obj$, col$.
_part_histograms is similar to _tab_histograms except it stores
statistics at table partition level. Its a view based on tabpart$, histgrm$,
obj$, col$
Version 1.2
Dictionary Objects
Library Cache
Partition heap loaded into data block 6
of the library cache object of the base
object.
Two structures :
kkpacocd - Object wide cache
kkpacpcd - Partition specific cache
kkpacocd - Holds the information from partobj$ and partcol$.
kkpacpcd - Holds the information from tabpart$/indpart$
Partitioning related dictionary manipulation is done by the following
routines in kkpod.c.
kkpolpd_load_part_descr : load the partition info from disk into heap 6 of the
base table/index by populating structures kkpacocd and kkpacpcd.
kkpopocd_populate_kkpacocd : load the object wide info supplied by the user
during create table/index into kkpacocd structure.
kkpoppcd_populate_kkpacpcd : load the partition specific info supplied by the
user during create table/index into kkpacpcd structure.
kkpomocd_modify_kkpacocd : modify a kkpacocd structure during partition
maintenance opearation.
kkpompcd_modify_kkpacpcd : modify kkpacpcd structure during partition
maintenance operation.
kkpofpd_flush_part_descr : flush changes to kkpacocd and kkpacpcd
structures to the disk.
Note - The partition heap size is approximately 150 KB for 1000 partitions of
the example table.
Version 1.2
Dictionary Objects
Row Cache
Nothing new for partitions
Two main caches
Dictionary object cache (KQROBC)
Object id cache (KQROIC)
KQROBC : data from obj$
KQROIC : data from oid$ (object ids)
Although the code refers to the KQROIC cache, currently object tables cannot
be partitioned. This functionality may be available in later releases of Oracle 8
(Oracle 8.1 will support partitioned tables containing LOBs, as well as
partitioned index organised tables (IOTs).
Instructor Note - Operations within the SGA may need to be reviewed
here. These operations are discussed in more detail in DSI301, DSI302
and DSI304 (relevenat components are discussed in each module).
Version 1.2
Locking
DML Locks
Additional level of locks on Partitions
Lock mode same as table DML locks
KGL Locks and Pins
No additional locks
Locks on base table
Partition heaps loaded and pinned when
required
DML Locks (All are TM locks on either the table or the partition) :
If T1 is a transaction which acquired the lock and T2 is requesting for one :
Row Shared (SS) - T1 may do a read and will allow concurrent read and
write operations. T1 may do a write if no other transaction has aquired S or
SSX locks. Will always disallow X locks by T2. Will disallow S or SSX
locks by T2 if there are uncommitted changes of T1.
Row Exclusive (SX) - T1 may do a read/write and will allow concurrent read
and write operations. Will always disallow S, SSX and X locks by T2.
Share(S) - T1 may do a read and will allow concurrent reads only. Select for
update is allowed by other transactions. T1 may do a write if there are no other
transactions holding S locks. Will always disallow write locks - SX, SSX, X
by T2. Will disallow S locks by T2 if there are pending updates for T1.
Shared Row Exclusive (SSX) - T1 may do a read/write and will allow
concurrent reads only. Select for update is allowed by T2. Will always
disallow SX or S or SSX or X locks by T2.
Exclusive (X) - T1 may do a read/write and will allow conncurrent reads
only. Select for update is not allowed by T2. Will disallow all locks by T2.
Instructor Note - A description of the TM locks is provided in Chapter 5.
Version 1.2
Locking
One Step Operations
Three Step Operations
KGL Locks : Shared mode for parsing and Exclusive mode for actual updates
to the dictionary.
One Step Operations :
(DDL operations. Eg. Alter table add partition, Alter table add column)
Acquire the KGL locks in X mode and DML locks.
Do the partition maintenance operation.
Release all the locks.
Three Step Operations :
(Alter table DROP partition, Alter table modify partition, Alter table move
partition, Alter table split partition, Alter table exchange partition, Rebuild
unusable local indexes, Create index, Alter index rebuild partition (for a global
index), Alter index rebuild partition (for a local index) ).
Aquire the KGL locks in S mode. Do the dictionary lookup.
Acquire the DML locks. Release the DDL locks.
Do the partition maintenance operation.
Acquire the KGL locks in X mode. Do the dictionary maintenance.
Release all locks.
Version 1.2
DML on Partitions
Conventional Insert (pseudo code)
Get the partition number from the row
If different from previous row
If table insert
flush buffered rows
else ( a partition insert )
report ora-14401 and exit.
Take a TM SX lock on the partition
Table insert : insert into sales values
insert into sales select
Partition insert : insert into sales partition(sales_m1) values
insert into sales partition(sales_m1) select
Flush Buffered Rows : Flush the rows from the pga buffer into buffer cache
Generate undo, redo for the data blocks
When doing table inserts, its efficient to sort the input rows on the
partition key.
ora-14401 : Inserted partition key is outside specified partition.
Instructor Note: The first time in to this section of code there is no previous
row, so a lock would be acquired.
Version 1.2
DML on Partitions
Coventional Delete/Update (pseudo code)
Get the partition number
If different from previous row :
Take TM SS or SX lock on the
partition
Partition number is determined either at parse time or execute time or from
the supplied rowid. If partition number is to be determined from rowids, an
array of structure holding the dataobject number, partition number is
generated at the end of parse step. This array is then binary searched if the
partition count is greater than 16 or else it is sequentially searched.
SX locks are taken out when doing delete or update
SS locks are taken out when doing select for update.
When partition locks are taken at execute time, there is a possibility that after
running the statement for a while, we may be unable to acquire the requested
lock. In this case, the statement is automatically rolled back and reexecuted.
Cannot update a partition key which results in movement of the row to
another partition. Error : ora-14402.
source : kdugetpart (kdu.c)
Version 1.2
Partition Maintenance
Three major issues
Speed
Cocurrency
Availability
Speed : Maintenance operations are considered fast if their expected duration
is not dependent on the size of the objects they operate on. Fast maintenance
operations result only in dictionary and segment header changes and do not
cause data scans or updates. They are expected to complete in a short time
(order of seconds). Slow operations usually involve data scans.
Concurrency : Will determine if other maintenance operations can be done in
parallel on the same table. This is usually determined whether we take a DML
X lock on the base table while doing a maintenance operation.
Availability : Will determine if the application will be partially or totally
unavailable. This is usually determined whether we take TM X locks on the
base table (only queries will be allowed in this case) or take TM SX locks on
the base table and TM X lock on the partition (only the partition will be
unavailable for dml). Also determined by the effect of the maintenance
operation on the status of the indexes.
Version 1.2
Single Step Operations include -
Add Table Partition
Rename Table Partition
All Single Step Operations follow a common locking protocol, in that they
acquire a DML Exclusive (TM X) lock on the table.
The actions undertaken by each operation are as follows:
Add Table Partition -
Internally executed as a special case of create table.
Obtain TM X on the table.
Invalidate all dependant objects.
Increment the partition count.
Delete the aggregate statistics and set the partition level statistics to NULL.
Add local index partition if required.
The status of existing indexes (local,global, and non-partitioned) will be
unaffected by this operation. The new local index partitions will have a status
of usable.
The aggregate statistics on the table are deleted. The aggregate statistics on the
local index are not affected. This inconsistency is reported as a P3 bug, number
619164, which was not fixed at time of writing. The new partitions have
NULL statistics.
Rename Table Partition -
Acquire TM X on table.
Creates a new entry in obj$ row cache.
Delete the obj$ row cache for the old name and invalidate the object id row
cache. Invalidate all dependent objects.
Version 1.2
Three Step Operations include -
Drop Table Partition
Split Table Partition
Move Table Partition
Truncate Table Partition
All three step operations acquire a TM SX lock on the base table and a TM X
lock on the partition(s) involved.
The actions undertaken by each operation are as follows:
Drop Table Partition -
Acquire TM SX lock on table
Acquire TM X lock on partition
A check is made for constraints referencing the table. A subsequent check
for indexes will be made if no constraints exist.
Oracle checks if the table is empty.
The partition segment is then dropped.
Partition count is decremented. The partition number of all affected
partitions is updated.
Statistics are re-aggregated.
If any indexes exist:
Pin heap 0 and the partition heaps.
KGL lock each index in X mode.
If the partition is not empty mark global indexes unusable.
Drop the lcoal index partition.
Any associated snapshot logs are purged.
Version 1.2
Note: The status of the existing indexes (local, global and non-partitioned) will
be uneffected by this operation. The new local index partitions will have a
status of usable.
The aggregate statistics on the table are deleted. The aggregate statistics on the
local index are not effected. This inconsistency is reported as a P3 bug
(619164) which is not fixed at time of writing. The new partitions will have
NULL statistics.
Split Table Partition -
Executed internally as a special case of Create Table As Select (CTAS).
Acquire TM SX lock on the table.
Acquire TM X lock on the partition.
Increment the partition count. Modify the partition numbers of all affected
partitions.
Delete aggregate statistics on the table, and set partition level statistics to
NULL.
If any indexes exist:
KGL Lock each index in X mode.
Pin heap 0 and the partition heaps.
If the partition is not empty mark global indexes and
non-partition indexes as unusable.
Split any local indexes.
Note: The aggregate table statistics are deleted, but the aggregate index
statistics are not affected when a table partition is split.
Move Table Partition -
Internally executed as a special case of CTAS.
Follow three step locking protocol, and perform the index maintenance as
specified in the Split Partition option described above.
Note: Moving a partition has no effect on the aggregate statistics of table and
indexes.
Truncate Table Partition -
Acquire TM SX lock on the table.
Acquire TM X lock on the partition.
If a truncate table (as opposed to a truncate partition) obtain an X lock on the
table.
Check if partition is empty.
Check if any constraints refer to this table.
If snapshot log is defined, purge it or invalidate the rowids.
Invalidate all dependant objects. cont .....
Version 1.2
Perform the index maintenance (pseudo code) -
If truncating an empty partition and index is non-partition, ignore.
X lock the obj$ cache entry (in row cache) for the index, and
retrieve the name and owner of the index.
X lock and X pin the KGL object for the index
If partitioned index, pin partition heap (as well)
If partition truncate
If local index
X lock obj$ entry and update modification timestamp.
Truncate the segment.
else /* Global Index */
If partition not empty mark unusable.
else /* table truncate */
X lock obj$ entry and update modification timestamp.
Truncate all segments of all partitions.
else /* Non-Partitioned index */
If truncate partition mark the index as unusable
else truncate index segments.
Truncate partition(s) segment(s)
Obtain the new data object number.
Reset the High Water Mark (HWM).
Update the data object number in obj$ row cache and partition heap.
Drop any storage (if required).
Note: During a truncate, if any constraints exist return ORA-2266. If the
partition to be truncated is empty the constraint check is skipped.
Statistics on the truncated partitions are not effected, so aggregate statistics on
both the table and indexes are unaffected.
Version 1.2
An interesting operation that executes
under a 3 step locking protocol is -
Exchange Partition
as it involves a partitioned object and a
non-partitioned table.
As with all 3 step operations acquire a TM SX on the table, and a TM X on the
partition. In addition acquire a TM X lock on the non-partitioned table.
Acquire KGL Lock and pin in Shared (S) mode on the partition table.
On the non-partitioned table -
If validation is required
acquire KGL S lock and pin
else
acquire KGL X lock and pin.
Pin the heaps containing constraint and index definitions of the table.
Perform various checks:
If constraints are enabled on either the partitioned table or the
non-partitioned table signal ORA-2266.
Check that column types, sizes,order are identical for both segments.
If local indexes exist ensure that a corresponding index exists on the
non-partitioned table. Acquire a KGL S lock on the indexes during
this check.
Peform validation if necessary
Execute the query - Select 1 from non-partitioned table
where tbl$or$idx$part$num () != 0
Version 1.2
tbl$or$idx$part$num() is a function which accepts the partitioned
columns and partition number as arguments and checks whether a
row maps to the partition.
Release any KGL S locks and re-qcquire them in X mode.
Perform the table and partition exchange:
Lock the partition entry in obj$ row cache in X mode.
Update the partition heap for the partitioned table and the object
heap for the non-partitioned table.
Swap the physical attributes and segment information by exchanging
tab$ (non-partitioned table) with tabpart$. Also swaps statistics and
logging information.
Swap the data object ids in the obj$ row cache and invalidate the
cache.
Swap the column statistics by updating obj# in hist_head$ and
histgrms$ .
Recalculate the partition table aggregate statistics.
If necessary we now exchange indexes:
For each index of the non-partitioned table -
KGL lock and pin the index in X mode.
If index is partitioned (has to be global) mark as unusable.
Lock the corresponding index on the partition table and swap it.
The algorithm for swapping indexes is similar to that for
swapping tables described in the paragraph above.
For each index on the partitioned table -
If no matching index on the non-partitioned table, KGL X lock
the index and mark as unusable.
Note: Even though this operation uses a three step locking protocol, the KGL
locks are not released after acquiring the DML locks. If validation is required,
these locks may be held for some time, preventing other maintenance
operations.
Instructor Note: If time permits it is recommended that one (or more)
operation from each category (single step and three step) is discussed in
detail, bearing in mind that the operations are similar.
Version 1.2
Index Partition Maintenance
Index operations include:
Single Step operations
~Create index
~Drop Index Partition
Three Step operations
~Rebuild Index Partitions
~Mark Local Indexes Unusable
~Rebuild Unusable Local Indexes
The operations mentioned above are performed in the following manner:
Single Step Locking Protocol Operations
Create Index
The single step lock protocol involves taking a TM lock in Shared mode
(S) on the table. A DL lock in Shared Exclusive (SX) mode is taken on
each partition.
Local Indexes : The optimizer may choose to scan another local index to build
the local index or scan each of the table partitions. It might choose another
local index provided it is a superset of the new index and all the partitions are
usable. Only the restricted rowids are stored in a local index.
Global indexes: The optimizer can use another global index provided its a
superset, contains the partitioning column as well and usable. The extended
rowid is stored in a global index.
DL lock is to block the user doing a simultaneous parallel direct load. When
doing serial direct load we take out TM locks in X lock on the table. This is
incompatible with a create index and hence not an issue. When doing parallel
direct load, we take out TM locks in S mode on the base table which is
compatible with a create index and DL locks in X mode. Hence we need the
DL lock in SX mode while creating index to prevent a direct parallel load
happening at the same time.
Version 1.2
Drop Index Partition
All cursors and views based on the table are invalidated.
The partition segment is dropped by -
Updating the index partition heap by decrementing the partition
count, and changing the partition number of the affected
partitions.
The obj$ entry for the partition is dropped.
If the partition was not empty or was unusable then we mark the
next highest partition as unusable.
Note: A global index partition may be dropped explicitly through
issuing an alter index drop partition statement. Local index
partitions are dropped when the corresponding table partition is
dropped, but the same code path is executed.
Version 1.2
Three Step Locking Protocol
Rebuild Index Partitions
Local Index
~ Acquires TM SS lock on table
~ Acquires TM S lock on partition
~ Acquires DL SX lock on partition
Global Index
~ Acquires S lock on table
~ Acquires DL SX lock on all partition
Rebuild Index Partition
Apart from the above locks the following locks are also acquired.
Acquires KGL S lock on the table
Acquires KGL S lock on the index
Pins the partition heaps of the index and heaps of table holding index data in share
mode
Internally executes it as a special case of create non-partitioned index
Note: If the alter table command Rebuild unusable local indexes is issued then we
acquire TM SS lock on the table, TM S lock on the partitions and then build a list of
local indexes whose partitions are marked unusable. The RDBMS then executes an
alter index rebuild partition command.
Mark Local Indexes Unusable
Acquire TM SX lock on the table, and a TM X lock on the partition.
Pin the heaps containing the index definitions in X mode.
Acquire KGL X lock on the indexes and pin the partition heaps. The local partition is
then marked unusable.
Version 1.2
The index operation Split Global Index
Partition may execute under a one step
or a three step locking protocol.
It depends if the partition is marked
UNUSABLE.
The execution of the operation Split Global Index Partition are as follows-
Acquire KGL S lock on the table and pin the heaps containing the index
definitions.
Acquire KGL X lock on the index and pin the partition heaps.
Note: If the partition was marked UNUSABLE then execute as for an
Alter index codepath, otherwise execute as for a Create index
codepath.
If partition is UNUSABLE then lock table in X mode (single step
locking)
otherwise lock table in S mode (3 step locking protocol).
The partition is then split which involves:
Executing drop partition code to drop the partition.
Increments the partition count by 2 (+1 effectively as we drop a
part.)
If the partition name is being reused by one of the new partitions
then flush the obj$ cache for the new partition.
If not splitting an empty partition mark the index unusable.
Create new index partitions.
Update the partition number of all affected partitions by 2
Delete aggregate statistics.
Version 1.2
Miscellaneous
Modify Partition
Cursor Invalidation
Queries and Partition Maintenance
SKIP_UNUSABLE_INDEXES option
Modify Partition : To modify the physical attributes of a table partition or local
index or global index partitions. Also the Logging/Nologging flags can be set
on the partitions.
Cursor Invalidation : It is still table based I.e. any DDL statement that modfies
a partition of a tablewill invalidate all cursors dependent on that table even if
the cursor does not access the modified partition.
Queries whose execution starts before invocation of a partition maintenance
operation or before dictionary updates are done, correctly access the effected
partitions using consistent read mechanism. The behaviour of such queries is
unpredictable after the dictionary updates are done especially if the segments
are reused by other objects. Simarly indexes partitions whose state change
from usable to unusable or vice versa during the execution of a query will
cause the query to fail when the corresponding index partition is accessed.
SKIP_UNUSABLE_INDEXES : can be specifed by alter session or as an
option to sqlldr or import. If set to TRUE, will ignore unusable indexes and
does not report errors during DML. But queries will still fail if they try to use
the index partition marked unusable. However, indexes that are unique and
unusable are not skipped and will still report errors.
Version 1.2
Analyze
Locking
Acquires KGL lock and Pin in S mode
on the tables and indexes
Acquires TM S lock on the table for
validate option
Escalates KGL lock to X mode for
statistics option after data gathering
Analyze does not take partition level locks. Even a partition level analyze
command such as analyze table sales partition(sales_m1) validate structure
will lock the whole table in S mode.
Analyzing a partitioned table with validate option will also check each row
for correct partition mapping. Rowids of invalid rows will be listed into
INVALID_ROWS table by default.
Version 1.2
Parallel DDL
Parallel DDL Operations
Create table as select (CTAS)
Parallelism on create (if specified) will
override parallelism on the select
Parallelism on create is limited by
number of partitions being created
Parallelism on select is not limited by
number of partitions of scan table
Create Table As Select
e.g. npt1, npt2 are non partitioned tables
pt1, pt2 are partitioned tables (4 partitions each)
1) create table npt1 parallel(degree 16) as
select /*+ parallel(npt2, 8, 1) */ from npt2;
Degree of parallelism used is 16 for both the scan and create.
2) create table npt1 parallel(degree 8) as
select * from pt1;
Degree of parallelism used is 8 for the scan (even though the number
of partitions are 4) and create operations.
3) create table pt1 parallel(degree 8 ) partition by as
select * from npt1;
Degree of parallelism is 4 (no of partitions) for both scan and create.
4) create table pt1 parallel(degree 8) partition by as
select * from pt2;
Degree of parallelism is 4 (no of partitions) for both scan and create.
Version 1.2
Parallel DDL
Create Index
Local Indexes
Parallelism limited by number of
partitions
Uses only 1 slave set
Global Indexes
Parallelism limited by number of
partitions on the index
Uses 2 slave sets
If the data is skewed or the system can sustain much more parallelism than the
number of partitions, use the following method to increase parallelism.
1. Create the partitioned table
2. Create non-unique index.
3. If uniqueness is required, add a disabled table constraint
4. Mark all partitions of the index unusable.
5. Load the data with skip_unusable_indexes=true
(loader, imp or alter session). This will not skip unique indexes, hence the
requirement to create non-unique indexes.
6. Rebuild each index partition with whatever parallelism that is desired
Both inter and intra-partition parallesim can be used.
7. Enabled the constraint if necessary.
The time taken in step 7 can be significant as the index needs to be
scanned to check for uniqueness.
Version 1.2
Parallel DDL
Alter table move partition
Alter table split partition
Alter index rebuild partition
Analyze
No intra partition parallelism
Inter partition parallelism by
analyzing partitions in multiple
sessions
For alter table move partition and alter table split partition, the semantics
of the parallel clause for reading and writing the data at partition level
are same as create table as select of Oracle 7.3
Similarly, alter index rebuild partition follows the same semantics of
create index parallel of Oracle 7.3.
If parallel is not specfied, default parallelism is computed from the object.
New V8 command to specify parallelism on a partitioned index
alter index parallel(degree m instances n)
Parallel clause ignored for non-partitioned indexes.
Version 1.2
Optimizer
Partition Elimination
Parse time
Execute time
Parallel Scans
rowid parallel table scans
partition parallel Index scans
Partition Elimination
If the query contains a predicate involving the partition columns,
partition elimination will occur at parse time for constant predicates
and at execute time for bind values or join from other row sources.
The type of elimination is indicated by values in the three new columns
of the plan table partition_start, partition_end, partition_id.
The partition boundaries are described by a new step PARTITION
in the plan table.
Parallel Scans
Rowid parallel table scans : similar to V7. No rowid range will span a
partition. Partition numbers will be sent with rowid ranges to the slaves.
Partition parallel index scans: new in V8. Index scans are done in
parallel and the parallelism is limited by number of index partitions.
The lab exercises demonstrate the various concepts.
Version 1.2
Optimizer
Hints
parallel_index
Use partition parallel index scans
Ignored for non-partitioned indexes
Ignored if index is not used
Total number of slaves limited by the
number of partitions accessed
parallel : No change from V7
Example
parallel_index(sales sales_idx2 13 1)
If P is the total number of partitions to be accessed after elimination and
I is thetotal number of instances, the number of instances used is
min(I,P). The total number of slaves used is min(P, degree * I).
Version 1.2
Backup/Recovery
Transparent to partitions
Tablespace/file level
Effect of Logging/Nologging
Case study for PIT recovery of partitions
Partition maintenance opearations which can make use of nologging :
alter table .. move partition
alter table .. split partition
alter index split partition
alter index rebuild
alter index rebuild partition
The redo will have a record (opcode 19.2) which will invalidate the range of
blocks on media recovery. Any access to these blocks after a media recovery
will result in ora-1578 error.
Version 1.2
8.1 Features
The following features may be available in
the Oracle 8.1 timeframe -
Hash partitioning
Composite partitioning (sub-partition
concept introduced)
Partitioning of tables with LOB col.
Partitioned Index-Organised Tables
Oracle 8.1 is currently undergoing beta testing. The listed features are being
evaluated for their inclusion in the Oracle 8.1 production release.
Hash partitioning - Many existing data sets are not readily partitioned by
range. A new method of partitioning on offer is hash function based. The
partitioning key is passed through a hash generator which determines the
partition in which the data will reside.The hash function controls the placement
of data across a fixed number of partitions (data is striped across available
partitions).
Composite partitioning - Data is first partitioned by range and then hash
partitioned within that range. Consider a table of sales data range partitioned
by sales_date and then has partitioned on product_id. This has the effect of
creating sub-partitions grouped by product_id within a given date range.
Diagrammatically, it can be represented as:
Version 1.2
What this diagram represents (in this example) is where each horizontal range
partition contains all orders for a relevant time period (quarter for example),
while each vertical subpartition contains all products with the same hash value
(a product group). Each sub-partition is therefore a grouping of products for a
particular quarter of orders, and can be managed individually, participate in
parallel DML or be a candidate for partition elimination.
SQL has been extended to allow for subpartitions to be specified in certain
statements (Select,delete, update for example).
New operations have been added - merging of partitions, dissolving of
subpartitions for example.
We can now partition tables that contain Large Objects (LOBs) as well as
partition Index Organised Tables (IOTs)
Instructor Note - Oracle 8.1 is not a production release at time of writing.
Any relevant 8.1 information will be incorporated into the DSI documents at a
later date.
Version 1.2
Summary
Global indexes increase downtime
during maintenance but minimise index
probes. Good for OLTP.
Local indexes reduce downtime during
maintenance
Non prefixed indexes are expensive to
scan but can be done in parallel. Good
for DSS.
Local prefixed indexes are ideal
Version 1.2
Summary
Parititon level locks to provide partition
independence
Application availability or maintenance
operation concurrency determined by
single step or three step locking
protocol
Enhancements to optimizer for partition
elimination and parallel index probes
This page left intentionally Blank
DSI 306 - Unit 5, Parallel DML 5--1
Version 1.2
11 1
5
Parallel DML
Development: Daniel Semler (dsemler.uk Oracle UK WWS)
60 minutes Lecture
15 minutes Examples
145 minutes Total
Version 1.2
22 2
Outline
Introduction
General Features
Transaction model
Recovery
Locking model
Space Usage
Monitoring - dictionary views
Explain plan
Version 1.2
3
Introduction
Permits INSERT, UPDATE and DELETE
operations to be performed using parallel
query slaves (QS)
4 basic types of PDML possible
Partitioned Tables
INSERT SELECT, DELETE, UPDATE
Non-partitioned Tables
INSERT SELECT
Version 1.2
4
Using PDML - Configuration
Link in the PQO option
Init.ora parameters
parallel_min_servers
parallel_max_servers
Set the parallelism on the table
ALTER SESSION ENABLE PARALLEL
DML;
Init.ora parameters
parallel_min_servers should be set to some nominal value
which will permit a number of queries to run without incurring the overhead of
having to spawn new slaves.
parallel_max_servers should be set to accommodate the
maximum number of slaves that could be required. Bear in mind that setting it
too high will be a drain on system resources, so setting it below what you may
want to use will prevent degradation of performance on a box which really
cannot handle the load.
When setting parallelism on the table itself you must bear in
mind that there will only be one slave used per partition in PDML on a
partitioned object. In the case of PDML INSERT into a non-partitioned table
however, multiple slaves may be used.
Before executing PDML you must issue the ALTER SESSION
command to enable parallel DML in the session.
Version 1.2
5
Using PDML - Restrictions
PDML must be the first and only
statement in a transaction
If a second statement is attempted, ORA-
12830 is returned
If a PDML statement is executed in a
transaction after another serial statement,
the PDML statement will revert to serial
execution
ORA-12830 is Must COMMIT or ROLLBACK after executing parallel
INSERT/UPDATE/DELETE
Version 1.2
6
PDML Restrictions
No trigger support for affected tables
Integrity constraints
No delete cascade
No deferred integrity
No self-referential integrity
Global unique indexes are not supported
for parallel update
Triggers may be disabled on a table and then PDML can be executed but there
are of course obvious side-effects - the triggers actions will not be taken.
Version 1.2
7
PDML Restrictions ...
Global indexes are not supported for
PDML INSERT or serial INSERT with the
APPEND hint
DML is not parallelised on tables with
bitmap indexes
DML should not have embedded functions
that either read or write database state or
package state
The PDML statement cannot reference
tables in remote databases
Note that like PDML statements a serial INSERT using the APPEND hint
must be followed directly by a COMMIT. This is due to the update of the
HWM done at commit time.
Version 1.2
8
PDML Restrictions ...
A Tx which explicitly acquires table or
partition locks cannot execute PDML
Violations will result in statements being
executed serially with no error being
reported
If a TX gets explicit locks on a table/partition and then uses PDML the locks
are not inherited by the slaves which can lead to deadlocks in some cases.
Following from this is the fact that setting SERIALIZABLE=TRUE or
ROW_LOCKING=intent will prohibit the use of PDML.
Version 1.2
93 3
Deadlock Scenarios
ITL Self-Deadlock
Deadlock in datablock
Two slaves may change the same block due to row piece
chaining
Two slaves may try to insert into the same block if they
hash into the same process free list
Two slaves may change same block in different table due
to trigger on base table.
Deadlock in index block
Global indexes can cause deadlock , since two slaves
may change the same index block for different data block
in base table.
All the above scenrios are the possible ways in which we could get into
deadlock if we had multiple slaves working on one segment.
These following operations can lead to above possible datablock deadlocks
- Update and Delete due to row piece chaining
- Insert conventional path and update due to hashing into the same process
free list.
Triggers can also cause datablock deadlock because they could change the
same row of another object.
Global indexes can cause an index block ITL deadlock since two different
datablocks in the base table can map to the same index leaf block. Actually it
is more a case of having both slaves requiring ITLs in a pair of index blocks,
and requesting the ITLs for the first block in one order and for the second
block in the opposite order. If MAXTRANS is hit in each case then you have
an ITL slot deadlock. As the number of slaves grows the complexity of such
deadlock possibilities increases.
Version 1.2
10
Deadlock Solution
Parallelism within partition is not
implemented
Session based locking mechanism is
used
parent xid and deadlock id is passed to
the slaves.
All slaves use the same coordinator id
to aquire the lock.
These are the few of the reasons for limitation of PDML as discussed in the
Restrictions section.
Parellelism within partition may be implemented in 8.1.
All the possible deadlock scenerios have to be taken into consideration.
The features which can cause deadlock are not implemented in rel 8.0
Version 1.2
11
Slave Allocation
INSERT
2 slave sets
UPDATE and DELETE
1 slave set if no join is involved
2 slave sets if there is a join for which
a parallel plan can be found
Slaves (QS) for all parallel operations are allocated in slave sets. Where one
suffices it is the only one which is allocated. For more complex queries and
PDML operations two sets will be allocated - each slave set will contain the
same number of QS.
Version 1.2
12
Transaction Model
QC has a parent tx (QC) and each DML
slave has a child tx (QS)
Each DML QS will use its own RBS
In the event of rollback each slave will do
its own rollback
The TX is completed with a 2PC
Each DML QS will attempt to use a different RBS where possible but where
this is not possible there may arise RBS header contention. RBS selection is
based on the RBS with the least number of active transactions. Where more
than 1 have the same number, the one after the last one used will be selected.
For example if two rollback segments R01 and R06 have 5 active transactions
each, PDML chooses R01 since the last transaction used R06.
SET TRANSACTION USE ROLLBACK SEGMENT is ignored if used.
Version 1.2
13
Transaction Control Messaging
PDML control messaging is an extension
of the O7 PQO messaging
New message types
KXFXObegintrans
KXFXORtrans
Familiarity with the basic PQO messaging is assumed.
The example which follows therefore only outlines the protocol for a PDML
transaction.
This is defined in kxfx.h - the kxfx module defines Fast (Parallel) execution.
Version 1.2
14
PDML DML QS messaging ...
QC QS
KXFXObegintrans
KXFXOparse
KXFXORtrans
KXFXORokcurs
This is the opening messaging.
The DML QS are told to begin a transaction with a KXFXObegintrans
message. This message includes the parent TxID. This TxID will later be used
in the 2PC protocol.
Each DML QS responds with a KXFXORtrans message. This simply tells the
QC that the KXFXObegintrans message has been received and processed.
The QC then sends the KXFXOparse messages to the DML QS which contains
first any trace events set in the QC session and then the SQL that the DML QS
is to parse and later execute.
Each DML QS responds with KXFXORokcurs when they have parsed the
SQL successfully.
Note that in the case of INSERTSELECT the QS devoted to scanning the
source table or tables will not be sent these messages - they will be sent
standard PQO messages for controlling queries.
Version 1.2
15
PDML query QS messaging ...
QC QS
KXFXOparse
KXFXOexecute
KXFXORokcurs
KXFXORreply
KXFXRready
Once the DML QS have started their transactions the query QS (if used) are
sent KXFXOparse messages containing the trace events and their SQL. The
respond with KXFXORokcurs.
Given that the QS is doing a ROWID range scan the KXFXOexecute message
sent next will contain the ROWID to scan first.The QS will bind these values
in the parsed SQL and respond with a KXFXORreply message of type
KXFXRready. This indicates that it is ready to begin its scan of the table.
Version 1.2
16
PDML messaging ...
QC
QS
KXFXOexecute
KXFXOresume
KXFXORreply
KXFXRpartial
KXFXORreply
KXFXRstarted
DML
Query
QS
KXFXOexecute
At this point the DML QS are told to start (KXFXOexecute) and they reply
with a reply message KXFXRstarted. They are now waiting for output from
the TQs which will be populated with rows from the query QS - if they exist. If
this is a simple(no join) update or delete the DML QS will now begin to
perform the update or delete directly.
In the case where there are query QS they will now be sent a KXFXOresume
message which tells them to begin scanning the source table. They will reply
to the QC with KXFXRpartial when they have completed the ROWID range
and will be sent another one (another KXFXOexecute message with ROWID
bind values). They then begin scanning this new ROWID range - no
KXFXOresume message is required from here on. When they complete
scanning the table the QC will send a null ROWID range.
Version 1.2
17
Two Phase Commit
A 2PC message protocol is used to
ensure that all or none of the QS commit
3 new KXFXO messages are used
KXFXOprepare
KXFXORtrans
KXFXOforget
Once the transaction has completed the COMMIT statement must be
processed. At present either a COMMIT or ROLLBACK must be the next
SQL entered. The COMMIT is handled using a 2 phase commit protocol
implemented within the context of the parallel query messaging protocol. This
ensures that if any of the QS fail to commit the others will rollback.
Version 1.2
18
2PC messaging
QC QS
KXFXOprepare
KXFXOforget
KXFXORtrans
KXFXORtrans
2PC
The QC sends a KXFXOprepare message containing the parent
transaction ID - kxfxmpt->pxid_kxfxmpt to each QS - this kxfxmpt-
>pxid_kxfxmpt is the same TxID which the QC sends in the
KXFXObegintrans message at the start of the PDML transaction. The QS each
respond with a KXFXORtrans message. The QC then sends a KXFXOforget
message containing the parent txid and the commit SCN. The QS each then
send another KXFXORtrans message confirming commit.
Interestingly the TxID used in the ITL slots by the QS is the parent TxID
passed in the KXFXObegintrans for PDML INSERT but, DELETE and
UPDATE use the QS TxID.
Version 1.2
19
Recovery
Statement Failure
When ROLLBACK statement is issued
Process Failure
Failure during 2PC
Instance Failure
Version 1.2
20
Recovery - Statement Failure
Statement Failure
Assuming that a statement has begun
parallel execution
The QC gets an error or one is reported from a slave.
The QC posts 10388 to the remaining slaves.
The slaves roll back.
While the slaves are rolling back the QC tries to
convert the PS for each slave to mode 6 (X). When
this suceeds it indicates that the slave has finished
recovery
When acquiring slaves for a query the QC acquires PS locks on each slave in
mode 6 (X) and then downgrades them to mode 4 (S). When running normally
a PS lock is held for each slave in mode 4 (S) by the QC. Each QS also holds
its own PS lock in mode 4 (S). During recovery this is still the case but the QC
then tries to escalate the lock to mode 6 (X). When the slave finishes the
recovery it releases the lock and the QC can succeed in acquiring it in mode 6
(X).
Version 1.2
21
Recovery - Transaction Failure
Transaction Failure
When ROLLBACK is issued, Ctrl-C is
hit during COMMIT or a slave gets an
error during COMMIT
If the QC TX is ACTIVE
QC rolls itself back
It will then send a CLEANUP message to the
slaves
If the QC TX is COMMITTED
it will send a FORGET message to all the slaves
In either case the QC will then wait for DONE
RECOVERY messages from all the slaves
The recovery performed by the slaves will be as described in the Recovery
during 2PC section later in this presentation.
Version 1.2
22
Recovery - Transaction Failure...
If there is a failure during this phase the QC will mark
its transaction DEAD and SMON will perform clean
up of the dead transaction
After DONE RECOVERY is received from all slaves
the QC
finishes cleaning up its own state
ends the transaction
the first error the QC recieved is reported to the
user
Slaves receiving CLEANUP message
if slave transaction is ACTIVE it rolls back
if its PREPARED it checks the QC transaction state to
determine whether it should rollback, commit or set
itself as in-doubt
The rules sed are the same as PMON uses - see slide 24
Version 1.2
23
Recovery - Process Failure
Process Failure
PMON detects dead process
If dead process is a QS
PMON posts QC (12805) that the QS is dead
QC posts OER(10388) to all other QS
Each QS on receipt of 10388 rolls back its Tx
PMON wakes periodically and rolls back QS Tx
If dead process is the QC
PMON posts OER(10389) to each QS
PMON recovery of QC
PMON detects process failure - as part of its processing PMON sends KILL(0)
to all known processes. When the return indicates the process is dead PMON
marks it as dead and begins cleanup. Cleanup is done
CLEANUP_ROLLBACK_ENTRIES rollback entries at a time so as not to
interfere with the other duties it must perform. Naturally if more than 1 QS
dies PMON can only roll them back separately and therefore recovery will
take longer.
PMON posts the QC with 12805 - parallel query server died unexpectedly -
and the QC then posts the QS with 10388 - parallel query server interrupt
(failure). On receipt of this error the QS begin rolling back. If the process that
died is the QC then PMON directly posts the QS with 10389 - parallel query
server interrupt (cleanup).
Rollback is driven by the rollback segment information. This means that the
process performing rollback does not know what blocks will be needed so each
block is read using db file sequential read one block at a time. The
implication of this is that if the QS had updated a large number of rows
rollback may well take longer than the original update.
It should be noted that the QC will return to the user without waiting for
rollback completion of all QS. Hence the QS may still hold resources for some
time as the QC session then begins to execute a new statement.
Version 1.2
24
Recovery during 2PC
PMON will recover a dead DML QS
depending on the state of QC
COMMITTED
the DML QS transaction will be committed
INACTIVE
PMON will mark the transaction as DEAD and
convert it to an ACTIVE state. It will then roll it
back.
ACTIVE
the DML QS transaction is really in doubt. PMON
will mark the transaction as DEAD and leave it.
SMON or another foreground process will recover it
later.
After doing the above the QC will be notified via a posting from PMON. This
notification is independent of the transaction state the slave was in. The QC is
simply notified that a slave died.
If PMON finds a slave transaction state PREPARED or COMMITTED and the
recovery cannot be done immediately (eg. where slave died in the PREPARED
state and the QC is still ACTIVE) PMON will mark the transaction DEAD for
SMON to recover later.
Version 1.2
25
Instance Failure
Where DML QS dies
SMON notifies uses cross-instance calls to notify the
QC of the dead slave/s - posting 12805 to the QC
If QC is on the dead instance SMON will use CICs to
notify the slaves of the QC death
SMON notifies (CICs if required) the surviving slaves
to cleanup (10388)
transaction recovery of the slave/s is performed after
instance crash recovery by either a surviving SMON
or another foreground process
Recovery - Instance Failure
Where an entire system fails the foreground process restarting the system will
mark as DEAD all the incomplete transactions on the system (which will
include failed slaves). Recovery will the proceed as above.
Version 1.2
26
Locking Model
Table Lock
TM lock
ID1=OBJ# for the table
Partition Lock
TM lock
ID1=OBJ# for the partition
ID2=0
Partition Wait Lock
TM lock
ID1=OBJ# for the partition
ID2=1
Three types of lock are used in PDML to ensure that access to the partitioned
object is properly coordinated. They are all TM locks on either the table and
the partitions.
Version 1.2
27
Locking Model ...
When locking a table
QC takes out
TM lock on table mode 3 (SX).
Partition locks on each partition involved mode 6 (X)
TX lock in mode 6 for the parent transaction
QS take out
TM lock on table mode 3 (SX)
Partition lock on the partition it is working on in mode
1 (NULL)
Partition Wait lock on the same partition in mode (X)
TX lock in mode 6 for its own child transaction
In addition to the above locks the standard PS locks held in any parallel query
operation will also be held. That is :
QC holds a PS lock in mode 4 (S) for each slave
QS holds its own PS lock in mode 4 (S)
This is done to permit recovery completion to be detected - see the recovery
section for details.
A PS lock is defined as:
ID1=instance ID on which slave exists
ID2=slave ID on the instance
Version 1.2
28
Locking model contention
When another process wishes to do DML
in a partition currently locked for PDML it
will
acquire TM lock on table in mode 3 (SX)
wait on partition lock in mode 3 (SX)
When a PDML session wishes to update a
table which is locked for DML the QC will
acquire a TX lock in mode 6 (X)
acquire a TM lock on the table in mode 3 (SX)
wait on the partition lock in mode 6 (X)
It should be noted that while ordinary (non-parallel) DML on a partitioned
object will still acquire both a table lock and partition locks it will not acquire
partition wait locks. It will also acquire the locks in mode 3 (SX) thereby
permitting multiple sessions to perform DML against the same partition
concurrently.
Version 1.2
29
Locking Model - Implications
When PDML is being performed on a
partition no other DML may be performed
against the same partition.
Likewise a partition currently locked for
non-parallel DML will not permit PDML to
execute.
Oracle Parallel Server Locking Implications
This is worth noting as there are many datawarehouses using
large OPS setups. Basically all the locking involved in partition access
becomes subject to the usual DLM overheads. However there are a couple of
specific 8.0 problems which exacerbate this problem.
- when performing the object flush before commencing a
parallel query against a partition object and if you are using OPS a global
database checkpoint is performed instead of checkpointing only the partitions
that will be accessed. This can have very severe performance implications
where there are a large number of partitions and files. This is being reviewed
for 8.1.
- when exchanging partitions the entire object definition is
flushed from the rowcache. Again in OPS with a large number of partitions
and files this is a real performance hit if many partition exchanges are being
done during query operations. Incidently, it is not possible to exchange a
partition with DML running on the partition in question. Queries can berun but
the results may be unpredictable.
Version 1.2
Space Management
Partitioned objects
INSERT
Each QS will use blocks above the highwater mark
When the COMMIT is done the HWM is bumped up
If required new extents will be allocated
UPDATE
Again this is the same as a normal session
Where an update increases space usage a new extent
will be allocated as would be the case normally
DELETE
Each QS will delete rows as any normal session
Version 1.2
Space management ...
Non-Partitioned Objects
INSERT
sqlldr style extent allocation
Version 1.2
32
Monitoring - V$ Views
V$PQ_SESSTAT
Statistic DML Parallelised will show the number of
PDML statements parallelised by the session
V$PQ_SYSSTAT
Statistic DML Initiated indicates the number of
PDML operations that were initiated instance wide
V$PQ_TQSTAT and V$PQ_SLAVE do not provide and PDML specific
information. They provide information on any parallel operation.
VPQ_SESSTAT and V$PQ_SYSSTAT both provide only one statistic on
PDML.
V$PQ_TQSTAT provides table queue based data for the last parallel operation
in th session. It is only valid for the duration of the session.
V$PQ_SLAVE provides statistics for each parallel query slave.
Monitoring of the progress of any parallel operation can be difficult -
V$PQ_SLAVE will tell if a given slave is in use. Using V$LOCK and finding
out which QC is holding the slaves PS lock in mode 6 (X) will show you
which session they are connected with. Then you can find out from v$sqlarea
what SQL each is performing. Using v$session_wait will help you to
determine if any slaves are hanging or waiting for something. Setting event
10046 level 12 will enable you to determine what the slave or QC is actually
doing and whether it is making any progress. These are all standard parallel
query debugging techniques and they apply equally to PDML.
Version 1.2
33
Explain Plan
Explain plans are similar to other parallel
plans
PLAN_TABLE.OTHER column
contains SQL executed by QS
PLAN_TABLE.OTHER_TAG column
how the operation is performed
useful for determining points of serialization
Examples are included in the handouts.
Version 1.2
References
Oracle 8 Server Concepts Guide Ch8
Note 50585.1
PDML Brown Bags on spicy.us
PDML Design Specification
DSI 306 - Unit 6, RAID 6-1
Version 1.2
Design: Alok Satyawadi (Advanced Analysis)
Implementation: Stuart Mcleod (Oracle Support Services, Australia)
Review: Robert Farrington
Roderick Manalac
80 minutes Lecture
30 minutes Exercises / Test
110 minutes Total
6
6
Lesson 6
RAID
Redundant Array
of Inexpensive Disks
Version 1.2
Objectives
At the end of this lesson, you will be able to
Describe the different levels of RAID
Understand the advantages and
disadvantages of RAID
Understand the suitability of RAID to
various application types
Understand the use of RAID with Oracle
Version 1.2
Introduction
The term RAID was coined at the
University of California in 1987
An alternative interpretation of the acronym
is Redundant Array of Independent Disks
RAID is available on all major UNIX
platforms and on Windows NT, with many
different proprietary configurations in place
The level of RAID describes configuration
rather than the technology in use
Version 1.2
When is RAID Useful?
Hard disks are machinery with moving
parts:
Moving parts will fail at some time
RAID can prevent downtime when there is
a disk failure
Any organisation which cannot afford
unplanned downtime may find RAID
beneficial
Hard disks have moving parts. Any piece of machinery with moving parts is suspect to
breaking down, especially not receiving maintenance. RAID configurations aim to
reduce the down time if a component of the disk fails. ie. Disk and Controller.
Discussion point: When will a hard disk fail?
Version 1.2
What Can RAID Do?
RAID can offer protection against:
Corruption and loss of data
Loss of access to critical information
RAID can:
Help achieve and maintain a high level of
system performance and responsiveness
3 main reasons why you would implement a sophisticated storage solution such as
RAID.
Version 1.2
Disk mirroring is still being used effectively
Duplicate copies of data are kept on
separate disks. If one disk fails, the other is
still available for access to the data
Mirroring offers high availability, but:
It can be expensive to implement
Only allows 50% disk utilisation
May not scale well
Traditional Methods - Mirroring
Disk mirroring is one of the simplest strategies for protecting against single-disk
failures. The expense involved is to acquire an additional disk. The disadvantage is it
doesnt protect against disk controller failures.
Disk duplexing provides for all the advantages of disk mirroring. In addition, it protects
against disk controller failure by having two disk controllers - a primary and a
secondary. The expense involved is in getting a separate disk and a separate
controller.
Can also lack scalability because of the high CPU overhead. Difficult to continually
add controllers and disks to a machine without degradation in performance..
Disks are duplexed
Secondary disk
Controller
Primary disk controller
Version 1.2
A RAID system consists of a number of
disks, usually in a single cabinet
Part of each disk is used to store
information about the data on the other
disks
This allows the data for a particular disk to
be reconstituted in the event of it failing
The exact way this is implemented
depends on the RAID level
How Does RAID Work?
Disk mirroring is one of the simplest strategies for protecting against single-disk
failures. The expense involved is to acquire an additional disk. The disadvantage is it
doesnt protect against disk controller failures.
Disk duplexing provides for all the advantages of disk mirroring. In addition, it protects
against disk controller failure by having two disk controllers - a primary and a
secondary. The expense involved is in getting a separate disk and a separate
controller.
Can also lack scalability because of the high CPU overhead. Difficult to continually
add controllers and disks to a machine without degradation in performance..
Disks are duplexed
Secondary disk
Controller
Primary disk controller
Version 1.2
Parity as the Basis of RAID
RAID operation is accomplished using
check bytes
A check byte is constructed in such a way
that, if one of the drives fails, the data on
that drive can reconstructed using the
redundant information stored as parity
Version 1.2
Parity - Example
Disk 1 has data 11100011
Disk 2 has data 11101101
The check (parity) byte is calculated using
an exclusive OR (XOR) operation as
11010111
If one of the drives fails, its data can be
recovered by an XOR operation on the
bytes in the remaining disks
Version 1.2
Levels of RAID
There are five commonly agreed RAID
levels
More do exist, but are vendor-specific
Of the five levels, only two are widely used
RAID-1 (Mirroring)
RAID-5 (Parity stored on several disks)
Version 1.2
Levels of RAID - Level 0
No allowance for redundancy
Data is stored across all drives in the array
Low overhead
Risk of loss of data increases with the
number of drives in the array
Discussion Point: Is RAID 0 really RAID?
Answer: Not technically because no allowance for redundancy
RAID 0 controller
I/O BUS
Version 1.2
Utilises mirroring to provide protection
Each disk in a mirrored pair contains an
exact copy of the data on its companion
drive
Synonymous with disk mirroring
See Traditional Methods...
Version 1.2
Redundancy is provided through Error
Correcting Codes (ECCs) or parity
One disk is reserved for parity
Striping is done at the bit level
ECCs are used to recreate the data
Requires less disk space than RAID 1, but
I/O is slower and parity disk is a single
point of failure and possible I/O bottleneck
The first bit is written on the first disk in the array, the second bit on the second disk,
and so on. RAID 2 is not commonly implemented.
Version 1.2
Bit level striping
A single drive is dedicated to parity data
Usually implemented with parallel
controllers
Useable capacity is array size 1 (Where
the -1 is the parity drive)
Again, parity disk is a single point of failure
and I/O bottleneck
Version 1.2
Same as Level 3, but with block-level
rather than bit-level striping
A block is the amount of data transferred in a single read/write
operation. Like RAID 3 a disk is reserved for parity.
Version 1.2
RAID Levels - Level 5
The most popular RAID level
Block level striping
Distributes parity evenly across all drives in
the array
No single drive bottleneck or point of failure
Usually implemented with parallel
controllers
Redundancy is 1:4
In RAID 5 a stripe segment can contain either data or parity. The following are the
steps in writing to a RAID 5 stripe:
1) Read the blocks to be overwritten
2) Read the corresponding parity blocks
3) Remove the contribution of the data to be overwritten from
the parity data
4) Add the contribution to parity of the new data
5) Write the new parity data
6) Write the new data
I/O BUS
RAID 5 controller
Version 1.2
Performance and RAID-5
Depending on the application, performance
may or may not improve if RAID is used
RAID can help if mainly doing reads
RAID will not help if writes predominate
Applications which are read-intensive usually show performance benefits
when RAID is used, because the disk with the head nearest the desired data
should be used for access.
Writes, however, will be slower; the data has to be striped over the available
disks and the parity calculated and stored.
Version 1.2
Several factors can affect performance
Access speed of constituent disks
Capacity and number of internal and
external buses
Size and number of caches
Algorithms used for determining
mechanisms for reading and writing
Version 1.2
If higher performance is sought, many
UNIX systems offer alternatives to RAID-5
The most common is Asynchronous I/O
A process can proceed to the next
operation without having to wait for the
write to complete
In Oracle terms, allows higher DBWR
throughput
An init.ora parameter may need to be set
Version 1.2
Common Practice
RAID levels 1 and 5 are the most common
Many vendors provide a large battery
backed-up cache for their controllers
Can improve performance of an
application even if it is not designed for
use with RAID
An internal controller optimises I/O for
cached data
RAID-1 is best where
complete redundancy of data is essential
disk space is not an issue
RAID-5 is best where
avoiding downtime (due to disk failure) is critical
better read performance is needed
Variables to consider when choosing a RAID device are: external bus, internal bus,
number of buses, external cache, internal cache, drive types, drive caches, internal
read/write algorithms and the Oracle distribution of write requests and write sizes.
Version 1.2
Fault Tolerance Requirements
True fault tolerance involves:
Ability to swap disks without powering
down the entire array, known as hot
swapping
Requires separate power supply for
each disk
Some vendors extend this by offering a
hot standby disk in the RAID array
Hot swapping is the ability to replace a failed drive without powering down the
whole disk array (part of RAID-5)
Logical Volume Manager (LVM) (with a suitable stripe size for the type of
application) gives performance as good as RAID-5. This is particularly true for
systems where there are no I/O bottlenecks...
Version 1.2
Logical Volumes and RAID
Performance of Logical Volume Managers
versus RAID
RAID is usually more effective on larger
systems
RAID uses dedicated hardware with high
speed interconnect
Logical Volume Managers, if available,
supplied as part of the operating system
Version 1.2
Oracle and RAID
Usage of RAID must be transparent to
Oracle
Features specific to the RAID configuration
are handled by the operating system
Data files can be put on RAID devices
Usually accessed randomly
Redo logs and rollback segment files
should not be put on RAID devices
Accessed sequentially
Oracle and RAID
Feature of Oracle is its platform and device independence. RAID is
implemented at the operating system level, but can now be integrated with
EBU and RMAN via products such as HPs OmniBack
Redo logs should not be put on RAID devices. Sequential access to these files
is required, as well as high throughput limiting their suitability. Multiple log
members should be used to provide redundancy.
Another reason redo logs should not be placed on RAID-5 devices is related to
the type of caching (if any) being done by the RAID system. Catastrophic loss
of data could ensue if the contents of the cache were lost, e.g. because of a
power failure, when Oracle was notified they had been. This is particularly true
of so-called write-back caching, where the data is regarded as having been
written to the disk when it has only been written to the cache. Write-through
caching, where the write is only regarded as having completed when it has
reached the disk, is much safer, but is still not recommended for redo logs for
the reasons mentioned earlier.
Version 1.2
Oracle and RAID
Multiple DBWR processes
Not a direct alternative to using RAID-5
Improves write efficiency rather than
fault tolerance
Optional on many UNIX platforms
Used by default on Windows NT
Version 1.2
Conclusions
Different organisations, projects and
applications have very different degrees of
storage requirements
If configured appropriately, RAID can
provide:
Protection against data corruption
Protection against loss of data access
High level of system performance and
responsiveness
RAID is implemented mainly for reliability, not performance. There
may be some performance benefits for an application that primarily
reads, however there will probably be a performance downturn for
an application that primarily writes.
DSI 306 Unit7, Advanced Queueing 7--1
11
1
Copyright Oracle Corporation,1998. All rights reserved.
7
7
Oracle 8 Advanced Queuing
Design: Alok Satyawadi (asatyawa@us.oracle.com)
Development:Alok Satyawadi (asatyawa@us.oracle.com)
45 minutes Lecture
15 minutes Examples
120 minutes Total
DSI306-7:2 Copyright Oracle Corporation, 1998. All rights reserved.
Contents
Introduction
Concepts
Internals
Algorithms
Sample run
Block dumps
This lesson assumes that the audience is familiar with the AQ chapter of the
Oracle8 Server Application Developers Guide.
Introduction
In some environments, deferred execution is desirable.
Examples - Business Process Managment automation,
Workflow automation.
Queuing implements deferred execution of work.
Queuing provides persistent queuing.
Advanced Queuing (Oracle/AQ)- Oracles answer.
Prior to this, TP monitors would provide this functionality.
Oracle/AQ - queuing functionality through a PL/SQL
interface.
In many enterprises today, large scale business applications are often implemented as a group of
cooperating programs. Each program implements a well-defined task. The programs are usually
onwed and managed by individual departments in an enterprise. The key to successful
implementation of such a distributed application architecture is getting the programs to
communicate with each other. There are two models of communication that can be employed -
synchronous and messaging.
Client Server Model is an example of a synchronous messaging system. Clients send a service
request to the server and WAIT till they get a delivery from the server. For such systems to work,
it is required that the network that connects all the nodes is available, the machines that run the
programs are up and running and all programs are running. However, in reality, all these
components are known to fail. Hence, distributed systems using synchronous model are very
vulnerable.
With the messaging model, programs communicate by placing messages in a queue. It is the
resposibility of the messaging system to make the message available to the appropriate recipient.
programs do not communicate with each other directly, they are disconnected from each other.
Messaging also removes the time-dependency relationship between the programs i.e. all the
programs need not be running at the same time. As a result, applications are less vulnerable to
network, machine or program failures.
Another important feature of a messaging system is Persistent Queuing. In the event of a failure of
any of the system components (network, machine or program), the messaging system provides
guaranteed delivery of messages, each exactly once.
Version 1.2
Introduction contd...
Features of Advanced Queuing
reliable, persistent queuing.
can enforce time windows for execution.
integrated with Oracle Server.
SQL access.
structured payload.
provides an alternative to Queuing mechanisms
of TP monitors.
retention and message history.
tracking and event journals.
integrated transactions.
Reliable, persistent queuing: The ability to store a message in a persistent queue in the event of a failure
(of network, machine or program) and deliver it later.
Windows for execution: the system provides a feature wherein one can establish a window of time within
which a message is delivered. Also, for some reason, if the message is not delivered in time then there is
a way by which one can shift priorities too.
AQ integrated with Oracle Server: all standard database features such as recovery, restart and enterprise
manager are supported. Oracle AQ queues are implemented in database tables, hence all the operational
benefits of high availability, scalability and reliability are applicable to queue data. In addition, database
development and management tools can be used with queues. For instance, queue tables can be imported
and exported.
SQL access: Messages are stored in normal rows in a database table. They can be queried using standard
SQL.
Structured Payload: Users can use object types to structure and manage the payload.
Replaces TP Monitor Queuing support: In the absence of such a queuing sytem developers would rely on
some other MOMs (Message Oriented Middleware) like TP Monitors.
Retention and Message History: all messages are retained even after they have been retrieved.
Administrators can specify the time for which they will be retained.
Tracking and Event Journals: Since the messages are retained, it allows users to track sequences of
related messages. These sequences represent event journals.
Integrated Transactions: integration of control information with payload simplifies application
development and management.
Concepts.
message - smallest unit of work processed by a single transaction.
It consists of user data and control information.
Queue - collection of messages.
Queue Table - data repository of a set of queues.
Queue table
Queue
Message
control data
queue name
message properties
enqueue properties
user data
message
The main objective of AQ system is to provide a mechanism by which
MESSAGEs (with certain attributes) can go back and forth between two or
more sessions.
This message constitutes of some control information and user data.
A collection of messages is called a QUEUE.
Queues are stored in a database table called QUEUE TABLE.
Concepts contd...
Oracle8
Queue Table
Queue1
Queue2
message
process 1
process 2
process 3
process 4
time
manager
So, conceptually, processes 1 &2 will put some messages in their respective
queues and processes 3 & 4 will pull those messages from the queues. These
queues will reside in a queue table and this table itself will be in an Oracle8
database.
Time manager: is a background process that monitors the messages in the
queue. It provides the mechanism for message expiration, retry and delay.
Architecture
dbms_aqadm
dbms_aq
enqueue dequeue
process 1
process 2
Oracle8
create_queue_table
create_queue
start_queue
alter_queue
stop_queue
drop_queue
drop_queue_table
aq_adm_role
aq_user_role
time
manager
As mentioned before, the user interface to the AQ system is provided by
PL/SQL. There are two packages that handle the queuing system -
dbms_aqadm and dbms_aq packages. The first package is responsible for all
the administrative tasks and the second package has the end-user calls for
queuing and dequeuing.
Time Management
implement functionality for the delay and
expiration parameters (specified at enqueue time).
move messages to expiration queue when they
expire and are not processed.
AQ uses a background process called time
manager to coordinate.
one time manager per instance
Dictionary table AQ$_TIMETABLE contains all
the time related data.
In order to invoke Time Monitor, the following Init.Ora parameters must be
set:
1. AQ_TM_PROCESSES: This parameter defaults to 0. It has a valid range of
values between 0 and 10. If this parameter is set to 1, one queue monitor
process will be created as a background process to monitor the messages. Also,
this parameter is a dynamic parameter. The name of the process is
ora_qmon<oracle_sid>.
2. JOB_QUEUE_PROCESSES: Message propagation is handled by job queue
(SNP) processes. This parameter defines the number of job queue processes
started. The default is 0.
3. COMPATIBLE: Should be set to 8.0.4 in order to use AQ propagation
feature.
Enqueue Overview
DBMS_AQ.ENQUEUE(
queue_name IN varchar2,
enqueue_options IN enqueue_options_t,
message_propertiesIN message_properties_t,
payload IN <type_name>,
msgid OUT RAW)
enqueue_options_t is RECORD (
visibility BINARY_INTEGER default ON_COMMIT,
relative_msgid RAW(16) default NULL,
sequence_deviation BINARY_INTEGER default NULL)
messag_properties_t IS RECORD (
priority BINARY_INTEGER default 1,
delay BINARY_INTEGER default NO_DELAY,
expiration BINARY_INTEGER default NEVER,
correlation VARCHAR2(128) default NULL,
attempts BINARY_INTEGER,
receipient_list aq$_recipient_list_t,
exception_queue VARCHAR2(51) default NULL,
enqueue_time DATE,
state BINARY_INTEGER )
Dequeue Overview
DBMS_AQ.DEQUEUE(
queue_name IN varchar2,
dequeue_options IN dequeue_options_t,
message_properties OUT message_properties_t,
payload OUT <type_name>,
msgid OUT RAW)
TYPE dequeue_options_t IS RECORD (
consumer_name varchar2(30) default NULL,
dequeue_mode BINARY_INTEGER default REMOVE,
navigation BINARY_INTEGER default NEXT_MESSAGE,
visibility BINARY_INTEGER default ON_COMMIT,
wait BINARY_INTEGER default FOREVER,
msgid RAW(16) default NULL,
correlation varchar2(128) default NULL)
Queue Dictionary Tables
system.aq$_queue_tables
system.aq$_queue_table_sort
system.aq$_queues
system.aq$_timetable
system.aq$_queue_tables: contains queue table definitions. An entry
is created for each queue table at the queue table creation time.
system.aq$_queue_table_sort: Stores the default sorting order (user
specified) for each queue table. This is done at the time of creating the
queue table. This information is used at the of a DEQUEUE operation.
system.aq$_queues: stores the operational characteristics (user
specified) for individual queues like ordering of messages etc.
system.aq$_timetable: contains the time related information for each
queue in the system.
Queue Dictionary Views
DBA_QUEUE_TABLES
USER_QUEUE_TABLES
ALL_QUEUE_TABLES
DBA_QUEUES
USER_QUEUES
ALL_USER_QUEUES
DBA_QUEUE_PRIV
AQ Internals
AQ uses SQL to store/retrieve messages.
Queue operations are coordinated thru a meta-
data cache - implemented as Library cache objects.
Each queue is represented by a KGL object -
referenced by its name and owner.
New object type KGLTQUEU, new namespace
KGLNQUEU added to kgl.h.
All meta-data stored in heap 0, the object heap.
Immediate/Standard Enq,Dq is implemented
using recursive transactions provided by the kernel.
Queue objects are coordinated through a meta data cache which is
implemented as Library Cache Objects. The Kernel Generic Library Cache
Manager (KGL) subsystem provides a convenient interface to create objects,
modify objects, drop objects, locking, pinning, manage dependencies between
objects and coordinate among multiple shared memory library caches (OPS).
Each Queue is represented by a kgl object, and is referenced by an object name
(queue name) and its owner name.
Internals contd...
If a call is of Immediate type then
a recursive transaction is started,
appropriate action is performed,
a commit is done,
logs are flushed to make the change persistent.
Dequeue can occur with WAIT or NOWAIT
option.
In case of MTS, dequeue cannot run with WAIT option.
Export/Import of AQ system is done on queue
table level.
An option exists for both enqueue and dequeue by which a user can specify whether the current
enqueue or dequeue call is either a part of an outer transaction along with other calls or the call is a
transaction by itself. If the call is part of a transaction (standard option), the changes by enqueue or
dequeue will be reflected when the transaction commits. If the call is a transaction by itself (the
immediate option) then Oracle will perform actions listed above.
In case of MTS, dequeue doesnt support WAIT option to prevent holding up a shared server
process.
Queues are implemented on tables. The export/import of queues constitutes of exproting and
importing of the underlying queue tables and related dictionary tables.
IMPORT: To maintain the consistency of queue dictionary tables, two import post-table actions
are used.
1. DBMS_AQ_IMPORT_INTERNAL.QT_DIC_UPDATE: executed for each queue table
imported. It inserts a new row in system.aq$_queue_tables and system.aq$_queue_tables_sort.
2. DBMS_AQ_IMPORT_INTERNAL.Q_DIC_UPDATE: executed for each queue in the imported
queue table. It inserts a new row in system.aq$_queues to describe this queue.
EXPORT: At the time of creating a queue table or a queue within a queue table, Oracle will create
an entry in expact$. This entry is for
DBMS_AQ_IMPORT_INTERNALQT_EXPORT_CHECK action, which generates an execute
statement to be written in export file. This is a cue for the import actions. Similarily, upon creating a
new queue, another entry is put in expact$ namely
DBMS_AQ_IMPORT_INTERNALQ_EXPORT_CHECK which is again a cue for import actions.
Algorithms: Sequence Deviation
algorithm designed to create buckets to
drop messages.
it addresses three cases:
Case I: inserting an independent message.
Case II: inserting a message in front of another message.
Case III: inserting a message before any other message.
Before we delve into the mechanics of the algorithm, we need to know a few
definitions.
Chain: a chain is defined as a set of messages related together. The messages
in a chain are related together by the relative message identifier (rel_msgid).
Each chain has a chain number assigned to it.
It is possible that messages do not belong to any chain.
Within a chain, messages are processed in an order called the local order.
So, each message will have -
- a chain number,
- a local order number.
Higher the order number, earlier the message will be processed.
First message in the chain, which will be processed last, is called the head of
the chain.
Case I: inserting an independent
message
bucket is big enough to insert x messages where
2power(x)=y and y=32 bit number.
when first message is inserted, null value is assigned to
chain number and 0 assigned to local order number.
Case II: inserting a message in front of another
message
if inserting message X infront of message Y where Y is the first message
then assign chain number to Y (which would have had NULL). This is done
by select nextval from dual.
select the message in the chain which has an order number higher than the
order number of the message before which you want to insert.
if above fetches no messages, order number for new message is order
number of a message in the queue + maximum order number and divide the
sum by 2.
if it fetches a set of messages, pick a message with lowest order number.
Add to this the order number of the message before which you want to insert
and divide the sum by 2.
the chain is locked when inserting messages to enforce concurrency control.
Example of algorithm in action for Case II:
enqueue 4 messages in a queue.
local processing order - message3>message2>message4>message1.
32 bit number is 1024.
Step Enqueue Chain# LocalOrder# RelMesgIdentifer
1 message1 NULL 0 NULL
2 message2 assg:1 (0+1024)/2=512 1
message1 assg:1 0 NULL
3 message3 assg:1 (512+1024)/2=768 2
4 message4 assg:1 (0+512)/2=256 1
Final State:
Message# chain# Order#
1 1 0
2 1 512
3 1 768
4 1 256
Case III - Inserting a message before any
other message
no relative message identifier required.
message before which you insert is selected by using the
ordering parameters specified at the time of creating the
queue. In the absence of ordering parameters, enqueue
time is looked at for finding a candidate message.
once message is found, use Case II, if not use Case I.
Algorithms: Time Management
time manager is event driven.
lot of communication between enqueue process
and time manager - of type wait/post.
at enqueue time, enqueue process will:
check to see if delay/expiration specified, if yes then it will insert an
enrty in AQ$TIME.
it notes the minimum time it has seen in an SGA variable.
if new message has delay/expiration less than what is seen before
then post the time manager.
time manager when posted scans AQ$TIME and picks up an entry
that has least absolute time.
time manager goes back to sleep to be woken up when this time
expires.
Time manager process is an event driven process. This means that it sleeps
most of the time and wakes up only when there is some work to do.
Sample Run
Set up
compile dbms_aqadm package.
set event 10931, level 2.
directive to plsql compiler to
understand
UDT-1 syntax.
compile dbms_aq package.
set event 10931, level 1.
Demo script handout Page 2
For dbms_aq, package specification and body should be compiled after setting
an event. Failure to set this event will result in the package compilation
generating errors. Setting the event gives a directive to the compiler to
compile this package with a special flag so that it can recognize the
"<UDT_1>" syntax. The event should then be turned off after compilation.
/* Set the event */
alter session set events '10931 trace name context forever, level 2'
/* Package specification */
@<package_spec_file>.plb
/* Package body */
@<package_body_file>.plb
/* Turn the event off */
alter session set events '10931 trace name context forever, level 1'
Block Dumps
Queue Table Header page 4
Describe output page 4
Queue Data Block page 4
8.1 Enhancements
publish/subscribe engine to support
loosely coupled application arch.
database event publication system.
faster message propagation
security
listening on multiple queues.
interoperability with 3rd party products.
Publishers are entities that publish information (as messages) to a "broker" or a "publish/subscribe
engine." They do not know or care about the interest of other applications in the messages. The
publishers decide when, how and what to publish. They are not driven by any external entity.
Subscribers are entities that receive information by expressing interest in certain types of messages.
They do not care about the origins of the messages. The broker delivers published messages to the
appropriate subscribers. Publishers and subscribers do not directly connect to each other (in the
network sense). So a publisher or a subscriber can come and go without impacting any other
application.
Depending on how subscribers specify their interest in messages, publish/subscribe models are
classified as either subject based or content based.
AQ 8.1 will offer both subject and content based addressing. Subscribers can express the filtering
rules using the full power of SQL. Since AQ messages are normal database objects, any SQL
operation that applies to data objects, can also be applied to messages. This functionality is made
possible through a rules engine that will evaluate incoming messages against a set of rules, specified
by subscribers. Only if the messages satisfies a rule, is it delivered to the subscriber.
A significant number of the important business activities occurring in an
enterprise eventually are reflected in changes to a database. Hence it has been a
common integration strategy to tap the database as the single source of events.
The database event publication subsystem formalizes this concept by providing
an elegant and powerful interface. Applications can subscribe to database
events just as they subscribe to messages from other applications. Database
events could be:
- DML events like insert, delete and update.
- System events like start up, shutdown, log on and log off.
The database event publication subsystem tightly integrates with the AQ
publish/subscribe engine. Database events are published by triggers to certain
topics, and can be evaluated by the rules engine. Applications have a single
consistent interface that allows them to treat the database as a publisher just
like any other application.
AQ propagation will use messaging streaming (and the elimination of the
current two phase commit mechanism) to significantly improve performance
of message propagation. AQ's integration with TIBCO's TIB/Rendezvous
product will distribute messages in near real time to a large number of
applications.
From a security point of view queues will be treated as first class objects, like
tables. Enqueue and Dequeue privileges will be granted on a per queue basis.
Administrative privileges like creating and deleting queues will also be granted
on a per queue basis.
Listening on Multiple Queues: Applications will be able to monitor a list of
queues for incoming messages. This is a blocking call which will return with
the first queue in which a message has been enqueued. This makes application
development easier, since it eliminates polling.
Third Party Interoperability: During the course of the year Oracle will
announce partnerships with other leading vendors of messaging technology.
AQ will integrate with messaging products that have complementary
technology.
References
Application Developers Guide (Chapter
11). http://st-doc/8.0
Source Code (on zen).
/src4/803/rdbms/src/server/oltp/qs
AQ design specifications.
AQ intranet site (http://aq.us.oracle.com)
DSI 306 - Unit 8, Tuning Data Load 8--1
Version 1.2
8
8
Tuning Data Load
Acknowledgments:
Design: Steve Tran, WSS, Bug Diagnosis and Escalations
Development: Steve Tran, WSS, Bug Diagnosis and Escalations
Review:
45 minutes Lecture
15 minutes Examples
60 minutes Total
Version 1.2
Outline
SQL*Loader Enhancements
Direct Insert Enhancements
Create Table As Select Enhancements
Import Enhancements
Notes:
This lesson describes various techniques for loading data into VLDBs .
Version 1.2
4
SQL*Loader Enhancements
Direct Path Load
Parallel Load
Restrictions
I/O consideration
Notes:
. SQL*Loader is mainly used to load data into VLDB.
From Oracle version 7.1 the SQL*Loader utility provides a direct path option which
can load large amounts of data in a relatively small period of time.
Direct Path load along with Parallel load can maximize loading data into the
database.
. With the introduction of table partitioning in Oracle8, we can improve load
performance by having multiple concurrent loads into separate partitions
. The NOLOGGING option eliminates much of the overheads of undo and redo
logging.
Version 1.2
Direct Path Load
set with DIRECT=true (on command line)
Eliminates much of the Oracle database
overhead
A Direct Path Load can load table and index
data on empty and non-empty table
Also can combine with PARALLEL option
for maximum performance
Notes:
Direct Path Load parses the input data according to the description given in the
control file, and builds a column array structure. SQL*Loader then uses the column
array structure to format Oracle data blocks and build index keys. The newly
formatted database blocks are then written directly to the database.
When loading a partitioned table, SQL*Loader partitions the rows and maintains
indexes. Note that a Direct Path Load of a partitioned table can be quite resource
extensive for tables with many partitions. For best results, pre-partition input data to
corresponw with the table partition before the load. The control file contains a
statement such as:
LOAD INTO ORDER PARTITION (jan95)
Local index partitions during direct path load to a single partition are maintained by
SQL*Loader . Global indexes are not maintained on single partition direct path loads.
Version 1.2
Tuning Direct Path Load
To minimize time
Pre-allocate storage
Pre-sort data
Infrequent data saves
NOLOGGING
To minimize time:
a. Pre-allocating required extents when table is created for faster loads
into new table.
b. Pre-sorting data: If pre-sorting is specified and existing index is empty, then
maximum efficiency is achieved. The sort routines are completely bypassed
with the merge phase of index creation. The new keys are simply inserted
into the index. So temporary storage is not required and time is saved.
e.g. LOAD DATA
INTO TABLE EMP
SORTED INDEXES (empno)
Note: indexes will be UNUSABLE if there are unsorted data.
c. Infrequent Data Saves:Frequent data saves resulting from a small value of
SQL*LOADER parameter ROWS adversely affect the performance of a
direct path load as it can be many times faster than conventional loads.
We should set the value of ROWS considerably higher for direct path load.
d. Three methods can be used to minimize redo log activity are:
Disable archiving
Unrecoverable
NOLOGGING
Version 1.2
Tuning Direct Path Load
To minimize space:
Pre-sort data
No index maintenance during the load
To minimize space:
a. When sorting data before the load, sort segment on the indexes can be
eliminated that usually requires the most temporary storage space.
b. Avoid index maintenance during the load: for direct path load, SQL*Loader
maintains all existing indexes for a table.
Index maintenance can be avoided by the following methods:
1) Drop the indexes prior to the beginning of the load
2) Setting SQL*Loader option: SKIP_UNUSABLE_INDEXES=TRUE
Indexes that are UNIQUE and marked UNUSABLE are not allowed to skip
index maintenance, and the load terminates upon entercountering a record
that would require index maintenance
3) SKIP_INDEX_MAINTENANCE=TRUE should be used when the number
of rows to be loaded is large compared to the size of the table and there are
many indexes that require huge sort segment space when rebuilding.
Version 1.2
Direct Path Load- An Example
Multiple SQL*Loader sessions to partitions
Separate Control File for each load
Local indexes are maintained
Global indexes cannot be defined
Tables to be loaded do not have any active
transactions pending
Temporary segment space for index
Indexes left in UNUSABLE state
Consider an example using the table ORDER. Partitioned by month, into twelve
sepearte partitions (one for each month of the year):
Its possible to use 12 SQL*Loader sessions with 12 input files to load data into the
ORDER table. Each load has its own control file to specify the partition into which to
load data. Parallel loading is still achieved, but on a partition level, without the
restrictions of the PARALLEL keyword. The only requirement is that you must
partition the input manually, otherwise many records will be rejected which can
adversely affect performance.
Restrictions: no Global Indexes can be defined on table to be loaded, also tables
cannot be clustered, enabled triggers are not allowed, and neither referential nor check
constraints.
Temporary Segment space needed for storing new index keys (in bytes) can be
estimated by 1.3 * key_storage, where:
key_storage=(number_of rows) * (10 + size _ind_columns + num_ind_cols)
Indexes can be left in Index Unusable state. Common reasons being:
- run out of space
- data is not in the order by the SORTED INDEXES
- duplicate keys in a unique index
- SQL*Loader session is interrupted
Version 1.2
Parallel Data Loads
DIRECT=true , PARALLEL=true
Indexes must be rebuilt after the load
Inter-segment concurrency with Direct Path
load (concurrent loads to different tables or
partitions).
Intra-segment concurrency with Direct Path
load (concurrent loads to one segment)
Restrictions from Direct Path Load are
applied, also local indexes are not maintained
Notes:
When you perform a Parallel load, SQL*Loader creates temporary segments for each
concurrent session and then merges the segments upon completion. The segment created
from the merge is then added to the existing segment in the database above the High
Water Mark. The last extent of each segment for each loader session is trimmed of any
free space before being combined with the other extents of the SQL*Loader sessions.
(This can lead to irregular sized extents belonging to a table, which are not of the NEXT
extent size).
Indexes are not maintained during a parallel load. Any indexes must be recreated or
rebuilt manually after the load completes. You can use the parallel index creation and
parallel index rebuild to speed up the building of large indexes after a parallel load.
Version 1.2
Inter-segment concurrency.
Concurrent Direct Path Loads to different
tables or partitions.
Input data should be pre-partitioned to
prevent high number of records are rejected
To illustrate concurrent direct path loads to twelve partitions of the table ORDER. Each
partition is in a different tablespace
e.g. SQLLDR DATA=file1.dat DIRECT=true PARALLEL=true
SQLLDR DATA=file2.dat DIRECT=true PARALLEL=true

If there are 7 input data files, we can start 7 SQL*Loader sessions concurrently. Even if
data is not pre-partitioned, it will go into the correct partition. The keyword
PARALLEL=true must be used because each of the 7 loader sessions can write to every
partition. SQL*Loader will attempt to spread data evenly across all the files in each of
the 12 tablespaces - however an even spread of data is not guaranteed. Also, there could
be I/O contention when the loader processes are attempting to simultaneously write to
the same device.
In order to overcome this, use the keyword FILE to specify exactly what files we want to
load data into. The partition name can also be specified in the control file :
SQLLDR DATA=jan95.file1 DIRECT=true PARALLEL=true FILE=f1
SQLLDR DATA=jan95.file2 DIRECT=true PARALLEL=true FILE=f2
...
where f1,f2, are data files belong to one of the twelve tablespaces.
Version 1.2
Intra-Segment Concurrency.
All the partitions are in same tablespace -
or table is not partitioned.
Control over the exact placement of
datafiles
No requirements to partition the input data
Use in Oracle 7
For best performance, table are spread
over different data files and disks as well as
input data load files
Intra-segment concurrency is the concurrency within a single tablespace. Using the
example from the previous slide, if all thirty files are in one tablespace, we may
need to have the same number of input files as data files in the tablespace. We do
not need to partition the input data as all data will go into one tablespace:
SQLLDR DATA=file1.dat DIRECT=true PARALLEL=true FILE=f1
SQLLDR DATA=file2.dat DIRECT=true PARALLEL=true FILE=f2
...
SQLLDR DATA=file30.dat DIRECT=true PARALLE=TRUE
FILE=f30
f1,f2,,f30 are thirty data files belong to a single tablespace.
The advantage of this approach is that we have control over the exact
placement of datafiles, since we use the FILE keyword. However, we are not
required to partition the input data by value.
Version 1.2
Tuning Parallel Path Loads.
Parallel Load into single partition
Reducing Disk Contention
Avoiding Dynamic Space Management
Tuning Sorts for creating indexes
Notes:
1. If a Parallel Direct Path load is being applied to a single partition, its best that the
data is pre_partitioned, otherwise the overhead of record rejection due to a partition
mismatch slows down the load.
2. Its recommended that each concurrent load uses files located on different disks to
allow for maximum I/O throughput (using the FILE keyword)
3.We can use Unix command like sar -d on may UNIX systems or iostat to monitor
disk I/O . For example, 40 or more I/Os per second is excessive for most disks. This
query will show how many reads/writes for each files:
select name,phyrds,phywrts from v$datafile df, v$filestat fs
where df.file# = fs.file#
4.Striping is the spreading data across separate datafiles (potentially on separate disks).
This can be achieved by OS or manual striping. With OS striping, the biggest concern
is choosing the right stripe size. For VLDB a stripe size of 5MB tends to be reasonable
in many situations. If the stripe size is too big, you may have a hot spot on one disk,
setting stripe size too small will detract from performance, particularly for backup and
restore operations. Manual striping allows for an even spread of tables over many disks.
Version 1.2
Avoiding Dynamic Space
Management
Detecting Dynamic Extension
Allocating extents
ST enqueue
Sort and Temporary Data
Dynamic extension can reduce performance as we need to execute SQL statements
(recursive calls) to allocate new extents. Monitor the statistic during the load with the
query :
select name, value from v$sysstat where name = recursive calls;
If the recursive calls are caused by dynamic extension, you should try to reduce
extension by allocating larger extents
Allocate extent size after determining the maximum size of objects (see Oracle8 Server
Administration Guide on how to calculate object size). Large extents are less likely to
be extended than multiple smaller extents (even if overall size is the same).
Since version 7.3, an unlimited number of extents for an objectis allowed, however, for
best performance, you should be able to read the whole extent map with a single I/O.
Any space management transaction is controlled by a single ST enqueue. To minimize
the possibility of timing out while waiting for the ST enqueue (ORA-1575), use:
a. Dedicated temporary tablespaces and do not use default characteristics.
b. Setting initial=next to a value of 1MB to 10MB, depending on the value of
SORT_AREA_SIZE, usually n* SORT_AREA_SIZE+DB_BLOCK_SIZE
c. Increasing SORT_AREA_SIZE to avoid sorting to disk
d. Using NOSORT option to create indexes by sorting data prior to load.
Version 1.2
Direct Load Insert
INSERT INTO TABLE SELECT
Data can be inserted into partitioned or non-
partitioned tables, either in parallel or serially.
Direct Load appends the inserted data after
existing data (above the HWM).
Direct load insert enhances performance during insert operations by formatting and
writing data directly into data files without using the buffer cache. The functionality is
similar to that of the Direct Path Load of SQL*Loader
- Free space within the existing data is not reused in direct load insert
- Direct load insert can only support the INSERT INTO SELECT syntax
Advantages:
a) Can load data without logging undo and redo entries
b) Direct load Insert updates the indexes of the table
c) With a Parallel Insert , atomicity of the transaction is ensured. Atomicity
cannot be guaranteed if multiple parallel loads are used. Parallel load
could leave some indexes in an unusable state.
d) Can parallel direct load insert into non-partitioned and partitioned tables
Note: In direct load insert, exclusive locks are obtained on the table (or on all partitions
of a partitioned table)
Version 1.2
Direct Load Insert.
APPEND is the default mode for direct insert
NOLOGGING with APPEND to make the
process even faster.
Enable parallel dml (Alter session comm.)
The parallel degree of the insert and select
are independent to each other.
Determine the degree of parallelism (number
of CPUs , number of disks )
The APPEND hint is optional. When set data is always inserted into a new block. The
APPEND mode should be used to increase the speed of insert operations, it may not be
best when space optimization is required.
Setting NOLOGGING with APPEND will make the process even faster. If recovery is
needed, be sure to take a backup immediately after the creation.
There is independance of parallelism between the insert and the select operation. If one
operation cannot be performed in parallel, it has no effect on whether the other operation
will be performed in parallel.
A precedence exists where parallelism is determined from the Insert hint directive >
Parallel clause(of the create table statement) > Maximum query directive
The default degree of parallelism is appropriate for reducing response time while
guaranteeing use of I/O and CPU resources for any parallel operations. If I/O bound ,
first spread the data over more disks then increase parallelism , stop when the operation
becomes CPU bound.
If a system becomes memory bound and there are several concurrent parallel
operations, reduce the degree of parallelism to correct this problem.
Version 1.2
Consideration for Direct Load
Insert
Index maintenance
Space consideration - each parallel server
process inserts its data into its own
temporary segment
Locking consideration
Direct Load Insert on partitioned tables that only have local indexes, the indexes are
rebuilt internally at the end of the insert operation
For parallel direct load insert, this is done by the parallel server processes.
Remember global indexes for partitioned tables or indexes for non-partitioned tables
cannot use the direct load insert path.
Tips: If the direct load insert modifies most of the data in a table, you can avoid the
performance impact of index maintenance by dropping the index before the insert and
rebuilding it at the end.
Direct load insert requires more space than conventional path insert as direct load
insert ignores existing space in the free lists of the segment. Each parallel process first
inserts data into a temporary segment before appending to the table, be careful when
choosing the NEXT and MINEXTENTS as they can affect the table fragmentation.
(either internal or external fragmentation to be discussed in Create Table as Select -
CTAS)
In direct load insert, exclusive locks are obtained on the table (or on all partitions of a
partitioned table). Concurrent queries are supported and only see the data in the table
before the insert began.
Version 1.2
Restrictions on Direct Load
INSERT
A Direct Load Insert must execute first
ROW_LOCKING=INTENT must not be set
Referential integrity and triggers are not
allowed
GLOBAL indexes are not maintained
There is no index maintenance for direct
load insert into a non-partitioned table
Object types and LOBs are not supported
Local indexes are fully maintained by direct load insert operations.
A direct load insert must execute as the first statement in a transaction, before any other
DML statements. The transaction cannot have an explain plan, and explicit locks (LOCK
TABLE ) are not allowed. After the direct load insert statement, only an explicit
commit or rollback statement or call is allowed.
Clustered tables are not supported
Any violations will cause the statement to execute manually, using the conventional
insert path, without warnings or error messages.
Version 1.2
Example:
create table summary(c1,avgc2,sumc3)
PARALLEL (degree 5)
as select c1,avg(c2),sum(c3)
from daily_sales
group by c1;
In the above example, 5 parallel server processes scan the daily_sales table and 5
parallel server processes are used to build table summary controlled by 1query
coordinator.
To avoid I/O bottlenecks, specify a tablespace with at least as many devices as CPUs.
To avoid fragmentation in allocating space, the number of files in a tablespace should
be a multiple of the number of CPUs.
Note: Parallelism of the SELECT does not influence the CREATE statement. If the
CREATE is parallel, however, the optimizer tries to make the SELECT run in parallel
also.
Create Table As Select (CTAS).
Create table as select can be parallelized
only by a PARALLEL clause
When the CTAS is paralleled. Oracle also
parallels the scan operation if possible.
Hint in the select clause does not affect the
create operation.
Version 1.2
Advantages of CTAS.
Common subqueries can be computed once
and referenced many times
Complex queries can be decomposed into
simpler steps
Manual parallel deletes can be implemented
efficiently
Summary tables for multi-dimensional drill-
down analysis can be created efficiently
Tables can be reorganized much faster
Advantages:
a) Common subqueries can be computed once and referenced many times. This
may be much more efficient than referencing a complex view many times.
b) Complex queries can be decomposed into simpler steps in order to provide
application-level checkpoint/restart. For example, a complex multi-table join
on VLDB could run for dozens of hours. A crash during this query would
mean starting over from the beginning. Using CTAS, the query can be
rewritten as a sequence of simpler queries that run for few hours each. After
a system failure, the query can be restarted from the last complete step.
c) Manual parallel deletes can be implemented efficiently by creating a new
table that omits the unwanted rows from the original table. The original table
can then be dropped.
d) Summary table for multi-dimensional drill-down analysis can be created
efficiently. For example, a summary table might store the sum of revenue
grouped by month, brand, region, and salesperson.
e) Table can be reorganized (chained rows eliminated, free space compressed,
and so on) by copying the old table to the new table much faster than export/
import and easier than reloading.
Version 1.2
Tuning CTAS.
Storage space for CTAS
Setting INITRANS and MAXTRANS
Transaction Free Lists
NOLOGGING
When creating a table or index in parallel , each parallel server process uses
the values in the STORAGE clause of the create statement to create temp
segments to store the rows. Therefore a table created with an INITIAL of 5M
and a PARALLEL DEGREE of 12 consume at least 60MB of storage during
table creation, When the parallel coordinator combines the segments, some of
the segments may be trimmed, and the resulting table may be smaller than the
requested 60MB.
For table and indexes especially global indexes , blocks will be shared by
server processes of the same PDML statement. Even if the operations are not
performed against the same row, the server processes may share the same
index blocks. Each server process needs one transaction entry before it can
make changes to a block. Therefore you should set INITRANS to a a large
value. E.g. A PDML statement with degree of parallelism of 10 against a table with an
index - all 10 server processes might attempt to change the same index block, you must
set INITRANS to at least 10 to avoid self deadlock.
For PDML, each server process may require its own transaction free list, thus
if there are 2 global indexes , one with 30 and the other with 50 transaction
free lists, the degree of parallelism is limited to 30. The default number of
transaction free list depends on block size, you can also decrease the process
free list to leave more room for transaction free lists in the segment header.
Version 1.2
Free Space when CTAS in
PARALLEL
When you create indexes/tables in
PARALLEL, each parallel server process
allocates a new extent and fills the extent
with the table or indexs data.
CTAS with degree of parallelism of 3, at
least 3 extents for that table initially.
It is possible to create pockets of free
space. This occurs when temp segments are
larger than the rowdata inserted.
External Fragmentation:
If the unused space in each temporary segment is larger than the value of the
MINEXTENTS, unused space will be trimmed when merging rows from all
the temporary segments into table or index. The unused space is returned to
the system free space and can be allocated for new extents, but it cannot be
coalesced into a larger segment because its not contiguous space.
Internal Fragmentation:
If the unused space for each temporary segments is smaller than the value of
the MINEXTENTS, unused space cannot be trimmed when the rows are
merged into table or index. This unused space is not returned to the system
free space. It becomes part of the table or indexes and is available only for
subsequent inserts or for updates that require additional space.
Version 1.2
Creating Indexes in Parallel.
Parallel index creation works in much the
same way as a parallel table scan with an
ORDER BY clause
No redo or undo logging can be specified
using the NOLOGGING option
Space consideration
Can not automatically create the required
index in parallel when adding a UNIQUE or
PRIMARY KEY constraint
1) The parallel clause in the CREATE INDEX statement is the only way in which to
specify the degree of parallelism for creating the index. If the degree of parallelism is not
specified in the parallel clause of CREATE INDEX, then the number of CPUs is used as
the degree of parallelism.
2) Space consideration: when an index is created with an INITIAL of 5MB and a
PARALLEL DEGREE of 12 will consume at least 60MB of storage during index creation
because each process starts with an extent of 5MB. When the query coordinator process
combines the sorted sub-indexes, some of extents may be trimmed, and the resulting index
may be smaller than 60MB.
3) By using the NOLOGGING option, no redo or undo logging should occur during index
creation. This can significantly improve performance, but the index is temporarily
unrecoverable. Recoverability is restored after the new index is backed up.
4) When adding or enabling a UNIQUE or PRIMARY KEY constraint on a table, the
index cannot be automatically created in parallel. Instead, manually create an index on the
desired columns using the CREATE INDEX and an appropriate PARALLEL clause and
then add or enable the constraint using the existing index.
Version 1.2
Import Enhancements.
Effective method to load data into tables
Can run multiple import sessions to
partitions concurrently.
Partition-level import lets you selectively
retrieve data from partitions in export file.
Provides a way to merge partitions
Import reports rejected rows
E.g. the following command line statements imports row data of partition P1 and P2 of
table scott.order concurrently:
imp system/manager file=exp.dmp fromuser=scott tables=order:P1
imp system/manager file=exp.dmp fromuser=scott tables=order:P2
If table order does not exist on the target database, it is created and data is inserted into
the same partitions. If table order exists on the target database before the import, the
row data is inserted into the partitions whose range allows insertion. The row data can
end up in partitions of names other than p1 and p2.
Partition merging allows data from multiple partitions to be merged into one partition,
by setting up new partition on the target system that can take data from multiple
partitions from source export file. IGNORE=Y must be used.
Reconfiguring partitions by having more partitions or changing partition key is possible
via exp/imp . IGNORE=Y must be set .Rejected rows are reported
Version 1.2
Import Enhancements - Index
Creation and Maintenance.
SKIP_UNUSABLE_INDEXES=y
Other indexes not previously set Index
Unusable are updated as rows are inserted
Help to speed up the load for partitions
containing a large amount of existing data
When specifying SKIP_UNUSABLE_INDEXES=y , index maintenance is postponed
during the import process. The indexes must be rebuilt once the import completes.
Example:
1)Within SVRMGRL
alter table t modify partition P2 unusable local indexes;
2) At the o/s command line
imp scott/tiger file=exp.dmp tables=(t:P1,t:P2) ignore=y
skip_unusable_indexes=y
3) After the import completes. From within SVRMGRL
alter table t modify P2 rebuild unusable local indexes;
In this example, partition P1 contains a much larger amount of data in the existing
table, compared to the amount of data to be imported. Assuming the reverse is true for
partition P2, then performing index updates for P1 indexes during import is more
efficient than as a partitioned index rebuild, the opposite is true for P2 indexes.
Version 1.2
Detecting Data Load Performance
Problems
Diagnose Problems
Dynamic Performance Tables
Operating System Statistics
When workload distribution is unbalanced, check for the following conditions:
a) Is there device contention? Are there enough disk controllers to provide
adequate I/O bandwidth?
b) Is the system I/O bound , with little parallelism? If so, consider increasing
parallelism up to the number of devices.
c) Is the system CPU bound, with too much parallelism? Check the OS CPU
monitor to see whether a lot of time is being spend in system calls. Reducing the
degree of parallelism should help
d) Are there more concurrent users than the system can support
The overall aim is to look for good utilization of I/O and CPU resources.
Dynamic Performance tables:
a) V$FILESTAT: help to diagnose I/O and workload distribution problems
b) V$PQ_SESSSTAT: show number of PDML statements and messages sent.
c) V$PQ_SLAVE : total CPU time and messages per parallel slaves
d) V$PQ_SYSSTAT: help to determine the appropriate number of parallel
processes. To determine whether the parallel server processes are actual busy:
select * from v$pq_sysstat where statistic = Servers Busy;
Operating System Statistics can be found using vmstat, sar, iostat.Some OS information
can be obtained through V$SESSSTAT.
Version 1.2
Tuning System Parameters.
Parameters Affecting Resource
Parameters Affecting PDML
Parameters Related to I/O
Many initialization parameters affect parallel execution performance. For best results, start
with an initialization files that is appropriate for the intended application.
The recommended settings are guidelines for a large data warehouse (more than 100
gigabytes) on a typical high end shared memory multiprocessor with more than one or two
gigabytes of memory. The parameters are grouped as follows:
Parameters affecting resource consumption
Parameters affecting PDML
Parameters related to I/O
Version 1.2
Consumption.
PARALLEL_MAX_SERVERS
PARALLEL_MIN_SERVERS
SHARED_POOL_SIZE
SORT_AREA_SIZE
PARALLEL_SERVER_IDLE_TIME specifies
the idle time (in minutes) after which the
querys server process will be terminated.
1) PARALLEL_MAX_SERVERS sets the maximum number of parallel query servers for
an instance, the QC will start other parallel query server processes up to the number
specified in this parameter when the concurrent user processes exceed the number of
current parallel query processes concurrently running. By default this is at most twice the
number of CPUs.
Recommended value: 2 * CPUs * number_of _concurrent_users
Example: with 4 CPUs machine and 3 concurrent users are running parallel
operations, the parameter PARALLEL_MAX_SERVERS can be set to:
PARALLEL_MAX_SERVERS= 2 * 4 * 3 = 24
2) PARALLEL_MIN_SERVERS specifies the number of processes to be started at
instance startup time. Consider increasing PARALLEL_MIN_SERVERS if more parallel
server processes are active, and the Servers Started statistics of V$PQ_SYSSTAT is
continuously growing.
3) SHARED_POOL_SIZE
Recommended value:
default + (3 * message_buffer_size)*(CPUs * 2) *PARALLEL_MAX_SERVERS
Note: the message buffers are used for parallel server processes to communicate with each
other, they can be 2K or 4K, depending in the platform.
4) SORT_AREA_SIZE specifies the amount of memory per server process for sort
operation, large value can dramatically increase the performance of sort as the entire
operation is done in memory. However, if the sort area is too high, the operating system
paging rate will be excessive.
Version 1.2
Consumption for PDML.
TRANSACTIONS
ROLLBACK_SEGMENTS
LOG_BUFFER
DML_LOCKS
ENQUEUE_RESOURCES
1) TRANSACTIONS: for PDML, each parallel server process starts a new transaction.
We may need to increase the value of TRANSACTIONS,which specifies the maximum
number of concurrent transactions. For example, if we have degree 20 parallelism, we
will have 20 more transactions plus 1 coordinator process, increase TRANSACTIONS by
21.
2) ROLLBACK_SEGMENTS, PDML will necessitates many rollback segments. For
example, one command with degree 5 parallelism uses 5 server transactions, which should
be distributed among different rollback segments. We should consider to increase number
of rollback segments for high number of concurrent PDML processes.
3) DML_LOCKS: a parallel insert to partitioned table holds many lock, so we should
increase the value of DML_LOCKS parameters accordingly. For example, running insert
with parallel degree 100 to table with 600 partitions, the number of locks required are:
The coordinator acquires: 1 table lock SX and 600 partition locks X
Total server processes acquire: 100 table locks SX, 600 partition locks
NULL and 600 partition-wait locks X.
4) ENQUEUE_RESOURCES should be increased by the same amount of
DML_LOCKS parameter.
Version 1.2
Parameters Related to I/O.
DB_BLOCK_SIZE
DB_FILE_MUTIBLOCK_READ_COUNT
SORT_DIRECT_WRITES
DISK_ASYNC_IO and TAPE_ASYNC_IO
1) DB_BLOCK_SIZE
Recommended value: 8K or 16K depends on database application: setting large
DB_BLOCK_SIZE=16K/32K for DSS/Data Warehouse, 4K-8K for OLTP, 8K for
mixture of both OLTP and large batch processing.
2) DB_FILE_MULTIPLE_READ_COUNT determines how many database blocks are
read with a single operating system READ call during full table scan. Many platforms
limits the number of bytes read to 64K.
Recommended value: 8 for 8K block size, 4 for 4K block size.
3) SORT_DIRECT_WRITES can improve sort performance if memory is abundant on
your system. When this parameter is set to AUTO and SORT_AREA_SIZE is greater than
10% of the buffer size, this parameter causes the buffer cache to be bypassed for writing
intermediate sort results to disk.
4) DISK_ASYNC_IO and TAPE_ASYNC_IO: these parameters turn on or off the use of
the systems asynchronous I/O facility. They allow parallel server processes to overlap I/O
requests with processing when performing table scans.
Note: If the operating systems support asynchronous I/O, these parameters should be left
at the default value of TRUE. Also, asynchronous operations are currently supported with
parallel table scans and hash joins only. They are not supported for sorts, or for serial table
scans.
Version 1.2
References:
1) Oracle8 Server Tuning
2) Oracle8 Utility Users Guide
3) Oracle8 Server Concept
4) Oracle8 Data Warehousing
(Tim Gorman and Gary Dodge)
DSI 306 - Unit 9, Miscellaneous 9-1
Version 1.2
Copyright Oracle Corporation,1998. All rights reserved.
9
9
Miscellaneous Enhancements
Design: Alok Satyawadi (Oracle Consulting Services)
Developed by: Alok Satyawadi
Review: Troy Anthony (Core Competancy
Development, Asia-Pacific).
Modified: Troy Anthony (Aug 98) v1.2
30 minutes Lecture
30 minutes Total
Version 1.2
Objectives
At the end of this session, the participants
should be able to understand
Sort Enhancements
Sort Direct Writes
Version 1.2
Contents
Sort problems
Sort Enhancements
Sort Segment
Sort Segment Storage
Performance
Sort Direct Writes
Tuning Considerations
References
Version 1.2
Sort Problems
small sorts - done in memory.
large sorts - done in temporary data.
medium sized sorts - can be a problem
cannot be done in memory.
overheads in temporary data can be
significant.
In the previous releases of Oracle, very small sorts (that could fit into
sort_area_size) would be done in memory. This would not require any space
management operations. Large sorts with multiple passes would require
temporary data. These operations would be dominated by I/O operations and
CPU cycles. Space management operations (allocating an extent, merging
extents) would be present but were not a significant part of the cost of sorting.
The medium sized sorts, though, could become expensive. This is because they
were not small enough to be done in the memory and not big enough to make
the space management operations insignificant.
Space Management Operations:
Sorts generates streams of sorted data called runs. These runs are written to
temporary storage. A brand new segment is allocated for each sort
operation.As runs are generated, new extents are added to the segment which is
a space management operation.
After the sort is finished, temporary segment is dropped which is also a space
management operation.
Overheads involved in Space Management Operations:
Since there is a single ST enqueue to perform space management,
concurrency of these operations becomes poor, especially in OPS
environments because of cross instance calls.
Updating used/free extent tables in data dictionary could be a significant
overhead in OPS environments.
Version 1.2
Sort Enhancements
Some sort enhancements:
Sort Segment
Sort Extent Pool
Sort Extent Pool Latch
Temporary Tablespace
Space management operations overheads have been addressed by adding a
sort segment in version 7.3 of Oracle.
Sort Segment: A separate segment for sorting per instance per tablespace
cached in memory.
Sort Extent Pool: SGA structure that describes sort segment. It keeps track of
used and unused extents of a given sort segment.
Sort Extent Pool Latch: a latch which synchronizes the Sort Extent Pool
Operations.
Temporary tablespace: tablespace which can only be used for Sort Segments.
No permanent objects can reside in this tablespace.
Version 1.2
Sort Segment
created at the time of first sort operation
size grows till all sort requirements
satisfied
allocs/deallocs of space are in-memory
operations
no cross instance calls required
stable state - when no new extents are
allocated, and no existing extents released
A sort segment in a given tablespace is created at the time of first sort
operation which uses this tablespace as its temporary storage. After creation, it
grows until it reaches a state in which all the sort operations can satisfy their
temporary storage requests by allocating existing extents out of this segment.
At this stage all sort storage requests can be satisfied by a lookup in a Sort
Extent Pool. Multiple processes which perform this lookup are synchronized
based on a local latch - Sort Extent Pool Latch. These are very short-time
synchronizations. Also, these synchronizations happen only within an instance,
multiple instances do not have to synchronize with each other. This reduces the
overhead of cross-instance calls.
Another important feature of these sort segments is that they reside in
temporary tablespaces. This feature of Temporary Tablespaces has been
introduced to alleviate the problem of a permanent object residing in the same
tablespace as a sort segment running out of space. For this reason, permanent
objects and temporary objects cannot share the same tablespace and hence the
concept of temporary tablespace.
Version 1.2
Sort Segment Storage
sort segments take their parameters
from default storage of the tablespace
extent size is NEXT from storage params
number of extents is UNLIMITED
A new syntax is introduced to make a tablespace temporary (or non-
temporary). The Sort segment Optimization happens only in temporary
tablespaces.
The extent size is the value of the NEXT specification, rounded up to the
closest multiple of multiblock IO size defined by the
DB_FILE_MULTIBLOCK_READ_COUNT init.ora parameter.
Version 1.2
Performance
excessive Sort Latch wait - bump up NEXT,
sort_area_size
excessive allocs/deallocs - add space to
tablespace, assign different tablespace for
big sorts
SGA thrashing
If there is a lot of contention for the sort pool extent latch, even in stable state,
it may be worthwhile to increase the extent size by changing the NEXT value
fo the DEFAULT STORAGE clause for the temporary tablespace. Another
possible reason for this contention could be many concurrent sorts happening
in the system. In this case it may be a good idea to bump up sort_area_size.
If the ADDED_EXTENTS and FREED_EXTENTS columns of
v$sort_segments table shows excessive allocation/de-allocation activity, one
may consider adding more space to the corresponding tablespace. Another
possibility is that there could be some very big sorts happening in the system.
In this case, it mey be worthwhile to assign these big sorts to another
temporary tablespace.
Sort Extent Pool is allocated from SGA, and therefore may have adverse effect
on the behaviour of other SGA caches and pools. The size of this pool is
roughly propotional to the number of total extents of all the sort segments in a
given instance.
Version 1.2
Sort Direct Writes
introduced in release 7.2.2
set of write buffers for asynchronous
writes bypassing the buffer cache
size, number of buffers was user-tunable
memory for buffers separate from sort area
no cost model for optimizer
Sort Direct Writes provides a set of write buffers for asynchrnous writes that
bypass the buffer cache. In the absence of such a feature, DBWR process
would become a bottleneck as multiple sorts would write into the buffer cache.
The size and number of buffers was user-tunable. The total memory for the
buffers was not part of sort_area_size. The optimizer did not have a cost model
for the new write method.
Version 1.2
Conventional Sorting and
Memory Paths
This diagram shows activity during a serial index creation that performs a disk
sort. Data blocks are read from the table to be indexed [1] and sort runs begin.
Sort blocks are created in the sort area and written out to sort segments [2] to
free memory for further sort runs. Table block reads [1] finish before the
merge phase begins. Sort segment blocks are read [3] back into the sort area
and the index blocks are written [4] to the index. Sort segment reads [3] and
index block writes [4] will first peak and then taper off until the index is
completely created.
The process of sort direct writes bypasses the writes through the Buffer Cache
in steps 2 and 4, above. The sort server process writes directly to sort segments
on disk. The setting of the init.ora parameter (see slide 12)
SORT_DIRECT_WRITES enables this feature. If the process of direct reads is
performed (which is the default action on full table scans from Oracle v7.1.5),
then step 1, above, would also bypass the Buffer Cache.
Note: refer to the paper Tuning Large Sorts, by Prabhakar Gongloor for a
more detailed discussion of this feature.
Version 1.2
Sort Direct Writes (7.3)
memory comes from sort area size
optimizer has cost model
suited for DSS type operations
In release 7.3, memory for these write buffers comes from sort_area_size. This
way, user has to tune only one parameter to obtain desired performance. Also,
a cost model is provided for the optimizer to calculate related costs for this
write method.
Version 1.2
Init Ora Parameters
sort_direct_writes
sort_write_buffers
sort_write_buffer_size
Initilaization parameter sort_direct_writes can be set to enhance sort
performance. When this parameter is set, each sort will allocate several large
buffers in memory for direct disk I/O. The default value of this parameter for
release 7.3 is AUTO. For Oracle8, the default is again AUTO.
When set to AUTO, Oracle automatically allocates direct write buffers if the
sort_area_size is ten times the configuration of direct write buffers. Point to be
noted is that this memory is taken out of sort_area_size allocation. This
behavior was different in release 7.2.
Sort_write_buffers and sort_write_buffer_size can be set to control the size of
these buffers. Sort writes an entire buffer for each I/O operation. The Oracle
process performing the sort writes the sort data directly to the disk, by passing
the buffer cache.
Version 1.2
Tuning Considerations
bigger sort_area_size - doesnt work
always
disk sorts improve with sort_direct_writes
parallelism always improves sorting time
once enabled, little gain on bumping up
sort write buffers or sort write buffer size.
Whenever we a see a problem with sorting, we more often than not, bump up
the sort_area_size and expect a performance improvement. This may not yield
the expected results all the time. Larger sort areas provide benefit if the sort
can be completed entirely in memory. If not, the resulting disk sort is not
greatly affected by sort area size.
Optimum performance for disk sorts is achieved using a sort area size with sort
direct writes and asynchronous I/O.
Parallelism has the most noticeable effect on reducing the sorting time.
Version 1.2
References
Understanding Oracle7 Release 7.3
Options for VLDB - Pranav Mohindroo -
http://spg.us.oracle.com
Tuning Large Sorts - Prabhakar
Gongloor - http://zen.us.oracle.com/COE
Labs and Exercises
DSI306 - Lesson 2 - Multi Threaded Server
Oracle Confidential - Page 2-1
DSI306 - Lesson 2 - Multi-Threaded Server
Case Studies and Lab Exercises
Contents:
I ntroduction to Labs and case studies 2
Lab 1: Establishing Multi-Threaded Server 3
Lab 2: Establish Connection Manager (CMAN) for Multiplexing
Lab 3: Connection Manager (CMAN) for Connection Pooling
Appendix - Sample Configuration Files (init.ora,cman.ora,
listener.ora, tnsnames.ora)
Introduction to Labs and Case Studies
This exercise involves configuring a database to operate using a Multi-threaded server (MTS)
configuration. This configuration provides the capability for many user processes to share a small number
of server processes (as opposed to having a dedicated connection for each user process). This minimises
the number of server processes and maximises the use of available system resources.
The MTS configuration has been further enhanced by the addition of other processes to further maximise
the number of available connections. A process called Connection Manager (CMAN) has been created
which can perform such tasks as connection pooling, multiplexing, and transport sharing. The CMAN
process communicates with the database using an MTS connection.
This exercise involves setting correct parameters in the init.ora file to enable MTS to be recognised by the
database, and establishing a listener process to wait on connection requests (for both dedicated, MTS and
CMAN connections).
Exercise 1 Setting Up MTS
1) Set environment variables.
% setenv TNS_ADMIN <home directory
% setenv ORACLE_SID <SID>
Modify your Listener_MTS<x>.ora to specify the correct port number you have been allocated.
2) Modify your init.ora to configure MTS, including the following:-
a) 2 TCP Dispatchers
b) MTS_MAX_DISPATCHERS = 10
c) MTS_MAX_SERVERS=10
d) MTS_SERVERS=2
e) MTS_SERVICE=NET8_MTS_SERV (NOTE this is different from the DBNAME)
f) Include the correct port number you have been allocated in the
MTS_LISTENER_ADDRESS
3) Start your instance specific listener i.e.
% lsnrctl start listener_mts<x>
verify what services are available.
4) Now start your database and check the services again.
5) Connect to your database again and issue an alter system command to startup a further 2 TCP
dispatchers.
6) Check lsnrctl services again.
7) Change sqlnet.ora to include AUTOMATIC_IPC=OFF
8) Modify your tnsnames.ora to include your designated port number and SID/SERVICE name
for both the Dedicated Alias and MTS alias.
9) Startup two additional unix sessions and connect to your database using the net8_ded alias and
net8_MTS alias.
10) SELECT SERVER,OSUSER FROM V$SESSION S, V$MYSTAT M
WHERE S.SID=M.SID AND
ROWNUM=1.
11) Run the script mts1.sql which will report which users are running through which MTS
Dispatchers.
Exercise/Demo 2 Configure Connection Manager for Multiplexing
1) Shutdown the database
2) Modify init.ora MTS_DISPATCHERS parameter i.e.
mts_dispatchers="(PROTOCOL=TCP)(MULTIPLEX=ON)(DISPATCHERS=1)"
3) Modify sqlnet.ora by adding the line:-
USE_CMAN=TRUE
4) Add an alias in the local TNSNAMES.ORA for a CMAN connection i.e.
NET8_CMAN =
(description =
(address_list =
(address =
(protocol = tcp)
(host = isis)
(port = 1610)
)
(address =
(protocol = tcp)
(host = isis)
(port = 1528)
)
)
(connect_data =
(SID = NET8MTS)
)
(SOURCE_ROUTE = YES)
)
5) There is currently a CMAN.ORA configured on the default PORT of 1610. E.g.
cman =
(address_list =
(address =
(protocol = tcp)
(host = isis)
(port = 1610)
)
)
6) Start the listener, Database and Connection Manager IN THAT ORDER!!!!
7) Issue a lsnrctl Services command to check which Port has been allocated to the Dispatcher
D000.
% lsnrctl services
Output similar to the following should be produced:
LSNRCTL for Solaris: Version 8.0.4.0.0 - Production on 18-MAR-98 13:35:56
(c) Copyright 1997 Oracle Corporation. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=ipc)(KEY=PNPKEY))
Services Summary...
NET8MTS has 1 service handler(s)
DEDICATED SERVER established:1 refused:0
LOCAL SERVER
NET8_MTS_SERV has 1 service handler(s)
DISPATCHER established:0 refused:0 current:0 max:16382 state:ready
D000 <machine: isis, pid: 18146>
(ADDRESS=(PROTOCOL=tcp)(DEV=21)(HOST=138.3.40.145)(PORT=44837))
extproc has 1 service handler(s)
We can see that the port is 44837. (Make a note of this so we can prove that Multiplexing is
enabled).
8) Make three connections using the NET8_CMAN alias.
9) Select osuser, server from V$SESSION just to make sure we are making a Shared Server
connection.
10) Issue the netstat command and grep for the port 44837 i.e.
% netstat -a | grep 44837
*.44837 *.* 0 0 0 0 LISTEN
isis.45450 isis.44837 32768 0 32768 0 ESTABLISHED
HERE WE HAVE THREE SESSIONS BEING MULTIPLEXED OVER ONE
INCOMING/OUTGOING TRANSPORT.
11) Now make a further three connections using the NET8_MTS i.e. bypassing Connection
Manager and multiplexing.
Issue the netstat command again..
% netstat -a | grep 44837
*.44837 *.* 0 0 0 0 LISTEN
isis.45637 32768 0 32768 0 ESTABLISHED
12) Continue making more connections via the NET8_CMAN alias until you receive the error:-
SQL*Plus: Release 8.0.4.0.0 - Production on Wed Mar 18 14:9:5 1998
ERROR:
ORA-12204: TNS:received data refused from an application
13) Issue the command:-
% cmctl stats
Output similar to the following should be produced:
CMCTL for Solaris: Version 8.0.4.0.0 - Production on 18-MAR-98 14:11:42
CMAN
(STATISTICS=(TOTAL_RELAYS=9)(ACTIVE_RELAYS=8)(MOST_RELAYS=8)(OUT_OF_REL
AY=1)(TOTAL_REFUSED=1))
This is because the DEFAULT Maximum_Relays parameter within CMAN.ORA is set to 8. Even after
making 8 successful connections we still only see one relay via netstat:-
netstat -a | grep 44837
*.44837 *.* 0 0 0 0 LISTEN
Exercise/Demo 3 Configure Connection Manager for Pooling
1) Shutdown the database
2) Modify init.ora MTS_DISPATCHERS parameter i.e.
mts_dispatchers="(DISPATCHERS=1)(PROTOCOL=TCP)(POOL=ON)\
(CONNECTIONS=1) (TICKS=1)"
3) Start the listener, Database and Connection Manager IN THAT ORDER!!!!
4) Open four separate windows and make a connection via the NET_CMAN alias.
5) Select * from user_tables from each session and note the behaviour of Connection Manager
Pooling i.e. only one transport session can be active at any one time.
Sample Configuration Files
SAMPLE INIT.ORA
#mts_dispatchers = "(DISPATCHERS=2)(PROTOCOL=TCP)"
#mts_dispatchers="(PROTOCOL=TCP)(MULTIPLEX=ON)\ Multiplexing
(DISPATCHERS=1)"
mts_dispatchers="(DISPATCHERS=1)(PROTOCOL=TCP)\ Pooling
(POOL=ON)(CONNECTIONS=1)(TICKS=1)"
mts_servers=2
mts_max_servers=10
mts_service=NET8_MTS_SERV Service Name
mts_listener_address="(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(host=isis)\
(port=1528))
SAMPLE LISTENER.ORA
Listener File for using MTS with Connection Manager
SQLNET.AUTHENTICATION_SERVICES = (NONE)
USE_PLUG_AND_PLAY_LISTENER = OFF
USE_CKPFILE_LISTENER = OFF
#The following setup is to test MTS
#Change the host, use the same port as in tnsnames.ora
MTS_LISTENER=(ADDRESS_LIST=
(ADDRESS= (PROTOCOL=tcp) (PORT=1600) (HOST=apcprsol1))
(ADDRESS= (PROTOCOL=ipc) (KEY=NET8MTS))
(ADDRESS= (PROTOCOL=ipc) (KEY=PNPKEY))
)
SID_LIST_MTS_LISTENER=
(SID_LIST=
(SID_DESC= (SID_NAME=O8PT) (ORACLE_HOME=/u03/app/oracle/product/8.0.4)
(GLOBAL_DBNAME = O8PT))
)
STARTUP_WAIT_TIME_MTS_LISTENER=0
CONNECT_TIMEOUT_MTS_LISTENER=10
TRACE_LEVEL_MTS_LISTENER=OFF
SAMPLE SQLNET.ORA
# The significant difference is USE_CMAN=TRUE
# Confirm host name and port
AUTOMATIC_IPC = OFF
TRACE_LEVEL_CLIENT = OFF
BEQUEATH_DETACH=NO
SQLNET.EXPIRE_TIME = 0
NAMES.DEFAULT_DOMAIN = world
NAME.DEFAULT_ZONE = world
USE_CMAN=TRUE
SQLNET.CRYPTO_SEED = "-1233843592-1233465155"
NAMES.DIRECTORY_PATH=(TNSNAMES,ONAMES)
NAMES.PREFERRED_SERVERS =
(ADDRESS_LIST =
(ADDRESS =
(COMMUNITY = TCP)
(PROTOCOL = TCP)
(Host = apcprsol1)
(Port = 1599)
)
)
NAME.PREFERRED_SERVERS =
(ADDRESS_LIST =
(ADDRESS =
(COMMUNITY = TCP)
(PROTOCOL = TCP)
(Host = apcprsol1)
(Port = 1599)
)
)
SAMPLE TNSNAMES.ORA
# Change host, match port with listener.ora
# Make sure SID is the real SID of the DB you are trying to connect to
NET8_DED.WORLD =
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)
(HOST = apcprsol1)
(PORT = 1600)
)
(CONNECT_DATA = (SID = O8PT)
)
)
# Change host, match port with listener.ora
# In this case, the SID is the one specified in the init.ora mts_service=NET8_MTS_SERV
NET8_MTS.WORLD =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP) (HOST = apcprsol1) (PORT = 1600))
(CONNECT_DATA = (SID = NET8_MTS_SERV))
)
# Change host, match port with listener.ora and cman.ora
# In this case, the SID is the one specified in the init.ora mts_service=NET8_MTS_SERV
NET8MTS_CMAN.WORLD =
(DESCRIPTION =
(ADDRESS_LIST=
)
(CONNECT_DATA = (SID = NET8_MTS_SERV))
(SOURCE_ROUTE=YES)
)
DSI306 - Lesson 3 - Manual Partitioning
DSI306 Lesson 3 Manual Partitioning , Star queries, bitmap indexes
Lab exercise 1 - Manual Partitioning
In this exercise we will examine the usage of manual partitioning. The user must have a default tablespace
large enough to hold approximately 5 tables of 1 MB each.
From sqlplus or Server Manager run the script dsi306_3_1a.sql. This will create 4 tables of common
attributes. Each of these tables has a check constraint on the column send_date which will limit the data
that may be inserted into them.
Run the script dsi306_3_1b.sql . This will add an index to each table, analyze the tables and the indexes,
and then create a UNION ALL view across the 4 manual partitions.
Run the script dsi306_1_populate.sql. This will insert 10,000 rows in to each of the tables.
In either the init.ora or using an alter session command, enable the event 10046 to level 8. The syntax to
use is:
Init.ora - event=10046 trace name context forever, level 8 (ensure that either only one event
line is set or event lines are consecutive.) Remember to shutdown and restart the instance.
Session variable - Alter session set events 10046 trace name context forever, level 8
This event will create a trace file showing the execution of all statements (including wait events, and
statistic information). If the AUTOTRACE facility is available within SQLPLUS set this as well.
Examine the view and the tables.
Select information from two of the partitions, use values for send_date (the constraining column) to
restrict the selection. For example, SELECT * FROM LINE_ITEM WHERE SEND_DATE = 01-JUN-94
OR SEND_DATE = 01-JUL-95;
Examine the trace file information and/or the autotracing. Have partitions been eliminated? How many
rows were retrieved from each partition? How many rows were scanned in each partition?
In the init.ora file set the parameter partition_views_enabled = TRUE (you may also set
v733_plans_enabled = TRUE this may allow some of the plans to function as expected).
Run the same select statement again. Examine the trace file and the autotracing. Have partitions been
eliminated? How many rows were retrieved from each partition? How many rows were scanned in each
partition?
If time permits try different queries, attempt to force use index accesses as opposed to full table scans (the
amount of data may not be sufficient for the optimizer to choose this on its own).
Lab Exercise 2 Star Queries
Run the script dsi306_3_2a.sql . This will create the schema outlined below -
Dimension table:
sales_table (
SALES_DATE DATE,
STORE_ID NUMBER,
PROD_ID NUMBER,
PROMO_AMT NUMBER,
PROMO_QTY NUMBER,
SALES_REP_ID NUMBER,
ORDER_ID NUMBER,
ORD_TYP NUMBER(38),
PROMO_TYP NUMBER )
Lookup tables:
prod_id_desc (
PROD_ID NUMBER,
PROD_ID_NAME VARCHAR2(20));
promo_typ_desc (
PROMO_TYP NUMBER,
PROMO_TYP_DESC VARCHAR2(20));
store_id_desc (
STORE_ID NUMBER,
store_id_name VARCHAR2(20));
sales_rep_id_desc (
sales_rep_id NUMBER,
sales_rep_id_desc VARCHAR2(20));
order_typ_desc (
order_typ NUMBER,
order_typ_desc VARCHAR2(20));
Run the script dsi306_3_2b.sql this will populate the tables, inserting approximately 100,000 rows in to
the sales_data table, and inserts appropriate information in to the lookup tables.
Enable the autotracing facility, and leave the event 10046 set (the trace file generated may be very large).
Two select statements are listed in file dsi306_3_2c.sql. Each statement is identical except for the
inclusion of the star hint /*+ STAR */ . Examine the different outputs produced, and estimate the time
taken for each join.
Lab Exercise 3 - Bitmap indexes
This exercise is designed to demonstrate the way in which bitmap indexes are created, stored and
accessed.
Run the sql script dsi306_3_3a.sql - this will create a table with the following attributes:
colour_table ( recid number, colour varchar2(20), rec2id number ) and should contain 3002
rows. The first and last of these rows will have the value BLUE set for the column COLOUR.
Set the events 10608 and 10717 in the init.ora file. This will trace bitmap index creation and trace any
compaction that may occur during creation, respectively.
Create a Bitmap Index on the column COLOUR in the COLOUR_TABLE. Examine any trace file
generated.
Shutdown and then set the events 10710 and 10715 in the init.ora (in place of 10608 and 10717). This
will trace bitmap access. Select the colour BLUE from the table COLOUR_TABLE. Examine the trace
and determine what functions are called and the actions they are performing.
DSI306 Lesson 3 Solutions
LAB Exercise 1
From sqlplus or Server Manager run the script dsi306_1a.sql. This will create 4 tables of common
attributes. Each of these tables has a check constraint on the column send_date which will limit the data
that may be inserted into them.
Run the script dsi306_1b.sql . This will add an index to each table, analyze the tables and the indexes,
and then create a UNION ALL view across the 4 manual partitions.
Run the script dsi306_1_populate.sql. This will insert 10,000 rows in to each of the tables.
In either the init.ora or using an alter session command, enable the event 10046 to level 8. The syntax to
use is:
Init.ora - event=10046 trace name context forever, level 8 (ensure that either only one event
line is set or event lines are consecutive.) Remember to shutdown and restart the instance.
Session variable - Alter session set events 10046 trace name context forever, level 8
This event will create a trace file showing the execution of all statements (including wait events, and
statistic information). If the AUTOTRACE facility is available within SQLPLUS set this as well.
Examine the view and the tables.
Select information from two of the partitions, use values for send_date (the constraining column) to
restrict the selection. For example, SELECT * FROM LINE_ITEM WHERE SEND_DATE = 01-JUN-94
OR SEND_DATE = 01-JUL-95;
=====================
PARSING IN CURSOR #1 len=80 dep=0 uid=31 oct=3 lid=31 tim=0
hv=1060012657 ad='802b0f04'
select * from line_item where send_date ='01-JUN-94' or send_date = '01-
JUL-95'
END OF STMT
PARSE #1:c=0,e=0,p=39,cr=418,cu=1,mis=1,r=0,dep=0,og=4,tim=0
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1650815232 p2=1 p3=0
WAIT #1: nam='file open' ela= 0 p1=0 p2=0 p3=0
WAIT #1: nam='db file sequential read' ela= 0 p1=7 p2=4700 p3=1 P1,P2 =
File #,Block #
WAIT #1: nam='db file scattered read' ela= 0 p1=7 p2=4701 p3=4
FETCH #1:c=0,e=0,p=18,cr=16,cu=3,mis=0,r=1,dep=0,og=4,tim=0
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1650815232 p2=1 p3=
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1650815232 p2=1
p3=0
WAIT #1: nam='db file sequential read' ela= 0 p1=8 p2=3523 p3=1
p3=0
=====================
hv=1898737984 ad='802b1ca4'
SELECT PT.VALUE FROM SYS.V_$SESSTAT PT WHERE PT.SID=:1 AND PT.STATISTIC#
IN (7,38,39,40,97,155,156,157,161,162) ORDER BY PT.STATISTIC#
END OF STMT
p3=0
p3=0
STAT #0 id=1 cnt=60 pid=0 pos=0 obj=0 op='VIEW LINE_ITEM ' (Rows ret.)
STAT #0 id=2 cnt=60 pid=1 pos=1 obj=0 op='UNION-ALL PARTITION '
STAT #0 id=3 cnt=10000 pid=2 pos=1 obj=2214 op='TABLE ACCESS FULL
LINE_ITEM_1994 ' (Rows scanned)
LINE_ITEM_1995 '
LINE_ITEM_1996 '
LINE_ITEM_1997 '
STAT #0 id=1 cnt=10 pid=0 pos=0 obj=0 op='SORT ORDER BY '
STAT #0 id=2 cnt=10 pid=1 pos=1 obj=0 op='FILTER '
STAT #0 id=3 cnt=12000 pid=2 pos=1 obj=0 op='FIXED TABLE FULL
X$KSUSESTA_*___< '
STAT #0 id=4 cnt=0 pid=2 pos=2 obj=0 op='SORT AGGREGATE '
STAT #0 id=5 cnt=200 pid=4 pos=1 obj=0 op='FIXED TABLE FULL X$KSUSD__ '
=====================
Using the script dba_r.sql which accepts the parameter P1 (file #) and P2 (block #) to view the
segment being read (the sequential reads that appear in the trace file are the segment header
blocks).
As this output shows all the tables were fully scanned (cnt = 10000 for each partition)., with eventually 60
rows being returned. No partitions were eliminated, as the partition_view_enabled parameter was not set.
Examine the trace file information and/or the autotracing. Have partitions been eliminated? How many
rows were retrieved from each partition? How many rows were scanned in each partition?
In the init.ora file set the parameter partition_views_enabled =TRUE (you may also set
v733_plans_enabled =TRUE this may allow some of the plans to function as expected).
Run the same select statement again. Examine the trace file and the autotracing. Have partitions been
eliminated? How many rows were retrieved from each partition? How many rows were scanned in each
partition?
=====================
hv=1060012657 ad='802ac964'
select * from line_item where send_date ='01-JUN-94' or send_date = '01-
JUL-95'
END OF STMT
p3=0
p3=0
=====================
hv=1898737984 ad='802ad704'
SELECT PT.VALUE FROM SYS.V_$SESSTAT PT WHERE PT.SID=:1 AND PT.STATISTIC#
IN (7,38,39,40,97,155,156,157,161,162) ORDER BY PT.STATISTIC#
END OF STMT
p3=0
p3=0
STAT #0 id=1 cnt=60 pid=0 pos=0 obj=0 op='VIEW LINE_ITEM '
STAT #0 id=2 cnt=60 pid=1 pos=1 obj=0 op='UNION-ALL PARTITION '
LINE_ITEM_1994 '
STAT #0 id=5 cnt=30 pid=2 pos=2 obj=0 op='FILTER ' The filter statement
implies that elimination is possible.
LINE_ITEM_1995 '
LINE_ITEM_1996 ' No rows scanned (cnt=0) therefore part. eliminated
LINE_ITEM_1997 '
STAT #0 id=1 cnt=10 pid=0 pos=0 obj=0 op='SORT ORDER BY '
STAT #0 id=3 cnt=12000 pid=2 pos=1 obj=0 op='FIXED TABLE FULL
X$KSUSESTA_*wd__< '
STAT #0 id=4 cnt=0 pid=2 pos=2 obj=0 op='SORT AGGREGATE '
STAT #0 id=5 cnt=200 pid=4 pos=1 obj=0 op='FIXED TABLE FULL X$KSUSD'
=====================
Lab Exercise 2 Solutions
Run the script dsi306_3_2a.sql . This will create the schema outlined below -
Dimension table: sales_table
Lookup tables: prod_id_desc,promo_typ_desc,store_id_desc,sales_rep_id_desc
Run the script dsi306_3_2b.sql this will populate the tables, inserting approximately 100,000 rows in to
the sales_data table, and inserts appropriate information in to the lookup tables.
SQL> select count(*) from sales_table;
COUNT(*)
----------
105000
Enable the autotracing facility, and leave the event 10046 set (the trace file generated may be very large).
Two select statements are listed in file dsi306_3_2c.sql. Each statement is identical except for the
inclusion of the star hint /*+ STAR */ . Examine the different outputs produced, and estimate the time
taken for each join.
Set AUTOTRACE on (in sqlplus) or set sql_trace=true, or set timed_statistics = true in order to generate
some diagnostic output.
The two SQL statements that are run are similar to the following -
select order_id, c.sales_rep_id, f.ord_typ, f.promo_typ
from order_typ_desc a,promo_typ_desc e,prod_id_desc b,
sales_rep_id_desc c,store_id_desc d, sales_table f
where
a.ord_typ = f.ord_typ and b.prod_id = f.prod_id
and c.sales_rep_id = f.sales_rep_id
and d.store_id = f.store_id and e.promo_typ = f.promo_typ
and a.ord_typ = 1 and e.promo_typ = 0 and c.sales_rep_id = 2 ;
select /*+ STAR */ order_id,c.sales_rep_id,f.ord_typ,f.promo_typ
from order_typ_desc a,promo_typ_desc e,prod_id_desc b,
sales_rep_id_desc c,store_id_desc d, sales_table f
where
a.ord_typ = f.ord_typ and b.prod_id = f.prod_id
and c.sales_rep_id = f.sales_rep_id
and d.store_id = f.store_id and e.promo_typ = f.promo_typ
and a.ord_typ = 1 and e.promo_typ = 0 and c.sales_rep_id = 2 ;
The only difference in these statements is the addition of the STAR hint in the second select operation.
Setting AUTOTRACE on produces the following output -
SQL> set autotrace on
SQL> @query2
ORDER_ID SALES_REP_ID ORD_TYP PROMO_TYP
---------- ------------ ---------- ----------
3095 2 1 0
4525 2 1 0
3810 2 1 0
1665 2 1 0
4590 2 1 0
3875 2 1 0
1015 2 1 0
3680 2 1 0
170 2 1 0
4460 2 1 0
1275 2 1 0
.
4330 2 1 0
3615 2 1 0
495 2 1 0
1925 2 1 0
1210 2 1 0
4785 2 1 0
105 2 1 0
2250 2 1 0
2445 2 1 0
1617 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 MERGE JOIN
2 1 SORT (JOIN)
3 2 MERGE JOIN
4 3 SORT (JOIN)
5 4 MERGE JOIN
6 5 SORT (JOIN)
7 6 MERGE JOIN
6 5 SORT (JOIN)
7 6 MERGE JOIN
8 7 SORT (JOIN)
9 8 MERGE JOIN
10 9 SORT (JOIN)
11 10 TABLE ACCESS (FULL) OF 'SALES_TABLE'
12 9 SORT (JOIN)
13 12 TABLE ACCESS (FULL) OF 'STORE_ID_DESC'
14 7 SORT (JOIN)
15 14 TABLE ACCESS (FULL) OF 'SALES_REP_ID_DESC'
16 5 SORT (JOIN)
17 16 TABLE ACCESS (FULL) OF 'PROD_ID_DESC'
18 3 SORT (JOIN)
19 18 TABLE ACCESS (FULL) OF 'PROMO_TYP_DESC'
20 1 SORT (JOIN)
21 20 TABLE ACCESS (FULL) OF 'ORDER_TYP_DESC'
Statistics
----------------------------------------------------------
27 recursive calls
2006 db block gets
2401 consistent gets
4324 physical reads
0 redo size
57889 bytes sent via SQL*Net to client
12874 bytes received via SQL*Net from client
111 SQL*Net roundtrips to/from client
9 sorts (memory)
2 sorts (disk)
1617 rows processed
Statement 2 (with STAR hint) -
---------- ------------ ---------- ----------
40 2 1 0
105 2 1 0
170 2 1 0
235 2 1 0
300 2 1 0
365 2 1 0
430 2 1 0
495 2 1 0
560 2 1 0
625 2 1 0
690 2 1 0

---------- ------------ ---------- ----------
104460 2 1 0
104525 2 1 0
104590 2 1 0
104655 2 1 0
104720 2 1 0
104785 2 1 0
104850 2 1 0
104915 2 1 0
104980 2 1 0
1617 rows selected.
Execution Plan
----------------------------------------------------------
6 5 TABLE ACCESS (FULL) OF 'ORDER_TYP_DESC' (Cost=1
Card=1 Bytes=26)
Card=1 Bytes=26)
7 5 TABLE ACCESS (FULL) OF 'SALES_TABLE' (Cost=1 Car
d=1 Bytes=100)
8 4 TABLE ACCESS (FULL) OF 'PROMO_TYP_DESC' (Cost=1 Ca
rd=1 Bytes=26)
9 3 TABLE ACCESS (FULL) OF 'PROD_ID_DESC' (Cost=1 Card=2
1 Bytes=273)
10 2 TABLE ACCESS (FULL) OF 'SALES_REP_ID_DESC' (Cost=1 Car
d=1 Bytes=39)
11 1 TABLE ACCESS (FULL) OF 'STORE_ID_DESC' (Cost=1 Card=21 B
ytes=273)
Statistics
Statistics
----------------------------------------------------------
0 recursive calls
19411 db block gets
2370 physical reads
0 redo size
1 sorts (memory)
0 sorts (disk)
1617 rows processed
No hint of a CARTESIAN - why? There are other requirements for a STAR join to be satisfied. One of
these being the presence of a concatenated index (or 3 bitmap indexes) on the FACT table. If BITMAP
indexes are created on the columns store_id, order_typ, promo_typ, sales_rep_id and the queries run
again, the following results may be obtained -
SQL> create bitmap index bit_promo_typ on sales_table(promo_typ);
Index created.
SQL> create bitmap index bit_sales_rep_id on sales_table(sales_rep_id);
Index created.
SQL> create bitmap index bit_store_id on sales_table(store_id);
Index created.
SQL> create bitmap index bit_prod_id on sales_table(prod_id);
Index created.
SQL> set autotrace on
SQL> @query2
---------- ------------ ---------- ----------
3095 2 1 0
4525 2 1 0
"star_test2.lst" [Incomplete last line] 4157 lines, 311296 characters
3095 2 1 0
4525 2 1 0
3810 2 1 0
1665 2 1 0
4590 2 1 0
3875 2 1 0
1730 2 1 0
2250 2 1 0
1535 2 1 0
4460 2 1 0
2185 2 1 0
---------- ------------ ---------- ----------
4070 2 1 0
885 2 1 0
1600 2 1 0
2315 2 1 0
3030 2 1 0
3745 2 1 0
3485 2 1 0
3160 2 1 0
2445 2 1 0
1617 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 MERGE JOIN
2 1 SORT (JOIN)
3 2 MERGE JOIN
4 3 SORT (JOIN)
5 4 MERGE JOIN
6 5 SORT (JOIN)
7 6 MERGE JOIN
6 5 SORT (JOIN)
7 6 MERGE JOIN
8 7 SORT (JOIN)
9 8 MERGE JOIN
10 9 SORT (JOIN)
11 10 TABLE ACCESS (FULL) OF 'SALES_TABLE'
12 9 SORT (JOIN)
13 12 TABLE ACCESS (FULL) OF 'STORE_ID_DESC'
14 7 SORT (JOIN)
15 14 TABLE ACCESS (FULL) OF 'SALES_REP_ID_DESC'
16 5 SORT (JOIN)
17 16 TABLE ACCESS (FULL) OF 'PROD_ID_DESC'
18 3 SORT (JOIN)
19 18 TABLE ACCESS (FULL) OF 'PROMO_TYP_DESC'
20 1 SORT (JOIN)
21 20 TABLE ACCESS (FULL) OF 'ORDER_TYP_DESC'
Statistics
----------------------------------------------------------
27 recursive calls
2003 db block gets
4366 physical reads
0 redo size
9 sorts (memory)
2 sorts (disk)
1617 rows processed
Statement 2 - with the STAR hint -
---------- ------------ ---------- ----------
1860 2 1 0
430 2 1 0
2575 2 1 0
3290 2 1 0
1145 2 1 0
4005 2 1 0
4720 2 1 0
1795 2 1 0
365 2 1 0
2510 2 1 0
3225 2 1 0
.
---------- ------------ ---------- ----------
104135 2 1 0
104850 2 1 0
100495 2 1 0
101210 2 1 0
101925 2 1 0
102640 2 1 0
103355 2 1 0
104070 2 1 0
104785 2 1 0
1617 rows selected.
Execution Plan
----------------------------------------------------------
Card=1 Bytes=26)
Card=1 Bytes=26)
8 7 TABLE ACCESS (FULL) OF 'PROMO_TYP_DESC' (Cost=
1 Card=1 Bytes=26)
10 9 TABLE ACCESS (FULL) OF 'SALES_REP_ID_DESC' (Cost
=1 Card=1 Bytes=39)
12 11 TABLE ACCESS (FULL) OF 'PROD_ID_DESC' (Cost=1 Card
=21 Bytes=273)
14 13 TABLE ACCESS (FULL) OF 'STORE_ID_DESC' (Cost=1 Card=
21 Bytes=273)
15 1 TABLE ACCESS (BY INDEX ROWID) OF 'SALES_TABLE' (Cost=1 C
ard=1 Bytes=100)
17 16 BITMAP AND
18 17 BITMAP INDEX (SINGLE VALUE) OF 'BIT_PROD_ID'
19 17 BITMAP INDEX (SINGLE VALUE) OF 'BIT_PROMO_TYP'
Statistics
----------------------------------------------------------
0 recursive calls
15 db block gets
27585 physical reads
0 redo size
5 sorts (memory)
0 sorts (disk)
1617 rows processed
1617 rows processed
As the highlighted area above shows we are doing cartesian products, when using the STAR hint. The
exacting conditions of the STAR transformation may be hard to meet, and the optimizer may be making a
correct choice (for performance reasons) in choosing the NLJ or Sort-Merge joins.
Lesson 3 Bitmap Indexes Solutions
This exercise is designed to demonstrate the way in which bitmap indexes are created, stored and
accessed.
Run the sql script dsi306_3_3a.sql - this will create a table with the following attributes:
colour_table ( recid number, colour varchar2(20), rec2id number ) and should contain 3002
rows. The first and last of these rows will have the value BLUE set for the column COLOUR.
SQL> @dsi306_3_3a.sql
Table created.
31 /
PL/SQL procedure successfully completed.
SQL> desc colour_table
Name Null? Type
------------------------------- -------- ----
RECID NUMBER
COLOUR VARCHAR2(20)
REC2ID NUMBER
SQL> select count(*) from colour_table;
COUNT(*)
----------
3002
SQL> select rowid,recid,colour,rec2id from colour_table
2 where colour = 'BLUE';
ROWID RECID COLOUR REC2ID
------------------ ---------- --------- ----------- ----------
AAAA7lAAHAAABYoAAA 1 BLUE 3
AAAA7lAAHAAABSfAAG 3000 BLUE 6000
Set the events 10608 and 10717 in the init.ora file. This will trace bitmap index creation and trace any
compaction that may occur during creation, respectively.
Create a Bitmap Index on the column COLOUR in the COLOUR_TABLE. Examine any trace file
generated.
From the init.ora file:
#event = "10710 trace name context forever,level 10:10715 trace name context forever,level 10"
event = "10608 trace name context forever,level 10:10717 trace name context forever,level 10"
SQL> create bitmap index bit_colour on colour_table(colour);

Index created.
There are 9 distinct values for colour in the colour_table:
SQL> select distinct colour,dump(colour) from colour_table;
COLOUR DUMP(COLOUR)
--------------------------------------------------------------------------------
AZURE GREY Typ=1 Len=10: 65,90,85,82,69,32,71,82,69,89
BLUE Typ=1 Len=4: 66,76,85,69
MAGENTA Typ=1 Len=7: 77,65,71,69,78,84,65
ORANGE Typ=1 Len=6: 79,82,65,78,71,69
STRAWBERRY Typ=1 Len=10: 83,84,82,65,87,66,69,82,82,89
TURQUOISE Typ=1 Len=9: 84,85,82,81,85,79,73,83,69
VERMILLION Typ=1 Len=10: 86,69,82,77,73,76,76,73,79,78
VIOLET Typ=1 Len=6: 86,73,79,76,69,84
YELLOW Typ=1 Len=6: 89,69,76,76,79,87
Examining the trace file will show the routines scanning the table looking for key values, and then
constructing the bitmap segments from those keys. Some edited highlights follow:
*** SESSION ID:(6.3) 1998.12.17.20.34.38.000
kkrbiarw: bitmap size is 870
kkrbirop: rid=01c01491.0000, new=Y , key: (7): 4d 41 47 45 4e 54 41 Magenta
kdibcoinit(1073304): srid=01c01491.0000 Rowid range (from line above - only one row with this value).
kkrbirop: rid=01c01491.0001, new=Y , key: (10): 41 5a 55 52 45 20 47 52 45 59 Azure Grey
kdibcoinit(1073280): srid=01c01491.0001
kkrbirop: rid=01c01491.0002, new=Y , key: (9): 54 55 52 51 55 4f 49 53 45
kdibcoinit(10731fc): srid=01c01491.0002
kkrbirop: rid=01c01491.0003, new=Y , key: (10): 56 45 52 4d 49 4c 4c 49 4f 4e
kkrbirop: rid=01c01491.0004, new=Y , key: (6): 59 45 4c 4c 4f 57
kdibcoinit(10730f8): srid=01c01491.0004
kkrbirop: rid=01c01491.0005, new=Y , key: (6): 4f 52 41 4e 47 45
kkrbirop: rid=01c01491.0006, new=Y , key: (10): 53 54 52 41 57 42 45 52 52 59
kdibcoinit(1072ff4): srid=01c01491.0006
kkrbirop: rid=01c01491.0007, new=Y , key: (6): 56 49 4f 4c 45 54
kdibcoinit(1072f74): srid=01c01491.0007
kkrbirop: rid=01c01491.0008, new=N, key: (7): 4d 41 47 45 4e 54 41
kkrbirop: rid=01c01491.0009, new=N, key: (10): 41 5a 55 52 45 20 47 52 45 59
kkrbirop: rid=01c01491.000a, new=N, key: (9): 54 55 52 51 55 4f 49 53 45
Information removed
kkrbicon: key: (7): 4d 41 47 45 4e 54 41 Magenta
srid=01c01491.0 erid=02001340.47 bitmap: (486): Start and End rowids and the bitmap segment size
cf 01 01 01 01 01 01 01 01 c9 01 01 ff 07 01 01 01 01 01 01 01 01 c9 01 01
ff 07 01 01 01 01 01 01 01 01 c9 01 01 ff 07 01 01 01 01 01 01 01 01 c9 01
Information removed
kdibcoend(1064b80): erid=01c01628.0007status=0
kkrbicon: key: (4): 42 4c 55 45 Blue
srid=01c0149f.0 erid=01c01628.7 bitmap: (4): 06 c0 b6 46 Bitmap segment
Shutdown and then set the events 10710 and 10715 in the init.ora (in place of 10608 and 10717). This
will trace bitmap access. Select the colour BLUE from the table COLOUR_TABLE. Examine the trace
and determine what functions are called and the actions they are performing.
In init.ora
event = "10710 trace name context forever,level 10:10715 trace name context forever,level 10"
#event = "10608 trace name context forever,level 10:10717 trace name context forever,level 10"
SVRMGR> show parameter event
NAME TYPE VALUE
----------------------------------- ------- ------------------------------
event string 10710 trace name context forev
SQL> select * from colour_table where colour = 'BLUE';
RECID COLOUR REC2ID
---------- -------------------- ----------
1 BLUE 3
3000 BLUE 6000
Examining the trace file shows:
*** SESSION ID:(8.3) 1998.10.20.15.43.16.000
kkrbtsta(107e670): started
kkrbxsta
(109d608)kkrbxgky(109d608): startkey count=1
kkrbxgky(109d608): startkey=(4): 42 4c 55 45 Key valueis BLUE
kkrbxgky(109d608): stopkey count=1
kkrbxgky(109d608): stopkey=(4): 42 4c 55 45
kkrbxfch(109d608): record: srid=01c0149f.0000, erid=01c01628.0007, data(4)=[06.. The fetch process treats
.] the data as an input
stream
kdibr1r2r(107e688): bml 4 srid=01c0149f.0000, erid=01c01628.0007
kdibci3init(107e69c): src_stream=efffd308
01c0149f.0000
kkrbtfch(107e670): rowid=01c0149f.0006 tobj 3508 - Rowid found
kkrbtfch(107e670): rowid=01c01628.0000 tobj 3508 Rowid found
kkrbtfch(107e670): total rowcount=2 Rows returned
Examining a block dump of the bitmap segment yields:
row#1[1347] flag: ----, lock: 0

col 0; len 4; (4): 42 4c 55 45 The key value BLUE
col 1; len 6; (6): 01 c0 14 9f 00 00 Start rowid
col 2; len 6; (6): 01 c0 16 28 00 07 End rowid
col 3; len 4; (4): 06 c0 b6 46 The bitmap segment itself. Remembering GAP/MAP structure.
DSI306 - Lesson 4 - Partitioned Tables and Indexes
Case Studies and Lab. Exercises
Contents:
I ntroduction To Labs and Case Studies ________________________________________________________2
Lab 1: Dictionary Objects ___________________________________________________________________4
Lab 2: Partition Maintenance_______________________________________________________________12
Lab 3: Optimizer_________________________________________________________________________ 15
Case Study : Point in time recovery of a partition ______________________________________________ 24
The Following files should be present in the LABS directory:
dsi306_4_1a.sql - creates 4 tables. Two partitioned: sales, customer
Two non-partitioned: region, product
dsi306_4_1b.sql - creates a variety of indices on the above tables
dsi306_4_1c.sql - inserts sample data into these tables.
dsi306_4_1d.sql - inserts 100,000 rows into SALES table.
Introduction to Labs and Case Studies :
A customer would like to maintain sales history for the last 52 weeks. The sales offices are grouped into regions.
Products are grouped into product categories. Customers are grouped into states based on postal codes.
Customers table is very large and so too is the sales history table. For performance as well as manageability, it
was decided to partition the customer and sales tables and the indexes on those tables.
Table Descriptions
PRODUCT
prodid number(4)
prodcat number(2)
prodname varchar2(60)
CUSTOMER
custid number(4)
custname varchar2(60)
state number(2)
REGION
regid number(2)
regname varchar2(60)
SALES
regid number(2)
custid number(4)
prodid number(4)
weekno number(2)
quantity number(4)
value number(6)
Partitioning Information
Customer table partitioned on custid
Sales table partitioned on weekno
Indexes
prodidx on product(prodid)
Local (prefixed) unique index custidx on customer(custid)
Local (non-prefixed) unique index sales_idx1 on sales(regid, custid,prodid, weekno)
Local (non-prefixed) non unique index sales_idx2 on sales(prodid)
Global (prefixed) non unique index sales_idx3 on sales(custid)
Note : The indexes are just to demonstrate various features of different types of indexes and not necessarily
sensible from an application point of view.
Lab 1 : Dictionary Objects
Objective
The objective of this lab is to make the students familiar with the various catalog views available to extract
all the information about partitions.
Exercises
a) Find the partitioned tables, indexes owned by the user and the partition columns for those objects.
Note the object names, number of partitions, type of indexes
b) Find the partition mapping information for a partitioned table SALES
c) Find the partition mapping information for a partitioned index SALES_IDX1
d) Find the subobject name, object id, data object id, type for table SALES and index SALES_IDX1
e) Find any segments associated with table SALES and index SALES_IDX1
f) Find the object_name, object id, data object id , type for table partition SALES_M1.
g) Find the object_name, object id, data object id, type for index partition SALES_IDX1_P1
h) Dump the segment header and a data block from partition SALES_M1
i) Dump the segment header and a leaf block from partition SALES_IDX1_M1
j) Dump the segment header and a leaf block from partition SALES_IDX3_250
Lab 1 : Dictionary Objects
Notes
a) select * from user_part_tables
select * from user_part_indexes
select * from user_part_key_columns
b) select partition_name, high_value, partition_position from user_tab_partitions where table_name = SALES
order by partition_positition;
c) select partition_name, hight_value, partition_position from user_ind_partitions where index_name =
SALES_IDX1 order by partition_position;
d) select substr(subobject_name,1,15), object_id, data_object_id from user_objects where object_type =
TABLE and object_name = SALES ;
Repeat with object_type = INDEX and object_name = SALES_IDX
subobject name and data object id will be NULL
e) Since the data object id is NULL, there will not be any segments
select ts#, file#, block# from tab$ [ind$] where obj# = <object_id from step d>
will return 0 for all three columns
f) select object_name, object_id, data_object_id from user_objects where subobject_name = SALES_M1;
g) select object_name, object_id, data_object_id from user_objects where subobject_name =
SALES_IDX1_M1;
h) select * from dba_extents where partition_name = SALES_M1;
e.g. file_id = 5 block_id = 7 (segment header)
alter system dump datafile 5 block 7
i) select * from dba_extents where partition_name = SALES_IDX1_M1;
e.g. file_id = 6 and block_id = 1102 (segment header)
file_id =6 and block_id = 1126 (last block of first extent, hopefully a leaf block)
j) select * from dba_extents where partition_name = SALES_IDX3_250;
e.g. file_id = 31 block_id =2 (segment header)
file_id = 31 block_id = 907 (first block of extent 3, hopefully a leaf block)
Table Partition Segment Header Block
Start dump data blocks tsn: 4 file#: 5 minblk 7 maxblk 7
buffer tsn: 4 rdba: 0x01400007 (5/7)
scn:0x0000.00008881 seq:0x01 flg:0x00 tail:0x88811001
frmt:0x02 chkval:0x0000 type:0x10=DATA SEGMENT HEADER - UNLIMITED
Extent Control Header
-----------------------------------------------------------------
Extent Header:: spare1: 0 tsn: 4 #extents: 3 #blocks: 904
last map rdba: 0x00000000 #maps: 0 offset: 2080
Highwater:: rdba: 0x01400234 ext#: 2 blk#: 37 ext size: 385
#blocks in seg. hdr's freelists: 0
#blocks below: 556
mapblk rdba: 0x00000000 offset: 2
Unlocked
Map Header:: next rdba: 0x00000000 #extents: 3 obj#: 1957 flag: 0x40000000
Data object id of table partition
Extent Map
-----------------------------------------------------------------
rdba: 0x01400008 length: 259
rdba: 0x0140010b length: 260
rdba: 0x0140020f length: 385
nfl = 1, nfb = 1 typ = 1 nxf = 0
SEG LST:: flg: UNUSED lhd: 0x00000000 ltl: 0x00000000
End dump data blocks tsn: 4 file#: 5 minblk 7 maxblk 7
==============================================================================
Table Partition Data Block
Wed Apr 1 15:34:38 1998
buffer tsn: 4 rdba: 0x01400008 (5/8)
scn:0x0000.0000884b seq:0x01 flg:0x00 tail:0x884b0601
frmt:0x02 chkval:0x0000 type:0x06=trans data
Block header dump: rdba: 0x01400008
Object id on Block? Y
seg/obj: 0x7a5 csc: 0x00.883d itc: 1 flg: - typ: 1 - DATA
Data object id of the table partition (1957 in decimal)
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0004.017.00000022 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
data_block_dump
===============
tsiz: 0xfb8
hsiz: 0x134
pbl: 0x200b1984
bdba: 0x01400008
flag=---------
ntab=1
nrow=145
frre=-1
fsbo=0x134
fseo=0x2cb
avsp=0x197
tosp=0x197
0xe:pti[0] nrow=145 offs=0
0x12:pri[0] offs=0xc25
0x14:pri[1] offs=0xc3c
0x16:pri[2] offs=0xc53
0x18:pri[3] offs=0xc6a
....
....
tl: 23 fb: --H-FL-- lb: 0x0 cc: 6
col 0: [ 2] c1 02
col 1: [ 2] c1 02
col 2: [ 2] c1 26
col 3: [ 2] c1 02
col 4: [ 3] c2 11 5b
col 5: [ 3] c2 11 46
end_of_block_dump
================================================================================
Local Index Partition Segment Header Dump
Wed Apr 1 15:40:18 1998
buffer tsn: 5 rdba: 0x0180044e (6/1102)
scn:0x0000.00008c1b seq:0x01 flg:0x00 tail:0x8c1b1001
-----------------------------------------------------------------
Highwater:: rdba: 0x018001bc ext#: 2 blk#: 182 ext size: 385
#blocks below: 506
Unlocked
Data object id of the local index Partition
Extent Map
-----------------------------------------------------------------
rdba: 0x0180044f length: 64
rdba: 0x01800002 length: 260
rdba: 0x01800106 length: 385
nfl = 1, nfb = 1 typ = 2 nxf = 0
Local Index Partition Leaf Block Dump
Wed Apr 1 15:40:47 1998
buffer tsn: 5 rdba: 0x0180048e (6/1166)
scn:0x0000.00008c18 seq:0x01 flg:0x00 tail:0x8c180601
Block header dump: rdba: 0x0180048e
seg/obj: 0x7db csc: 0x00.8c14 itc: 2 flg: - typ: 2 - INDEX
Data Object Id of the local index partition (2011 in decimal)
0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
0x02 0x0004.02b.00000027 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
Leaf block dump
===============
header address 537598364=0x200b199c
kdxcolev 0
kdxcolok 0
kdxconco 4
kdxcosdc 0
kdxconro 158
kdxcofbo 352=0x160
kdxcofeo 776=0x308
kdxcoavs 424
kdxlespl 0
kdxlende 0
kdxlenxt 25165826=0x1800002
kdxleprv 25166989=0x180048d
kdxledsz 6
kdxlecol 0
kdxlebksz 3936
row#0[3916] flag: ----, lock: 0, data:(6): 01 40 00 4c 00 39
Local Index Partition store the normal rowid
col 0; len 2; (2): c1 03
col 1; len 2; (2): c1 06
col 2; len 2; (2): c1 4b
col 3; len 2; (2): c1 02
row#1[3896] flag: ----, lock: 0, data:(6): 01 40 00 4c 00 3a
col 0; len 2; (2): c1 03
col 1; len 2; (2): c1 06
col 2; len 2; (2): c1 4b
col 3; len 2; (2): c1 03
....
....
===============================================================================
Global Index Partition Segment Header Dump
buffer tsn: 30 rdba: 0x07c00002 (31/2)
scn:0x0000.00008b40 seq:0x01 flg:0x00 tail:0x8b401001
-----------------------------------------------------------------
Highwater:: rdba: 0x07c014db ext#: 6 blk#: 1687 ext size: 1945
#blocks below: 5336
Unlocked
Data object id of the global index partition
Extent Map
-----------------------------------------------------------------
rdba: 0x07c00003 length: 259
rdba: 0x07c0020a length: 385
rdba: 0x07c0038b length: 580
rdba: 0x07c005cf length: 865
rdba: 0x07c00e44 length: 1945
nfl = 1, nfb = 1 typ = 2 nxf = 0
Global Index Partition Leaf Block Dump
buffer tsn: 30 rdba: 0x07c0038b (31/907)
scn:0x0000.00008b27 seq:0x02 flg:0x00 tail:0x8b270602
Block header dump: rdba: 0x07c0038b
seg/obj: 0x7d4 csc: 0x00.8b1e itc: 2 flg: - typ: 2 - INDEX
Data object id of the global index partition (2004 in decimal)
0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
0x02 0x0003.029.00000024 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
Leaf block dump
===============
header address 537598364=0x200b199c
kdxcolev 0
kdxcolok 0
kdxconco 2
kdxcosdc 0
kdxconro 194
kdxcofbo 424=0x1a8
kdxcofeo 832=0x340
kdxcoavs 408
kdxlespl 0
kdxlende 0
kdxlenxt 130024332=0x7c0038c
kdxleprv 130024330=0x7c0038a
kdxledsz 0
kdxlecol 0
kdxlebksz 3936
row#0[3920] flag: ----, lock: 0
col 0; len 2; (2): c1 05
col 1; len 10; (10): 00 00 07 aa 03 c0 00 0c 00 35
-------------------- Extended rowids are stored in Global Indexes
00 00 07 aa is the data object id to which the rowid 03 c0 00 0c 00 35 is pointing to.
row#1[3904] flag: ----, lock: 0
col 0; len 2; (2): c1 05
col 1; len 10; (10): 00 00 07 aa 03 c0 00 0c 00 36
row#2[3888] flag: ----, lock: 0
col 0; len 2; (2): c1 05
col 1; len 10; (10): 00 00 07 aa 03 c0 00 0c 00 37
row#3[3872] flag: ----, lock: 0
col 0; len 2; (2): c1 05
col 1; len 10; (10): 00 00 07 aa 03 c0 00 0c 00 38
row#4[3856] flag: ----, lock: 0
col 0; len 2; (2): c1 05
col 1; len 10; (10): 00 00 07 aa 03 c0 00 0c 00 39
row#5[3840] flag: ----, lock: 0
col 0; len 2; (2): c1 05
col 1; len 10; (10): 00 00 07 aa 03 c0 00 0c 00 3a
----- end of leaf block dump -----
Lab 2 : Partition Maintenance
Objective
The labs will attempt to make the student familiar with various partition maintenance operations and related
issues.
a) Add a partition to the customer table to cater for new customers. Name the partition as CUST_1250 to
store new customers upto custid value of 1250. Note the name, location of the local index partition.
b) Is it possible to add a partition to a global index ?
c) Try dropping the partitions CUST_250, CUST_1250 and SALES_M4
Before dropping SALES_M4, please back it up by using CTAS and creating a temporary table.
d) User decides dropping the SALES_M4 was a mistake and would like to revert to original partition mapping
scheme. Split partition SALES_M5 at weekno (27).
Note the names of the news table partitions and local index partitions. Examine the status of the local and
global index partitions. Also note the tablespaces in which the new local index partitions reside.
e) Move and rename the table partitions generated in step d.
Table partition SYS_P1 should be moved to TRAIN01_TAB_PART4 and renamed to SALES_M4
Table partition SYS_P2 should be renamed to SALES_M5.
f) Move the local index partitions resulting from the split in step d into their proper tablespaces and rename
them. Check the status of the partitions before and after the move.

First split partition of SALES_IDX1 to be renamed as SALES_IDX1_M4 and moved to train01_indl_part4
Second split partiton of SALES_IDX1 to be renamed as SALES_IDX1_M5 and moved to
train01_indl_part5

First split partition of SALES_IDX2 to be renamed as SALES_IDX2_M4 and moved to train01_indl_part4
Second split partiton of SALES_IDX2 to be renamed as SALES_IDX2_M5 and moved to
train01_indl_part5

g) Repopulate the partition sales_m4 with the data backed up in step c
Avoid index maintenance.
After the load, make all local index partitions on sales_m4 usable.
Lab 2 : Partition Maintenance
Notes
a) alter table customer add partition cust_1250 values less than (1250) tablespace train01_tab_part5;
The new local index partition created will have the same name as the table partition i.e. CUST_1250.
If there is another index partition with the same name, then a name of the form SYS_Pnnn will be generated
based on an internal sequence number.
The physical attributes for the table partition are derived from the base table and the physical attributes of the
local index partition are derived from the base index.
The local index partition CUST_1250 will be created in the same tablespace as the table partition i.e.
train01_tab_part5 unless a default tablespace is specified at the time of create local index.
b) No. Global indexes have an upper bound of MAXVALUE . Cannot add partitions to table or index which has
an upper bound of MAXVALUE for the last partition. Use SPLIT partition to achieve the same results.
c) Alter table customer drop partition CUST_250;
This should return ora-2266 : unique/primary keys in table referenced by enabled foreign keys

Alter table customer drop partition CUST_1250;
This will succeed as this partition (created in step a) is empty eventhough there are enabled constraints
referencing this table. Will drop the corresponding local index partition.
Create table backup_sales_M4 as select * from sales partition (sales_m4);
Alter table sales drop partition SALES_M4;
This will succeed but since the partition is not empty, will mark the global indexes SALES_IDX3 unusable.
So 2 options to drop a non empty partition :
a) Delete all the rows and then drop partition or
b) Disable constraints referencing this table, Drop the partition and reenable the constraint and
rebuild the global indexes.
d) alter table sales split partition sales_m5 at (17);
Table partitions have names SYS_P1 and SYS_P2 and reside in tablespace train01_tab_part5.
Local index SALES_IDX1_M5 is split into SYS_P1 and SYS_P2 as well and reside in train01_tab_part5.
Similarly, local index SALES_IDX2_M5 is split into SYS_P1 and SYS_P2 and reside in train01_tab_part5.
Local index partition SYS_P1 of SALES_IDX1 and SYS_P1 of SALES_IDX2 are marked USABLE because
the corresponding table partition sys_P1 is empty. The other two partitions are marked UNUSABLE.
Remember the partition SALES_M4 was dropped in step c.
Global index partitions will be marked UNUSABLE because the table is not empty. In this case, they were
already marked unusable in step c. The alter table split partition syntax does allow the specification of
partition names and target locations for the table
e) alter table sales move partition SYS_P1 tablespace TRAIN01_TAB_PART4;
alter table sales rename partition SYS_P1 to sales_m4;
alter table sales rename partition SYS_P2 to sales_m5.
Since partition SYS_P1 is empty, the move does not effect the status of local or global indexes.
f) alter index sales_idx1 rebuild partition sys_p1 tablespace train01_indl_part4;
alter index sales_idx1 rename partition sys_p2 to sales_idx1_m4;
alter index sales_idx2 rebuild partition sys_p1 tablespace train01_indl_part5;
alter index sales_idx2 rename partition sys_p2 to sales_idx1_m5;
A rebuild of an index partition will make the partition usable.

g) To avoid index maintenance during the load into partition sales_m4
alter table sales modify partition sales_m4 unusable local indexes;
alter session set skip_unusable_indexes=true
insert into sales partition(sales_m4) select * from backup_sales_m4;
This will result in ora-1502 : index train01.sales_idx1 or partition of such index is in unusable state. This is
because sales_idx1 is a unique index and the skip_unusable_indexes option will not skip unique indexes.
Drop index sales_idx1;
insert into sales partition(sales_m4) select * from backup_sales_m4;
commit;
Now rebuild the unusable local index partitions of sales_idx2 and rebuild the index sales_idx1
alter table sales rebuild unusable local indexes;
create index sales_idx1
(extract the command from the script crelocidx.sql)
Lab 3 : Optimizer
Construct the following queries and examine the execution plans.
1. Find the sales of product category 20. Group by region name, customer name.
2. Find the sales of product category 20 in week 1. Group by region name, customer name.
3. Find the sales of product 1 to customer 1 in region 1 and in the week number 1.
4. Find the sales of product 1 to customer 1 in region 1 during the period weekno 1 and weekno 12.
5. Find the total sales to customer 1.
6. Find the sales of product 1 to customer 1 in region 1 in the current week
7. Find the sales of product 1 to customer 1 in region 1 from weekno 1 to the current week.
8. Specify parallelism for table customer and index SALES_IDX2 and reexecute query from 1.
Lab 3 : Optimizer
Notes
1. Find the sales of a product category, group by region, customer.
SQL> REM
SQL> REM Sales of a product category for the whole year
SQL> REM Group by region, customer
SQL> REM
SQL> delete from plan_table;
12 rows deleted.
SQL> explain plan for
2 select r.regname, c.custname, sum(s.quantity), sum(s.value) from sales s,
3 customer c, region r, product p
4 where s.regid = r.regid
5 and s.custid = c.custid
6 and s.prodid = p.prodid
7 and p.prodcat = 20
8 group by r.regname, c.custname
9 /
Explained.
SQL> select lpad(' ', 2*(LEVEL - 1)) || operation || ' ' || options || ' ' ||
2 object_name||' '||substr(partition_start,1,10)||' '||
3 substr(partition_stop,1,10)||decode(id,0, 'Cost = ' ||position) "Query Plan"
4 from plan_table
5 start with id = 0
6 connect by prior id = parent_id;
Query Plan
--------------------------------------------------------------------------------
SELECT STATEMENT Cost = 3328
SORT GROUP BY
HASH JOIN
PARTITION CONCATENATED NUMBER(1) NUMBER(4)
TABLE ACCESS FULL CUSTOMER NUMBER(1) NUMBER(4)
HASH JOIN
TABLE ACCESS FULL REGION
NESTED LOOPS
TABLE ACCESS FULL PRODUCT
TABLE ACCESS BY LOCAL INDEX ROWID SALES NUMBER(1) NUMBER(13)
Query Plan
--------------------------------------------------------------------------------
INDEX RANGE SCAN SALES_IDX2 NUMBER(1) NUMBER(13)
12 rows selected.
SQL> spool off
SALES_IDX2 is a non-prefixed index on prodid.
Hence need to probe all partitions indicated by NUMBER(1) and NUMBER(13). The resulting INDEX RANGE
SCAN needs to be concatenated as indicated by PARTITION CONCATENATED NUMBER(1)
NUMBER(13). There is a full scan on all the partitions of CUSTOMER. Also the presence of NUMBER
indicates, the decision to scan which partitions was determined at PARSE time.
2. Sales of a product category for a specific week, group by region name, customer name
SQL> REM
SQL> REM Sales of a product category for a specific week
SQL> REM Group by region, customer
SQL> REM
12 rows deleted.
2 select r.regname, c.custname, sum(s.quantity), sum(s.value) from sales s,
3 customer c, region r, product p
4 where s.regid = r.regid
5 and s.custid = c.custid
6 and s.prodid = p.prodid
7 and p.prodcat = 20
8 and s.weekno = 1
9 group by r.regname, c.custname
10 /
Explained.
Query Plan
--------------------------------------------------------------------------------
SORT GROUP BY
HASH JOIN
HASH JOIN
TABLE ACCESS FULL REGION
NESTED LOOPS
TABLE ACCESS FULL PRODUCT
TABLE ACCESS FULL SALES NUMBER(1) NUMBER(1)
TABLE ACCESS FULL CUSTOMER NUMBER(1) NUMBER(4)
10 rows selected.
SQL> spool off
There is a non-prefixed index SALES_IDX1 on (regid, custid, prodid, weekno) but the partition
column is part of the index. In this case, not all the columns of the index were specified in the predicate.
Decided to a TABLE ACCESS FULL on Partition on 1 alone as indicated by TABLE ACCESS FULL SALES
NUMBER(1) NUMBER(1). Note that in tkprof execution plan, we will simply see TABLE ACCESS FULL
SALES. This means TABLE ACCESS FULL of a partition. If multiple partitions are scanned, we will see
PARTITION CONCATENATED above the TABLE ACCESS FULL as illusrated by the access on
CUSTOMER table.
3. Sales of a product for a given, customer, region, week
SQL> REM
SQL> REM Sales of a product for a given customer, region, week
SQL> REM
5 rows deleted.
2 select sum(s.quantity), sum(s.value) from sales s
3 where s.regid = 1
4 and s.custid = 1
5 and s.prodid = 1
6 and s.weekno = 1
7 /
Explained.
Query Plan
--------------------------------------------------------------------------------
SORT AGGREGATE
INDEX UNIQUE SCAN SALES_IDX1 NUMBER(1) NUMBER(1)
Note the difference in plan steps between this and the next one.
4. Sales of a product for a given customer, region and range of weeks
SQL> REM
SQL> REM Sales of a product for a given customer, region and week range
SQL> REM
4 rows deleted.
3 where s.regid = 1
4 and s.custid = 1
5 and s.prodid = 1
6 and s.weekno between 1 and 12
7 /
Explained.
Query Plan
--------------------------------------------------------------------------------
SORT AGGREGATE
If the partition elimination has taken place during parse time and only a single partition is accessed,
the execution plan will not contain the step PARTITION (see execution plan for query 3).
In this case, a range of partitions need to be accessed as indicated by PARTITION CONCATENATED.
5. Sales to a customer
SQL> spool off
SQL> REM
SQL> REM Sales for a given customer for the whole year
SQL> REM
4 rows deleted.
3 where s.custid = 1
4 /
Explained.
Query Plan
--------------------------------------------------------------------------------
SORT AGGREGATE
TABLE ACCESS BY GLOBAL INDEX ROWID SALES ROW LOCATION ROW LOCATION
SQL> spool off
Note the presence of ROW LOCATION. This indicates the decision to scan which partitions of the
table SALES is determined at execution time from the ROWID's returned by the global index.
6. Sales of a product in a region for a customer in the current week
SQL> REM
SQL> REM Sales in a product in a region for a customer in the current week
SQL> REM
SQL> REM
14 rows deleted.
2 select s.quantity, s.value from sales s
3 where s.prodid = 1
4 and s.weekno = (select to_char(sysdate, 'IW') from dual)
5 and s.regid = 1
6 and s.custid = 1
7 /
Query Plan
--------------------------------------------------------------------------------
PARTITION SINGLE KEY KEY
TABLE ACCESS BY LOCAL INDEX ROWID SALES KEY KEY
INDEX UNIQUE SCAN SALES_IDX1 KEY KEY
TABLE ACCESS FULL DUAL
Note the Parition Step - Partiion Single
A single partition (to be determined at run time) will be accessed.
7. Sale of a product in a region for a customer from week 1 to the current week
SQL> spool off
SQL> REM
SQL> REM Sales in a product in a region for a customer from week 1 to
SQL> REM till the previous week
SQL> REM
SQL> REM
14 rows deleted.
2 select s.quantity, s.value from sales s
3 where s.prodid = 1
4 and s.weekno > 0
5 and s.weekno < (select to_char(sysdate, 'IW') from dual)
6 and s.regid = 1
7 and s.custid = 1
8 /
Explained.
Query Plan
--------------------------------------------------------------------------------
PARTITION CONCATENATED NUMBER(1) KEY
TABLE ACCESS BY LOCAL INDEX ROWID SALES NUMBER(1) KEY
INDEX RANGE SCAN SALES_IDX1 NUMBER(1) KEY
Partition Start has values NUMBER(1). This indicates the lower boundary is determined
at parse time. Partition Stop has the value KEY. This indicates that the higher boundary
is determined at run time by the partition key value.
8. Enable parallelism on table customer and index sales_idx2
alter table customer parallel (degree 4 instances 1);
alter index sales_idx2 parallel (degree 13 instances 1)
REM
REM Sales of a product category group by region name, customer name
REM
select r.regname, c.custname, sum(s.quantity), sum(s.value) from sales s,
customer c, region r, product p
where s.regid = r.regid
and s.custid = c.custid
and s.prodid = p.prodid
and p.prodcat = 20
group by r.regname, c.custname
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
0 SORT (GROUP BY) [:Q39006]
SELECT /*+ CIV_GB */ A1.C0,A1.C1,SUM(A1.C2),SUM(A1.C3) FROM
:Q39005 A1 GROUP BY A1.C0,A1.C1
0 HASH JOIN [:Q39005]
SELECT /*+ PIV_GB */ A1.C0 C0,A1.C1 C1,SUM(A1.C2) C2,SUM(A1.C3)
C3 FROM (SELECT /*+ ORDERED NO_EXPAND USE_HASH(A3)
SWAP_JOIN_INPUTS(A3) */ A2.C1 C0,A3.C1 C1,A2.C4 C2,A2.C5 C3
FROM :Q39004 A2,:Q39003 A3 WHERE A2.C0=A3.C0) A1 GROUP BY
A1.C0,A1.C1
0 PARTITION (CONCATENATED) [:Q39005]
0 TABLE ACCESS (FULL) OF 'CUSTOMER' [:Q39003]
SELECT /*+ ROWID(A1) */ A1."CUSTID" C0,A1."CUSTNAME" C1 FROM
"CUSTOMER" PARTITION(:B1) A1 WHERE ROWID BETWEEN :B2 AND
:B3
0 HASH JOIN [:Q39004]
SELECT /*+ ORDERED NO_EXPAND USE_HASH(A2) SWAP_JOIN_INPUTS(A2)
*/ A1.C4 C0,A2.C1 C1,A1.C1 C2,A1.C2 C3,A1.C3 C4,A1.C6 C5,
A1.C5 C6 FROM :Q39002 A1,:Q39001 A2 WHERE A1.C0=A2.C0
0 TABLE ACCESS (FULL) OF 'REGION' [:Q39001]
0 NESTED LOOPS [:Q39002]
SELECT /*+ ORDERED NO_EXPAND USE_NL(A2) */ A2.C1 C0,A1.C0 C1,
A2.C0 C2,A2.C5 C3,A2.C2 C4,A2.C3 C5,A2.C4 C6 FROM :Q39000
A1,(SELECT /*+ INDEX(A3 "SALES_IDX2") */ A3.ROWID C0,
A3."REGID" C1,A3."CUSTID" C2,A3."PRODID" C3,A3."QUANTITY"
C4,A3."VALUE" C5 FROM "SALES" PARTITION(:B1) A3) A2 WHERE
A2.C3=A1.C0
0 TABLE ACCESS (FULL) OF 'PRODUCT' [:Q39000]
0 PARTITION (CONCATENATED) [:Q39002]
0 TABLE ACCESS GOAL: ANALYZED (BY LOCAL INDEX ROWID)
OF 'SALES' [:Q39002]
0 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'SALES_IDX2'
(NON-UNIQUE) [:Q39002]
The index SALES_IDX2 is scanned in partition parallel. There will be 13 slaves each one of them probing one
partition each. The query being passed to the slave uses the partition extended naming syntax to indicate which
partitions to probe. The customer table is being scanned in parallel and again the query passed to
the slaves includes the partition in addition to the rowid ranges
Case study for point in time recovery of a partition (Point in time recovery is covered in more detail in
DSI303).
Problem
A partition sales_m10 was dropped by mistake. DBA would like to a PIT recovery of that tablespace just before
the drop.
1. Check database is in archive log mode
SVRMGR> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /supp/app/oracle/product/8.0.2/dbs/arch
Oldest online log sequence 494
Next log sequence to archive 496
Current log sequence 496
2. List the partitions
SVRMGR> select partition_name, high_value from user_tab_partitions;
-------------------------------
SALES_M1 5
SALES_M2 9
SALES_M3 13
SALES_M4 17
SALES_M5 21
SALES_M6 25
SALES_M7 29
SALES_M8 33
SALES_M9 37
SALES_M10 41
SALES_M11 45
SALES_M12 49
SALES_M13 maxvalue
3. Do hot backups (Backup the control file as well)
SVRMGR> alter database backup controlfile to
2> '/supp/oradata/ppenta/prep2/ctrl.bak';
Statement processed.
SVRMGR> select to_char(sysdate, 'DD-MON-YY HH24:MI:SS') from dual;
TO_CHAR(SYSDATE,'D
------------------
13-MAR-98 14:18:54
4. Drop the partition
alter table sales drop partition sales_m10;
This will drop the corresponding local index partition and mark all
the global index partitions as unusable.
5. Find the time of the drop
select to_char(sysdate, 'DD-MON-YY HH24:MI:SS') from dual;
TO_CHAR(SYSDATE,'D
------------------
13-MAR-98 14:19:46
6. List the partitions
SVRMGR> select partition_name, high_value from user_tab_partitions;
-------------------------------
SALES_M1 5
SALES_M2 9
SALES_M3 13
SALES_M4 17
SALES_M5 21
SALES_M6 25
SALES_M7 29
SALES_M8 33
SALES_M9 37
SALES_M11 45
SALES_M12 49
SALES_M13 maxvalue
7. Drop and rebuild the global index
8. DBA realises his mistake at this point
Would like to do a PIT recovery of tablespace train01_tab_part10
Start preparing to do a PIT recovery
Recovery will be done upto 13-MAR-98 14:18:54
8.1 Query the primary db for any objects to be drop
Any objects created in the tablespace TRAIN01_TAB_PART10 after the recovery time need to be dropped.
SVRMGR> select * from ts_pitr_objects_to_be_dropped
2> where tablespace_name = 'TRAIN01_TAB_PART10'
3> and creation_time > to_date('13-MAR-98 14:18:54', 'DD-MON-YY HH24:MI:SS'
);
OWNER NAME CREATION_ TABLESPA
CE_NAME
------------------------------ ------------------------------ --------- --------
----------------------
0 rows selected.
SVRMGR>
8.2 Query the primary for any dependencies
SVRMGR> select * from ts_pitr_check
2> where
3> (ts1_name in ('TRAIN01_TAB_PART10') and
4> ts2_name not in ('TRAIN01_TAB_PART10'))
5> or
6> (ts1_name not in ('TRAIN01_TAB_PART10') and
7> ts2_name in ('TRAIN01_TAB_PART10'))
8> /
OBJ1_OWNER OBJ1_NAME OBJ1_SUBNAME
OBJ1_TYPE TS1_NAME OBJ2_NAME
OBJ2_SUBNAME OBJ2_TYPE OBJ2_OWNER
TS2_NAME CONSTRAINT_NAME REASON
------------------------------ ------------------------------ ------------------
------------ --------------- ------------------------------ --------------------
---------- ------------------------------ --------------- ----------------------
-------- ------------------------------ ------------------------------ ---------
---------------------------------------------------------------------
0 rows selected.
SVRMGR>
If these returns any rows, these must be resolved by dropping the objects/
disabling the constraints and they should be rebuilt later.
8. 3 Archive the current log and offline the tablespace
SVRMGR> alter system archive log current;
SVRMGR> alter tablespace TRAIN01_TAB_PART10 offline;
SVRMGR>
9. 0 Prepare the clone
Generate the init ora file
Specify lock_name_space=CLONE
Make sure control_files points to the control files in the new location.
All files are offlined by default.
After mounting the database, rename and online the required files.
These are the system, rollback segment and train01_tab_part10 tablespace files.
startup nomount pfile= ..
alter database mount clone database
alter database rename file
alter database datafile .. online
recover database using backup controlfile until time '13-MAR-98 14:18:54'
alter database open resetlogs.
10. Check the clone database
2> where
5> or
8> /
------------------------------ ------------------------------ ------------------
------------ --------------- ------------------------------ --------------------
---------- ------------------------------ --------------- ----------------------
-------- ------------------------------ ------------------------------ ---------
---------------------------------------------------------------------
TRAIN01 SALES SALES_M1
TABLE PARTITION TRAIN01_TAB_PART1 SALES
SALES_M10 TABLE PARTITION TRAIN01
This needs to be resolved first before we can do any exports.
SVRMGR> connect train01/train01
Connected.
SVRMGR> create table temp as select * from sales where 1 = 0;
create table temp as select * from sales where 1 = 0
*
ORA-01552: cannot use system rollback segment for non-system tablespace 'USERS'
SVRMGR> connect system/manager
Connected.
SVRMGR> alter user train01 default tablespace system;
SVRMGR> connect train01/train01
Connected.
SVRMGR> create table temp as select * from sales where 1 = 0;
SVRMGR> alter table sales exchange partition sales_m10 with table temp;
SVRMGR> connect internal;
connected.
2> where
5> or
8> /
11. Do the export
exp sys/manager point_in_time_recover=y \
recovery_tablespaces=TRAIN01_TAB_PART10
12. Recover the primary database
Copy the datafiles belonging to TRAIN01_TAB_PART10 from the clone database
onto the primary database. Then do the dictionary import.
imp sys/manager point_in_time_recover=true
.
SVRMGR> connect internal;
Connected.
SVRMGR> alter tablespace train01_tab_part10 online;
SVRMGR>
Split the partition sales_m11 to map to the original structure and exchange sales_m10 with temp.
alter table sales split partition sales_m11 at (41) into
2> (partition sales_m10 tablespace train01_tab_part10,
3> partition sales_m11 tablespace train01_tab_part11);
alter table sales exchange partition sales_m10 with table temp;
The global index partition is marked unusable because of the split
The local index partitions need to be rebuilt for partitions sales_m10 and sales_m11.
The split command split sales_m11 into sales_m10 and sales_m11. The name sales_m11 was reused.
Hence the local index partitions will be split as follows :
i.e. sales_idx1_m11 will be split into sales_m10 and sales_idx1_m11
sales_idx2_m11 will be split into SYS_pnnn and sales_idx2_m11
SVRMGR> alter index sales_idx1 rebuild partition SALES_IDX1_m11
SVRMGR> alter index sales_idx2 rebuild partition SALES_IDX2_m11;
Statement processed
SVRMGR> alter index sales_idx1 rebuild partition SALES_M10 tablespace
2> train01_indl_part10;
SVRMGR> alter index sales_idx1 rename partition sales_m10 to sales_idxl_m10
SVRMGR> alter index sales_idx2 rebuild partition SYS_P29 tablespace
2> train01_indl_part10;
SVRMGR> alter index sales_idx1 rename partition SYS_P29 to sales_idx2_m10
DSI306 - Lesson 5 - Parallel DML
Case Studies and Lab. Exercises
Contents:
I ntroduction To Case Studies ________________________________________________________________2
Test Rig _______________________________________________________________________________________2
Case 1 - Locking Behaviour _________________________________________________________________4
Description ____________________________________________________________________________________4
Diagnostics_____________________________________________________________________________________4
Explanation of Locking Output ____________________________________________________________________5
Summary and Impact ____________________________________________________________________________5
Case 2 - Explain Plan ______________________________________________________________________6
Description ____________________________________________________________________________________6
Running the Explain Plan_________________________________________________________________________6
Dumping the Plan Table Output ___________________________________________________________________6
The Plan_______________________________________________________________________________________7
Interpretation __________________________________________________________________________________7
Case 3 - Explain Plan ______________________________________________________________________8
Description ____________________________________________________________________________________8
Running the Explain Plan_________________________________________________________________________8
The Plan_______________________________________________________________________________________8
Interpretation __________________________________________________________________________________8
Note - The following scripts should be present in the LABS directory:
dsi306_5_1.sql - To create the partitioned table SALES
dsi306_5_2.sql - To populate the table SALES with sample data
qpartsegs.sql - Queries dba_segments for partitioned object specified
qpartid.sql - Queries dba_objects for partitioned object specified
qpartbounds.sql - Queries dba_tab_partitions for object specified
qlocks.sql - Queries v$lock, and v$session for locking information
Introduction To Case Studies
In the following case studies we explore rollback segment recovery in three scenarios:
1. Locking Behaviour
2. Explain Plan - FTS
3. Explain Plan - IRS
The three case studies were performed on Oracle Release 8.0.4.
Test Suite
The cases were executed against a single partitioned table the characteristics of which are displayed below. The
SALES table is defined with a degree of parallelism of 4. This is because of the number of partitions though the
testing only relies on the use of the first 2 or 3 partitions.
Run the script dsi306_5_1.sql to create the table outlined below (You will be prompted for partition
names and tablespace names to place them in). Then execute the script dsi306_5_2.sql to populate this
table with sample data (approximately 100,000 rows). Create an index on the column WEEK_NO called
SALES_IDX. Calculate statistics on the table.
SQL> describe sales
Name Null? Type
------------------------------- -------- ----
WEEK_NO NUMBER(3)
SALES_DATE DATE
STORE_ID NUMBER
PROD_ID NUMBER
PROMO_AMT NUMBER
PROMO_QTY NUMBER
SALES_REP_ID NUMBER
ORDER_ID NUMBER
ORD_TYP NUMBER(38)
PROMO_TYP NUMBER
NB. Dont forget to enclose responses in quotes.
SQL> @qpartsegs
Enter the owner : TRAIN01
Enter the table/index name : SALES
SEGMENT_NA PARTITION_ HEADER_FILE HEADER_BLOCK
---------- ---------- ----------- ------------
SALES SALES_P1 5 2
SALES SALES_P2 7 2
SALES SALES_P3 9 2
SQL> @qpartid
Enter the owner : train01
Enter the table : sales
OWNER# UNAME NAME SUBNAME OBJ# DATAOBJ#
------- ---------- ------- -------------------- ------- --------
26 TRAIN01 SALES SALES_P1 2220 2220
SALES SALES_P2 2221 2221
SALES SALES_P3 2222 2222
SALES 2219
SQL> @qpartbounds
Enter the table owner : train01
Enter the table name : sales
TABLE_OWNE TABLE_NAME PARTITION_ HIGH_VALUE POS
---------- ---------- ---------- ------------------------------ ----
TRAIN01 SALES SALES_P1 5 1
SALES_P2 9 2
SALES_P3 13 3
There is also an index used for the 3
rd
case study called SALES_IDX on WEEKNO.
Case 1 - Locking Behaviour
Description
A user in attempting to perform a DML statement against a partition of the SALES table. They are currently
hanging. Another user is already performing PDML against the SALES table.
The first session runs the following PDML :
UPDATE SALES SET ord_typ = ord_typ WHERE weekno < 11;
The second session then attempts this DML:
update sales partition (sales_m1) set ord_typ = ord_typ where promo_typ=1;
Diagnostics
The following is seen in V$LOCK:
SQL> @qlocks
SID TY ID1 ID2 LMODE REQUEST USERNAME SPID
---------- -- ---------- ---------- ---------- ---------- ---------- ------
8 PS 1 1 4 0 TRAIN01 21639
PS 1 2 4 0 TRAIN01 21639
PS 1 0 4 0 TRAIN01 21639
TM 2219 0 3 0 TRAIN01 21639
TM 2220 0 6 0 TRAIN01 21639
TM 2221 0 6 0 TRAIN01 21639
TM 2222 0 6 0 TRAIN01 21639
TX 327684 97 6 0 TRAIN01 21639
9 TM 2219 0 3 0 TRAIN01 21435
TM 2220 0 0 3 TRAIN01 21435
12 PS 1 0 4 0 TRAIN01 21669
TM 2220 0 1 0 TRAIN01 21669
TM 2220 1 6 0 TRAIN01 21669
TM 2219 0 3 0 TRAIN01 21669
TX 196618 128 6 0 TRAIN01 21669
13 PS 1 2 4 0 TRAIN01 21673
TM 2222 0 1 0 TRAIN01 21673
TM 2222 1 6 0 TRAIN01 21673
TM 2219 0 3 0 TRAIN01 21673
TX 131085 123 6 0 TRAIN01 21673
14 PS 1 1 4 0 TRAIN01 21671
TM 2219 0 3 0 TRAIN01 21671
TM 2221 0 1 0 TRAIN01 21671
TM 2221 1 6 0 TRAIN01 21671
TX 327689 96 6 0 TRAIN01 21671
Explanation of Locking Output
SID 8 is the query coordinator (QC) or rather the PDML coordinator. It holds the following locks:
PS lock for each DML slave in mode 4(S). This is new in Oracle 8 - PS locks in O7 are held in mode
6(X). This is explained in the Locking Model section.
TM lock in mode 3 (SX) on ID1=2219. Object 2219 is the SALES table itself.
TM lock in mode 6 (X) on ID1=2220,2221,2222. These are the SALES_M1, SALES_M2 and
SALES_M3 partitions for which update is either occurring or which had to be checked for possibility of update.
These are partition locks.
TX lock in mode 6 (X) for its own transaction.
SID 12,13 and 14 are slaves (QS) and they hold the following locks:
Its own PS lock in mode 4 (S). Again in O8 this lock mode is different from O7. In O7 this lock is held in
mode 1 (NULL).
TM lock in mode 3 (SX) on ID1=2219. Object 2219 is the SALES table itself.
TM lock in mode 1 (NULL) on ID1=2220 ID2=0. This is the partition lock, in this case the one held by
SID 12. The slave only holds the partition and partition wait locks for its own partition.
TM lock in mode 6 (X) on ID1=2220 ID2=1. This is the partition wait lock for SID 12.
The waiting SID (9) is waiting on the TM lock in mode 3. This is a standard DML lock wait where the segment
is locked in a higher mode. The partition (object id = 2220) is held in mode 6 by the PDML coordinator..
Summary and Impact
The implication here is obvious - you cannot run PDML AND DML on the same partition simultaneously.
Case 2 - Explain Plan
Description
We will examine an execution plan to highlight the operations performed by the slaves and to see that the plan is
indeed parallelised. Note that when you attempt to explain a parallel DML statement you must first enable
parallel DML. If you do not you will only be given a serial plan. A script for dumping a parallel plan will be
presented which may be of use in diagnosing other customer issues.
Running the Explain Plan
alter session enable parallel dml
/
explain plan set statement_id = 'Test' for
UPDATE /*+PARALLEL (SALES,4)*/ SALES SET ord_typ = ord_typ
WHERE weekno < 11 and sales_rep_id between 6 and 12
/
NB. The hint is to force parallelism as the optimizer may not decide it is the best method. Try running the
statement with and without the hint.
Dumping the Plan Table Output (see script dsi306_5_exp.sql)
rem dsemler 24-MAY-96
rem PQO query of plan_table
rem Dump the plan itself with slave SQL
set echo off
set long 200
column id Heading "ID" format 999
column query heading "Query Plan" format a42
column other_tag heading "Parallel Op" format a30
column other heading "Other" format a32
select id,
lpad(' ',2*(level-1))||operation||' '||options||' '
||object_name||' '
||decode(object_node,'','','['||object_node||'] ') query,
other
from plan_table
start with id = 0
connect by prior id = parent_id;
rem
rem Dump the parallelisation of the operations
rem
select id,parent_id,other_tag
from plan_table
where id <> 0;
The Plan (Note - run dsi306_5_exp.sql to get the formatting shown here).
ID Query Plan Other
---- ------------------------------------------ --------------------------------
0 UPDATE STATEMENT
1 UPDATE SALES [:Q8000] UPDATE (SELECT /*+ FULL(A1) */ A
1.ROWID C0,A1."CUSTID" C1,A1."WE
EKNO" C2,A1."VALUE" C3 FROM "SAL
ES" PARTITION(:B1) A1 WHERE A1.
"CUSTID"<=100 AND A1."CUSTID">=8
8 AND A1."WEEKNO"<11) SET C3 = C
3
2 PARTITION CONCATENATED [:Q8000]
3 TABLE ACCESS FULL SALES [:Q8000]
ID PARENT_ID Parallel Op
---- ---------- ------------------------------
1 0 PARALLEL_TO_SERIAL
2 1 PARALLEL_COMBINED_WITH_PARENT
Interpretation
The output above indicates that each DML slave is sent the same SQL with the PARTITION as a bind variable
:B1. This value will then bound during the bind phase of the messaging from the QC. The operation is
performed as a full table scan - more accurately a full partition scan but its the same thing.
Case 3 - Explain Plan
Description
This second explain plan is to demonstrate the use of index scans on partitions during PDML. Ensure that an
index is created on sales on the column week_no
Running the Explain Plan
SQL> alter session enable parallel dml
2 /
Session altered.
SQL> explain plan set statement_id = 'Test' for
2 update /*+parallel(sales,4)*/ sales set promo_qty = promo_qty*1.1
3 where week_no < 11
4 /
NB. Attempt this with and without the hint.
The Plan
ID Query Plan Other
---- ------------------------------------------ --------------------------------
0 UPDATE STATEMENT
1 UPDATE SALES [:Q199000] UPDATE (SELECT /*+ INDEX(A1 "SAL
ES_IX") */ A1.ROWID C0,A1."VALUE
"C1 FROM "SALES" PARTITION(:B1)
A1 WHERE A1."WEEKNO"<11 AND A1."
WEEKNO"<11) SET C1 = C1*1.1
2 PARTITION CONCATENATED [:Q199000]
3 INDEX RANGE SCAN SALES_IX [:Q199000]
ID PARENT_ID Parallel Op
---- ---------- ------------------------------
1 0 PARALLEL_TO_SERIAL
Interpretation
The above explain plan will cause each DML slave to use an index range scan on the partitioned index
SALES_IX. The key thing to note here is that to get this to work in 8.0.3 you must define the columns in
SALES_IX ( in this case WEEKNO) to be NOT NULL. If this is not done bug 483556 will prevent this from
working. It will prevent parallel queries from using Fast Full Scans. This is fixed in 8.0.4. It is also interesting to
note that the where predicate A1."WEEKNO"<11 is duplicated despite having been in the original SQL only once.
DSI306 - Lesson 7 - Advanced Queueing
Oracle Confidential - Page 1
Oracle 8 Advanced Queuing - Demonstration
Contents:
DEMONSTRATION 2
Script 2
Script Run Output 3
Block Dump - Queue Table Header 5
Describe Output 5
Block Dump - Queue Table 6
Note - The scripts illustrated on the next few pages should be present in your LAB directories. They are
called
dsi306_7_1.sql - to create the AQ user and establish the queue.
dsi306_7_enq.sql - to enqueue a message
dsi306_7_deq.sql - to dequeue a message
Demonstration
Script
1. /* Create user and grant privileges: */
2.
3. CONNECT sys/change_on_install;
4. CREATE user aq identified by AQ;
5. GRANT AQ_ADMINISTRATOR_ROLE TO aq;
6. GRANT CONNECT TO aq;
7. GRANT RESOURCE TO aq;
8.
9. EXECUTE dbms_aqadm.grant_type_access(àq');
10.
11. CONNECT aq/AQ;
12.
13. SET ECHO ON;
14. SET SERVEROUTPUT ON;
15.
16. /* Create a message type: */
17.
18. CREATE type aq.message_type as object (
19. subject VARCHAR2(30),
20. text VARCHAR2(80));
21.
22. /* Create a object type queue table and queue: */
23. EXECUTE dbms_aqadm.create_queue_table (
24. queue_table => àq.msg',
25. queue_payload_type => àq.message_type');
26.
27. EXECUTE dbms_aqadm.create_queue (
28. queue_name => `msg_queue',
29. queue_table => àq.msg');
30.
31. EXECUTE dbms_aqadm.start_queue (
32. queue_name => `msg_queue');
33.
34. /* Enqueue to msg_queue: */
35.
36. DECLARE
37. enqueue_options dbms_aq.enqueue_options_t;
38. message_properties dbms_aq.message_properties_t;
39. message_handle RAW(16);
40. message aq.message_type;
41. BEGIN
42. message := message_type(`NORMAL MESSAGE',
43. enqued to msg_queue first.');
44. dbms_aq.enqueue(queue_name => `msg_queue',
45. enqueue_options => enqueue_options,
46. message_properties => message_properties,
47. payload => message,
48. msgid => message_handle);
49. COMMIT;
50. dbms_output.put_line('*****message enqueued');
51. END;
52. /
53.
54. /* Dequeue from msg_queue: */
55.
56. DECLARE
57. dequeue_options dbms_aq.dequeue_options_t;
58. message_properties dbms_aq.message_properties_t;
59. message_handle RAW(16);
60. message aq.message_type;
61. BEGIN
62. dbms_aq.dequeue(queue_name => `msg_queue',
63. dequeue_options => dequeue_options,
64. message_properties => message_properties,
65. payload => message,
66. msgid => message_handle);
67. dbms_output.put_line (`Message: ` || message.subject ||
68. ` ... ` || message.text );
69. COMMIT;
70. END;
71. /
Script Run Output
1. [tcsun2]/home/usupport/asatyawa> sqlplus aq/AQ
2.
3. SQL*Plus: Release 8.0.3.0.0 - Production on Thu Jul 24 14:19:54 1997
4.
5. (c) Copyright 1997 Oracle Corporation. All rights reserved.
6.
7.
8. Connected to:
9. Oracle8 Enterprise Edition Release 8.0.3.0.0 - Production
10. With the Partitioning and Objects options
11. PL/SQL Release 8.0.3.0.0 - Production
12.
13. SQL> @demo2
14. *****message enqueued
15.
16. PL/SQL procedure successfully completed.
17.
18. Message: NORMAL MESSAGE ... enqued to msg_queue first
19.
20. PL/SQL procedure successfully completed.
Block Dump - Queue Table Header
1. Start dump data blocks tsn: 0 file#: 1 minblk 27792 maxblk 27793
2. buffer tsn: 0 rdba: 0x00406c90 (1/27792)
3. scn:0x0000.0003ce49 seq:0x01 flg:0x00 tail:0xce491001
4. frmt:0x02 chkval:0x0000 type:0x10=DATA SEGMENT HEADER - UNLIMITED
5.
6. Extent Control Header
7. -----------------------------------------------------------------
8. Extent Header:: spare1: 0 tsn: 0 #extents: 1 #blocks: 4
9. last map rdba: 0x00000000 #maps: 0 offset: 1056
10. Highwater:: rdba: 0x00406c92 ext#: 0 blk#: 1 ext size: 4
11. #blocks in seg. hdr's freelists: 1
12. #blocks below: 1
13. mapblk rdba: 0x00000000 offset: 0
14. Unlocked
15. Map Header:: next rdba: 0x00000000 #extents: 1 obj#: 3351 flag: 0x400
16. 00000
17. Extent Map
18. -----------------------------------------------------------------
19. rdba: 0x00406c91 length: 4
20.
21. nfl = 1, nfb = 1 typ = 1 nxf = 0
22. SEG LST:: flg: USED lhd: 0x00406c91 ltl: 0x00406c91
Describe Output
1. SQL> describe msg
2. Name Null? Type
3. ------------------------------- -------- ----
4. Q_NAME VARCHAR2(30)
5. MSGID RAW(16)
6. CORRID VARCHAR2(30)
7. PRIORITY NUMBER
8. STATE NUMBER
9. DELAY DATE
10. EXPIRATION NUMBER
11. TIME_MANAGER_INFO DATE
12. LOCAL_ORDER_NO NUMBER
13. CHAIN_NO NUMBER
14. CSCN NUMBER
15. DSCN NUMBER
16. ENQ_TIME DATE
17. ENQ_UID NUMBER
18. ENQ_TID VARCHAR2(30)
19. DEQ_TIME DATE
20. DEQ_UID NUMBER
21. DEQ_TID VARCHAR2(30)
22. RETRY_COUNT NUMBER
23. EXCEPTION_QSCHEMA VARCHAR2(30)
24. EXCEPTION_QUEUE VARCHAR2(30)
25. STEP_NO NUMBER
26. RECIPIENT_KEY NUMBER
27. DEQUEUE_MSGID RAW(16)
28. USER_DATA MESSAGE_TYPE
Block Dump - Queue Table
1. buffer tsn: 0 rdba: 0x00406c91 (1/27793)
2. scn:0x0000.0003ce71 seq:0x01 flg:0x02 tail:0xce710601
3. frmt:0x02 chkval:0x0000 type:0x06=trans data
4.
5. Block header dump: rdba: 0x00406c91
6. Object id on Block? Y
7. seg/obj: 0xd17 csc: 0x00.3ce70 itc: 1 flg: O typ: 1 - DATA
8. fsl: 0 fnx: 0x0 ver: 0x01
9.
10. Itl Xid Uba Flag Lck Scn/Fsc
11. 0x01 0x0004.014.00000037 0x008009a6.0015.08 --U- 1 fsc 0x0075.0003ce71
12.
13. data_block_dump
14. ===============
15. tsiz: 0x7b8
16. hsiz: 0x24
17. pbl: 0x00ff8f44
18. bdba: 0x00406c91
19. flag=---------
20. ntab=1
21. nrow=9
22. frre=-1
23. fsbo=0x24
24. fseo=0x38a
25. avsp=0x366
26. tosp=0x3dd
27. 0xe:pti[0] nrow=9 offs=0
28. 0x12:pri[0] offs=0x6ca
29. 0x14:pri[1] offs=0x741
30. 0x16:pri[2] offs=0x653
31. 0x18:pri[3] offs=0x5dc
32. 0x1a:pri[4] offs=0x565
33. 0x1c:pri[5] offs=0x4ef
34. 0x1e:pri[6] offs=0x479
35. 0x20:pri[7] offs=0x401
36. 0x22:pri[8] offs=0x38a
37. block_row_dump:
38. tab 0, row 0, @0x6ca
39. tl: 2 fb: --HDFL-- lb: 0x1
40. tab 0, row 1, @0x741
41. tl: 119 fb: --H-FL-- lb: 0x0 cc: 27
42. col 0: [ 9] 4d 53 47 5f 51 55 45 55 45
43. col 1: [16] 16 e8 26 dd 44 9c 5a 19 e0 34 08 00 20 85 66 42
44. col 2: *NULL*
45. col 3: [ 2] c1 02
46. col 4: [ 1] 80
47. col 5: *NULL*
48. col 6: *NULL*
49. col 7: *NULL*
50. col 8: [ 1] 80
51. col 9: [ 1] 80
52. col 10: *NULL*
53. col 11: *NULL*
54. col 12: [ 7] 77 c5 07 16 0f 1e 2a
55. col 13: [ 2] c1 32
56. col 14: [ 7] 33 2e 31 32 2e 37 34
57. col 15: *NULL*
58. col 16: *NULL*
59. col 17: *NULL*
60. col 18: [ 1] 80
61. col 19: *NULL*
62. col 20: *NULL*
63. col 21: [ 1] 80
64. col 22: [ 1] 80
65. col 23: *NULL*
66. col 24: [ 1] 00
67. col 25: [14] 4e 4f 52 4d 41 4c 20 4d 45 53 53 41 47 45
68. col 26: [25]
69. 65 6e 71 75 65 64 20 74 6f 20 6d 73 67 5f 71 75 65 75 65 20 66 69 72 73 74
70. tab 0, row 2, @0x653
71. tl: 119 fb: --H-FL-- lb: 0x0 cc: 27
72. col 0: [ 9] 4d 53 47 5f 51 55 45 55 45
73. col 1: [16] 16 e8 26 dd 44 9e 5a 19 e0 34 08 00 20 85 66 42
74. col 2: *NULL*
75. col 3: [ 2] c1 02
76. col 4: [ 1] 80
77. col 5: *NULL*
78. col 6: *NULL*
79. col 7: *NULL*
80. col 8: [ 1] 80
81. col 9: [ 1] 80
82. col 10: *NULL*
83. col 11: *NULL*
84. col 12: [ 7] 77 c5 07 16 0f 1e 2a
85. col 13: [ 2] c1 32
86. col 14: [ 7] 35 2e 31 32 2e 36 30
87. col 15: *NULL*
88. col 16: *NULL*
89. col 17: *NULL*
90. col 18: [ 1] 80
91. col 19: *NULL*
92. col 20: *NULL*
93. col 21: [ 1] 80
94. col 22: [ 1] 80
95. col 23: *NULL*
96. col 24: [ 1] 00
97. col 25: [14] 4e 4f 52 4d 41 4c 20 4d 45 53 53 41 47 45
98. col 26: [25]
99. 65 6e 71 75 65 64 20 74 6f 20 6d 73 67 5f 71 75 65 75 65 20 66 69 72 73 74
100.
DSI306 - Lesson 8 - Data Loading
DSI306 Lesson 8 Data Loading
A) Example Import Session Using Partition-Level Import
This section describes how to use partition-level Import to partition an
unpartitioned table, merge partitions of a table, and repartition a table on a
different column.
The examples in this section assume that the following tablespaces exist (If
these exercises are being performed then choose tablespaces/datafiles
appropriate to your database):
* tbs_e1, tbs_e2, tbs_e3
* tbs_d1, tbs_d2, tbs_d3
Example 1: Partitioning an Unpartitioned Table
Perform the following steps to partition an unpartitioned table:
1. Export the table to save the data.
2. Drop the table from the database.
3. Create the table again with partitions.
4. Import the table data.
The following example shows how to partition an unpartitioned table:
% exp scott/tiger tables=emp file=empexp.dmp
.
.
.
About to export specified tables via Conventional Path ...
. . exporting table EMP 14 rows exported
Export terminated successfully without warnings.
SQL> drop table emp cascade constraints;
Table dropped.
SQL> create table emp
2 (
3 empno number(4) not null,
4 ename varchar2(10),
5 job varchar2(9),
6 mgr number(4),
7 hiredate date,
8 sal number(7,2),
9 comm number(7,2),
10 deptno number(2)
11 )
12 partition by range (empno)
13 (
14 partition emp_low values less than (7600)
15 tablespace tbs_e1,
16 partition emp_mid values less than (7900)
17 tablespace tbs_e2,
18 partition emp_high values less than (8100)
19 tablespace tbs_e3
20 );
Table created.
SQL> exit
% imp scott/tiger tables=emp file=empexp.dmp ignore=y
.
. .
Export file created by EXPORT:V08.00.03 via conventional path
. importing SCOTT's objects into SCOTT
. . importing table "EMP" 14 rows imported
Import terminated successfully without warnings
The following SELECT statements show that the data is partitioned on the
empno column:
SQL> select empno from emp partition (emp_low);
EMPNO
----------
7369
7499
7521
7566
4 rows selected.
SQL> select empno from emp partition (emp_mid);
EMPNO
----------
7654
7698
7782
7788
7839
7844
7876
7 rows selected.
SQL> select empno from emp partition (emp_high);
EMPNO
----------
7900
7902
7934
3 rows selected.
Example 2: Merging Partitions of a Table
This example assumes the EMP table has three partitions, based on the
EMPNO column, as shown in Example 1.
Perform the following steps to merge partitions of a table:
1. Export the partition you want to merge. This saves the data.
2. Alter the table to delete the partition you want to merge.
3. Import the partition to be merged.
The following example shows how to merge partitions of a table:
% exp scott/tiger tables=emp:emp_mid file=empprt.dmp
. . exporting table EMP
. . exporting partition EMP_MID 7 rows exported
. .
.SQL> alter table emp drop partition emp_mid;
Table altered.
imp scott/tiger fromuser=scott tables=emp:emp_mid file=empprt.dmp
ignore=y
. . .
. . importing partition "EMP":"EMP_MID" 7 rows imported
Import terminated successfully without warnings.
The following SELECT statements show the data from the deleted
EMP_MID partition now merged in the EMP_HIGH partition:
SQL> select empno from emp partition (emp_low);
EMPNO
----------
7369
7499
7521
7566
4 rows selected.
SQL> select empno from emp partition (emp_high);
EMPNO
----------
7900
7902
7934
7654
7698
7782
7788
7839
7844
7876
10 rows selected.
Example 3: Repartitioning a Table on a Different Column
This example assumes the EMP table has two partitions, based on the
EMPNO column, as shown in Example 2. This example repartitions the EMP
table on the DEPTNO coumn.
Perform the following steps to repartition a table on a different column:
1. Export the table to save the data.
2. Delete the table from the database.
3. Create the table again with the new partitions.
4. Import the table data.
The following example shows how to repartition a table on a different
column:
% exp scott/tiger tables=emp file=empexp.dat
.
. . exporting table EMP
. . exporting partition EMP_LOW 4 rows exported
. . exporting partition EMP_HIGH 10 rows exported
SQL> drop table emp cascade constraints;
Table dropped.
SQL>
SQL> create table emp
2 (empno number(4) not null,
3 ename varchar2(10),
4 job varchar2(9),
5 mgr number(4),
6 hiredate date,
7 sal number(7,2),
8 comm number(7,2),
9 deptno number(2) )
10 partition by range (deptno)
11 (
12 partition dept_low values less than (15)
13 tablespace tbs_d1,
14 partition dept_mid values less than (25)
15 tablespace tbs_d2,
16 partition dept_high values less than (35)
17 tablespace tbs_d3
18 );
Table created.
SQL> exit
% imp scott/tiger tables=emp file=empexp.dat ignore=y
.
. . importing partition "EMP":"EMP_LOW" 4 rows imported
. . importing partition "EMP":"EMP_HIGH" 10 rows imported
Import terminated successfully without warnings.
The following SELECT statements show that the data is partitioned on the
DEPTNO column:
SQL> select empno, deptno from emp partition (dept_low);
EMPNO DEPTNO
---------- ----------
7934 10
7782 10
7839 10
3 rows selected.
SQL> select empno, deptno from emp partition (dept_mid);
EMPNO DEPTNO
---------- ----------
7369 20
7566 20
7902 20
7788 20
7876 20
5 rows selected.
SQL> select empno, deptno from emp partition (dept_high);
EMPNO DEPTNO
---------- ----------
7499 30
7521 30
7900 30
7654 30
7698 30
7844 30
6 rows selected.
B) Populating the Database Using Parallel Load
This section presents a case study (of an actual system) which illustrates how
to create, load, index, and analyze a large data warehouse fact table with
partitions, in a
typical star schema. This example uses SQL Loader to explicitly stripe data
over 30 disks.
* The example 120 G table is named FACTS.
* The system is a 10 CPU shared memory computer with more than 100 disk
drives.
* Thirty disks (4 G each) will be used for base table data, 10 disks for index, and
30 disks for temporary space. Additional disks are needed for rollback segments,
control files, log files, possible staging area for loader flat files, and so on.
The FACTS table is partitioned by month into 12 logical partitions.
To facilitate backup and recovery each partition is stored in its own
tablespace.
* Each partition is spread evenly over 10 disks, so that a scan which accesses few
partitions, or a single partition, can still proceed with full parallelism. Thus there
can be intra-partition parallelism when queries restrict data access by partition
pruning.
* Each disk has been further subdivided using an OS utility into 4 OS files with
names like /dev/D1.1, /dev/D1.2, ... , /dev/D30.4.
* Four tablespaces are allocated on each group of 10 disks. To better balance I/O
and parallelize table space creation (because Oracle writes each block in a datafile
when it is added to a tablespace), it is best if each of the four tablespaces on each
group of 10 disks has its first datafile on a different disk. Thus the first tablespace
has /dev/D1.1 as its first datafile, the second tablespace has /dev/D4.2 as its first
datafile and so on.
Step 1: Create the Tablespaces and Add Datafiles in Parallel
Below is the command to create a tablespace named "Tsfacts1". Other
tablespaces are created with analogous commands. On a 10-CPU machine, it
should be possible to run all 12 CREATE TABLESPACE commands
together.
Alternatively, it might be better to run them in two batches of 6 (two from
each of the three groups of disks).
CREATE TABLESPACE Tsfacts1
DATAFILE /dev/D1.1' SIZE 1024MB REUSE
DATAFILE /dev/D10.1 SIZE 1024MB REUSE
DEFAULT STORAGE (INITIAL 100MB NEXT 100MB PCTINCREASE 0)
...

Extent sizes in the STORAGE clause should be multiples of the multiblock
read size, where
blocksize * MULTIBLOCK_READ_COUNT = multiblock read size
Note that INITIAL and NEXT should normally be set to the same value. In the case of
parallel load, make the extent size large enough to keep the number of extents reasonable,
and to avoid excessive overhead and serialization due to bottlenecks in the data dictionary.
When PARALLEL=TRUE is used for parallel loader, the INITIAL extent is not used. In
this case you can override the INITIAL extent size specified in the tablespace default
storage clause with the value that you specify in the loader control file (such as, for
example, 64K).
Tables or indexes can have an unlimited number of extents provided you have set the
COMPATIBLE system parameter and use the MAXEXTENTS keyword on the CREATE
or ALTER command for the tablespace or object. In practice, however, a limit of 10,000
extents per object is reasonable. A table or index has an unlimited number of extents, so
the PERCENT_INCREASE parameter should be set to zero in order to have extents of
equal size.
Note: It is not desirable to allocate extents faster than about 2 or 3 per minute. This is an
issue with allocating the ST enqueue for temporary segments and sorting . Thus, each
process should get an extent that will last for 3 to 5 minutes. Normally such an extent is at
least 50MB for a large object. Too small an extent size will incur a lot of overhead, and
this will affect performance and scalability of parallel operations. The largest possible
extent size for a 4GB disk evenly divided into 4 partitions is 1GB. 100MB extents should
work nicely. Each partition will have 100 extents. The default storage parameters can be
customized for each object created in the tablespace, if needed.
Step 2: Create the Partitioned Table
We create a partitioned table with 12 partitions, each in its own
tablespace. The table contains multiple dimensions and multiple measures.
The partitioning column is named "dim_2" and is a date. There are other
columns as well.
CREATE TABLE fact (dim_1 NUMBER, dim_2 DATE, ...
meas_1 NUMBER, meas_2 NUMBER, ... )
PARALLEL
(PARTITION BY RANGE (dim_2)
PARTITION jan95 VALUES LESS THAN ('02-01-1995') TABLESPACE TSfacts1
PARTITION feb95 VALUES LESS THAN ('03-01-1995') TABLESPACE TSfacts2
...
PARTITION dec95 VALUES LESS THAN ('01-01-1996') TABLESPACE TSfacts12)
;
Step 3: Load the Partitions in Parallel
This section describes four alternative approaches to loading partitions in
parallel.
The different approaches to loading help you manage the ramifications of the
PARALLEL=TRUE keyword of SQL*Loader, which controls whether or not
individual partitions are loaded in parallel. The PARALLEL keyword entails
restrictions such as the following:
* Indexes cannot be defined.
* You need to set a small initial extent, because each loader session gets a
new extent when it begins, and it doesn't use any existing space associated
with the object.
* Space fragmentation issues arise.
However, regardless of the setting of this keyword, if you have one loader
process per partition, you are still effectively loading into the table in parallel.
Case 1
In this approach, assume 12 input files that are partitioned in the same way as
your table. The DBA has 1 input file per partition of the table to be loaded.
The DBA starts 12 SQL*Loader sessions in parallel, entering statements like
these:
SQLLDR DATA=jan95.dat DIRECT=TRUE CONTROL=jan95.ctl
SQLLDR DATA=feb95.dat DIRECT=TRUE CONTROL=feb95.ctl
. . .
SQLLDR DATA=dec95.dat DIRECT=TRUE CONTROL=dec95.ctl
Note that the keyword PARALLEL=TRUE is not set. A separate control file
per
partition is necessary because the control file must specify the partition
into which the loading should be done. It contains a statement such as:
LOAD INTO fact partition(jan95)
Advantages of this approach are that local indexes are maintained by
SQL*Loader. You still get parallel loading, but on a partition level-without
the restrictions of the PARALLEL keyword.
A disadvantage is that you must partition the input manually.
Case 2
In another common approach, assume an arbitrary number of input files that
are not partitioned in the same way as the table. The DBA can adopt a
strategy of performing parallel load for each input file individually. Thus if
there are 7 input files, the DBA can start 7 SQL*Loader sessions, using
statements like the following:
SQLLDR DATA=file1.dat DIRECT=TRUE PARALLEL=TRUE
Oracle will partition the input data so that it goes into the correct
partitions. In this case all the loader sessions can share the same control file,
so there is no need to mention it in the statement.
The keyword PARALLEL=TRUE must be used because each of the 7 loader
sessions can write into every partition. (In case 1, every loader session would
write into only 1 partition, because the data was already partitioned outside
Oracle.) Hence all the PARALLEL keyword restrictions are in effect.
In this case Oracle attempts to spread the data evenly across all the files in
each of the 12 tablespaces-however an even spread of data is not guaranteed.
Moreover, there could be I/O contenntion during the load when the loader
processes are attempting simultaneously to write to the same device.
Case 3
In Case 3 (illustrated in the example), the DBA wants precise control of the
load. To achieve this the DBA must partition the input data in the same way
as the datafiles are partitioned in Oracle.
This example uses 10 processes loading into 30 disks. To accomplish this, the
DBA must split the input into 120 files beforehand. The 10 processes will
load the first partition in parallel on the first 10 disks, then the second
partition in parallel on the second 10 disks, and so on through the 12th
partition. The DBA runs the following commands concurrently as background
processes:
SQLLDR DATA=jan95.file1.dat DIRECT=TRUE PARALLEL=TRUE FILE=/dev/D1.1
...
SQLLDR DATA=jan95.file10.dat DIRECT=TRUE PARALLEL=TRUE FILE=/dev/D10.1
WAIT;
...
SQLLDR DATA=dec95.file1.dat DIRECT=TRUE PARALLEL=TRUE FILE=/dev/D30.4
...
SQLLDR DATA=dec95.file10.dat DIRECT=TRUE PARALLEL=TRUE FILE=/dev/D29.4
For Oracle Parallel Server, divide the loader session evenly among the nodes.
The datafile being read should always reside on the same node as the loader
session. NFS mount of the data file on a remote node is not an optimal
approach.
The keyword PARALLEL=TRUE must be used, because multiple loader
sessions can write into the same partition. Hence all the restrictions entailed
by the PARALLEL keyword are in effect. An advantage of this approach,
however, is that it guarantees that all of the data will be precisely balanced,
exactly reflecting your partitioning.
Note: Although this example shows parallel load used with partitioned tables,
the two features can be used independent of one another.
Case 4
For this approach, all of your partitions must be in the same tablespace. You
need to have the same number of input files as datafiles in the tablespace, but
you do not need to partition the input the same way in which the table is
partitioned.
For example, if all 30 devices were in the same tablespace, then you would
arbitrarily partition your input data into 30 files, then start 30 SQL*Loader
sessions in parallel. The statement starting up the first session would be like
the following:
SQLLDR DATA=file1.dat DIRECT=TRUE PARALLEL=TRUE FILE=/dev/D1
. . .
SQLLDR DATA=file30.dat DIRECT=TRUE PARALLEL=TRUE FILE=/dev/D30
The advantage of this approach is that, as in Case 3, you have control overthe
exact placement of datafiles, because you use the FILE keyword. However,
you are not required to partition the input data by value: Oracle does that.
A disadvantage is that this approach requires all the partitions to be in the
same tablespace; this minimizes availability.

Dsi 306

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dsi 306

Uploaded by

Copyright:

Available Formats

Data Server Internals DSI306

SQL> create bitmap index bit_colour on colour_table(colour);

row#1[1347] flag: ----, lock: 0

CREATE TABLESPACE Tsfacts12

You might also like