You are on page 1of 102

################################################################################## ############ # Topic Name : UNIXORACLE # # Author Name : Aalok Dixit (Oracle DBA) # # Creation date: 16-11-2001 # ##################################################################################

############ Purpose The following table documents Unix kernel parameters that should be monitored and possibly increased after changes are made to the related init.ora parameter. Please check with your Operating System documentation for specific details on the parameter changes.

Init.ora Parameter Unix Kernel Parameter db_block_buffers shmmax db_files (maxdatafiles) nfile, maxfiles large_pool_size shmmax log_buffer shmmax processes nproc, semmsl, semmns shared_pool_size shmmax

Common Unix Kernel Parameter Definitions The following Kernel Parameters tend to be generic across most Unix platforms. However, their names may be different on your platform. Consult your Installation and Configuration Guide (ICG) for the exact names. maxfiles - Soft file limit per process. maxuprc - Maximum number of simultaneous user processes per userid. nfile - Maximum number of simultaneously open files systemwide at any given time. nproc - Maximum number of processes that can exist simultaneously in the system. shmmax - The maximum size(in bytes) of a single shared memory segment. shmmin - The minimum size(in bytes) of a single shared memory segment. shmmni - The number of shared memory identifiers. shmseg - The maximum number of shared memory segments that can be attached by a process. semmns - The number of semaphores in the system. semmni - The number of semaphore set identifiers in the system; determines the number of semaphore sets that can be created at any one time. semmsl - The maximum number of sempahores that can be in one semaphore set. It should be same size as maximum number of Oracle processes.

References: Note:1010913.6 - Unix Configuration Parameters: Where to set Semaphores and Shared Memory Problem Description: ==================== This entry covers the Unix configuration parameters, and where to set semaphores and shared memory for various Unix platforms. Search Words: kernel, tuning, SHMMAX, SHMMIN, SHMMNI, SHMSEG, SEMMNS, SEMMNI, SEMMSL, file Unix Version Kernel Configuration File ================================================= ATT 3000 SVR4/386 Data General 88K (Motorola) DEC Alpha OSF/1 DEC RISC Ultrix DG Aviion HP 9000/3xx HP 9000/8xx V9.0.x HP 9000/8xx V10.x IBM RS/6000 AIX ISC Unix Olivetti SVR4 v2.x Pyramid OSX SCO Unix Sequent Dynix Sequent Dynix/ptx Silicon Graphics V4.x Silicon Graphics V5.x Silicon Gorphics IRIX v5.x Solbourne Sun Solaris Sun SunOS 4c/Sparc /etc/conf/cf.d/stune /usr/include/sys/param.h /usr/sys/include/sys/param.h /usr/sys/conf/mips/[KNLNAME] /var/Build/system.aviion /etc/conf/dfile /etc/conf/gen/S800 /stand/system automatically configured /etc/conf/cf.d/stune /etc/conf/cf.d/stune /usr/sys/kernel/[KNLNAME] /etc/conf/cf.d/stune /usr/sys/conf/[KNLNAME] /usr/conf/uts/symmetry/site.[KNLNAME] /usr/var/sysgen/mtune/shm /var/sysgen/mtune/shm /usr/var/sysgen/stune /usr/sys/kbus/conf/[KNLNAME] /etc/system /usr/kvm/sys/sun4c/conf/[KNLNAME]

Unisys SVR4/386 .

/etc/conf/cf.d/stune

Note:1008866.6 - How to determine SGA Size (7.x, 8.0, 8i) PURPOSE The following explains how to approximate the size of the SGA (System Global Area). SCOPE & APPLICATION It is very difficult and time consuming to exactly calculate the SGA size based on values of init.ora parameters. It is difficult because of different port specific sizes of data structures that are allocated in the SGA. It is time consuming because there are so many parameters that influence the SGA size. For example, any parameter that configures a number of resources, such as PROCESSES and SESSIONS, will have an impact on the SGA size. This article will concentrate on: - Showing size of the SGA once connected to a running database. - Present a brief overview on different sub-divisions of the SGA - How to ESTIMATE the size of the SGA based on values of init.ora parameters. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ How to Approximate the Size of the SGA in in 8.0.X, 8i, and 9i: =============================================================== This section discusses Oracle8, Oracle8i, and Oracle9i. at the end of this note. Showing size of the SGA ----------------------SGA size information are displayed upon startup of the database. It can also be displayed using svrmgrl or sqlplus. See examples below. 8.0.X - svrmgrl connect internal show sga 8.1.X - svrmgrl or sqlplus /nolog connect internal show sga 9.X - sqlplus SQL*Plus: Release 9.0.1.0.0 - Production on Thu Aug 23 15:40:29 2001 (c) Copyright 2001 Oracle Corporation. All rights reserved. Enter user-name: sys as sysdba Enter password: Oracle7 is discussed

Connected to: Oracle9i Enterprise Edition Release 9.0.1.0.0 - Production With the Partitioning option JServer Release 9.0.1.0.0 - Production SQL> show sga Total System Global Area Fixed Size Variable Size Database Buffers Redo Buffers 72123504 279664 67108864 4194304 540672 bytes bytes bytes bytes bytes

Different sub-divisions of the SGA ---------------------------------Sample from svrmgrl SHOW SGA: Total System Global Area Fixed Size Variable Size Database Buffers Redo Buffers 23460696 72536 22900736 409600 77824 bytes bytes bytes bytes bytes

Total System Global Area - Total in bytes of all the sub-divisions that makes up the SGA. Fixed Size - Contains general information about the state of the database and the instance, which the background processes need to access. - No user data is stored here. - This area is usually less than 100k in size. Variable Size - This section is influenced by the following init.ora parameters shared_pool_size large_pool_size java_pool_size - See 'Approximating Size of the SGA' section of this article for version specific information. Database Buffers - Holds copies of data blocks read from datafiles. size = db_block_buffers * block size Redo Buffers - A circular buffer in the SGA that holds information about changes made to the database. - Enforced mininum is set to 4 times the maximum database block size for the host operating system. Approximating size of the SGA ----------------------------8.0.X To approximate size of the SGA (Shared Global Area), use the following

formula: ((db_block_buffers * block size) + (shared_pool_size + large_pool_size + log_buffers) + 1MB 8.1.X To approximate size of the SGA (Shared Global Area), use the following formula: ((db_block_buffers * block size) + (shared_pool_size + large_pool_size + java_pool_size + log_buffers) + 1MB 9.X In Oracle9i, the SGA can be configured as in prior releases to be static, or can now be dynamically configured. The size of the dynamic SGA is determined by the values of the following database initialization parameters: DB_BLOCK_SIZE, DB_CACHE_SIZE, SHARED_POOL_SIZE, and LOG_BUFFER. Beginning with Oracle9i, the SGA infrastructure is dynamic. This means that the following primary parameters used to size the SGA can be changed while the instance is running: Buffer cache ( DB_CACHE_SIZE) -- the size in bytes of the cache of standard blocks Shared pool ( SHARED _POOL_SIZE) -- the size in bytes of the area devoted to shared SQL and PL/SQL statements Large pool (LARGE_POOL_SIZE) (default is 0 bytes) -- the size in bytes of the large pool used in shared server systems for session memory, parallel execution for message buffers, and by backup and restore processes for disk I/O buffers. The LOG_BUFFER parameter is used when buffering redo entries to a redo log. It is a static parameter and represents a very small portion of the SGA and can be changed only by stopping and restarting the database to read the changed value for this parameter from the initialization parameter file (init.ora). Note that even though you cannot change the MAX_SGA_SIZE parameter value dynamically, you do have the option of changing any of its three dependent primary parameters: DB_CACHE_SIZE, SHARED_POOL_SIZE, and LARGE_POOL_SIZE to make memory tuning adjustments on the fly. (NOTE: LARGE_POOL_SIZE cannot be dynamically changed in Oracle 9.0.1, it is anticipated to be made dynamic in the next release).

To help you specify an optimal cache value, you can use the dynamic DB_CACHE_ADVICE parameter with statistics gathering enabled to predict behavior with different cache sizes through the V$DB_CACHE_ADVICE performance view. Use the ALTER SYSTEM...SET clause... statement to enable this parameter. See Oracle9i Database Performance Guide and Reference for more information

about using this parameter. Beginning with Oracle9i, there is a concept of creating tablespaces with multiple block sizes and specifying cache sizes corresponding with each block size. The SYSTEM tablespace uses a standard block size and additional tablespaces can use up to four non-standard block sizes. The standard block size is specified by the DB_BLOCK_SIZE parameter. Its cache size is specified by the DB_CACHE_SIZE parameter. Non-standard block sizes are specified by the BLOCKSIZE clause of the CREATE TABLESPACE statement. The cache size for each corresponding non-standard block size is specified using the notation: DB_nK_CACHE_SIZE parameter, where the value n is 2, 4, 8, 16, or 32 Kbytes. The standard block size, known as the default block size, is usually set to the same size in bytes as the operating system block size, or a multiple of this size. The DB_CACHE_SIZE parameter, known as the DEFAULT cache size, specifies the size of the cache of standard block size (default is 48M bytes). The system tablespace uses the standard block size and the DEFAULT cache size. Either the standard block size or any of the non-standard block sizes and their associated cache sizes can be used for any of the other tablespaces. If you intend to use multiple block sizes in your database storage design, you must specify at least the DB_CACHE_SIZE and one DB_nK_CACHE_SIZE parameter value. You must specify all sub-caches for all the other non-standard block sizes that you intend to use. This block size/cache sizing scheme lets you use up to four different non-standard block sizes for your tablespaces and lets you specify respective cache sizes for each corresponding block size. Because the DB_BLOCK_SIZE parameter value can be changed only by re-creating the database, the value for this parameter must be chosen carefully and remain unchanged for the life of the database. To approximate size of the SGA (Shared Global Area), use following formula: DB_CACHE_SIZE + DB_KEEP_CACHE_SIZE + DB_RECYCLE_CACHE_SIZE + DB_nk_CACHE_SIZE + SHARED_POOL_SIZE + LARGE_POOL_SIZE + JAVA_POOL_SIZE + LOG_BUFFERS + 1MB NOTE: ADD IN EACH DB_nk_CACHE_SIZE. THERE CAN BE UP TO 4 DB_nk_CACHE_SIZE (2, 4, 8, 16, 32k) DEFINED. ONE OF THE BLOCK SIZES IS THE DEFAULT BLOCK SIZE AND ITS CACHE SIZE IS DEFINED BY DB_CACHE_SIZE.

Additional Information: ----------------------- Redo Buffers in SHOW SGA does not match init.ora:log_buffer parameter setting. - Enforced mininum is set to 4 times the maximum database block size for the host operating system. For more details, see: <Note 30753.1> Init.ora Parameter "LOG_BUFFER" Reference Note - Java_pool_size not accounted for in SHOW SGA or v$sga. This is a bug that is fixed in 8.1.6. - Java_pool_size restrictions in 8.1.5. The default is 20000K. If specifying in the init.ora, must it must be greater than 1000K, or you will receive an ORA-01078 "failure in processing initialization parameters"

error on startup. - Java_pool_size restrictions in 8.1.6. The default is 20000K. This parameter can be set in the init.ora, but the enforced mininum is 32768. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Approximating SGA Size and Showing Existing SGA in Oracle7: =========================================================== To approximate the size of the SGA (Shared Global Area), use the following formula: ( (db_block_buffers * block size) + shared_pool_size + log_buffers) /.9 Example (from 7.0.16 on PORT 2 HP-UX 9000): From the init<SID>.ora DB_BLOCK_BUFFERS = 200 LOG_BUFFERS = 8192 SHARED_POOL_SIZE = 3500000 Default Block Size = 2048 bytes The Block Size is an Operating System specific default. db_block_buffers * block size + shared_pool_size + log_buffers ( (200 * 2048) + 3500000 + 8192 ) / .9 409600 + 3500000 + 8192 = 3917792 bytes dividing by 0.9 = 4,353,102 bytes or 4M The division by .9 is used to take into account the variable portion of the SGA -- this is only an approximation of the actual value. Our calculations come up to 4353102 but the actual value is 4504072(see below). To check the actual size of the SGA, issue these commands using either sqldba or svrmgrl: 7.0.X - 7.2.X % sqldba lmode=y SQLDBA> connect internal SQLDBA> show sga 7.1.X - 7.3.X % svrmgrl SVRMGR> connect internal SVRMGR> show sga Example of Output:

Total System Global Area Fixed Size Variable Size Database Buffers Redo Buffers

4504072 37704 4048576 409600 8192

bytes <-- total size loaded into memory bytes bytes bytes bytes ('log buffers')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ References: =========== [NOTE:30753.1] PARAMETER: INIT.ORA: LOG_BUFFER [NOTE:1058897.6] WHAT DO V$SGASTAT AND V$SGA INDICATE AND DO THEY RELATE? Note:1012819.6 - Operation System Tuning issues on Unix Problem Description: ==================== This document discusses operating system performance issues on Unix. Subjects which are particularly addressed include memory management and tuning your SGA and OS kernel parameters. Solution Description: ===================== Operating System Performance Issues on Unix The shared memory feature of the Unix operating system is required by Oracle. The System Global Area (SGA) resides in shared memory; therefore, enough shared memory must be available to each Oracle process to address the entire SGA. Refer to PRE 1008866.6 SGA SIZE AND CONFIGURATION for details on how to calculate the size of your SGA. If you create a large SGA and a large portion of your machine's physical memory is devoted to the shared memory, it will resulting in poor performance. However, there are also performance benefits to having your entire SGA located in shared memory. Therefore, when sizing your SGA, you need to balance Oracle performance needs with the demands your machine and operating system can meet without suffering performance degradation. For information on recommended Unix shared memory kernel parameters, see PRE 1011658.6 SHARED MEMORY REQUIREMENTS ON UNIX. As a general rule, the total size of the SGA (or SGAs if you have more than one Oracle instance) on a machine should not exceed RAM minus the memory the operating system is using. If you are running other programs and applications, then you must take into account the additional memory they require. . Note:1011658.6 - Shared Memory requirements on Unix PURPOSE Oracle uses shared memory and semaphores to communicate between processes and the SGA (System Global Area). There are certain requirements for shared memory and the semaphores. When the Oracle instance comes up, it allocates a

certain portion of the main memory to create the SGA. If the shared memory or the semaphores are not set properly, then it gives an error related to shared memory or semaphores. The following are the recommended values for shared memory and semaphores for running a SMALL size Oracle database. These values are set at the Unix kernel level. SCOPE & APPLICATION This entry lists shared memory requirements for Unix systems. Shared Memory Requirements on Unix: =================================== The shared memory feature of the UNIX operating system is required by Oracle. The System Global Area (SGA) resides in shared memory; therefore, shared memory must be available to each Oracle process to address the entire SGA. Definitions of Shared Memory and Semaphore Parameters SHMMAX SHMMIN SHMMNI SHMSEG The maximum size(in bytes) of a single shared memory segment. The minimum size(in bytes) of a single shared memory segment. The number of shared memory identifiers. The maximum number of shared memory segments that can be attached by a process. SEMMNS = The number of semaphores in the system. SEMMNI = The number of semaphore set identifiers in the system; determines the number of semaphore sets that can be created at any one time. SEMMSL = The maximum number of sempahores that can be in one semaphore set. It should be same size as maximum number of Oracle processes (The PROCESSES parameter in the init.ora file). Recommended Semaphore and Shared Memory Parameters Operating System ---------------Sun OS Shared Memory Parameters -----------------------SHMSIZE = 32768 SHMMNI = 50 Semaphore --------SEMMNS = 200 SEMMNI = 50 = = = =

TECH: Unix Semaphores and Shared Memory Explained Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 06-OCT-1994 Last Revision Date: 26-APR-2001 PURPOSE Shared memory and semaphores are two important resources for an Oracle instance on Unix. An instance cannot start if it is unable to allocate what it needs. This paper primarily discusses the process Oracle goes through to

allocate shared memory and semaphores at instance startup. Other important points unrelated to startup as well as some troubleshooting information will be touched upon. SCOPE & APPLICATION Understanding Oracle and Shared Memory/Semaphores.

Unix Semaphores and Shared Memory Explained =========================================== General ======= Shared memory is exactly that - a memory region that can shared between different processes. Oracle uses shared memory for implementing the SGA, which needs to be visible to all database sessions. Shared memory is also used in the implementation of the SQL*Net V1 Fast driver as a means of communicating between the application and shadow process. On the RS/6000, each shadow process stores its PGA in a shared memory segment (however, only the shadow attaches this segment). In the latter two cases, Oracle allocates the shared memory dynamically as opposed to the allocation of the SGA, which occurs at instance startup. This allocation will not be discussed in this paper. Semaphores can be thought of as flags (hence their name, semaphores). They are either on or off. A process can turn on the flag or turn it off. If the flag is already on, processes who try to turn on the flag will sleep until the flag is off. Upon awakening, the process will reattempt to turn the flag on, possibly suceeding or possibly sleeping again. Such behaviour allows semaphores to be used in implementing a post-wait driver - a system where processes can wait for events (i.e. wait on turning on a semphore) and post events (i.e. turning of a semaphore). This mechanism is used by Oracle to maintain concurrency control over the SGA, since it is writeable by all processes attached. Also, for the same reasons, use of the Fast Driver requires additional semaphores. However, these semaphores will be allocated dynamically instead of at instance startup. This allocation will not be discussed in this paper. Instance startup ================ On instance startup, the first things that the instance does is: -Read the "init<SID>.ora" -Start the background processes -Allocate the shared memory and semphores required The size of the SGA will be calculated from various "init.ora" parameters. This will be the amount of shared memory required. The SGA is broken into 4 sections - the fixed portion, which is constant in size, the variable portion, which varies in size depending on "init.ora" parameters, the redo block buffer, which has its size controlled by log_buffers, and the db

block buffer, which has its size controlled by db_block_buffers. The size of the SGA is the sum of the sizes of the 4 portions. There is unfortunately no simple formula for determining the size of the variable portion. Generally, the shared pool dominates all other parts of the variable portion, so as a rule of thumb, one can estimate the size as the value of shared_pool_size (in v6, one can ignore the size of the variable portion). The number of semphores required is much simpler to determine. Oracle will need exactly as many semaphores as the value of the processes "init.ora" parameter. Note that the recommended kernel parameter values in the ICG are enough to support the default database (4M SGA, 50 processes), but may be insufficient to run a larger instance. With the above estimations and the information which follows, a DBA should be able to build a kernel with appropriate settings to support the instance. Shared memory allocation ======================== Oracle has 3 different possible models for the SGA - one-segment, contiguous multi-segment, and non-contiguous multi-segment. When attempting to allocate and attach shared memory for the SGA, it will attempt each one, in the above order, until one succeeds or raises an ORA error. On other, non-fatal, errors, Oracle simply cleans up and tries again using the next memory model. The entire SGA must fit into shared memory, so the total amount of shared memory allocated under any model will be equal to the size of the SGA. This calculated value will be referred to below as SGASIZE. The one-segment model is the simplest and first model tried. In this model, the SGA resides in only one shared memory segment. Oracle attempts to allocate and attach one shared memory segement of size equal to total size of the SGA. However, if the SGASIZE is larger than the configured SHMMAX, this will obviously fail (with EINVAL). In this case, the SGA will need to be placed in multiple shared memory segments, and Oracle proceeds to the next memory model for the SGA. If an error other than EINVAL occurs when allocating the shared memory with shmget(), Oracle will raise an ORA-7306. If the segment was received (i.e. if SHMMAX > SGASIZE), Oracle attempts to attach it at the start address defined in ksms.o. An error on the attach will raise an ORA-7307. With multiple segments there are two possibilities. The segments can be attached contiguously, so that it appears to be one large shared memory segment, or non-contiguously, with gaps between the segments. The former wastes less space that could be used for the stack or heap, but depending on alignment requirements for shared memory (defined by SHMLBA in the kernel), it may not be possible. At this point, Oracle needs to determine SHMMAX so it can determine how many segments will be required. This is done via a binary search algorithm over the range [1...SGASIZE] (since Oracle is trying this model and not the one segment model it must be that SHMMAX<SGASIZE) The value of SHMMAX calculated will then be rounded to an even page size (on some machines, possibly to an even 2 or 4 page block).

In the contiguous segment model, Oracle simply divides the SGA into SGASIZE/SHMMAX (rounded down) segments of size SHMMAX plus another segment of size SGASIZE modulo SHMMAX. If more than SS_SEG_MAX segments are required total, an ORA-7329 is raised. It then allocates and attaches one segment at a time, attaching the first segment at the start address defined in "ksms.o". Subsequent segments are attached at an address equal to the previous segment's attach address plus the size of the previous segment so that they are contiguous in memory. For example, if SHMMAX is 2M, SGASIZE is 5M, and the start address is 0xd0000000, there would be 3 segments, 2 of 2M and 1 of 1M. They would be attached at 0xd0000000, 0xd0000800 (0xd0000000+2M), and 0xd0001000 (0xd0000800+2M). If Oracle receives an error allocating a shared memory segment, an ORA-7336 is raised. If an error is raised on attaching a shared memory segement, Oracle checks the system error returned. If it is EINVAL, the attach address used is most likely badly aligned (not a mulitple of SHMLBA). In this case, Oracle tries the next model for SGA allocation, non-contiguous segments. Otherwise, an ORA-7337 is raised. The last model Oracle will try is the non-contiguous model. Here, things become a bit more complicated. After calculating SHMMAX, Oracle first checks to see if it can put the fixed and variable portion into one shared memory segment just large enough to hold the two portions If it can, it allocates a segment just big enough to hold both portions. If it cannot, it will put them each into their own seperate segment just large enough to hold each portion. If the fixed portion is larger than SHMMAX, an ORA-7330 will be raised. If the variable portion is larger than SHMMAX, an ORA-7331 will be raised. Then Oracle computes the number of redo block buffers it can fit in a segment (rounded down to an integral number of buffers - buffers cannot overlap segments). An ORA-7332 is raised is SHMMAX is smaller than the size of a redo block. Similarly, the number of db block buffers per segment is calculated, with an ORA-7333 raised if SHMMAX is too small to hold one db block. Then Oracle can compute the total number of segments required for both the redo and database block buffers. This will be buffers/buffers per segment (rounded down) segments and one (if necessary) of buffers modulo buffers per segment size, calculated seperately for both the redo and db block buffers. These segments will be of a size just large enough to hold the buffers (so no space is wasted). The total number of segments allocated will then be the number needed for the fixed and variable portions (1 or 2) plus the number needed for the redo block buffers plus the number of segments needed for the database block buffers. If this requires more than SS_SEG_MAX segments, an ORA-7334 is raised. Once the number of segments and their sizes is determined, Oracle then allocates and attaches the segments one at a time; first the fixed and variable portion segment(s), then the redo block buffer segment(s), then the db block buffer segment(s). They will be attached non-contiguously, with the first segment attached at the start address in "ksms.o" and following segments being attached at the address equal to the attach address of the previous segment plus the size of the previous segment, rounded up to a mulitple of SHMBLA.

If Oracle receives an error allocating a shared memory segment, an ORA-7336 is raised. If an error is raised on attaching a shared memory segement, Oracle checks the system error returned. If it is EINVAL, normally another model would be tried, but as there are no more models to try, an ORA-7310 is raised. Other attach errors raise an ORA-7337. At this point, we have either attached the entire SGA or returned an ORA error. The total size of segments attached is exactly SGASIZE; no space is wasted. Once Oracle has the shared memory attached, Oracle proceeds to allocating the semaphores it requires. Semaphore allocation ==================== Semaphore allocation is much simpler than shared memory. Oracle just needs to allocate a number of semaphores equal to the processes parameter in "init.ora". PROCESSES will be used to refer to this value. Note on machines with a post-wait kernel extension, Oracle does not need to allocate semaphores (because it doesn't need to implement its own post-wait mechanism). Oracle uses semaphores to control concurrency between all the background processes (pmon, smon, dbwr, lgwr, and oracle shadows). Semaphores are also used to control two-task communication between the user process and shadow process if the fast (shared memory) driver is used. And in the Unix ports based on MIPS RISC processors, Oracle uses a special semaphore to perform basic test & set functions that are not provided by the processor. Typing "ipcs -sb" will show you what semaphores are allocated to your system at the moment. This will display all the semaphore sets allocated, their identifying number, the owner, the number of semaphores in each set, and more. Occasionally, unexpected termination of Oracle processes will leave semaphore resources locked. If your database is not running, but "ipcs -sb" shows that semaphore sets owned by oracle are still in use, then you need to deallocate (free) them. If you don't do this, then you may not be able to allocate enough semaphores later to restart your database. Freeing semaphore sets is done with the "ipcrm" command. For each set that oracle has allocated, type "ipcrm -s ID" where ID is the set number you see from the "ipcs" output. Semaphores can also be freed by rebooting the system. ORA-7250, ORA-7279, ORA-27146 If the environment variable ORANSEMS is set, Oracle will use that value as the number it will allocate per set. Oracle will attempt to allocate one set of size ORANSEMS. If this fails, an ORA-7250 is raised. If ORANSEMS is not set, Oracle tries to determine the maximum number of semaphores allowed per set (SEMMSL). It does this by first trying to allocate a set of PROCESSES semaphores. If this fails with EINVAL, it tries again, this time trying to get one fewer semaphore. If this fails an ORA-7279 or ORA-27146 on 8.1.X or higher is raised. This process continues until either the semget() succeeds, or when the number of semaphores Oracle is attempting to allocate drops to zero. Increase the kernel parameter SEMMNS if an ORA-7279 or ORA-27146 is generated.

ORA-7251 If the latter case occurs, an ORA-7251 will be raised. Now Oracle begins allocating sets of size SEMMSL (or ORANSEMS, as the case may be) until it has at least PROCESSES semaphores. ORA-7252, ORA-7339 All semaphore sets will be the same size, so if PROCESSES is not a multiple of SEMMSL (or ORANSEMS), there will be additional semaphores allocated that will not be used (or in other words, PROCESSES/SEMMSL, rounded up, sets of SEMMSL semaphores will be allocated). Should an error occur trying to allocate a semaphore set, ORA-7252 will be raised. If more than SS_SEM_MAX semaphore sets would be required, an ORA-7339 occurs. At this point, Oracle has either allocated at least PROCESSES semaphores or returned an ORA error. All IPC resources required by Oracle on Unix have been allocated and the related information can be written into the sgadef file for this instance for later use by other processes which connect to the instance. Connecting to an instance ========================= All shadow processes, when starting, attempt to attach the SGA. Shadows will be started whenever there is a logon attempt (the connect command includes an implicit logoff, so it produces a new shadow). The only exception is SQL*Dba in version 7 which immediately spawns a shadow process and where connect commands do not spawn a new shadow. Also, since SQL*Dba is used to start up the database, errors encountered in attempting to attach the SGA will be discarded because the SGA may not have been allocated yet. When a startup command is issued later, the SGA and semaphores will be allocated. Note that this applies only to version 7 and sqldba. What Oracle does when attempting to connect to the SGA depends on the version of Oracle. In version 6, the "sgadef<SID>.dbf" file is used to get the necessary information. In version 7, the SGA itself contains the information about the shared memory and semaphores (how the bootstrap works will be explained later). In either case, the information stored is the same - the key, id, size, and attach address of each shared memory segment and the key, id, and size of each semaphore set. Note that we need not do anything special to initialize the semaphores. We can use them with the data structure we read in on connecting. The version 6, approach is rather simple. It first tries to open the "sgadef<SID>.dbf" file. If it cannot, an ORA-7318 is raised. Once opened, the data written earlier on startup is read. If an error occurs for some reason on the read, an ORA-7319 occurs. Once all the data is read in, Oracle attaches each segment in turn. First, it generates what it believes the key for the segment should be. It then gets that segment, returning ORA-7429 if it fails. The key used and the key stored are then compared. They should be equal, but if not, an ORA-7430 occurs. Once the key is verified, the segment is attached. A failure to attach the segment raises an ORA-7320. If the segment is attached, but not at the address we requested, an ORA-7321 occurs. This process is repeated for all segments until the entire SGA is attached.

Version 7 differs only in the first part, when the shared memory and semaphore data is read. Once that data is read in, Oracle proceeds in the same manner. To fetch this data, Oracle generates what it thinks should be the key for the first segment of the SGA and attaches it as if it were the only segment. Once it is attached, the data is copied from the SGA. With this data, Oracle attaches any remaining segments for the SGA. There is one possible problem. If somehow two instances have a key collision (i.e. they both generate the same key for their first segment), it is possible to only have one of the two instances up at a time! Connection attempts to either one will connect a user to whichever instance is up. This is rare, but can happen. Development is currently working on a better key generation algorithm. Attaching shared memory ======================= As seen in previous sections, shared memory must be received (this may mean allocating the shared memory, but not necessarily) and then attached, to be used. Attaching shared memory brings the shared memory into the process' memory space. There are some important things about attach addresses. For one thing, they may need to be alligned on some boundary (generally defined by SHMLBA). More importantly, shared memory must mapped to pages in the process' memory space which are unaccounted for. Every process already has a text, a data, and a stack segment laid out as follows (in general): +---------+ high addresses | stack | |---------| -+ | | | | | v | | |---------| | | shm seg | |- unused portion |---------| | These are valid pages for shared memory | ^ | | Pages are allocated from this area | | | | as both the stack and heap(data) grow |---------| -+ | data | |---------| | text | +---------+ low addresses So, valid attach addresses lie in the unused region between the stack and the data segments (a shared memory segment is drawn in the diagram to aid in visualization - not every process has shared memory attached!). Of course, the validity also depends on the size of the segment, since it cannot overlap another segment. Note that both the stack and data segments can grow during the life of a process. Because segments must be contiguous and overlapping is not allowed, this is of some importance. Attaching shared memory creates a limit on how much the stack or data segment can grow. Limiting the stack is typically not a problem, except when running deeply recursive code. Neither is limiting the data segment, but this does

restrict the amount memory that can be dynamically allocated by a program. It is possible (but seldom) that some applications running against the database may hit this limit in the shadow (since the shadow has the SGA attached). This is the cause of ORA-7324 and ORA-7325 errors. How to deal with these is discussed in the troubleshooting section. The SGA is attached, depending on the allocation model used, more or less contiguously (there may be gaps, but those can be treated as if they were part of the shared memory). So where the beginning of the SGA can be attached depends on the SGA's size. The default address which is chosen by Oracle is generally sufficient for most SGAs. However, it may be necessary to relocate the SGA for very large sizes. It may also need to be changed if ORA-7324 or ORA-7325 errors are occuring. The beginning attach address is defined in the file "ksms.s". Changing the attach address requires recompilation of the Oracle kernel and should not be done without first consulting Oracle personnel. Unfortunately, there is no good way to determine what a good attach address will be. When changing the address to allow a larger SGA, a good rule of thumb is taking the default attach address in "ksms.s" and subtracting the size of the SGA. The validity of an attach address can be tested with the Oracle provided tstshm executable. Using: tstshm -t <size of SGA> -b <new attach address> will determine if the address is usable or not. Troubleshooting =============== Errors which might have multiple causes are discussed in this sections. Errors not mentioned here generally have only one cause which has a typically obvious solution. ORA-7306, ORA-7336, ORA-7329 Oracle received a system error on a shmget() call. The system error should be reported. There are a few possibilities: 1) There is insufficient shared memory available. This is indicated by the operating system error ENOSPC. Most likely, SHMMNI is too small. Alternatively, there may shared memory already allocated; if it is not attached, perhaps it can be freed. Maybe shared memory isn't configured in the kernel. 2) There is insufficient memory available. Remember, shared memory needs pages of virtual memory. The system error ENOMEM indicates there is insufficient virtual memory. Swap needs to be increased, either by adding more or by freeing currently used swap (i.e. free other shared memory, kill other processes) 3) The size of the shared memory segment requested is invalid. In this case, EINVAL is returned by the system. This should be very rare - however, it is possible. This can occur if SHMMAX is not a mulitple of page size and Oracle is trying a multi-segment model. Remember that Oracle rounds its calculation of SHMMAX to a page boundary, so it may have

rounded it up past the real SHMMAX! (Whether this is a bug is debatable). 4) The shared memory segment does not exist. This would be indicated by the system error ENOENT. This would never happen on startup; it only would happen on connects. The shared memory most likely has been removed unexpectedly by someone or the instance is down. ORA-7307, ORA-7337, ORA-7320 Oracle received a system error on a shmat() call. reported. There a a few possibilities: The system should be

1) The attach address is bad. If this is the cause, EINVAL is returned by the system. Refer to the section on the attach address to see why the attach address might be bad. This may happen after enlarging the SGA. 2) The permissions on the segment do not allow the process to attach it. The operating system error will be EACCES. Generally the cause of this is either the setuid bit is not turned on for the oracle executable, or root started the database (and happens to own the shared memory). Normally, this would be seen only on connects. 3) The process cannot attach any more shared memory segments. This would be accompanieed by the system error EMFILE. SHMSEG is too small. Note that as long as SHMSEG is greater than SS_SEG_MAX, you should never see this happen. ORA-7329, ORA-7334 Oracle has determined the SGA needs too many shared memory segments. Since you can't change the limit on the number of segments, you should instead increase SHMMAX so that fewer segments are required. ORA-7339 Oracle has determined it needs too many semaphore sets. Since you can't change the limit on the number of semaphore sets, you should increase SEMMSL so fewer sets are required. ORA-7250, ORA-7279, ORA-7252, ORA-27146 Oracle received a system error on a semget() call. The system error should be reported. There should be only one system error ever returned with this, ENOSPC. This can mean one of two things. Either the system limit on sempahore sets has been reached or the system limit on the total number of semaphores has been reached. Raise SEMMNI or SEMMNS, as is appropriate, or perhaps there are some semaphore sets which can be released. In the case of ORA-7250, ORANSEMS may be set too high (>SEMMSL). If it is, raise SEMMSL or decrease ORANSEMS. ORA-7251 Oracle failed to allocate even a semaphore set of only one semaphore. likely that semaphores are not configured in the kernel. ORA-7318 Oracle could not open the sgadef file. The system error number will be returned. There are a few possible causes: 1) The file doesn't exist. In this case, the system error ENOENT is returned. Maybe ORACLE_SID or ORACLE_HOME is set wrong so that Oracle It is

is looking in the wrong place. Possibly the file does not exist (in this case, a restart is necessary to allow connections again). 2) The file can't be accessed for reading. The with this is EACCES. The permissions on the directories) don't allow an open for reading might not be owned by the oracle owner. The turned on for the oracle executable. operating system error returned file (or maybe of the sgadef file. It setuid bit might not be

ORA-7319 Oracle did not find all the data it expected when reading the sgadef<SID>.dbf file. Most likely the file has been truncated. The only recovery is to restart the instance. ORA-7430 Oracle expected a key to be used for the segment which does not match the key stored in the shared memory and semaphore data structure. This probably indicates a corruption of the sgadef file (in version 6) or the data in the first segment of the SGA (in version 7). A restart of the instance is probably necessary to recover in that case. It may also be a key collision problem and Oracle is attached to the wrong instance. ORA-7321 Oracle was able to attach the segment, but not at the address it requested. In most cases, this would be caused by corrupted data in the sgadef file (in version 6) or the first segment of the SGA (in version 7). A restart of the database may be necessary to recover. ORA-7324, ORA-7325 Oracle was unable to allocate memory. Most likely, the heap (data segment) has grown into the bottom of the SGA. Relocating the SGA to a higher attach address may help, but there may be other causes. Memory leaks can cause this error. The init.ora parameter sort_area_size may be too large, decreasing it may resolve the error. The init.ora parameter context_incr may also be too large, decreasing it may resolve this ORA-7264, ORA-7265 Oracle was unable to decrement/increment a semaphore. This generally is accompanied by the system error EINVAL and a number which is the identifier of the semaphore set. This is almost always because the semaphore set was removed, but the shadow process was not aware of it (generally due to a shutdown abort or instance crash). This error is usually ignorable. System Parameters ================= SHMMAX - kernel parameter controlling segment SHMMHI - kernel parameter controlling in the system SHMSEG - kernel parameter controlling a process can attach SEMMNS - kernel parameter controlling the system SEMMNI - kernel parameter controlling sets. Semphores in Unix are maximum size of one shared memory maximum number of shared memory segments maximum number of shared memory segments maximum number of semphores in maximum number of semaphore allocated in sets of 1 to SEMMSL.

SEMMSL - kernel parameter controlling maximum number of semaphores in a semphore set. SHMLBA - kernel parameter controlling alignment of shared memory segments; all segments must be attached at multiples of this value. Typically, non-tunable. System errors ============= ENOENT ENOMEM EACCES EINVAL EMFILE ENOSPC No such file or directory, system error 2 Not enough core, system error 12 Permission denied, system error number 13 Invalid argument, system error number 22 Too many open files, system error number 24 No space left on device, system error number 28

Oracle parameters ================= SS_SEG_MAX - Oracle parameter specified at compile time (therefore, unmodifiable without an Oracle patch) which defines maximum number of segements the SGA can reside in. Normally set to 20. SS_SEM_MAX - Oracle parameter specified at compile time (therefore, unmodifiable without an Oracle patch) which defined maximum number of semaphore sets oracle will allocate. Normally set to 10. .

Calculating Oracle's SEMAPHORE Requirements: ============================================ Semaphores should be allocated for a system as follows: 1. 2. For each database 'instance' you wish to run list out the 'processes' parameter from the "init<SID>.ora" file. For MIPS based machines only: Add 1 to each of these figures. Keep this list of figures for use as the 'ORAPROC' parameter in step 4. Sum these figures. The figure you have is the number of semaphores required by Oracle to start ALL databases. Add to this any other system requirements and ensure SEMMNS is AT LEAST this value. Ie: SEMMNS >= SUM of 'processes' for all Databases + 1 per database (MIPS only) + other system requirements. 4. Semaphores are allocated by Unix in 'sets' of up to SEMMSL semaphores per set. You can have a MAXIMUM of SEMMNI sets on the system at any one time. SEMMSL is an arbitrary figure which is best set to a round figure no smaller that the smallest 'processes' figure for any database on the system. This is not a requirement

3.

though. Note that SEMMSL is not used on all unix platforms. Eg: HPUX does not have a SEMMSL limit on the number of semaphores in any one set. To determine Oracle requirements for SEMMNI: Take each figure from step 2 & substitute it for ORAPROC below: Sets required for Instance = (ORAPROC / SEMMSL) rounded UP. Sum these figures for all instances. This gives you Oracle's SEMMNI requirement. Add to this any other system requirements. System requirements are generally 10% above what Oracle requires, however you need to take into account any other programs that require semaphores. 5. 6. On MIPS systems SEMMNU should be set at least equal to SEMMNS. Oracle 8.0.x and 8.1.x try to allocate twice as many semaphores as are in the "init<SID>.ora" file on startup. For example, if processes = 200, Oracle will need 400 to startup the SGA. This needs to be part of your calculations. Example: If you have 3 databases and the "init.ora" files have 100 150 and 200 processes allocated for each database then you would add up the three numbers 100 +150+200 =450 and an extra 10 processes per database 450 +30 = 480. You would need to set SEMMNS to at least twice this number (480 *2 = 960 semmns =960). Recap: ~~~~~~ SEMMNS SEMMNI SEMMSL . SEMMNU

total semaphores available on the system as a whole maximum number of SETs of semaphores (number of identifiers) Some platforms only. Limits the maximum number of semaphores available in any one set. Number of Undo structures.

TECH: Unix Virtual Memory, Paging & Swapping explained Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 23-DEC-1994 Last Revision Date: 25-OCT-2000 ==================================================================== Understanding and measuring memory usage on UNIX operating systems. ==================================================================== When planning an Oracle installation, it is often necessary to plan for memory requirements. To do this, it is necessary to understand how the

UNIX operating system allocates and manages physical and virtual memory among the processes on the system. -----------------------------I. Virtual memory and paging -----------------------------Modern UNIX operating systems all support virtual memory. Virtual memory is a technique developed around 1961 which allows the size of a process to exceed the amount of physical memory available for it. (A process is an instance of a running program.) Virtual memory also allows the sum of the sizes of all processes on the system to exceed the amount of physical memory available on the machine. (Contrast this with a system running MS-DOS or Apple MacIntosh, in which the amount of physical memory limits both the size of a single process and the total number of simultaneous processes.) A full discussion of virtual memory is beyond the scope of this article. The basic idea behind virtual memory is that only part of a particular process is in main memory (RAM), and the rest of the process is stored on disk. In a virtual memory system, the memory addresses used by programs do not refer directly to physical memory. Instead, programs use virtual addresses, which are translated by the operating system and the memory management unit (MMU) into the physical memory (RAM) addresses. This scheme works because most programs only use a portion of their address space at any one time. Modern UNIX systems use a paging-based virtual memory system. In a paging-based system, the virtual address space is divided up into equal-sized chunks called pages. The actual size of a single page is dependent on the particular hardware platform and operating system being used: page sizes of 4k and 8k are common. The translation of virtual addresses to physical addresses is done by mapping virtual pages to physical pages. When a process references a virtual address, the MMU figures out which virtual page contains that address, and then looks up the physical page which corresponds to that virtual page. One of two things is possible at this point: either the physical page is loaded into RAM, or it is on disk. If the physical page is in RAM, the process uses it. If the physical page is on disk, the MMU generates a page fault. At this point the operating system locates the page on disk, finds a free physical page in RAM, copies the page from disk into RAM, tells the MMU about the new mapping, and restarts the instruction that generated the page fault. Note that the virtual-to-physical page translation is invisible to the process. The process "sees" the entire virtual address space as its own: whenever it refers to an address, it finds memory at that address. All translation of virtual to physical addresses and all handling of page faults is performed on behalf of the process by the MMU and the operating system. This does not mean that taking a page fault has no effect. Since handling a page fault requires reading the page in from disk, a process that takes a lot of page faults will run much slower than one that does not. In a virtual memory system, only a portion of a process's virtual address space is mapped into RAM at any particular time. In a paging-based system, this notion is formalized as the working set of a

process. The working set of a process is simply the set of pages that the process is using at a particular point in time. The working set of a process will change over time. This means that some page faulting will occur, and is normal. Also, since the working set changes over time, the size of the working set changes over time as well. The operating system's paging subsystem tries to keep all the pages in the process's working set in RAM, thus minimizing the number of page faults and keeping performance high. By the same token, the operating system tries to keep the pages not in the working set on disk, so as to leave the maximum amount of RAM available for other processes. Recall from above that when a process generates a page fault, the operating system must read the absent page into RAM from disk. This means that the operating system must choose which page of RAM to use for this purpose. In the general case, there may not be a free page of physical RAM, and the operating system will have to read the data for the new page into a physical page that is already in use. The choice of which in-use page to replace with the new data is called the page replacement policy. Entire books have been written on various page replacement policies and algorithms, so a full discussion of them is beyond the scope of this article. It is important to note, however, that there are two general classes of page replacement policy: local and global. In a local page replacement policy, a process is assigned a certain number of physical pages, and when a page fault occurs the operating system finds a free page within the set of pages assigned to that process. In a global page replacement policy, when a page fault occurs the operating system looks at all processes in the system to find a free page for the process. There are a number of key points to understand about paging. (1) Typically, only a relatively small number of pages (typically 10% 50%) of a single process are in its working set (and therefore in physical memory) at any one time. (2) The location of physical pages in RAM bears no relation whatever to the location of pages in any process's virtual address space. (3) Most implementations of paging allow for a single physical page to be shared among multiple processes. In other words, if the operating system can determine that the contents of two (or more) virtual pages are identical, only a single physical page of RAM is needed for those virtual pages. (4) Since working set sizes change over time, the amount of physical memory that a process needs changes over time as well. An idle process requires no RAM; if the same process starts manipulating a large data structure (possibly in response to some user input) its RAM requirement will soar. (5) There exists a formal proof that it is impossible to determine working set sizes from a static analysis of a program. You must run a program to determine its working set. If the working set of the program varies according to its input (which is almost always the case) the working sets of two processes will be different if the processes have different inputs.

--------------------------II. Virtual memory on Unix --------------------------The discussion above of virtual memory and paging is a very general one, and all of the statements in it apply to any system that implements virtual memory and paging. A full discussion of paging and virtual memory implementation on UNIX is beyond the scope of this article. In addition, different UNIX vendors have implemented different paging subsystems, so you need to contact your UNIX vendor for precise information about the paging algorithms on your UNIX machine. However, there are certain key features of the UNIX paging system which are consistent among UNIX ports. Processes run in a virtual address space, and the UNIX kernel transparently manages the paging of physical memory for all processes on the system. Because UNIX uses virtual memory and paging, typically only a portion of the process is in RAM, while the remainder of the process is on disk. 1) The System Memory Map The physical memory on a UNIX system is divided among three uses. Some portion of the memory is dedicated for use by the operating system kernel. Of the remaining memory, some is dedicated for use by the I/O subsystem (this is called the buffer cache) and the remainder goes into the page pool. Some versions of UNIX statically assign the sizes of system memory, the buffer cache, and the page pool, at system boot time; while other versions will dynamically move RAM between these three at run time, depending on system load. (Consult your UNIX system vendor for details on your particular version of UNIX.) The physical memory used by processes comes out of the page pool. In addition, the UNIX kernel allocates a certain amount of system memory for each process for data structures that allow it to keep track of that process. This memory is typically not more than a few pages. If your system memory size is fixed at boot time you can completely ignore this usage, as it does not come out of the page pool. If your system memory size is adjusted dynamically at run-time, you can also typically ignore this usage, as it is dwarfed by the page pool requirements of Oracle software. 2) Global Paging Strategy

UNIX systems implement a global paging strategy. This means that the operating system will look at all processes on the system when is searching for a page of physical memory on behalf of a process. This strategy has a number of advantages, and one key disadvantage. The advantages of a global paging strategy are: (1) An idle process can be completely paged out so it does not hold memory pages that can be better used by another process. (2) A global strategy allows for a better utilization of system memory; each process's page allocations will be closer to their actual working set size. (3) The administrative overhead of managing process or user page quotas is completely

absent.

(4) The implementation is smaller and faster.

The disadvantage of a global strategy is that is is possible for a single ill-behaved process to affect the performance of all processes on the system, simply by allocating and using a large number of pages. 3) Text and Data Pages

A UNIX process can be conceptually divided into two portions; text and data. The text portion contains the machine instructions that the process executes; the data portion contains everything else. These two portions occupy different areas of the process's virtual address space. Both text and data pages are managed by the paging subsystem. This means that at any point in time, only some of the text pages and only some of the data pages of any given process are in RAM. UNIX treats text pages and data pages differently. Since text pages are typically not modified by a process while it executes, text pages are marked read-only. This means that the operating system will generate an error if a process attempts to write to a text page. (Some UNIX systems provide the ability to compile a program which does not have read-only text: consult the man pages on 'ld' and 'a.out' for details.) The fact that text pages are read-only allows the UNIX kernel to perform two important optimizations: text pages are shared between all processes running the same program, and text pages are paged from the filesystem instead of from the paging area. Sharing text pages between processes reduces the amount of RAM required to run multiple instances of the same program. For example, if five processes are running Oracle Forms, only one set of text pages is required for all five processes. The same is true if there are fifty or five hundred processes running Oracle Forms. Paging from the filesystem means that no paging space needs to be allocated for any text pages. When a text page is paged out it is simply over-written in RAM; if it is paged in at a later time the original text page is available in the program image in the file system. On the other hand, data pages must be read/write, and therefore cannot (in general) be shared between processes. This means that each process must have its own copy of every data page. Also, since a process can modify its data pages, when a data page is paged out it must be written to disk before it is over-written in RAM. Data pages are written to specially reserved sections of the disk. For historical reasons, this paging space is called "swap space" on UNIX. Don't let this name confuse you: the swap space is used for paging. 4) Swap Space Usage The UNIX kernel is in charge of managing which data pages are in RAM and which are in the swap space. The swap space is divided into swap pages, which are the same size as the RAM pages. For example, if a particular system has a page size of 4K, and 40M devoted to swap space, this swap space will be divided up into 10240 swap pages. A page of swap can be in one of three states: it can be free, allocated, or used. A "free" page of swap is available to be allocated as a disk page. An "allocated" page of swap has been allocated to be the disk

page for a particular virtual page in a particular process, but no data has been written to the disk page yet -- that is, the corresponding memory page has not yet been paged out. A "used" page of swap is one where the swap page contains the data which has been paged out from RAM. A swap page is not freed until the process which "owns" it frees the corresponding virtual page. On most UNIX systems, swap pages are allocated when virtual memory is allocated. If a process requests an additional 1M of (virtual) memory, the UNIX kernel finds 1M of pages in the swap space, and marks those pages as allocated to a particular process. If at some future time a particular page of RAM must be paged out, swap space is already allocated for it. In other words, every virtual data page is "backed with" a page of swap space. An important consequence of this strategy is if all the swap space is allocated, no more virtual memory can be allocated. In other words, the amount of swap space on a system limits the maximum amount of virtual memory on the system. If there is no swap space available, and a process makes a request for more virtual memory, then the request will fail. The request will also fail if there is some swap space available, but the amount available is less than the amount requested. There are four system calls which allocate virtual memory: these are fork(), exec(), sbrk(), and shmget(). When one of these system calls fails, the system error code is set to EAGAIN. The text message associated with EAGAIN is often "No more processes". (This is because EAGAIN is also used to indicate that the per-user or system-wide process limit has been reached.) If you ever run into a situation where processes are failing because of EAGAIN errors, be sure to check the amount of available swap as well as the number of processes. If a system has run out of swap space, there are only two ways to fix the problem: you can either terminate some processes (preferably ones that are using a lot of virtual memory) or you can add swap space to your system. The method for adding swap space to a system varies between UNIX variants: consult your operating system documentation or vendor for details. 5) Shared Memory UNIX systems implement, and the Oracle server uses, shared memory. In the UNIX shared memory implementation, processes can create and attach shared memory segments. Shared memory segments are attached to a process at a particular virtual address. Once a shared memory segment is attached to a processes, memory at that address can be read from and written to, just like any other memory in the processes address space. Unlike "normal" virtual memory, changes written to an address in the shared memory segment are visible to every process that has attached to that segment. Shared memory is made up of data pages, just like "conventional" memory. Other that the fact that multiple processes are using the same data pages, the paging subsystem does not treat shared memory pages any differently than conventional memory. Swap space is reserved for a shared memory segment at the time it is allocated, and the pages of memory in RAM are subject to being paged out if they are not in use, just like regular data pages. The only difference between the

treatment of regular data pages and shared data pages is that shared pages are allocated only once, no matter how many processes are using the shared memory segment. 6) Memory Usage of a Process When discussing the memory usage of a process, there are really two types of memory usage to consider: the virtual memory usage and the physical memory usage. The virtual memory usage of a process is the sum of the virtual text pages allocated to the process, plus the sum of the virtual data pages allocated to the process. Each non-shared virtual data page has a corresponding page allocated for it in the swap space. There is no system-wide limit on the number of virtual text pages, and the number of virtual data pages on the system is limited by the size of the swap space. Shared memory segments are allocated on a system-wide basis rather than on a per-process basis, but are allocated swap pages and are paged from the swap device in exactly the same way as non-shared data. The physical memory usage of a process is the sum of the physical text pages of that process, plus the sum of the physical data pages of that process. Physical text pages are shared among all processes running the same executable image, and physical data pages used for shared memory are shared among among all processes attached to the same shared memory segment. Because UNIX implements virtual memory, the physical memory usage of a process will be lower than the virtual memory usage. The actual amount of physical memory used by a process depends on the behavior of the operating system paging subsystem. Unlike the virtual memory usage of a process, which will be the same every time a particular program runs with a particular input, the physical memory usage of a process depends on a number of other factors. First: since the working set of a process changes over time, the amount of physical memory needed by the process will change over time. Second: if the process is waiting for user input, the amount of physical memory it needs will drop dramatically. (This is a special case of the working set size changing.) Third: the amount of physical memory actually allocated to a process depends on the overall system load. If a process is being run on a heavily loaded system, then the global page allocation policy will tend to make the number of physical memory pages allocated to that process to be very close to the size of the working set. If the same program is run with the same input on a lightly loaded system, the number of physical memory pages allocated to that process will tend to be much larger than the size of the working set: the operating system has no need to reclaim physical pages from that process, and will not do so. The net effect of this is that any measure of physical memory usage will be inaccurate unless you are simulating both the input and the system load of the final system you will be testing. For example, the physical memory usage of a Oracle Forms process will be very different if a user is rapidly moving between 3 large windows, infrequently moving between the same three windows, rapidly typing into a single window, slowly typing into the same window, or if they are reading data off of the screen and the process is sitting idle -- even though the

virtual memory usage of the process will remain the same. By the same token, the physical memory usage of an Oracle Forms process will be different if it is the only active process on a system, or if it is one of fifty active Oracle Forms processes on the same system. 7) Key Points There are a number of key points to understand about the UNIX virtual memory implementation. (1) Every data page in every process is "backed" by a page in the swap space. This size of the swap space limits the amount of virtual data space on the system; processes are not able to allocate memory if there is not enough swap space available to back it up, regardless of how much physical memory is available on the system. (2) UNIX implements a global paging strategy. This means that the amount of physical memory allocated to a process varies greatly over time, depending on the size of the process's working set and the overall system load. Idle processes may be paged out completely on a busy system. On a lightly loaded system processes may be allocated much more physical memory than they require for their working sets. (3) The amount of virtual memory available on a system is determined by the amount of swap spaces configured for that system. The amount of swap space needed is equal to the sum of the virtual data allocated by all processes on the system at the time of maximum load. (4) Physical memory is allocated for processes out of the page pool, which is the memory not allocated to the operating system kernel and the buffer cache. The amount of physical memory needed for the page pool is equal to the sum of the physical pages in the working sets of all processes on the system at the time of maximum load. ---------------------------------III. Process Memory Layout on UNIX ---------------------------------1) The Segments of a Process The discussion above speaks of a UNIX process as being divided up into two regions: text and data. This division is accurate for discussions of the paging subsystem, since the paging subsystem treats every non-text page as a data page. In fact, a UNIX process is divided into six segments: text, stack, heap, BSS, initialized data, and shared memory. Each of these segments contains a different type of information and is used for a different purpose. The text segment is used to store the machine instructions that the process executes. The pages that make up the text segment are marked read-only and are shared between processes that are running the same executable image. Pages from the text segment are paged from the executable image in the filesystem. The size of the text segment is fixed at the time that the program is invoked: it does not grow or shrink during program execution. The stack segment is used to store the run-time execution stack. The run-time program stack contains function and procedure activation

records, function and procedure parameters, and the data for local variables. The pages that make up the stack segment are marked read/write and are private to the process. Pages from the stack segment are paged into the swap device. The initial size of the stack segment is typically one page; if the process references an address beyond the end of the stack the operating system will transparently allocate another page to the stack segment. The BSS segment is used to store statically allocated uninitialized data. The pages that make up the BSS segment are marked read/write, are private to the process, and are initialized to all-bits-zero at the time the program is invoked. Pages from the BSS segment are paged into the swap device. The size of the BSS segment is fixed at the time the program is invoked: it does not grow or shrink during program execution. The initialized data segment is used to store statically allocated initialized data. The pages that make up the initialized data segment are marked read/write, and are private to the process. Pages from the initialized data segment are initially read in from the initialized data in the filesystem; if they have been modified they are paged into the swap device from then on. The size of the initialized data segment is fixed at the time the program is invoked: it does not grow or shrink during program execution. The dynamically allocated data segment (or "heap") contains data pages which have been allocated by the process as it runs, using the brk() or sbrk() system call. The pages that make up the heap are marked read/write, are private to the process, and are initialized to all-bits-zero at the time the page is allocated to the process. Pages from the heap are paged into the swap device. At program startup the heap has zero size: it can grow arbitrarily large during program execution. Most processes do not have a shared data segment. In those that do, the shared data segment contains data pages which have been attached to this process using the shmat() system call. Shared memory segments are created using the shmget() system call. The pages that make up the shared data segment are marked read/write, are shared between all processes attached to the shared memory segment, and are initialized to all-bits-zero at the time the segment is allocated using shmget(). Pages from the shared data segment are paged into the swap device. Shared memory segments are dynamically allocated by processes on the system: the size of a shared memory segment is fixed at the time it is allocated, but processes can allocate arbitrarily large shared memory segments. 2) Per-Process Memory Map

The six segments that comprise a process can be laid out in memory in any arbitrary way. The exact details of the memory layout depend on the architecture of the CPU and the design of the particular UNIX implementation. Typically, a UNIX process uses the entire virtual address space of the processor. Within this address space, certain addresses are legal, and are used for particular segments. Addresses outside of any segment are illegal, and any attempt to read or write to them will generate a 'Segmentation Violation' signal.

The diagram below shows a typical UNIX per-process virtual memory map for a 32-bit processor. Note that this memory map covers the entire virtual address space of the machine. In this diagram, regions marked with a 't' are the text segment, 's' indicates the stack segment, 'S' the shared memory segment, 'h' the heap, 'd' the initialized data, and 'b' the BSS. Blank spaces indicate illegal addresses. +--------+-----+--------+----+---------------------+-------+----+----+ |tttttttt|sssss| |SSSS| |hhhhhhh|dddd|bbbb| |tttttttt|sssss| ->> |SSSS| <<- |hhhhhhh|dddd|bbbb| |tttttttt|sssss| |SSSS| |hhhhhhh|dddd|bbbb| +--------+-----+--------+----+---------------------+-------+----+----+ 0 2G In this particular implementation, the text segment occupies the lowest virtual addresses, and the BSS occupies the highest. Note that memory is layed out in such a way as to allow the stack segment and the heap to grow. The stack grows "up", toward higher virtual addresses, while the heap grows "down", toward lower virtual addresses. Also note that the placement of the shared memory segment is critical: if it is attached at too low of an address it will prevent the stack from growing, and if it is attached at too high of an address it will prevent the heap from growing. 3) Process size limits All UNIX systems provide some method for limiting the virtual size of a process. Note that these limits are only on virtual memory usage: there is no way to limit the amount of physical memory used by a process or group of processes. On systems that are based on SVR3, there is a system-wide limit on the virtual size of the data segment. Changing this limit typically requires you to change a UNIX kernel configuration parameter and relink the kernel: check your operating system documentation for details. On systems that are based on BSD or SVR4, there is a default limit on the size of the stack segment and the data segment. It is possible to change these limits on a per-process basis; consult the man pages on getrlimit() and setrlimit() for details. If you are using the C-shell as your login shell the 'limit' command provides a command-line interface to these system calls. Changing the system-wide default typically requires that you change a UNIX kernel configuration parameter and relink the kernel: check your operating system documentation for details. Most systems also provide a way to control the maximum size and number of shared memory segments: this typically involves changing the UNIX kernel parameters SHMMAX, SHMSEG and SHMMNI. Again, consult your operating system documentation for details. 4) The High-Water-Mark Effect Recall from above that the size of the data segment can only be changed by using the brk() and sbrk() system calls. These system calls allow you to either increase or decrease the size of the data segment. However, most programs, including Oracle programs, do not use brk() or sbrk() directly. Instead, they use a pair of library functions

provided by the operating system vendor, called malloc() and free(). These two functions are used together to manage dynamic memory allocation. The two functions maintain a pool of free memory (called the arena) for use by the process. They do this by maintaining a data structure that describe which portions of the heap are in use and which are available. When the process calls malloc(), a chunk of memory of the requested size is obtained from the arena and returned to the calling function. When the process calls free(), the previously-allocated chunk is returned to the arena making it available for use by a later call to malloc(). If a process calls malloc() with a request that is larger than the largest free chunk currently in the arena, malloc() will call sbrk() to enlarge the size of the arena by enlarging the heap. However, most vendor's implementations of free() will not shrink the size of the arena by returning memory to the operating system via sbrk(). Instead, they simply place the free()d memory in the arena for later use. The result of this implementation is that processes which use the malloc() library exhibit a high-water-mark effect: the virtual sizes of the processes grow, but do not shrink. Once a process has allocated virtual memory from the operating system using malloc(), that memory will remain part of the process until it terminates. Fortunately, this effect only applies to virtual memory; memory returned to the arena is quickly paged out and is not paged in until it is re-allocated via malloc(). ------------------------IV. Monitoring Memory Use ------------------------In the final analysis, there are only two things to be concerned with when sizing memory for a UNIX system: do you have enough RAM, and do you have enough swap space? In order to answer these questions, it is necessary to know how much virtual memory and how much physical memory each process on the system is using. Unfortunately, the standard UNIX process monitoring tools do not provide a way to reliably determine these figures. The standard tools for examining memory usage on a UNIX system are 'size', 'ipcs', 'ps', 'vmstat' and 'pstat'. Most SYSV-derived systems will also have the 'crash' utility: most BSD-derived systems will allow you to run 'dbx' against the UNIX kernel. The 'size' utility works by performing a static analysis of the program image. It prints out the virtual memory size of the text, BSS and initialized data segments. It does not attempt to determine the size of the stack and the heap, since both of these sizes can vary greatly depending on the input to the program. Since the combined size of the stack and the heap is typically several hundred times larger than than the combined size of the BSS and the initialized data, this method is the single most unreliable method of determining the runtime virtual memory requirement of a program. It is also the method used in the ICG to determine memory requirements for Oracle programs. The one useful piece of information you can obtain from 'size' is the virtual size of the text segment. Since the text segment is paged from the filesystem, knowing the virtual size of the text segment will not help you size either swap space or RAM.

The 'ipcs' utility will print out the virtual memory size of all the shared memory segments on the system. Use the '-mb' flags to have it print the size of the segments under the SEGSZ column. The 'ps' utility will print out information about any process currently active on the system. On SYSV-based systems, using 'ps' with the '-l' will cause 'ps' to print out the SZ field, which contains the virtual size of the process's non-text segments, measured in pages. On BSD-based systems, using 'ps' with the '-u' flag will also cause the SZ field to be printed. While this figure is an accurate measure of the virtual memory being used by this process, it is not accurate if the process has attached a shared memory segment. This means that when sizing memory, you must subtract the size of the SGA (obtained via 'ipcs', above) from the virtual memory used by all of the Oracle background and shadow processes. On SVR4-based and BSD-based systems, using the BSD-style 'ps' command with the '-u' flag will also cause the RSS field to be printed. This field contains the physical memory usage for the process. Unfortunately, this value is the combined physical memory usage for all the segments of the process, and does not distinguish between pages private to the process and pages shared between processes. Since text and shared data pages are shared between processes, this means that adding up the RSS sizes of all processes on the system will over-estimate the amount of physical memory being used by the system. This also means that if you add up the RSS fields for all the processes on the system you may very well come up with a number larger than the amount of RAM on your system! While the RSS field is a good indicator of how much RAM is required when there is only one process running a program image, it does not tell you how much additional RAM is required when a second process runs that same image. The 'pstat' utility is also used to print per-process information. If it has a SZ or RSS field, the same limitations that apply to 'ps' output also apply to 'pstat' output. On some versions of UNIX, 'pstat' invoked with a flag (typically '-s' or '-T') will give you information about swap space usage. Be careful! Some UNIX versions will only print out information about how much swap space that is used, and not about how much has been allocated. On those machines you can run out of swap, and 'pstat' will still tell you that you have plenty of swap available. The 'vmstat' utility is used to print out system-wide information on the performance of the paging subsystem. Its major limitation is that it does not print out per-process information. The format of 'vmstat' output varies between UNIX ports: the key fields to look at are the ones that measure the number of page-in and page-out events per second. Remember that some paging activity is normal, so you will have to decide for yourself what number of pages-in or pages-out per second means that your page pool is too small. On SYSV-based systems, the 'sar' utility is used to print out system-wide information on the performance of a wide variety of kernel subsystems. Like 'vmstat', its major limitation is that it does not print out per-process information. The '-r', '-g', and '-p' options are the most useful for examining the behavior of the paging subsystem.

On SYSV-based systems, the 'crash' utility lets you directly examine the contents of the operating system kernel data structures. On BSD-based systems, it is usually possible to use a kernel debugger to examine these same data structures. These data structures are always hardware- and operating system-specific, so you will not only need a general knowledge of UNIX internals, but you will also need knowledge of the internals of that particular system. However, if you have this information (and a lot of patience) it is possible to get 'crash' to give you precise information about virtual and physical memory usage on a per-process basis. Finally, there are a variety of public domain and vendor-specific tools for monitoring memory usage. Remember: you are looking for a utility that lets you measure the physical memory usage of a process, and which gives you separate values for the number of pages used by the text segment, the shared memory segment, and the remainder of the process. Consult your operating system vendor for details. ---------------------------V. Sizing Swap Space and RAM ---------------------------The bottom line is, that while it is possible to estimate virtual and physical memory usage on a UNIX machine, doing so is more of an art than a science. First: you must measure your actual application. An Oracle Forms application running in bitmapped mode, using 256 colors, 16 full-screen windows, and retrieving thousands of records with a single query may well use two orders of magnitude more stack and heap than an Oracle Forms application running in character mode, using one window and only retrieving a few dozen rows in any single query. Similarly, a server-only system with five hundred users logged into the database but only fifty of them performing queries at any one time will have a far lower RAM requirement than a server-only system which has only two hundred users logged into the database all of which are continually performing queries and updates. Second: when measuring physical memory usage, make sure that your system is as heavily loaded as it will be in a production situation. It does no good to measure physical memory usage with 255 processes running Oracle Forms if all 255 processes are sitting idle waiting for input -- all of the processes are paged out waiting for input. Sizing swap space is relatively easy. Recall that every page of virtual data must be backed with a page of swap. This means that if you can estimate the maximum virtual memory usage on your machine, you have determined how much swap space you need. Use the SZ column from the 'ps' command to determine the virtual memory usage for the processes running on the system. The high-water mark can be your ally in this measurement: take one process, run it as hard as you can, and see how high you can drive the value of the SZ column. Add together the virtual memory used by the system processes to form a baseline, then calculate the maximum amount of virtual memory used by each incremental process (don't forget to count all processes that get created when a user logs on, such as the shell and any dedicated shadow processes). The swap space requirement is simply the sum of the SZ columns of all processes at the time of maximum load. The careful

system administrator will add 10% to the swap space size for overhead and emergencies. Sizing RAM is somewhat more difficult. First, start by determining the amount of RAM dedicated for system space (this is usually printed in a message during startup). Note that tuning the operating system kernel may increase the amount of RAM needed for system space. Next, determine the amount of RAM needed for the buffer cache. Finally, determine the amount of RAM needed for the page pool. You will want to have enough RAM on the system so that the working set of every active process can remain paged in at all times. -------------VI. References -------------`Operating Systems Design and Implementation' Andrew S. Tannenbaum, Prentice-Hall, ISBN 0-13-637406-9 `The Design and Implementation of the 4.3BSD Unix Operating System', Samuel Leffler, Kirk McKusick, Michael Karels, John Quarterman, 1989, Addison-Wesley, ISBN 0-201-06196-1 `The Design of the Unix Operating System', Maurice Bach, 1986, Prentice Hall, ISBN 0-13-201757-1 `The Magic Garden Explained: The Internals of Unix System V Release 4', Berny Goodheart, James Cox, 1994, Prentice Hall, ISBN 0-13-098138-9. . DETERMINING WHICH INSTANCE OWNS WHICH SHARED MEMORY & SEMAPHORE SEGMENTS Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 03-FEB-1999 Last Revision Date: 10-MAY-2001 Purpose ======= This article describes how to identify which shared memory and semaphore segments are owned by a particular instance, in Oracle v7.x, v8.0 and v8.1. Scope & Application =================== This is helpful when in recovery situations where the instance may not have released the shared memory or semaphores on database shutdown. How To Determine Which Instance Owns Which Shared memory and Semaphore Segments =============================================================================== For 7.0.X - 8.0.X ===================

You have several instances running and one instance crashes and leaves "sgadef<sid>.dbf" file, shared memory and semaphore segments running. As there are many instances running, you are unsure which segments to kill and when to do an ipcs. You may see several shared memory and semaphore segments. This is an example of what you may see: % ipcs -b (this command will return something similar to the following output) IPC status from /dev/kmem as of Wed Apr 8 16:12:18 1998 T ID KEY MODE OWNER GROUP QBYTES Message Queues: m 2 0x4e0c0002 --rw-rw-rwroot root 31008 m 3 0x41200207 --rw-rw-rwroot root 8192 m 45060 0x5fa4f34e --rw-r----- osupport dba 4526080 m 8709 0x5fa5b36c --rw-r----- osupport dba 4640768 m 12806 0x00000000 D-rw-r----- osupport dba 4640768 m 4615 0x6aac51e2 --rw-r----- osupport dba 5140480 m 6664 0x5aac503f --rw-r----- osupport dba 4392968 m 6665 0x5fa37342 --rw-r----- osupport dba 6422528 m 17418 0x5fa2b2b1 --rw-r----- osupport dba 4640768 m 523 0x5fa23296 --rw-r----- osupport dba 4591616 m 1036 0x52aea224 --rw-r----- usupport dba 4521984 Semaphores: s 0 0x2f180002 --ra-ra-raroot sys 6 s 1 0x411c02f9 --ra-ra-raroot root 1 s 2 0x4e0c0002 --ra-ra-raroot root 2 s 3 0x41200207 --ra-ra-raroot root 2 s 4 0x00446f6e --ra-r--r-root root 1 s 5 0x00446f6d --ra-r--r-root root 1 s 6 0x01090522 --ra-r--r-root root 1 s 11271 0x00000000 --ra-r----- osupport dba 50 s 4360 0x00000000 --ra-r----- osupport dba 50 s 2828 0x00000000 --ra-r----- osupport dba 50 You can determine which shared memory and semaphore segments NOT to kill. *NOTE: It is very hard to guess, and very dangerous to those instances still running.

First, set your "ORACLE_SID" and "ORACLE_HOME", then log into each individual instance you have up and 'running'. The following is an example of how to proceed: SVRMGR> connect internal Connected. SVRMGR> oradebug ipc -------------- Shared memory -------------Seg Id Address Size 6665 c4c94000 6422528 Total: # of segments = 1, size = 6422528 -------------- Semaphores ----------------

Total number of semaphores = 50 Number of semaphores per set = 50 Number of semaphore sets = 1 Semaphore identifiers: 2828 The following output shows the shared memory segment and semaphore segment: Seg Id Address Size 6665 c4c94000 6422528 Semaphore identifiers: 2828 Then look to verify this is running with the following command % ipcs -b m s 6665 0x5fa37342 --rw-r----2828 0x00000000 --ra-r----osupport osupport dba 6422528 dba 50

You now know these are valid segments on a running database. Using this process of elimination you can identify the idle segments from a crashed instance. You can then kill them using "ipcrm -m" and "ipcrm -s" respectfully. The command syntax to remove the shared memory segments or semaphores is as follows: % ipcrm -m <shared memory id> % ipcrm -s <semaphore id> For 8.1.X: ========== To obtain the shared memory id and semaphore id for 8.1.X you can do either of the following: $ORACLE_HOME/bin/sysresv IPC Resources for ORACLE_SID "V817" : Shared Memory: ID KEY 14851 0x8a85a74c Semaphores: ID KEY 11206656 0x4bd4814c Oracle Instance alive for sid "V817" OR %sqlplus internal SQL> oradebug ipc Information written to trace file.

Trace file is written to USER_DUMP_DEST. The shared memory segment id can be found by looking in the trace file for Shmid: The shared memory segment id in the following is 2007. Area 0 Subarea 0 Shmid Stable Addr Actual Addr 2007 0000000080000000 0000000080000000

To find the semaphore id look for Semaphore List= In the following example the semaphore id is 1245189. Semaphore List= 1245189 Example of trace file: /u02/app/oracle/product/8.1.6/admin/R816/udump/r816_ora_975.trc Oracle8i Enterprise Edition Release 8.1.6.2.0 - Production With the Partitioning option JServer Release 8.1.6.2.0 - Production ORACLE_HOME = /u02/app/oracle/product/8.1.6 System name: SunOS Node name: sandbox1 Release: 5.6 Version: Generic_105181-16 Machine: sun4u Instance name: R816 Redo thread mounted by this instance: 1 Oracle process number: 12 Unix process pid: 975, image: oracle@sandbox1 (TNS V1-V3) *** SESSION ID:(14.4287) 2000-08-31 10:47:44.542 Dump of unix-generic skgm context areaflags 00000037 realmflags 0000000f mapsize 00002000 protectsize 00002000 lcmsize 00002000 seglen 00002000 largestsize 00000000f8000000 smallestsize 0000000000400000 stacklimit ef87eebf stackdir -1 mode 640 magic acc01ade Handle: 177b8c8 `/u02/app/oracle/product/8.1.6R816' Dump of unix-generic realm handle `/u02/app/oracle/product/8.1.6R816', flags = 0 0000000 Area #0 `Fixed Size' containing Subareas 0-0 Total size 0000000000010ff0 Minimum Subarea size 00000000 Area Subarea Shmid Stable Addr Actual Addr 0 0 2007 0000000080000000 0000000080000000 Subarea size Segment size 0000000000012000 00000000039d4000 Area #1 `Variable Size' containing Subareas 1-1 Total size 00000000025a2000 Minimum Subarea size 00100000

Shmid Stable Addr Actual Addr 2007 0000000080012000 0000000080012000 Subarea size Segment size 0000000002600000 00000000039d4000 Area #2 `Database Buffers' containing Subareas 2-2 Total size 0000000001388000 Minimum Subarea size 00002000 Area Subarea Shmid Stable Addr Actual Addr 2 2 2007 0000000082612000 0000000082612000 Subarea size Segment size Area #2 `Database Buffers' containing Subareas 2-2 Total size 0000000001388000 Minimum Subarea size 00002000 Area Subarea Shmid Stable Addr Actual Addr 2 2 2007 0000000082612000 0000000082612000 Subarea size Segment size 0000000001388000 00000000039d4000 Area #3 `Redo Buffers' containing Subareas 3-3 Total size 000000000002c000 Minimum Subarea size 00000000 Area Subarea Shmid Stable Addr Actual Addr 3 3 2007 000000008399a000 000000008399a000 Subarea size Segment size 000000000002c000 00000000039d4000 Area #4 `Lock Manager' containing Subareas 5-5 Total size 0000000000004000 Minimum Subarea size 00000000 Area Subarea Shmid Stable Addr Actual Addr 4 5 2007 00000000839ce000 00000000839ce000 Subarea size Segment size 0000000000004000 00000000039d4000 Area #5 `Java' containing Subareas 4-4 Total size 0000000000008000 Minimum Subarea size 00000000 Area Subarea Shmid Stable Addr Actual Addr 5 4 2007 00000000839c6000 00000000839c6000 Subarea size Segment size 0000000000008000 00000000039d4000 Area #6 `skgm overhead' containing Subareas 6-6 Total size 0000000000002000 Minimum Subarea size 00000000 Area Subarea Shmid Stable Addr Actual Addr 6 6 2007 00000000839d2000 00000000839d2000 Subarea size Segment size 0000000000002000 00000000039d4000 Dump of Solaris-specific skgm context sharedmmu 00000001 shareddec 0 used region 0: start 0000000080000000 length 0000000004000000 Maximum processes: = 50 Number of semaphores per set: = 54 Semaphores key overhead per set: = 4 User Semaphores per set: = 50 Number of semaphore sets: = 1 Semaphore identifiers: = 1 Semaphore List= 1245189 -------------- system semaphore information ------------IPC status from <running system> as of Thu Aug 31 10:47:44 2000 T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME Semaphores: s 1 0x55535253 --ra-ra-raroot root root root 1 20:15:27 6:21:03

Area 1

Subarea 1

s 458755 00000000 --ra-r----- rsupport :49:07 9:38:57 s 196612 0x0a248eb5 --ra-r----- rsupport 9:45:23 10:15:16 s 1245189 0x09d48eb6 --ra-r----- rsupport 9:17:49 10:55:00 s 131078 00000000 --ra-r----- rsupport :56:45 14:20:27 s 65543 00000000 --ra-r----- rsupport :47:41 14:39:14 s 196616 00000000 --ra-r----- rsupport :06:22 9:06:16 s 65545 00000000 --ra------- rsupport :11:57 9:09:05 s 262154 00000000 --ra-r----- rsupport :39:30 9:32:49 s 327691 0x09d48b46 --ra-r----oracle 10:40:05 7:21:02 s 196620 0x06148c55 --ra-r----oracle 10:40:22 7:41:50 s 131085 00000000 --ra-r----- rsupport :29:13 14:24:04

rdba rsupport rdba rsupport rdba rsupport rdba rsupport rdba rsupport rdba rsupport rdba rsupport rdba rsupport rdba rdba oracle oracle

rdba rdba rdba rdba rdba rdba rdba rdba rdba rdba rdba

50 54 54 50

50 10 50 2 50 54 54 50 2 9 9 9

rdba rsupport

Once again the command syntax to remove the shared memory segments or semaphores is as follows: % ipcrm -m <shared memory id> % ipcrm -s <semaphore id> Search Words: ============= ORA-07307, segment, shmmax, semmns, semaphores, oradebug, ipc .

2Gb or Not 2Gb - File limits in Oracle Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 02-SEP-1998 Last Revision Date: 09-MAR-2001 Introduction ~~~~~~~~~~~~ This article describes "2Gb" issues. It gives information on why 2Gb is a magical number and outlines the issues you need to know about if you are considering using Oracle with files larger than 2Gb in size. It also looks at some other file related limits and issues. The article has a Unix bias as this is where most of the 2Gb issues arise but there is information relevant to other (non-unix) platforms. Articles giving port specific limits are listed in the last section.

Topics covered include: Why is 2Gb a Special Number ? Why use 2Gb+ Datafiles ? Export and 2Gb SQL*Loader and 2Gb Oracle and other 2Gb issues Port Specific Information on "Large Files" Why is 2Gb a Special Number ? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Many CPU's and system call interfaces (API's) in use today use a word size of 32 bits. This word size imposes limits on many operations. In many cases the standard API's for file operations use a 32-bit signed word to represent both file size and current position within a file (byte displacement). A 'signed' 32bit word uses the top most bit as a sign indicator leaving only 31 bits to represent the actual value (positive or negative). In hexadecimal the largest positive number that can be represented in in 31 bits is 0x7FFFFFFF , which is +2147483647 decimal. This is ONE less than 2Gb. Files of 2Gb or more are generally known as 'large files'. As one might expect problems can start to surface once you try to use the number 2147483648 or higher in a 32bit environment. To overcome this problem recent versions of operating systems have defined new system calls which typically use 64-bit addressing for file sizes and offsets. Recent Oracle releases make use of these new interfaces but there are a number of issues one should be aware of before deciding to use 'large files'. Another "special" number is 4Gb. 0xFFFFFFFF in hexadecimal can be interpreted as an UNSIGNED value (4294967295 decimal) which is one less than 4Gb. Adding one to this value yields 0x00000000 in the low order 4 bytes with a '1' carried over. The carried over bit is lost when using 32bit arithmetic. Hence 4Gb is another "special" number where problems may occur. Such issues are also mentioned in this article. What does this mean when using Oracle ? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The 32bit issue affects Oracle in a number of ways. In order to use large files you need to have: 1. An operating system that supports 2Gb+ files or raw devices 2. An operating system which has an API to support I/O on 2Gb+ files 3. A version of Oracle which uses this API Today most platforms support large files and have 64bit APIs for such files. Releases of Oracle from 7.3 onwards usually make use of these 64bit APIs but the situation is very dependent on platform, operating system version and the Oracle version. In some cases 'large file' support is present by default, while in other cases a special patch may be required. At the time of writing there are some tools within Oracle which have not been updated to use the new API's, most notably tools like EXPORT and SQL*LOADER, but again the exact situation is platform and version specific.

Why use 2Gb+ Datafiles ? ~~~~~~~~~~~~~~~~~~~~~~~~ In this section we will try to summarise the advantages and disadvantages of using "large" files / devices for Oracle datafiles: Advantages of files larger than 2Gb: On most platforms Oracle7 supports up to 1022 datafiles. With files < 2Gb this limits the database size to less than 2044Gb. This is not an issue with Oracle8 which supports many more files. (Oracle8 supported 1022 files PER TABLESPACE). In reality the maximum database size in Oracle7 would be less than 2044Gb due to maintaining separate data in separate tablespaces. Some of these may be much less than 2Gb in size. Larger files allow this 2044Gb limit to be exceeded. Larger files can mean less files to manage for smaller databases. Less file handle resources required. Disadvantages of files larger than 2Gb: The unit of recovery is larger. A 2Gb file may take between 15 minutes and 1 hour to backup / restore depending on the backup media and disk speeds. An 8Gb file may take 4 times as long. Parallelism of backup / recovery operations may be impacted. There may be platform specific limitations - Eg: Asynchronous IO operations may be serialised above the 2Gb mark. As handling of files above 2Gb may need patches, special configuration etc.. there is an increased risk involved as opposed to smaller files. Eg: On certain AIX releases Asynchronous IO serialises above 2Gb. Important points if using files >= 2Gb Check with the OS Vendor to determine if large files are supported and how to configure for them. Check with the OS Vendor what the maximum file size actually is. Check with Oracle support if any patches or limitations apply on your platform , OS version and Oracle version. Remember to check again if you are considering upgrading either Oracle or the OS in case any patches are required in the release you are moving to. Make sure any operating system limits are set correctly to allow access to large files for all users. Make sure any backup scripts can also cope with large files.

Note that there is still a limit to the maximum file size you can use for datafiles above 2Gb in size. The exact limit depends on the DB_BLOCK_SIZE of the database and the platform. On most platforms (Unix, NT, VMS) the limit on file size is around 4194302*DB_BLOCK_SIZE. See the details in the Alert in [NOTE:112011.1] which describes problems with resizing files, especially to above 2Gb in size. Important notes generally Be careful when allowing files to automatically resize. It is sensible to always limit the MAXSIZE for AUTOEXTEND files to less than 2Gb if not using 'large files', and to a sensible limit otherwise. Note that due to [BUG:568232] it is possible to specify an value of MAXSIZE larger than Oracle can cope with which may result in internal errors after the resize occurs. (Errors typically include ORA-600 [3292]) On many platforms Oracle datafiles have an additional header block at the start of the file so creating a file of 2Gb actually requires slightly more than 2Gb of disk space. On Unix platforms the additional header for datafiles is usually DB_BLOCK_SIZE bytes but may be larger when creating datafiles on raw devices. 2Gb related Oracle Errors: These are a few of the errors which may occur when a 2Gb limit is present. They are not in any particular order. ORA-01119 Error in creating datafile xxxx ORA-27044 unable to write header block of file SVR4 Error: 22: Invalid argument ORA-19502 write error on file 'filename', blockno x (blocksize=nn) ORA-27070 skgfdisp: async read/write failed ORA-02237 invalid file size KCF:write/open error dba=xxxxxx block=xxxx online=xxxx file=xxxxxxxx file limit exceed. Unix error 27, EFBIG Export and 2Gb ~~~~~~~~~~~~~~ 2Gb Export File Size ~~~~~~~~~~~~~~~~~~~~ At the time of writing most versions of export use the default file open API when creating an export file. This means that on many platforms it is impossible to export a file of 2Gb or larger to a file system file. There are several options available to overcome 2Gb file limits with export such as: - It is generally possible to write an export > 2Gb to a raw device. Obviously the raw device has to be large enough to fit the entire export into it. - By exporting to a named pipe (on Unix) one can compress, zip or

split up the output. See: "Quick Reference to Exporting >2Gb on Unix" [NOTE:30528.1] - One can export to tape (on most platforms) See "Exporting to tape on Unix systems" [NOTE:30428.1] (This article also describes in detail how to export to a unix pipe, remote shell etc..) - Oracle8i allows you to write an export to multiple export files rather than to one large export file. Other 2Gb Export Issues ~~~~~~~~~~~~~~~~~~~~~~~ Oracle has a maximum extent size of 2Gb. Unfortunately there is a problem with EXPORT on many releases of Oracle such that if you export a large table and specify COMPRESS=Y then it is possible for the NEXT storage clause of the statement in the EXPORT file to contain a size above 2Gb. This will cause import to fail even if IGNORE=Y is specified at import time. This issue is reported in [BUG:708790] and is alerted in [NOTE:62436.1] An export will typically report errors like this when it hits a 2Gb limit: . . exporting table BIGEXPORT EXP-00015: error on row 10660 of table BIGEXPORT, column MYCOL, datatype 96 EXP-00002: error in writing to export file EXP-00002: error in writing to export file EXP-00000: Export terminated unsuccessfully There is a secondary issue reported in [BUG:185855] which indicates that a full database export generates a CREATE TABLESPACE command with the file size specified in BYTES. If the filesize is above 2Gb this may cause an ORA-2237 error when attempting to create the file on IMPORT. This issue can be worked around be creating the tablespace prior to importing by specifying the file size in 'M' instead of in bytes. [BUG:490837] indicates a similar problem. Export to Tape ~~~~~~~~~~~~~~ The VOLSIZE parameter for export is limited to values less that 4Gb. On some platforms may be only 2Gb. This is corrected in Oracle 8i. [BUG:490190] describes this problem. SQL*Loader and 2Gb ~~~~~~~~~~~~~~~~~~ Typically SQL*Loader will error when it attempts to open an input file larger than 2Gb with an error of the form: SQL*Loader-500: Unable to open file (bigfile.dat) SVR4 Error: 79: Value too large for defined data type The examples in [NOTE:30528.1] can be modified to for use with SQL*Loader for large input data files. Oracle 8.0.6 provides large file support for discard and log files in SQL*Loader but the maximum input data file size still varies between platforms. See [BUG:948460] for details of the input file limit. [BUG:749600] covers the maximum discard file size.

Oracle and other 2Gb issues ~~~~~~~~~~~~~~~~~~~~~~~~~~~ This sections lists miscellaneous 2Gb issues: - From Oracle 8.0.5 onwards 64bit releases are available on most platforms. An extract from the 8.0.5 README file introduces these - see [NOTE:62252.1] - DBV (the database verification file program) may not be able to scan datafiles larger than 2Gb reporting "DBV-100". This is reported in [BUG:710888] - "DATAFILE ... SIZE xxxxxx" clauses of SQL commands in Oracle must be specified in 'M' or 'K' to create files larger than 2Gb otherwise the error "ORA-02237: invalid file size" is reported. This is documented in [BUG:185855]. - Tablespace quotas cannot exceed 2Gb on releases before Oracle 7.3.4. Eg: ALTER USER <username> QUOTA 2500M ON <tablespacename> reports ORA-2187: invalid quota specification. This is documented in [BUG:425831]. The workaround is to grant users UNLIMITED TABLESPACE privilege if they need a quota above 2Gb. - Tools which spool output may error if the spool file reaches 2Gb in size. Eg: sqlplus spool output. - Certain 'core' functions in Oracle tools do not support large files See [BUG:749600] which is fixed in Oracle 8.0.6 and 8.1.6. Note that this fix is NOT in Oracle 8.1.5 nor in any patch set. Even with this fix there may still be large file restrictions as not all code uses these 'core' functions. Note though that [BUG:749600] covers CORE functions - some areas of code may still have problems. Eg: CORE is not used for SQL*Loader input file I/O - The UTL_FILE package uses the 'core' functions mentioned above and so is limited by 2Gb restrictions Oracle releases which do not contain this fix. <Package:UTL_FILE> is a PL/SQL package which allows file IO from within PL/SQL. Port Specific Information on "Large Files" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Below are references to information on large file support for specific platforms. Although every effort is made to keep the information in these articles up-to-date it is still advisable to carefully test any operation which reads or writes from / to large files: Platform See ~~~~~~~~ ~~~ AIX (RS6000 / SP) [NOTE:60888.1] HP [NOTE:62407.1] Digital Unix [NOTE:62426.1] Sequent PTX [NOTE:62415.1]

Sun Solaris Windows NT

[NOTE:62409.1] Maximum 4Gb files on FAT Theoretical 16Tb on NTFS ** See [NOTE:67421.1] before using large files on NT with Oracle8 *2 There is a problem with DBVERIFY on 8.1.6 See [BUG:1372172] *3 There is a problem with 8.1.6 / 8.1.7 where an autoextend to 4Gb can cause a crash - see [BUG:1668488]

Oracle and the Operating System File Size Limit Type: FAQ Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 13-JAN-1993 Last Revision Date: 01-MAY-2001 PURPOSE This document describes two ways of increasing the filesize limitation, set for a UNIX process. It also describes the errors an Oracle user might encounter as a result of this limitation. Oracle and the Operating System File Size Limit: ================================================ On most of the Unix PC platforms there is a limitation on the size of a file that can be written to by a process. The limitation is enforced by the Unix operating system. The system parameter is called the 'ulimit', and has a default which is configured into the Unix kernel, but also is able to be altered by a system call. A process will inherit the value of the ulimit from its parent process. Only the root user has the ability to alter the ulimit for a process by using the system call. Since Oracle is an information storage system, it tends to have files of sizes that are larger than the default ulimit on the Unix kernels when they are sent out to customers. Therefore, users can run into the problem that the Oracle database writer process can not write to parts of the database files beyond the ulimit. When this happens, the database writer dies, and the Oracle system has to be restarted. The error messages that indicate this problem are found in the trace files, and usually have the text "File too large" somewhere in the error stack. The error reported by Oracle is usually ORA-1114 "Unable to write to datablock". There are 2 possible solutions to this problem, one of which is provided by Oracle. The first solution is to reconfigure the Unix kernel with a larger default ulimit (one that is at least as big as the largest Oracle data file). The second solution is to make sure that the program osh is run before any DBA starts up the database. OSH (Oracle shell)

The osh program is included in the Oracle distribution and is run when the oraenv (or coraenv) script is run in the Bourne shell (or C shell). Therefore, if all DBA users (that is users that have the ability to start the Oracle database) have a call to this script in their .profile (or .login), you will not experience the error described above. The osh program raises the ulimit to the maximum (about 1/2 gigs), and execs over the current shell. It has the suid bit set and is owned by root, so it has the permission to raise the ulimit. Any process started from this shell will inherit the large ulimit and will be able to write to the largest files that Unix can handle. To recap, there are 2 ways to make sure that oracle doesn't run into errors due to the ulimit constraint: 1. Reconfigure the Unix kernel to have a high default ulimit. 2. Make sure that any user session that starts up Oracle first executes osh by running the oraenv (or coraenv) script before starting Oracle. .

MAKING THE DECISION TO USE UNIX RAW DEVICES =========================================== Cary V Millsap July 2, 1992 ABSTRACT The raw I/O capability of ORACLE for UNIX has a confusing reputation. The benefits of using raw devices have long been assumed; however, it has been difficult to reconcile the discrepancy between marketing claims and measurements taken at numerous Oracle client sites. The costs of using raw devices begin to manifest themselves as soon as the decision to implement them is made; however, detailed documentation of these costs is scarce. This paper is intended to eliminate confusion about raw I/O, and give the ORACLE RDBMS administrator the information necessary to make a wise decision about whether or not to use UNIX raw devices. 1 DEFINITIONS

In the context of this paper, a UNIX filesystem is a UNIX system's hierarchical file directory tree, whose access is coordinated through the UNIX file buffering mechanism. A raw device is an unmounted UNIX disk section that can be used by ORACLE for data files or redo log files. When ORACLE reads or writes a raw device, it bypasses the processing overhead associated with UNIX file buffering. 2 HISTORICAL CONTEXT

Marketing recommendations have long stated that using raw devices has a dramatically positive impact on performance. Yet, in most cases, careful measurement at Oracle customer sites has shown the performance impact of using raw devices to be imperceptible. As contradictory as these reports may seem, actually both are correct. A clearer statement of the available evidence is:

o o

Use of raw devices instead of UNIX file buffering can dramatically help the performance of ORACLE disk I/O. Use of raw devices has an imperceptible impact on overall ORACLE RDBMS performance at most customer sites.

ORACLE for UNIX enthusiasts commonly argue that bypassing UNIX file buffering by using raw devices can result in a five-to-twenty percent improvement in ORACLE disk I/O. These figures do NOT accurately reflect the performance gain in general performance or throughput anywhere but the most transaction-intensive benchmark tests. Throughput is not materially affected by going raw unless disk I/O is the system's performance bottleneck. The ORACLE RDBMS uses cache management to process large real-world data volumes with exceptional efficiency. When a reasonably well-configured [1] ORACLE RDBMS is I/O bound, it is almost invariably the result of a poorly optimized application query driving multiple large full-table scans. This kind of problem can sometimes be improved marginally by moving to raw devices, but permanent satisfaction comes only at the examination of ORACLE's access path to the data, beginning the a SQL trace analysis. Perhaps most of the confusion about raw I/O has been generated by those situations in which a client's ORACLE data files are moved from UNIX filesystems onto raw devices. Invariably, such a procedure results in a fiveto twenty-percent improvement in overall throughput. However, the operation of moving a data files from a UNIX filesystem to a raw device requires deletion and re-insertion of the things that were stored in the data file. This round-trip reduces row chaining all the way to its theoretical minimum and also nicely repacks and balances every index.[2] The same five- to twenty-percent performance improvement can be gained by moving a badly chained table from raw devices to a UNIX filesystem. 3 THE BENEFITS AND COSTS OF RAW DEVICES

Unfortunately, many clients enthusiastically "go raw" with misinterpretations about the benefits and no idea whatever about the costs. Using raw I/O can help performance on the margin at some large volume, high transaction sites, but using raw I/O can also assist in crippling both the performance and the functional effectiveness of a site's applications. To use raw devices, the RDBMS architect/administrator sacrifices a great deal of the database file sizing flexibility offered by ORACLE. These are the performance advantages of using raw devices: o Circumvention of UNIX File Buffering. Bypassing UNIX file buffering results in a savings on every disk read or write. This savings shows up as a throughput improvement only if disk I/O is the system performance bottleneck. o Better Memory Utilization. The memory used by UNIX to buffer file I/O can be better used by the RDBMS that does its own I/O and caching. The more memory a machine has, the less effective an argument for going raw this memory savings becomes. Wise allocation of memory to the SGA instead of to mammoth UNIX file I/O buffers further neutralizes the benefit of going raw.

The performance advantages of going raw are outweighed at most sites by the following disadvantages: o Harder Configuration Planning. Clients with small databases usually do not have the luxury of choosing from a sufficient number of well-sized raw device sections. Disk sections usually come in odd sizes that do not lend themselves to the implementation of a good database architecture. Even with the flexible section sizing of recent releases of System V, the DBA should make all data files the same size in order to use load balancing techniques as experience with the system accumulates. o Harder Configuration Tuning. Upon finding that a particular disk drive is "hot" and that performance would benefit from movement of an ORACLE data file from that drive to some other, it is likely that no acceptably sized section exists on the "cool" drive. Moving data files around, a simple and attractive option in a UNIX filesystem environment, is potentially impossible with raw devices. o Harder Daily Administration. The administrator must use more complicated UNIX tools to monitor and administer raw devices than those available for maintaining UNIX filesystems. Notably, the DBA loses most of the power and simplicity of the ORACLE data storage portion of the OFA standard [OFA]. The complexity can be minimized, but only with extra effort. 4 NECESSARY CONDITIONS FOR USE OF RAW DEVICES

Using raw devices can marginally improve the performance of certain ORACLE systems. However, the costs of going raw outweigh the benefits in most cases. An ORACLE architect/administrator should choose to use raw devices only if each of the following criteria holds. 4.1 Direct I/O Is Not Available

Use raw devices of ORACLE files only if the UNIX operating system does not offer the capability for direct I/O through the UNIX filesystem. Some UNIX computer systems include a UNIX kernel capability for direct reading and writing of UNIX mounted filesystems.[3] This kernel feature allows application software to bypass the UNIX I/O buffering mechanism for disk performance that essentially matches that of using raw devices, without incurring the administrative costs of using unmounted disk sections. The ORACLE RDBMS began taking advantage of this capability in v 6.0.32. If this capability is available, then there is no reason to use raw devices. Note that any factor that causes disk I/O to be less of a bottleneck is an argument against using raw devices than a site whose I/O is purely synchronous.

4.2 Transaction Volume Is High Use raw devices for ORACLE files only if the site has sufficiently brutal transaction and query volume that disk I/O is the performance bottleneck. If disk I/O is not a site's performance bottleneck, then using raw devices is all cost and no benefit. If disk I/O is the performance bottleneck, then it is likely that the highest throughput gain lies in the SQL trace performance analysis of a few individual application SQL statements. Normal use of queries that return a large number of rows (tens or hundreds of thousands) have motivated many DBAs to use raw devices. However, before jumping to raw devices, the DBA should help determine if an application that processes tens or hundreds of thousands of rows is designed as well as it should be. Another fact to consider is that, because each UNIX file buffer is a write-through cache, some ORACLE requests for physical I/O will actually be fulfilled with logical reads in a UNIX filesystem environment. Because a UNIX file buffer continues to hold data from the last ORACLE database write, the request for a block that no longer resides in the SGA may not require a physical file I/O. It is critical that before making the decision to use raw devices, the DBA and UNIX administration teams avoid the temptation to exaggerate the net benefit of throwing away filesystem buffering. 4.3 Raw Disk Sections Are Plentiful

Use raw devices for ORACLE files only if the site has at least as many raw disk sections as it will have ORACLE tablespaces. Without at least as many raw sections as tablespaces, the DBA is forced into integrating segments with incompatible fragmentation characteristics. Doing this will hurt performance more than the use of raw I/O can help it. Any ORACLE database should contain at least six tablespaces: 1. 2. 3. 4. 5. 6. SYSTEM -- SYS-owned dictionary segments only RBS -- rollback segments only TEMP -- temporary segments only TOOLS -- SYSTEM-owned segments only (plus, crt, forms, srw, etc.) USERS -- users' personal tables, etc. DATA -- each application should have its own tablespace

Naturally, many databases will have multiple applications housed in multiple tablespaces, and some DBAs will separate applications data and indexes into different tablespaces [OFA]. The greater the number of tablespaces in the database, the greater will be the number of disk sections required to use a raw device architecture. 4.4 Disk Volume Is Large

Use raw devices for ORACLE files only if the site has enough disk space that it can afford over-allocation of small ORACLE tablespaces.

For those sites that do go raw, fully-flexible disk load balancing is possible only if disk sections used for raw ORACLE storage are all the same size. Living by this homogeneous sizing requirement costs disk space because it forces over-allocation of small tablespaces. For example, if a site's data dictionary requires only 30 MB, yet the site uses 150-MB raw sections, then there will be 120 MB of wasted space in the SYSTEM tablespace.[4] Using the extra space in SYSTEM for non-dictionary segments can seem preferable to explaining why 120 MB of disk space sits idle, especially if the DBA has to convince a finance committee that the site needs to buy more disk drives. However, indiscriminately mixing segments in a common tablespace -- especially by putting non-dictionary segments in SYSTEM -- can prove much more expensive than the waste of 120 MB of disk [OFA]. Administrators using raw devices will also experience more difficulty in adding space to existing ORACLE tablespaces. To continue our example, it would be understandably difficult to resist the temptation to pre-allocate as many 150-MB chunks of disk space as possible to ORACLE tablespaces. But if after a month or two of operation, the administrator finds the need to add a data file to a tablespace, the options become: (1) buy a new disk drive, (2) re-create the database with a new, more appropriate tablespace architecture, or (3) add a data file from space available in the UNIX filesystem. Options 1 and 2 are immediately and obviously expensive. Option 3 initiates the DBA into the business of conducting two synchronized operations for each cold database backup. Any decision that complicates a critical DBA task inevitably invokes forces of disaster in a manner which not one expert in a thousand is able to predict. 4.5 Redundant Administrative Support Is Available

Use raw devices for ORACLE files only if the site has multiple experienced ORACLE and UNIX administrators. Clearly, the inflexibility of raw devices motivates the need for ORACLE and UNIX administrators with enough experience to see into the future with reasonable clarity. The realities of illness and vacation accrual motivate the recommendation that every raw I/O site have at least two people who are competent to administer both UNIX and ORACLE. 5 USING RAW DEVICES

Once the decision has been made to use raw devices for ORACLE data, adherence to the following standard maximizes the administrative effectiveness of the resulting ORACLE RDBMS. 5.1 Make Raw Disk Sections the Same Size

Make all raw disk sections the same size. If possible, choose a disk partitioning scheme that cuts each disk into equally-sized sections. This standard ensures maximal flexibility in system I/O load balancing as experience with the RDBMS accumulates. A variety of section sizes compounds the difficulty of moving a data file from one disk

drive to another. is chosen: o

Consider the following factors as the standard section size

Sufficiently Large. The standard section size must be large enough that each large tablespace uses a minimal number of data files. Too small a standard section size restricts database size because ORACLE for UNIX imposes a limit on the number of files held open by an instance.

Sufficiently Small. The standard section size must be small enough that small tablespaces are not appreciably wasteful. Too large a standard section size leads to disk space under-utilization.

SUMMARY

In the excitement of discovering that using raw devices with ORACLE for UNIX might yield appreciably better performance, many clients enthusiastically "go raw" with a clear understanding neither of the benefits nor the costs. Using raw devices can help performance at the margin in some installations, but raw I/O will not benefit most ORACLE sites. It is especially important that arguments for raw I/O not be used to justify other decisions that degrade ORACLE performance, such as a decision to integrate dictionary and other segments into a common tablespace. Even in spite of the costs incurred by the decision to use UNIX raw devices for ORACLE database files, there are situations in which raw I/O is the appropriate choice. Using raw I/O for ORACLE is appropriate only if a site without a UNIX direct read/write capability has tremendous disk, data and transaction volume in a tuned environment where I/O is truly the performance bottleneck. Only clients that have multiple ORACLE and UNIX administrators who are competent to deal with the added complexities of using unmounted disk sections should consider using the raw I/O capability of ORACLE for UNIX. [Footnotes] [1] It is naturally difficult to place a metric on the "well-configuredness" of an ORACLE RDBMS, but at least the following issues must be addressed: approximate balance of I/O load across multiple disk heads; well-chosen database buffer, log buffer, and sort area sizes; well-tuned dictionary cache; sufficiently many rollback segments that are sufficiently well-sized; and well-tuned redo logging. If the DBA finds this operation necessary at period intervals to relieve row chaining, then the solution is not to schedule a periodic export/import, the answer is to find better values for pctfree and pctused. To date, the list of machines includes Sequent DYNIX, DYNIX/ptx: and Unisys 6000. The same problem surfaces if the administrator needs to create a 180-MB tablespace with 150-MB files.

[2]

[3] [4] .

I/O Tuning with Different RAID Configurations Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 25-JUL-1995 Last Revision Date: 08-JUN-2001 PURPOSE ------This document gives a general overview of RAID (Redundant Arrays of Inexpensive Disks), the different levels of RAID and their uses, and the use of RAID with Oracle databases. SCOPE & APPLICATION ------------------This note is intended to provide a discussion on RAID configurations. 1. Overview of RAID configurations and Oracle --------------------------------------------RAID-0: ------RAID-0 offers pure disk striping. The striping allows a large file to be spread across multiple disks/controllers, providing concurrent access to data because all the controllers are working in parallel. It does not provide either data redundancy or parity protection. In fact, RAID-0 is the only RAID level focusing solely on performance. Some vendors, such as EMC, do not consider level 0 as true RAID and do not offer solutions based on it. Pure RAID-0 significantly lowers MTBF, since it is highly prone to downtime. If any disk in the array (across which Oracle files are striped) fails, the database goes down. RAID-1: ------With RAID-1, all data is written onto two independent disks (a "disk pair") for complete data protection and redundancy. RAID-1 is also referred to as disk mirroring or disk shadowing. Data is written simultaneously to both disks to ensure that writes are almost as fast as to a single disk. During reads, the disk that is the least busy is utilized. RAID-1 is the most secure and reliable of all levels due to full 100-percent redundancy. However, the main disadvantage from a performance perspective is that every write has to be duplicated. Nevertheless, read performance is enhanced, as the read can come from either disk. RAID-1 demands a significant monetary investment to duplicate each disk; however, it provides a very high Mean time between failures (MTBF). Combining RAID levels 0 and 1 (RAID-0+1) allows data to be striped across an array, in addition to mirroring each disk in the array. RAID-0 & RAID-1: ----------------

If RAID/0 is then combined with RAID/1 (mirroring) this then provides the resilience, but at a cost of having to double the number of disk drives in the configuration. There is another benefit in some RAID/1 software implementations in that the requested data is always returned from the least busy device. This can account for a further increase in performance of over 85% compared to the striped, non-mirrored configuration. Write performance on the other hand has to go to both pieces of the software mirror. If this second mirror piece is on a second controller (as would normally be recommended for controller resilience), this degradation can be as low as 4 percent. RAID-3: ------In a RAID 3 configuration, a single drive is dedicated to storing error correction or parity data. Information is striped across the remaining drives. RAID/3 dramatically reduces the level of concurrency that the disk subsystem can support (I/O's per second) to a comparable software mirrored solution . The worst case for a system using RAID/3, would be an OLTP environment, where the number of rapid transactions is numerous and response time is critical. So to put it simply, if the environment is mainly read only (Eg Decision Support) RAID/3 provides disk redundancy with read performance slightly improved, but at the cost of write performance. Unfortunately, even decision support databases still do a significant amount of disk writing since complex joins, unique searches etc still do temporary work, thus involving disk writing. RAID-5: ------Instead of total disk mirroring, RAID-5 computes and writes parity for every write operation. The parity disks avoid the cost of full duplication of the disk drives of RAID-1. If a disk fails, parity is used to reconstruct data without system loss. Both data and parity are spread across all the disks in the array, thus reducing disk bottleneck problems. Read performance is improved, but every write has to incur the additional overhead of reading old parity, computing new parity, writing new parity, and then writing the actual data, with the last two operations happening while two disk drives are simultaneously locked. This overhead is notorious as the RAID-5 write penalty. This write penalty can make writes significantly slower. Also, if a disk fails in a RAID-5 configuration, the I/O penalty incurred during the disk rebuild is extremely high. Read-intensive applications (DSS, data warehousing) can use RAID-5 without major real-time performance degradation (the write penalty would still be incurred during batch load operations in DSS applications). In terms of storage, however, parity constitutes a mere 20-percent overhead, compared to the 100-percent overhead in RAID-1 and 0+1. Initially, when RAID-5 technology was introduced, it was labeled as the cost-effective panacea for combining high availability and performance. Gradually, users realized the truth, and until about a couple of years ago, RAID-5 was being regarded as the villain in most OLTP shops. Many sites contemplated getting rid of RAID-5 and started looking at alternative solutions. RAID 0+1 gained prominence as the best OLTP solution for people who could afford it. Over the last two years, RAID-5 is making a comeback either as hardware-based

RAID-5 or as enhanced RAID-7 or RAID-S implementations. However, RAID-5 evokes bad memories for too many OLTP database architects. RAID-S: ------RAID S is EMC's implementation of RAID-5. However, it differs from pure RAID-5 in two main aspects: (1) It stripes the parity, but it does not stripe the data. (2) It incorporates an asynchronous hardware environment with a write cache. This cache is primarily a mechanism to defer writes, so that the overhead of calculating and writing parity information can be done by the system, while it is relatively less busy (and less likely to exasperate the user!). Many users of RAID-S imagine that since RAID-S is supposedly an enhanced version of RAID-5, data striping is automatic. They often wonder how they are experiencing I/O bottlenecks, in spite of all that striping. It is vital to remember that in RAID-S, striping of data is not automatic and has to be done manually via third-party disk-management software. RAID-7: ------RAID-7 also implements a cache, controlled by a sophisticated built-in real-time operating system. Here, however, data is striped and parity is not. Instead, parity is held on one or more dedicated drives. RAID-7 is a patented architecture of Storage Computer Corporation. 2. Pro's and Cons of Implementing RAID technology ------------------------------------------------There are benefits and disadvantages to using RAID, and those depend on the RAID level under consideration and the specific system in question. In general, RAID level 1 is most useful for systems where complete redundancy of data is a must and disk space is not an issue. For large datafiles or systems with less disk space, this RAID level may not be feasible. Writes under this level of RAID are no faster and no slower than 'usual'. For all other levels of RAID, writes will tend to be slower and reads will be faster than under 'normal' file systems. Writes will be slower the more frequently ECC's are calculated and the more complex those ECC's are. Depending on the ratio of reads to writes in your system, I/O speed may have a net increase or a net decrease. RAID can improve performance by distributing I/O, however, since the RAID controller spreads data over several physical drives and therefore no single drive is overburdened. The striping of data across physical drives has several consequences besides balancing I/O. One additional advantage is that logical files may be created which are larger that the maximum size usually supported by an operating system. There are disadvantages, as well, however. Striping means that it is no longer possible to locate a single datafile on a specific physical drive. This may cause the loss of some application tuning capabilities. Also, in Oracle's case, it can cause database recovery to be more time-consuming. If a single physical disk in a RAID array needs recovery, all the disks which are part of that logical RAID device must be involved in the recovery. One additional note is that the storage of ECC's may require up to 20% more disk space than would storage of data alone, so there is some disk

overhead involved with usage of RAID. 3. RAID and Oracle -----------------The usage of RAID is transparent to Oracle. All the features specific to RAID configuration are handled by the operating system and go on behindthe-scenes as far as Oracle is concerned. Different Oracle file-types are suited differently for RAID devices. Datafiles and archive logs can be placed on RAID devices, since they are accessed randomly. Redo logs should be not be put on RAID devices, since they are accessed sequentially and performance is enhanced in their case by having the disk drive head near the last write location. However, mirroring of redo log files is strongly recommended by Oracle. In terms of administration, RAID is far simple than using Oracle techniques for data placement and striping. Recommendations: In general, RAID usually impacts write operations more than read operation. This is specially true where parity need to be calculated (RAID 3, RAID 5, etc). Online or archived redo log files can be put on RAID 1 devices. You should not use RAID 5. 'TEMP' tablespace data files should also go on RAID1 instead of RAID5 as well. The reason for this is that streamed write performance of distributed parity (RAID5) isn't as good as that of simple mirroring (RAID1). Swap space can be used on RAID devices without affecting Oracle. ================================================================================== == RAID Type of RAID Control Database Redo Log Archive Log File File File File ================================================================================== == 0 Striping Avoid* OK* Avoid* Avoid* ----------------------------------------------------------------------------------1 Shadowing OK OK Recommended Recommended ----------------------------------------------------------------------------------0+1 Striping + OK Recommended Avoid Avoid Shadowing (1)

----------------------------------------------------------------------------------3 Striping with OK Avoid Avoid Avoid Static Parity (2)

----------------------------------------------------------------------------------5 Striping with OK Avoid Avoid Avoid Rotating Parity (2) ----------------------------------------------------------------------------------* RAID 0 does not provide any protection against failures. It requires a strong backup strategy. (1) RAID 0+1 is recommended for database files because this avoids hot spots and gives the best possible performance during a disk failure. The disadvantage of RAID 0+1 is that it is a costly configuration. (2) When heavy write operation involves this datafile

RAID and Oracle - 20 Common Questions and Answers Type: FAQ Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 26-FEB-1998 Last Revision Date: 26-MAR-2000 RAID and Oracle - 20 Common Questions and Answers ================================================= 1. What is RAID? RAID is an acronym for Redundant Array of Independent Disks. A RAID system consists of an enclosure containing a number of disk volumes, connected to each other and to one or more computers by a fast interconnect. Six levels of RAID are defined: RAID-0 simply consists of several disks, and RAID-1 is a mirrored set of two or more disks. The only other widely-used level is RAID-5, which is the subject of this article. Other RAID levels exist, but tend to be vendor-specific, and there is no generally accepted standard for features included. 2. What platforms is RAID available for? Third-party vendors supply RAID systems for most of the popular UNIX platforms and for Windows NT. Hardware vendors often provide their own RAID option. 3. What does RAID do? The main feature of RAID-5 is prevention of data loss. If a disk is lost because of a head crash, for example, the contents of that disk can be reconstituted using the information stored on other disks in the array. In RAID-5, redundancy is provided by error-correcting codes (ECCs) with parity information (to check on data integrity) stored with the data, thus striped across several physical disks. (The intervening RAID levels between 1 and 5 work in a similar way, but with differences in the way the ECCs are stored.)

4.

What are the performance implications of using RAID-5? Depending on the application, performance may be better or worse. The basic principle of RAID-5 is that files are not stored on a single disk, but are divided into sections, which are stored on a number of different disk drives. This means that the effective disk spindle speed is increased, which makes reads faster. However, the involvment of more disks and the more complex nature of a write operation means that writes will be slower. So applications where the majority of transactions are reads are likely to give better response times, whereas write-intensive applications may show worse performance. Only hardware-based striping should be used on Windows NT. Software striping, from Disk Administrator, gives very poor performance.

5.

How does RAID-5 differ from RAID-1? RAID-1 (mirroring) is a strategy that aims to prevent downtime due to loss of a disk, whereas RAID-5 in effect divides a file into chunks and places each on a separate disk. RAID-1 maintains a copy of the contents of a disk on another disk, referred to a mirrored disk. Writes to a mirrored disk may be a little slower as more than one physical disk is involved, but reads should be faster as there is a choice of disks (and hence head positions) to seek the required location.

5.

How do I decide between RAID-5 and RAID-1? RAID-1 is indicated for systems where complete redundancy of data is considered essential and disk space is not an issue. RAID-1 may not be practical if disk space is not plentiful. On a system where uptime must be maximised, Oracle recommends mirroring at least the control files, and preferably the redo log files. RAID-5 is indicated in situations where avoiding downtime due to disk problems is important or when better read performance is needed and mirroring is not in use.

6.

Do all drives used for RAID-5 have to be identical? Most UNIX systems allow a failed disk to be replaced with one of the same size or larger. This is highly implementation-specific, so the vendor should be consulted.

7. Is RAID-5 enough to provide full fault-tolerance? No. A truly fault-tolerant system will need to have a separate power supply for each disk to allow for swapping of one disk without having to power down the others in the array. A fully fault-tolerant system has to be purpose-designed. 8. What is hot swapping? This refers to the ability to replace a failed drive without having to power down the whole disk array, and is now considered an essential feature of RAID-5. An extension of this is to have a hot

standby disk that eliminates the time taken to swap a replacement disk in - it is already present in the disk array, but not used unless there is a problem. 9. What is a logical drive, and how does it relate to a physical drive? A logical drive is a virtual disk constructed from one or (usually) more than one physical disks. It is the RAID-5 equivalent of a UNIX logical volume; the latter is a software device, whereas RAID-5 uses additional hardware. 10. What are the disadvantages of RAID-5? The need to tune an application via placement of 'hot' (i.e. heavily accessed) files on different disks is reduced by using RAID-5. However, if this is still desired, it is less easy to accomplish as the file has already been divided up and distributed across disk drives. Some vendors, for example EMC, allow striping in their RAID systems, but this generally has to be set up by the vendor. There is an additional consideration for Oracle, in that if a database file needs recovery several physical disks may be involved in the case of a striped file, whereas only one would be involved in the case of a normal file. This is a side-effect of the capability of RAID-5 to withstand the loss of a single disk. 11. What variables can affect the performance of a RAID-5 device? The major ones are: Access speed of constituent disks Capacity of internal and external buses Number of buses Size of caches Number of caches The Nature of the algorithms used for determining how reads and writes are done.

12. What types of files are suitable for placement on RAID-5 devices? Placement of data files on RAID-5 devices is likely to give the best performance benefits, as these are usually accessed randomly. More benefits will be seen in situations where reads predominate over writes. Rollback segments and redo logs are accessed sequentially (usually for writes) and therefore are not suitable candidates for being placed on a RAID-5 device. Also, datafiles belonging to temporary tablespaces are not suitable for placement on a RAID-5 device. Another reason redo logs should not be placed on RAID-5 devices is related to the type of caching (if any) being done by the RAID system. Given the critical nature of the contents of the redo logs, catastrophic loss of data could ensue if the contents of the cache were not written to disk, e.g. because of a power failure, when Oracle was notified they had been written. This is particularly true of write-back caching, where the write is regarded as having been written to disk when it has only been written to the cache. Write-through caching, where the write is only regarded as having completed when it has reached the disk, is much safer, but still not recommended for redo logs for the reason mentioned earlier.

13. What about using multiple DBWRs as an alternative to RAID-5? Using at least as many Database Writer processes (DBWR) as you have database disks will maximise synchronous write capability, by avoiding one disk having to wait for a DBWR process which is busy writing to another disk. However, this is not an alternative to RAID-5, because it improves write efficiency. And RAID-5 usually results in writes being slower. 14. What about other strategies? Two strategies that can be used as alternatives to RAID-5, or in addition to it, are Asynchronous I/O (aio) and List I/O (listio). 15. What is Asynchronous I/O? Asynchronous I/O (aio) is a means by which a process can proceed with the next operation without having to wait for a write to complete. For example, after starting a write operation, the DBWR process blocks (waits) until the write has been completed. If aio is used, DBWR can continue almost straight away. aio is activated by the relevant "init.ora" parameter, which will either be ASYNC_WRITE or USE_ASYNC_IO, depending on the platform. If aio ia used, there is no need to have multiple DBWRs. Asynchronous I/O is optional on many UNIX platforms. It is used by default on Windows NT. 16. What are the advantages and disadvantages of aio? In the above DBWR example, the idle time is eliminated, resulting in more efficient DBWR operation. However, aio availability and configuration is very platform-dependent; while many UNIX versions support it, some do not. Raw devices must be used to store the files so the use of aio adds some complexity to the system administrator's job. Also, the applications must be able to utilise aio. 17. What is List I/O? List I/O is a feature found on many SVR4 UNIX variants. As the name implies, it allows a number of I/O requests to be batched into a "list", which is then read or written in a single operation. It does not exist on Windows NT. 18. What are its advantages and disadvantages? I/O should be much more efficient when done in this manner. You also get the benefits of aio, so this is not needed if listio is available. However, listio is only available on some UNIX systems, and as in the case of aio, the system administrator needs to set it up and make sure key applications are configured to use it. 19. How do Logical Volume Managers (LVMs) affect use of RAID-5? Many UNIX vendors now include support for an LVM in their standard product. Under AIX, all filesystems must reside on logical volumes. Performance of a UNIX system using logical volumes can be very good

compared with standard UNIX filesystems, particularly if the stripe size (size the chunks files are divided into) is small. Performance will not be as good as RAID-5 given that the latter uses dedicated hardware with fast interconnects. In practice, many small and medium-sized systems will find that the use of logical volumes (with a suitable stripe size for the type of application) performs just as good as RAID-5. This particularly applies to systems where there is no I/O problem. Larger systems, though, are more likely to need the extra performance benefits of RAID-5. 20. How can I tell if my strategy to improve I/O performance is working? At the UNIX level, there are several commands that can tell you if a disk device is contributing to I/O problems. On SVR4, use the 'sar' command with the appropriate flag, usually '-d'. On BSD, use the 'iostat' command. You are looking for disks whose request queue average length is short, ideally zero. Disks with more than a few entries in the queue may need attention. Also check the percent busy value, as a disk might have a short average queue length yet be very active. On Windows NT, the Performance Monitor allows I/O statistics to be monitored easily and in a graphical manner. It is essential to obtain baseline figures for normal system operation, so you will know when a performance problem develops and when your corrective action has restored (or improved upon) the performance normally expected. References: =========== -Installation and Configuration Guide for Oracle7/8/8i (platform-specific) -Oracle for UNIX Performance Tuning Tips -Oracle7/8/8i Server Getting Started for Windows NT

TECH: Using Unix Raw Partitions as Oracle Data File Type: FAQ Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 04-APR-1995 Last Revision Date: 01-MAY-2001 PURPOSE Implementation of Unix Raw Partitions as Oracle Data files SCOPE & APPLICATION For DBA's wanting further information about creating Oracle datafiles on raw devices. Contents 1.0 What is a raw partition

2.0 3.0 4.0 5.0 6.0 7.0

When to use Raw Partition Setting Up Backup Strategies Raw Devices and Export/Import How to convert from file system to raw partitions the only supported way Questions and Answers

1.0 What is a raw partition Raw devices are disk partitions that are not mounted and written to as a Unix file system but are accessed via a character device driver. It is the responsibility of the application to organize how the data is written to the disk partition. As with a mounted disk partition there are devices in the /dev directory that are used to access the disk partition and these character devices usually have a prefix of "r". For example, on a Sun workstation running Sunos they are defined in the following format: crw-r----1 root 17, 6 Sep. 28 10:05 rsd0g

The main difference between accessing a disk partition via its raw device driver as opposed to as a mounted file system is that the database writer bypasses the Unix buffer cache and eliminates the file system overheads such as inodes or free lists. The performance benefit of using raw devices can be between 5 and 40 % for the same number of disks. 2.0 When to use Raw Partitions 2.1 I/O Bound Application Raw devices are used in circumstances where an application is seen to be I/O bound. To see if this is the case there are a number of tools available : 1. 2. 3. 4. SQLDBA "monitor fileio" SVRMGR "monitor fileio" UTLBstat UTLEstat utility (Provided by Oracle in $O_H/rdbms/admin) Operating system monitors such as sar or vmstat

Using these tools along with your knowledge of the application being run you should be able to identify I/O Hot Spots. Having done this and identified an I/O problem there are several options that should be considered BEFORE deciding to implement raw disk partitions. To summarize these options : 1. Use more database buffers to reduce the need for Disk I/O. 2. Organize objects that are heavily accesses such that they are on separate disks. 3. Separate indexes from tables place into different tablespaces and split these tablespaces onto different disks. 4. Stripe heavily accessed objects over multiple disks. 5. Separate redo logs onto a lightly loaded disk drive. Note: Raid 5 based disks are NOT a good location for redo logs 6. Place rollback segments into to separate tablespaces and then by the listing order in the init.ora interleaf the access between the two tablespaces. 7. Use multiple database writers up to the number of disk spindles

that are being accessed. 8. Use the Explain plan utility to check the most common SQL statements that are used. From this it may be possible to utilize indexes that will prevent sorting and hence reduce I/O to the temporary tablespace. If having done this you are still identifying an I/O problem, then now is time to implement raw devices. 2.1 Oracle Parallel Server Some implementations of Oracle Parallel Server require that all Data files and control files are placed onto raw devices so that the different nodes of the parallel environment are all able to see and access the files. 2.3 List I/O and Async I/O Both of without up to a to take these facilities allow a program to issue multiple write operations having to wait for the return of the previous write. This can give 15% improvement in performance. However on some operating systems advantage of this data files will need to be on raw devices.

3.0 Setting Up 3.1 Creating the partitions Due to the complex nature of setting up a database to use raw devices it is important that the Oracle DBA works very closely with the System Administrator for the machine. This will ensure that when partitioning up disks things like swap space won't get used !! (Swap space doesn't show in a df command). Each raw partition can only be used for ONE database datafile, so any space that is not allocated to the data file is wasted and cannot be used for anything else. It is convenient to partition up the disk into a number of evenly-sized partitions with a number of small medium and large partitions. If the operating system allows you to name these partitions, then choose a logical name. For a Parallel Server environment, this could be: <nodename>_<logical_disk>_<Slice number> 3.2 Calculating The Size of the Partition When creating the oracle tablespace on the raw partition a slightly smaller size than the actual partition size needs to be specified. This size can be calculated as follows: Size of Redo Log = Raw Partition Size - 1*512 byte block Size of Data File = Raw Partition Size - 2* Oracle Block Size 3.3 First Partition of a disk. On some operating systems, if the first partition of a disk is used as a raw device it will overwrite the disk partition table. This will, at the next machine reboot, cause the disk to be unreadable. Check with your hardware supplier to see if this applies.

3.4 File Protections After creating a raw partition the devices are usually still owned by root. To allow Oracle to use the partition, the owner and group must be changed so that the oracle account owns the device and its group is the DBA group. 3.5 Specifying in a create tablespace command Once the raw device has been created, its group and owner set correctly, and the required size of the tablespace calculated, it may be referenced in a create tablespace command as follows: Three raw partitions each 50M in size and called /dev/rpart1 ..2 ..3 The database has a 4K block size create tablespace tab_on_raw data file '/dev/rpart1' size 51196K, data file '/dev/rpart2' size 51196K, data file '/dev/rpart3' size 51196K 3.6 Oracle Block Size The Oracle Block size logical block size is On raw disks, you can or write in multiples 4.0 Backup Strategies 4.1 dd To backup raw partitions you will need to use the Unix dd command. Utilities like tar, cpio and dump CANNOT be used for backing up raw partitions. The typical command line for dd to do this is as follows: dd if=/dev/rpart1 of=/dev/tape_device bs=16K (Keep the Block size to multiple of the Oracle Block Size) It is important that all raw partitions are included in the backup procedure. It will require close cooperation between the Oracle DBA and Systems Administrator for this to be achieved. Any errors or missed partitions will make the backup invalid !! 4.2 Oracle Parallel Backup/Restore This utility provides an effective mechanism to backup and restore Oracle Data files and control files. The Oracle Parallel Backup/Restore works in conjunction with a tape management product provided by a third party software vendor. As long as this third party product is capable of backing up raw partitions via dd or its own proprietary method then Oracle Parallel Backup/Restore can be used. can be changed on raw devices, but make sure that the a multiple of the physical block size on the raw disk. do seek only to physical block boundaries and read of the physical block size.

5.0 Raw Devices and Export/Import If you are performing a full database import to a database on the same machine as the exported database, and the original datafiles were on a raw device, the files will be reused even if you specify DESTROY=N. This will crash the original database from which the export was taken. 6.0 How to convert file systems to raw partitions the only supported way. The following is the only supported way to convert an instance from a file system to use raw partitions: i. Export the objects from the tablespace

ii. Recreate them on raw device iii. Import the tablespace 7.0 Questions and Answers 1. Can a database use both raw partitions and file system files as Data files in the same database ? Yes, a Unix database can simultaneously use Data files stored on both raw devices and file systems. Exeptions to this are when using Oracle Parallel Server or List I/O which may require all Data files to be on raw devices. 2. Can redo log files be stored on raw partitions ? Yes. 3. Can I have multiple Data files on a single raw partition ? No, you may only configure only one data file per raw partition. You can, of course, have multiple raw devices per disk. 4. Will splitting my datafiles onto different disk partitions guarantee faster file I/O ? No. Simply splitting datafiles between different disk partitions is not sufficient. You need to ensure that the datafiles are split across different partitions that are on disks or spindles.

Raw Devices and Oracle - 20 Common Questions and Answers Type: FAQ Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 24-APR-1996 Last Revision Date: 26-MAR-2000 Raw Devices and Oracle - 20 Common Questions and Answers

-------------------------------------------------------1. What is a raw device? A raw device, also known as a raw partition, is a disk partition that is not mounted and written to via the UNIX filesystem, but is accessed via a character-special device driver. It is up to the application how the data is written since there is no filesystem to do this on the application's behalf. 2. How can a raw device be recognised? In the '/dev' directory, there are essentially two type of files: block special and character special. Block special files are used when data is transferred to or from a device in fixed size amounts (blocks), whereas character special files are used when data is transferred in varying size amounts. Raw devices use character special files; a long listing of the '/dev' directory shows them with a 'c' at the leftmost position of the permissions field, e.g. crw-rw-rw1 root system 15, 0 Mar 12 09:45 rfd0

In addition, character special files usually have names beginning with an 'r', as shown in the above example. Some devices, principally disks, have both a block special device and a character special device associated with them; for the floppy diskette shown above, there is also a device brw-rw-rw1 root system 15, 0 Apr 16 15:42 /dev/fd0

So the presence of a 'c' in a device does NOT necessarily mean this is a raw device suitable for use by Oracle (or another application). Generally, a raw device needs to be created and set aside for Oracle (or whatever application is going to use it) when the UNIX system is set up - therefore, this needs to be done with close cooperation between the DBA and UNIX system administrator. Once a raw device is in use by Oracle, it must be owned by the oracle account, and may be identified in this way. 3. What are the benefits of raw devices? There can be a performance benefit from using raw devices, since a write to a raw device bypasses the UNIX buffer cache, the data is transferred directly from the Oracle buffer cache to the disk. This is not guaranteed, though. If there is no I/O bottleneck, raw devices will not help. The performance benefit if there is a bottleneck can vary between a few percent to something like 40%. Note that the overall amount of I/O is not reduced; it is just done more efficiently. Another lesser benefit of raw devices is that no filesystem overhead is incurred in terms of inode allocation and maintenance or free block allocation and maintenance. 4. How can I tell if I will benefit from using raw devices? There are two distinct parts to this: first, the Oracle database and application should be examined and tuned as necessary, using one or both

of the following: -Server Manager or SQLDBA "monitor fileio" -UTLBstat and UTLestat utilities (in $ORACLE_HOME/rdbms/admin) There are several strategies for improving performance with an existing disk arrangement, i.e. purely within Oracle. See [NOTE:16347.1] for details. After checking your Oracle database and application, the next stage is to identify UNIX-level I/O bottlenecks. This can be done using a UNIX utility such as 'sar' or 'vmstat'. See the relevant manual pages for details. If you identify that there is a UNIX-level problem with I/O, now is the time to start using raw devices. This may well require reorganisation of the entire UNIX system (assuming there are no spare partitions available). 5. Are there circumstances when raw devices have to be used? Yes. If you are using the Oracle Parallel Server, all data files, control files, and redo log files must be placed on raw partitions so they can be shared between nodes. This is a limitation with the UNIX operating system. Also, if you wish to use List I/O or Asynchronous I/O, some versions of UNIX require the data files and control files to be on raw devices for this to work. Consult your platform-specific documentation for details. 6. Can I use the entire raw partition for Oracle? No. You should specify a tablespace slightly smaller in size than the raw partition size, specifically at least two Oracle block sizes smaller. 7. Can I use the first partition of a disk for a raw device? This is not recommended. On older versions of UNIX, the first partition contained such information as the disk partition table or logical volume control information, which if overwritten could render the disk useless. More recent UNIX versions do not have this problem as disk management is done in a more sophisticated manner. Consult your operating system vendor for more details, but if in any doubt do not use the first partition. 8. Who should own the raw device? You will need to create the raw devices as root, but the ownership should be changed to the 'oracle' account afterwards. The group must also be changed to the 'dba' group (usually called dba). 9. How do I specify a raw device in Oracle commands? When using a raw device you need to specify the full pathname in single quotes, and use the REUSE parameter. e.g. if there are two raw devices, each 30Mb in size, and the database has a 4K block size, the relevant command would look like this:

create tablespace raw_tabspace datafile '/dev/raw1' size 30712K REUSE datafile '/dev/raw2' size 30712K REUSE 10. Does the Oracle block size have any relevance on a raw device? It is of less importance than for a UNIX file; the size of the Oracle block can be changed, but it must be a multiple of the physical block size as it is only possible to seek to physical block boundaries and hence write only in multiples of the physical block size. 11. How can I back up my database files if they are on raw devices? You cannot use utilities such as 'tar' or 'cpio', which expect a filesystem to be present. You must use the 'dd' command, as follows: dd if=/dev/raw1 of=/dev/rmt0 bs=16k See the UNIX man page on dd for further details. It is also possible to copy the raw device file (using dd) to a normal UNIX file, and then use a utility such as 'tar' or 'cpio', but this requires more disk space and has a greater administrative overhead. 12. Providing I am not using Parallel Server, can I use a mixture of raw partitions and filesystem files for my tablespace locations? Yes. The drawback is that this makes your backup strategy more complicated. 13. Should I store my redo log files on raw partitions? Redo logs are particularly suitable candidates for being located on raw partitions, as they are write-intensive and in addition are written to sequentially. If Parallel Server is being used, redo logs must be stored on raw partitions. 14. Can I use raw partitions for archive logs? No. Archive logs must be stored on a partition with a UNIX filesystem. 15. Can I have more than one data file on a raw partition? No. This means you should be careful when setting up the raw partition. Too small a size will necessitate reorganisation when you run out of space, whereas too large a size will waste any space the file does not use. 16. Should my raw partitions be on the same disk device? This is inadvisable, as there is likely to be contention. You should place raw devices on different disks, which should also be on different controllers. 17. Do I need to make my raw partitions all the same size? This is not essential, but it provides flexibility in the event of having to change the database configuration. 18. Do I need to change any UNIX kernel parameters if I decide to use raw

devices? No, but you may wish to reduce the size of the UNIX buffer cache if no other applications are using the machine. 19. What other UNIX-level changes could help to improve I/O performance? RAID and disk mirroring can be beneficial, depending on the application characteristics, especially whether it is read or write-intensive, or a mixture. 20. How can I gain further performance benefits, after considering all of the above? You will need to buy more disk drives and controllers for your system, to spread the I/O load between devices. What to and How to Relink in Oracle after an OS Upgrade Type: UPGRADE NOTE Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 18-OCT-1999 Last Revision Date: 01-MAY-2001 PURPOSE This article will point out which makefiles will need to be relinked for Oracle products after an operating system (OS) upgrade. SCOPE & APPLICATION DBA's, sysAdmin's or anyone responsible for upgrading the OS. The following makefiles need to be relinked with the following commands for version 7.3.X for the database after an OS upgrade as the user ORACLE: % make -f ins_network.mk install (Generally found in $ORACLE_HOME/network/lib) % make -f ins_agent.mk install (Generally found in $ORACLE_HOME/network/lib) % make -f ins_names.mk install (Generally found in $ORACLE_HOME/network/lib) % make -f ins_sqlplus.mk install (Generally found in $ORACLE_HOME/sqlplus/lib) % make -f ins_svrmgr.mk lstall % make -f ins_rdbms.mk install (Generally found in $ORACLE_HOME/svrmgr/lib) (Generally found in $ORACLE_HOME/rdbms/lib)

For the following Oracle products, the makefiles are generally found in $ORACLE_HOME/<product>/lib

If you are using Oracle Forms 4.5 relink the following % make -f ins_forms45.mk % make -f ins_forms45d.mk % make -f ins_forms45w.mk If you are using Oracle Forms 6.0 relink the following: % make -f ins_forms60.mk % make -f ins_forms60d.mk % make -f ins_forms60w.mk If you are using Oracle Reports 3.0 relink the following: % make -f ins_reports30d.mk References: =========== [NOTE:1074673.6] HOW TO RELINK EXECUTABLES ON UNIX FOR 8.1.5

HOW TO RELINK EXECUTABLES ON UNIX FOR 8.1.5 Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 30-JUL-1999 Last Revision Date: 25-APR-2001 PURPOSE This note explains how to relink your 8.1.5 Oracle executables on Unix. SCOPE & APPLICATION Instructional. RELATED DOCUMENTS [NOTE:74991.1] What to Relink in Oracle after an OS Upgrade How To Relink Executables for 8.1.5 on Unix: ============================================ (Please make sure you are logged on as the Oracle User and the products are shutdown first). You can relink ALL executables with the following command: % cd $ORACLE_HOME/bin % relink all OR

To relink individual products for 8.1.5, do the following: % cd $ORACLE_HOME/rdbms/lib % make -f ins_rdbms.mk install % cd $ORACLE_HOME/sqlplus/lib % make -f ins_sqlplus.mk install % cd $ORACLE_HOME/network/lib % make -f ins_net_server.mk install

<=== ( New command for 8.1.5 )

See [NOTE:74991.1], which discusses how to relink in Oracle 7. Note: ===== It is important to be in the correct directory to relink the specific executables. Then, follow the above commands to relink the executables for 8.1.5 on Unix. .

How to Relink Oracle Database Software on Unix Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 02-JAN-2001 Last Revision Date: 04-NOV-2001 PURPOSE ------Provide relinking instructions for Oracle Database software on Unix platforms. SCOPE & APPLICATION ------------------Anyone who maintains Oracle RDBMS software on a Unix platform. Relinking Oracle ================ Background: Applications for Unix are generally not distributed as complete executables. Oracle, like many application vendors who create products for Unix, distribute individual object files, library archives of object files, and some source files which then get relinked at the operating system level during installation to create usable executables. This guarantees a reliable integration with functions provided by the OS system libraries.

Relinking occurs automatically under these circumstances: - An Oracle product has been installed with an Oracle provided installer. - An Oracle patch set has been applied via an Oracle provided installer. Relinking Oracle manually is suggested under these circumstances: - An OS upgrade has occurred. - A change has been made to the OS system libraries. This can occur during the application of an OS patch. - A new install failed during the relinking phase. - Individual Oracle executables core dump during initial startup. - An individual Oracle patch has been applied (However, explicit relink instructions are usually either included in the README or integrated into the patch install script)

[Step 1] Log into the Unix system as the Oracle software owner ============================================================================== Typically this is the user 'oracle'. [STEP 2] Verify that your $ORACLE_HOME is set correctly: =============================================================================== For all Oracle Versions and Platforms, perform this basic environment check first: % cd $ORACLE_HOME % pwd ...Doing this will ensure that $ORACLE_HOME is set correctly in your current environment. [Step 3] Verify and/or Configure the Unix Environment for Proper Relinking: =============================================================================== For all Oracle Versions and Unix Platforms: The Platform specific environment variables LIBPATH, LD_LIBRARY_PATH, & SHLIB_PATH typically are already set to include system library locations like '/usr/lib'. In most cases, you need only check what they are set to first, then add the $ORACLE_HOME/lib directory to them where appropriate. i.e.: % setenv LD_LIBRARY_PATH ${ORACLE_HOME}/lib:${LD_LIBRARY_PATH} (see [NOTE:131207.1] How to Set Unix Environment Variables for help with setting Unix environment variables) If on AIX with: -------------Oracle 7.3.x: - Set LIBPATH to include $ORACLE_HOME/lib Oracle 8.0.x: - Set LIBPATH to include $ORACLE_HOME/lib - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib and $ORACLE_HOME/network/lib (Required when using Oracle products that

use Java) - Set LINK_CNTRL to L_PTHREADS_D7 if using AIX 4.3. ('oslevel' verifies OS version) Oracle 8.1.x or 9.0.x: - For 8.1.5, set LINK_CNTRL to L_PTHREADS_D7 - If not 8.1.5, ensure that LINK_CNTRL is not set - Set LIBPATH to include $ORACLE_HOME/lib - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib and $ORACLE_HOME/network/lib(Required when using Oracle products that use Java) If on DATA GENERAL AVIION (DG) with: ----------------------------------Oracle 7.3.* or 8.0.x: - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib - ensure TARGET_BINARY_INTERFACE is unset Oracle 8.1.x: - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib:$ORACLE_HOME/JRE/lib/PentiumPro/native_threads If on HP-UX with: ---------------Oracle 7.3.x, 8.0.x, 8.1.x or 9.0.x: - Set SHLIB_PATH to include $ORACLE_HOME/lib - If using 64bit Oracle, SHLIB_PATH should also include $ORACLE_HOME/lib64. (See [NOTE:109621.1] HP/UX LD_LIBRARY_PATH and SHLIB_PATH) - ensure LPATH is unset If on NCR with: -------------Oracle 7.3.x, 8.0.x or 8.1.x: - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib:/usr/ccs/lib If on SCO Unixware with: ----------------------Oracle 7.3.x or 8.0.x: - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib Oracle 8.1.x: - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib:$ORACLE_HOME/JRE/lib/x86at/native_threads If on SGI with: -------------32bit Oracle 7.3.x or 8.0.x: - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib - Set SGI_ABI to -32 64bit Oracle 8.0.x or 8.1.x (8i is only available in 64bit): - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib

- Set SGI_ABI to -64 - If one does not already exist, create the file compiler.defaults and set the COMPILER_DEFAULTS_PATH variable: In the Oracle software owner's $HOME directory, create a file called 'compiler.defaults': % cd $HOME % echo "-DEFAULT:abi=64:isa=mips3:proc=r10k" > compiler.defaults Then set the environment variable COMPILER_DEFAULTS_PATH to point to the $HOME directory. % setenv COMPILER_DEFAULTS_PATH $HOME If this is not set, relinking will fail because the compiler defaults to MIPS4 objects although Oracle requires MIPS3. - Set LD_LIBRARY64_PATH to include the $ORACLE_HOME/lib and the $ORACLE_HOME/javavm/admin directories. - Set LD_LIBRARYN32_PATH to include the $ORACLE_HOME/lib32 directory. NOTE: LD_LIBRARY64_PATH & LD_LIBRARYN32_PATH must be undefined when installing software with Oracle Universal Installer. If on SOLARIS (Sparc or Intel) with: -----------------------------------Oracle 7.3.x, 8.0.x, 8.1.x or 9.0.x: - Ensure that /usr/ccs/bin is before /usr/ucb in $PATH % which ld ....should return '/usr/ccs/bin/ld' - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib - If using 64bit Oracle, LD_LIBRARY_PATH should also include $ORACLE_HOME/lib64. If on Digital/Tru64, IBM/Sequent PTX, Linux or any other Unix Platform not mentioned above with: -----------------------------------------------------------------------------Oracle 7.3.x, 8.0.x, 8.1.x or 9.0.x: - Set LD_LIBRARY_PATH to include $ORACLE_HOME/lib [Step 4] For all Oracle Versions and Unix Platforms: =============================================================================== Verify that you performed Step 2 correctly: % env|pg ....make sure that you see the correct absolute path for $ORACLE_HOME in the variable definitions. [Step 5] Run the OS Commands to Relink Oracle: =============================================================================== Important Note: Before relinking Oracle, shut down both the database and the listener. Important Note: The following commands will output a lot of text to your session window. For all Unix platforms:

Oracle 7.3.x -----------For executables:

oracle, exp, imp, sqlldr, tkprof

% cd $ORACLE_HOME/rdbms/lib % make -f ins_rdbms.mk install For executables: svrmgrl, svrmgrm <- linstall is for svrmgrl, minstall is for svrmgrm

% cd $ORACLE_HOME/svrmgr/lib % make -f ins_svrmgr.mk linstall minstall For executables: sqlplus

% cd $ORACLE_HOME/sqlplus/lib % make -f ins_sqlplus.mk install For executables: dbsnmp, oemevent, oratclsh

% cd $ORACLE_HOME/network/lib % make -f ins_agent.mk install For executables: names, namesctl

% cd $ORACLE_HOME/network/lib % make -f ins_names.mk install For executables: tnslsnr, lsnrctl, tnsping, csmnl, trceval, trcroute

% cd $ORACLE_HOME/network/lib % make -f ins_network.mk install Oracle 8.0.x -----------For executables:

oracle, exp, imp, sqlldr, tkprof, mig, dbv, orapwd, rman, svrmgrl, ogms, ogmsctl

% cd $ORACLE_HOME/rdbms/lib % make -f ins_rdbms.mk install For executables: sqlplus

% cd $ORACLE_HOME/sqlplus/lib % make -f ins_sqlplus.mk install For executables: dbsnmp, oemevent, oratclsh, libosm.so

% cd $ORACLE_HOME/network/lib % make -f ins_oemagent.mk install For executables: tnslsnr, lsnrctl, namesctl, names, osslogin, trcasst, trcroute

% cd $ORACLE_HOME/network/lib % make -f ins_network.mk install

Oracle 8.1.x or 9.0.x -----------------------*** NEW IN 8i! *** A 'relink' script is provided in the $ORACLE_HOME/bin directory. % cd $ORACLE_HOME/bin % relink ...this will display all of the command's options. usage: relink <parameter> accepted values for parameter: all, oracle, network, client, client_sharedlib, interMedia, precomp, utilities, oemagent You can relink ALL executables with the following command: % relink all ([BUG:1337908]: If on Solaris w/ Oracle 8.1.6, also do: 'relink utilities') -orSince the 'relink' command merely calls the traditional 'make' commands, you still have the option of running the 'make' commands independently: For executables: oracle, exp, imp, sqlldr, tkprof, mig, dbv, orapwd, rman, svrmgrl, ogms, ogmsctl

% cd $ORACLE_HOME/rdbms/lib % make -f ins_rdbms.mk install For executables: sqlplus

% cd $ORACLE_HOME/sqlplus/lib % make -f ins_sqlplus.mk install For executables: dbsnmp, oemevent, oratclsh

% cd $ORACLE_HOME/network/lib % make -f ins_oemagent.mk install For executables: names, namesctl

% cd $ORACLE_HOME/network/lib % make -f ins_names.mk install For executables: osslogin, trcasst, trcroute, onrsd, tnsping

% cd $ORACLE_HOME/network/lib % make -f ins_net_client.mk install For executables: tnslsnr, lsnrctl

% cd $ORACLE_HOME/network/lib % make -f ins_net_server.mk install How to Tell if Relinking Was Successful: =============================================================================== If relinking was successful, the make command will eventually return to the OS prompt without an error. There will NOT be a 'Relinking Successful' type message.

If You Receive an Error Message During Relinking: =============================================================================== Confirm that the message you received is an actual fatal error and not a warning. Relinking errors usually terminate the relinking process and contain verbage similar to the following: 'Fatal error', 'Ld: fatal', 'Exit Code 1' While warnings will look similar to: 'ld: warning: option -YP appears more than once, first setting taken' and can most often be ignored. If you receive an error that terminates the relinking process, your first step should be to extract the relevant information about the error from the make output: This can be broken down into three basic steps: 1. Identify the OS utility that is returning the error. 'ld', 'make', 'cc', 'mv', 'cp', 'ar' are common sources. 2. Identify the type of error: 'Permission Denied', 'Undefined Symbol', 'File Not Found' are common types. 3. Identify the files or symbols involved. Using the information from above as keywords, search Oracle's Metalink repository (MetaLink.oracle.com) for previous occurrences of the same error. If no previous occurances are found or a solution is not provided, generate an iTAR that includes the complete error text. Help setting environment variables. ============================================================================== See [NOTE:131207.1] How to Set Unix Environment Variables for help with setting Unix environment variables. Relinking with Orainst: =============================================================================== For Oracle7 & Oracle8 only, the following document illustrates how to relink with the 'orainst' utility: [NOTE:1032747.6] HOW TO RELINK ORACLE USING THE 7.3.X INSTALLER While 'orainst' will run the same commands as [Step 4], performing [Step 4] manually from a Unix shell is the preferred approach. RELATED DOCUMENTS ----------------[NOTE:131207.1] [NOTE:109621.1] [NOTE:1032747.6] [BUG:1337908] . How to Set Unix Environment Variables HP/UX: LD_LIBRARY_PATH and SHLIB_PATH HOW TO RELINK ORACLE USING THE 7.3.X INSTALLER THE $ORACLE_HOME/BIN/RELINK SCRIPT DOES NOT RELINK EXP, IMP SQLLOADER

PURPOSE This entry is an introduction to the Unix file system and covers the following topics: * File System Structure

* * * *

Ownership of Files Permissions of Files Showing Ownership and Permissions Changing Ownership and Permissions

SCOPE & APPLICATION Instructional.

Overview of the Unix File System: ================================= File System Structure --------------------The UNIX file system is a hierarchical structure made up of files and special files called directories. Files - - UNIX files contain information: text, data, executable programs, etc. Directories - - - - - Directories provide a structure for organizing files. Directories located under other directories are called subdirectories. Files are grouped under directories beginning with the "root" directory in a branching structure. root/ ---/ / \ / \

/ / bin/ --cd chmod chown chgrp ls pwd

home/ ($ORACLE_HOME) ---/\ / \ bin/ rdbms/ -----/|\ / | \ / | \ / | \ / | \ admin/ | lib/ ------doc/ ---

\ \

Ownership --------Each file and directory has three associated ownership statuses. Each ownership type has a permissions status assigned to it. These three ownership types are associated with every file: o user the owner of the file or directory

o o

group other

members of the group associated with the file or directory everyone else (also called "world", or "public")

The default "user" is the creator of the file or directory. The default "group" is the group the file creator belongs to. Other consists of everyone else on the system. Permissions ----------Permissions determine the kind of access users are granted to a file. three kinds of permissions are: o r read allows reading of a file o w write allows writing to a file o x execute allows executing a file or searching a directory These permissions are set on or off for each of the three ownership types: "user", "group", and "world". Setuid Permissions - - - - - - - - In addition there are "setuid" permissions: o s set "user" ID on execution o s set "group" ID on execution Setting the "uid" bit of an executable file causes it to be run as if its owner was running it. Setting the "gid" bit of an executable file causes it to be run as if a member of its group was running it. Setting the "sticky" bit of an executable file causes its process to be pinned in memory, preventing the process from being swapped out. This improves the performance of the executable. Setting the "sticky" bit a directory, however, performs a different function. If the "sticky" bit of a directory is set, then the files placed in that directory can only be removed by the file's true owner, no matter what the read, write, or execute permissions of the file may be set to. Typically, this this is done for the /temp directory where many users will temporarily store their files.

The

Showing Ownership and Permissions --------------------------------Use the UNIX command "ls" to display ownership and permission information. For example, to see the ownership and permissions of the "oracle" executable type: % ls -l $ORACLE_HOME/bin/oracle -rwsr-s--x 1 usupport dba 7100499 Mar 15 09:30 oracle Note that when the "setuid" or "setgid" bits of an executable are set, an "s" replaces the "x" for user or group permissions. Changing Ownership and Permissions ---------------------------------The user, group, and permissions for any file can be changed

using the following UNIX commands. o user with "chown" o group with "chgrp" o permissions with "chmod"

Change the:

The command "man <command name>" on any UNIX system will display a manual page for that command. NOTE: Permissions are constructed from the OR of any of the following modes: 0444 0222 0111 4000 20#0 Allow read by owner, group, other Allow write by owner, group, other Allow execute (search in directory) by owner, group, other Set user ID on execution Set group ID on execution if # is 7, 5, 3, or 1 Enable mandatory locking if # is 6, 4, 2, or 0 This bit is ignored if the file is a directory

For example: rwx-----0400 0200 + 0100 ---700 rwsr-x--x 0440 0200 0111 + 4000 ---4751 read by owner, group write by owner execute (search in directory) by owner, group, and other Set user ID on execution read, write, and execute (search) by owner read, and execute (search) by group and other set user ID on execution read by owner write by owner execute (search in directory) by owner read, write, and execute (search) by owner

References: =========== [NOTE:1011995.6] COMMON ORACLE PERMISSION PROBLEMS ON UNIX

How to Set Unix Environment Variables Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 29-DEC-2000 Last Revision Date: 01-MAY-2001

PURPOSE Demonstrates how to set environment variables on Unix. How to Set Unix Environment Variables: ====================================== Setting Unix Environment Variables: ----------------------------------The command syntax for setting environment variables varies depending on which Unix shell you are using. As a result, you first need to determine which type of Unix Shell you have logged into: Bourne (sh), Korn (ksh) or C shell (csh). There are other shell derivatives available, but they generally employ the command syntax of one of the three shells mentioned above. For instance, the Bash Shell utilizes Bourne syntax, while Tcsh utilizes C Shell syntax. Please Note: While the Bourne shell (sh) and Korn shell (ksh) typically use either a '#' or a '$' as a command line prompt, and the C Shell (csh) typically uses a '%' as a command line prompt, a '>' is used in the following examples to represent the command line prompt. How to determine which Unix shell you are using: > env | grep SHELL -or> echo $SHELL -or> ps -f ....Will provide a full listing of processes associated with the current terminal, one of which will be the shell process.

-or> setenv ....On a C shell this will return the current environment, while other shells will return an error.

Please Note: The following examples use 'ORACLE_HOME' as the variable name. BOURNE SHELL(sh): ----------------To set environment variables within a Bourne Shell (sh), the variable must be initialized locally, then exported globally: > ORACLE_HOME=/u01/app/oracle/product/8.1.7 ...defines ORACLE_HOME locally to the shell > export ORACLE_HOME ...makes it globally available to other processes started from this shell

To have a variable set automatically when you log into the Bourne Shell of your Unix server: Add the above lines (minus the '>' prompt) to the hidden '.profile' file in your $HOME directory. If you make changes to your '.profile' and want those changes propagated to your current running environment (without having to log out, then back in): > cd $HOME > . .profile To unset environment variables within a Bourne Shell (sh): > unset ORACLE_HOME To check what an environment variable is set to: > env | grep ORACLE_HOME KORN SHELL(ksh): ---------------To set environment variables within a Korn Shell (ksh), you can use the Bourne syntax show above, or use the streamlined Korn Shell syntax: > export ORACLE_HOME=/u01/app/oracle/product/8.1.7 To have a variable set automatically when you log into the Korn Shell of your Unix server: Add the above lines (minus the '>' prompt) to the hidden '.profile' file in your $HOME directory. If you make changes to your '.profile' and want those changes propagated to your current running environment (without having to log out, then back in): > cd $HOME > . .profile To unset environment variables within a Korn Shell (ksh), use the same syntax as you would in a Bourne Shell (sh): > unset ORACLE_HOME To check what an environment variable is set to: > env | grep ORACLE_HOME C SHELL(csh): ------------To set environment variables within a C Shell (csh): > setenv ORACLE_HOME /u01/app/oracle/product/8.1.7

To have a variable set automatically when you log into the C Shell of your Unix server: Add the above lines (minus the '>' prompt) to the hidden '.login' file in your $HOME directory. If you make changes to your '.login' and want those changes propagated to your current running environment (without having to log out, then back in): > cd $HOME > source .login To unset environment variables within a C Shell (csh): > unsetenv ORACLE_HOME To check what an environment variable is set to: > env | grep ORACLE_HOME If You Encounter Errors Using the Above Commands: ------------------------------------------------Check the man page for the Unix shell you are using: > man sh Since Unix shell implementations vary from platform to platform, and Unix shells are highly configurable, it's possible that the information supplied above is not correct for the Unix platform you are on. Please check with your system administrator if you have any further questions or problems setting environment variables. . HOW DO YOU CREATE TWO SEPARATE ORACLE_HOMES ON A SINGLE MACHINE? Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 07-NOV-1996 Last Revision Date: 26-APR-2001 PURPOSE This document describes how to create two separate ORACLE_HOMEs on a single machine and some of the concerns that need to be addressed if this is done. SCOPE & APPLICATION Instructional. How to Create Two Separate ORACLE_HOMEs on a Single Machine: ============================================================ One thing to keep in mind when creating two ORACLE_HOMEs on one machine is that you can only deal with one ORACLE_HOME at a time.

Whatever ORACLE_HOME your environment variable is set to will be the ORACLE_HOME that you are working with. To check this use: % echo $ORACLE_HOME For this example, it is assumed that you already have one oracle instance installed and running. We will also assume that the first ORACLE_HOME and its instance are set up with the following environment variables: ORACLE_HOME = /u02/app/oracle/product/7.2.3 ORACLE_SID = db1 Do the following steps to create a second ORACLE_HOME: 1. Log in as the oracle user (the same oracle user used for the first ORACLE_HOME). 2. Set the ORACLE_HOME environment variable to point to the new directory structure for the new installation For C Shell: -----------% setenv ORACLE_HOME /u02/app/oracle/product/7.3.2 For Bourne or Korn Shell: ------------------------$ ORACLE_HOME=/u02/app/oracle/product/7.3.2 ; export ORACLE_HOME 3. Set a new ORACLE_SID for the new instance For C Shell: -----------% setenv ORACLE_SID db2 For Bourne or Korn Shell: ------------------------$ ORACLE_SID=db2 ; export ORACLE_SID 4. Follow the installation instructions found in the Oracle7 Installation and Configuration Guide for your platform and release version. See the Chapter entitled "Installation Tasks". NOTES: 1. Each instance that you create is specific to that ORACLE_HOME. For example, you cannot start an instance from a 7.2.3 ORACLE_HOME and shut it down from a 7.3.2 ORACLE_HOME. 2. You can add the second ORACLE_HOME while the

instance(s) in the first ORACLE_HOME are running. Additional Considerations: ========================== Listed below are two additional items which may or may not be needed when you install a second ORACLE_HOME. These questions are specific to your operating system and can only be answered by looking at how your system is set up. 1. Do you have enough space on the device that you are installing the new ORACLE_HOME on? 2. Do you need to tune the system parameters again, "SHMMAX", "SEMMNS", etc., see [NOTE:15566.1]. References: =========== [NOTE:15566.1] Unix Semaphores and Shared Memory Explained

ORACLE 64-bit ADVANTAGES Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 01-MAY-2000 Last Revision Date: 31-MAY-2001 PURPOSE This note discusses the advantages of the 64-bit Oracle architecture. SCOPE & APPLICATION Informational. Oracle 64-bit Advantages: ========================= The key market for 64-bit databases are high-performance systems for applications that have a very large working data set, and thus can make good use of the improved memory addressing capabilities of a 64-bit architecture. Using the 64-bit architecture will also improve scalability and the potential for faster performance offered by the 64-bit machine. The current 64-bit Oracle release takes full advantage of the latest HP 64 bit PA-RISC processor technology. A true 64-bit computing environment has the capability to process 64-bit data, instructions and addressing. The HP system is a true 64-bit environment, with 64-bit processor, 64-bit memory addressing capabilities, 64-bit Direct Memory Access (DMA) and a 64-bit kernel. In a 32-bit system, addressing is limited to 2(32) 32-bit words or 4GB of memory. With 64 bits we can address 2(64) 64-bit words or 18 billion GB (Exabytes) of memory, representing a huge increase in the amount of memory that can be addressed. 64-bit processors achieve better performance by carrying out 64-bit integer and floating point integer

arithmetic operations. One important advantage of 64-bit memory addressing is the improved scalability of the machine. Applications can store more data in the larger amount of memory available and reduce considerably calls to the I/O subsystem. A large SGA is especially useful for OLTP and applications with a large working data set. More data can be held in memory, reducing I/O to disks and thereby increasing throughput. In the case of the 32-bit Oracle database, the amount of System Global Area (SGA), was limited to 1.75GB on 32 bit HP machines. The SGA for 64-bit Oracle can be grown to occupy all possible physical memory on a 64-bit system. On HP's largest 64-bit system the physical memory could be as large as 32GB. A very large memory also allows a greater number of in-memory processes. The in-memory nature alone is extremely fast. Memory is accessed about 10,000 times faster than disk drives. For large applications that swap to disk frequently, simply moving to a 64-bit operating environment with generous amount of physical memory would increase the performance drastically. Such performance improvement is critical in an e-commerce environment, where there is a large number of connections to huge databases. With 32-bit HPUX, the limit on the size of the file was 4GB. This restriction is removed in the 64-bit environment. The Decision to use 64-bit Architecture: ======================================== Before moving to a 64-bit architecture, the Oracle customer should perform a thorough needs analysis. Here are some issues to consider before moving to a 64-bit architecture: 64-bit computing may not be required everywhere in an environment. For example, in a three-tier architecture, the back-end database server may be 64-bit, but application servers and clients can remain 32-bit. Applications that do not require 64-bit features should remain 32-bit applications. Scalability on 64-bit machines does not plateau as quickly as 32-bit systems. 64-bit machines are therefore an ideal choice for applications that require a large amount of computing power or expect significant future growth and need the scalability of 64-bit addressability. 32-bit databases run on systems with a small number of 32-bit CPUs (4-6) may see some degradation in performance if moved to 64-bit systems also with a small number of 64-bit CPUs. Applications will achieve the benefits of improved scalability on-64 bit machine only if they are memory intensive. 64-bit applications have bigger data structures because memory has to be addressed with a larger number of bits. Larger data structures translate into addtional memory requirements per process. 64-bit systems work more effectively when running with a large number of CPUs. Oracle produces both 32-bit and 64-bit versions of the Oracle database for HP-UX 11.x. The 32 and 64-bit versions are built from identical Oracle code. The only difference is the compile and link time flags. Therefore all features found in a particular version of Oracle are present in both the 32-bit and 64-bit versions. The 64-bit version of the Oracle binary supports network connections from

both 64-bit and 32-bit clients. Running 32-bit Binaries on a 64-bit System: =========================================== When running 32-bit Oracle binaries on a 64-bit machine, you will have to set SHMMAX to 1GB exactly. This is an important requirement when you want to extend the SGA beyond the 1GB. Search Words: ============= 64 bit 32 bit performance features improvement . UNIX: How to Create An 8i Database Manually in the UNIX Environment Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 14-AUG-2000 Last Revision Date: 27-APR-2001 Purpose ======= How to create a 8i Database manually in Unix. Oracle 8i provides a GUI tool called 'dbassist' which will create a database. However, this note describes the steps involved in manually creating the database on Unix platforms using Server Manager commands. Assumptions =========== The installation of oracle8i has completed successfully. The install was OFA compliant in that the environment variable ORACLE_BASE was used. The following directories exist: $ORACLE_BASE/admin/TEST/ $ORACLE_BASE/admin/TEST/pfile $ORACLE_HOME/install $ORACLE_HOME/oradata/TEST The new instance to be created is called TEST. Steps involved ============== 1. Setup the Environment -----------------------Setup the environment variables required : ORACLE_SID ORACLE_HOME - to define the database name you wish to create - set to full pathname of the Oracle system home directory

ORACLE_BASE - if your install used OFA (Oracle Flexible Architecture) PATH - needs to include $ORACLE_HOME/bin To set your Unix environment use the following commands depending on the Unix shell you are using: sh - ORACLE_SID TEST ; export ORACLE_SID csh - setenv ORACLE_SID TEST ksh - export ORACLE_SID=TEST Make sure the values are set: env | grep ORACLE 2. Create the init.ora ---------------------Create a file called initTEST.ora in $ORACLE_BASE/admin/TEST/pfile directory # This is an example of initTEST.ora # ----------------------------------db_name = TEST instance_name = TEST service_names = TEST control_files = ("$ORACLE_HOME/oradata/TEST/control01.ctl", "$ORACLE_HOME/oradata/TEST/control02.ctl") db_block_buffers = 1000 shared_pool_size = 4194304 log_checkpoint_interval = 10000 log_checkpoint_timeout = 1800 processes = 50 log_buffer = 163840 #SMALL #INITIAL

# audit_trail = false # if you want auditing # timed_statistics = false # if you want timed statistics # max_dump_file_size = 10000 # limit trace file size to 5M each # Uncommenting the line below will cause automatic archiving if archiving # has been enabled using ALTER DATABASE ARCHIVELOG #log_archive_start = true #log_archive_dest_1 = "location=$ORACLE_HOME/admin/TEST/arch" #log_archive_format = %t_%s.dbf # # # # If using private rollback segments, place lines of the following form in each of your instance-specific init.ora files: Rollback Parameter Must be Uncomment after Create Database rollback_segments = (r01, r02, r03, r04)

# Global Naming -- enforce that a dblink has same name as the db it connects to # global_names = false # # # # # # # Uncomment the following line if you wish to enable the Oracle Trace product to trace server activity. This enables scheduling of server collections from the Oracle Enterprise Manager Console. Also, if the oracle_trace_collection_name parameter is non-null, every session will write to the named collection, as well as enabling you to schedule future collections from the console. oracle_trace_enable = true

# define directories to store trace and alert files background_dump_dest = $ORACLE_BASE/admin/TEST/bdump core_dump_dest = $ORACLE_BASE/admin/TEST/cdump user_dump_dest = $ORACLE_BASE/admin/TEST/udump db_block_size = 2048 #SMALL

# remote_login_passwordfile = exclusive os_authent_prefix = "" compatible = "8.1.0" 3. Edit the following Scripts to create the database ---------------------------------------------------# # # # # # This is an example of TEST_1.sh which is used to Create the Database. Replace <ORACLE_HOME> with the full path of your ORACLE_HOME TEST_1.sh file --------------

#!/bin/sh ORACLE_SID=TEST export ORACLE_SID $ORACLE_HOME/bin/svrmgrl << EOF spool $ORACLE_HOME/install/TEST_1.log connect internal startup nomount pfile = $ORACLE_BASE/admin/TEST/pfile/initTEST.ora CREATE DATABASE "TEST" maxdatafiles 254 maxinstances 8 maxlogfiles 32 character set US7ASCII national character set US7ASCII DATAFILE '<ORACLE_HOME>/oradata/TEST/system01.dbf' SIZE 55M logfile '<ORACLE_HOME>/oradata/TEST/redo01.log' SIZE 2M, '<ORACLE_HOME>/oradata/TEST/redo02.log' SIZE 2M, '<ORACLE_HOME>/oradata/TEST/redo03.log' SIZE 2M disconnect spool off exit

EOF # # # # # # # -------------This is an example of TEST_2.sh which is used to Create Tablespaces and Rollback Segments TEST_2.sh file: ---------------

#!/bin/sh ORACLE_SID=TEST export ORACLE_SID $ORACLE_HOME/bin/svrmgrl << EOF spool $ORACLE_HOME/install/TEST_2.log connect internal REM ***** Creating Catalog's Views and Synonyms ***** @$ORACLE_HOME/rdbms/admin/catalog.sql; REM **** Creating System rollback segments **************** CREATE ROLLBACK SEGMENT r0 TABLESPACE SYSTEM STORAGE (INITIAL 32K NEXT 64K MINEXTENTS 10 MAXEXTENTS 512); ALTER ROLLBACK SEGMENT r0 ONLINE; REM ************ TABLESPACE FOR OEM_REPOSITORY *************** CREATE TABLESPACE OEM_REPOSITORY DATAFILE '$ORACLE_HOME/oradata/TEST/oemrep01.dbf' SIZE 5M REUSE AUTOEXTEND ON NEXT 5M MAXSIZE 20M MINIMUM EXTENT 128K DEFAULT STORAGE ( INITIAL 128K NEXT 128K MINEXTENTS 1 MAXEXTENTS 4096 PCTINCREASE 0); REM ************** TABLESPACE FOR ROLLBACK ***************** CREATE TABLESPACE RBS DATAFILE '$ORACLE_HOME/oradata/TEST/rbs01.dbf' SIZE 12M REUSE MINIMUM EXTENT 128K DEFAULT STORAGE ( INITIAL 128K NEXT 128K MINEXTENTS 2 MAXEXTENTS 4096 PCTINCREASE 0); REM ************** TABLESPACE FOR TEMPORARY ***************** CREATE TABLESPACE TEMP DATAFILE '$ORACLE_HOME/oradata/TEST/temp01.dbf' SIZE 5M REUSE MINIMUM EXTENT 256K DEFAULT STORAGE ( INITIAL 256K NEXT 256K MINEXTENTS 1 MAXEXTENTS 4096 PCTINCREASE 0) TEMPORARY; REM ************** TABLESPACE FOR USER ********************* CREATE TABLESPACE USERS DATAFILE '$ORACLE_HOME/oradata/TEST/users01.dbf' SIZE 10M REUSE MINIMUM EXTENT 50K DEFAULT STORAGE ( INITIAL 50K NEXT 50K MINEXTENTS 1 MAXEXTENTS 4096 PCTINCREASE 0); REM ************** TABLESPACE FOR INDEX ********************* CREATE TABLESPACE INDX DATAFILE '$ORACLE_HOME/oradata/TEST/indx01.dbf' SIZE

10M REUSE MINIMUM EXTENT 50K DEFAULT STORAGE ( INITIAL 50K NEXT 50K MINEXTENTS 1 MAXEXTENTS 4096 PCTINCREASE 0); REM **** Creating four rollback segments **************** CREATE ROLLBACK SEGMENT r01 TABLESPACE RBS; CREATE ROLLBACK SEGMENT r02 TABLESPACE RBS; CREATE ROLLBACK SEGMENT r03 TABLESPACE RBS; CREATE ROLLBACK SEGMENT r04 TABLESPACE RBS; ALTER ROLLBACK SEGMENT r01 ONLINE; ALTER ROLLBACK SEGMENT r02 ONLINE; ALTER ROLLBACK SEGMENT r03 ONLINE; ALTER ROLLBACK SEGMENT r04 ONLINE; ALTER ROLLBACK SEGMENT r0 OFFLINE; DROP ROLLBACK SEGMENT r0; REM **** SYS and SYSTEM users **************** alter user sys temporary tablespace TEMP; alter user system temporary tablespace TEMP; disconnect spool off exit EOF # # # # # # --------------This is an example of TEST_3.sh Which is used to Create Catalog Views and Synonyms TEST-3.sh file: ---------------

#!/bin/sh ORACLE_SID=TEST export ORACLE_SID $ORACLE_HOME/bin/svrmgrl << EOF spool $ORACLE_HOME/install/TEST_3.log connect internal @$ORACLE_HOME/rdbms/admin/catproc.sql @$ORACLE_HOME/rdbms/admin/caths.sql @$ORACLE_HOME/rdbms/admin/otrcsvr.sql REM***** Creating Scott user and Demo Tables ***** @$ORACLE_HOME/rdbms/admin/utlsampl.sql connect system/manager REM ***** Creating Product Profiles Tables ***** @$ORACLE_HOME/sqlplus/admin/pupbld.sql disconnect spool off exit EOF 4. Execute the sample scripts -----------------------------

sh TEST_1.sh sh TEST_2.sh sh TEST_3.sh On completion of each script check the logs created in $ORACLE_HOME/install before proceeding to the next script. The database is now created & ready to use. 5. Edit the initTEST.ora -----------------------Uncomment the rollback_segments line so that when the database is shutdown and started all the rollback segments will automatically be brought online. 6. Edit the ORATAB file ----------------------Add an entry to the oratab file for the new instance. NOTES ----In this example all the redo logs, control files and datafiles are created in the same file system, it is strongly recommended that these files are spread across different file systems. Make the necessary changes to the init<sid>.ora to change the instance_name, dbname, etc. Change the sizes of the datafiles to fit your system setup and requirements. References ---------Oracle8i Server Administrator's Guide Release 8.1.5 ORACLE ENVIRONMENT VARIABLES IN UNIX Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 31-MAY-1994 Last Revision Date: 26-MAR-2000 Document ID: Title: Creation Date: Last Revision Date: Revision Number: Product: Product Version: Platform: Information Type: Impact: Abstract: 103795.387 Oracle Environment Variables on Unix 1 August 1993 24 November 1998 3 RDBMS 6.x, 7.0.x, 7.1.x UNIX ADVISORY LOW This document describes all known Oracle environment variables on any UNIX machine as well as Unix

environment variables that affect Oracle. Keywords: ENVIRONMENT;VARIABLES;UNIX;PRINTENV _______________________________________________________________________ Oracle Environment Variables on Unix The document describes each Oracle environment variable with its name, it use, general sample values, and specific examples in the following manner: ENVIRONMENT VARIABLE NAME Use: What is this environment variable? General: What are some sample values? Example: Specific example In addition, UNIX environment variables that affect Oracle are briefly described. PLEASE CONSULT YOUR "INSTALLATION AND CONFIGURATION GUIDE" FOR EACH PRODUCT'S FULL DESCRIPTION AND LIST OF VARIABLES. General Notes ============= 1. $O_H == $ORACLE_HOME 2. $O_S == $ORACLE_SID 3. An environment variable followed by a "*" applies to Oracle7. 4. An environment variable followed by a "#" is used during database creation for Oracle Version 6 only. 5. For more information, see your IUG, ICG, or the "ORACLE7 Server for UNIX Administrator's Reference Guide" 6. Note that not all UNIX platforms use all of these environment variables. Oracle Environment Variables ============================ APIPATH* Use: version 7.0.12 only: directory containing Tool Kit II *.res files General: Example: $O_H/orainst:$O_H/tk2/admin

BOOK_LOCALPREFERENCE* Use: Oracle*Book env var General: $ORACLE_HOME/book/admin Example: $ORACLE_HOME/book/admin BOOK_RESOURCE* Use: General: Example: BOOK_GLOBALPREFERENCE* Use: General: Example: directory for Oracle*Book resource files $O_H/book/admin/resource/US, $O_H/book/admin/resource/JA $ORACLE_HOME/book/admin/resource/US Oracle*Book env var $ORACLE_HOME/book/admin $ORACLE_HOME/book/admin

BOOK_HELP*

Use: General: Example: Use: General: Example: Use: General: Example: Use: General: Example:

directory for Oracle*Book help files $ORACLE_HOME/book/admin/help/US $ORACLE_HOME/book/admin/help/US Command to plot a file from CASE*Designer Actual print command or name of a script lpr -Pplot Command to print postscript file from CASE*Designer Actual print command or name of a script lpr -P Tool Kit I terminal type for CASE file:device (file.r from $O_H/dict50/admin/etc) and (device from Oraterm) case_hpx:xterm to print CASE*Designer screen prints to line or PostScript printers printer name lw Tool Kit I GUI window type for CASE xterm, hpterm, aixterm, dxterm, etc. xterm home directory of CASE Generator $O_H/cgen20 Debug for TK2 tools 0,1,? 1 initial database file name $O_H/dbs/dbs$O_S.dbf will hard code db file (you will not be able to move $O_H); setting to dbs$O_S.dbf will allow you to move $O_H dbsoracle.dbf initial database file size numberK, numberM 5000K or 5M minimum, up to partition size devicename for TK2 tools (CDE tools) vt220, vt100, hp, sun (see $ORACLE_HOME/tk2/admin/terminal) hp directory containing Forms 3.0 (TK I) resource files $ORACLE_HOME/forms30/admin/resource file to record the install procedure $O_H/install/install.log $O_H/install/install.log

CASE_HP_CMD

CASE_PS_CMD

CASE_RESOURCE

CASE_SDPRINT

Use: General: Example:

CASE_XTERM

Use: General: Example: Use: General: Example: Use: General: Example: Use: General:

CGEN_HOME

DEBUG_SLFIND*

DBS_FILE#

Example: DBS_SIZE# Use: General: Example: Use: General: Example: FORMS30PATH Use: General: Example: LOG Use: General: Example:

FORMS_DEVICE*

LOG_FILE1#

Use: General:

Example: LOG_FILE2# Use: General:

first redo log file name $O_H/dbs/log1$O_S.dbf will hard code log file (you will not be able to move $O_H); setting to log1$O_S.dbf will allow you to move $O_H log1oracle.dbf first redo log file name $O_H/dbs/log2$O_S.dbf will hard code log file (you will not be able to move $O_H); setting to log2$O_S.dbf will allow you to move $O_H log2oracle.dbf initial log file size numberK, numberM no minimum, 500K default, no maximum (tune to your database needs) directory containing Menu 5.0 (TK I) resource files $ORACLE_HOME/menu5/admin/resource ? $ORACLE_HOME/mm/admin/resource/US $ORACLE_HOME/mm/admin/resource/US National Language Support language, territory, and character set language_territory.characterset american_american.us7ascii set to true if your system doesn't have C dev tools (cc, make, ld, etc.) false new directory structure for Oracle 7 to follow Oracle Flexible Architecture see ICG "Preparing to install Oracle Products" can be any directory directory containing help file? $ORACLE_HOME/help/admin/resource $ORACLE_HOME/help/admin/resource dir containing top level Oracle directories any directory with enough space /usr/oracle dir containing icons for CDE tools $ORACLE_HOME/guicommon/tk2/admin/icon $ORACLE_HOME/guicommon/tk2/admin/icon which lp arguments to use for Easy*SQL, SQL*Calc, SQL*Forms, SQL*Menu,

Example: LOG_SIZE# Use: General: Example: Use: General: Example: MM_RESOURCE* Use: General: Example: Use: General: Example: NO_MAKE Use: General: Example: ORACLE_BASE* Use: General: Example: ORACLE_HELP Use: General: Example: Use: General: Example: Use: General: Example: Use:

MENU5PATH

NLS_LANG

ORACLE_HOME

ORACLE_ICON*

ORACLE_LPARGS

SQL*Report, or SQL*ReportWriter General: Example: ORACLE_LPPROG Use: General: Example: ORACLE_LPSTAT Use: General: Example: Use: General: Example: ORACLE_PAGER Use: General: Example: ORACLE_PATH Use: General: Example: ORACLE_SERVER# Use: General: Example: ORACLE_SID Use: General: Example: ORACLE_TERM* Use: -c -s which lp command to use for Easy*SQL, SQL*Calc, SQL*Forms, SQL*Menu, SQL*Report, or SQL*ReportWriter lp, lpr, print lp which lp status command to use lpstat, lpq lpstat UNIX Operating System usercode who owns the Oracle files can be any user, default is oracle oracle which UNIX pager to use (more, less, pg) in Oracle Products like SQL*Menu more directory for reading and writing to and from SQL*Forms, SQL*Menu 5.0, SQL*Plus can be any directory (if not set, read and write from current directory) $HOME/oracle For client only installs T if client only, F if there will be a local database T Oracle System Identifier must begin with a letter, followed by a number or character; limited by some OS's to 4 chars v712 Tool Kit II env var pointing to the tk2c${ORACLE_TERM}.res file under $ORACLE_HOME/rdbms/admin/terminal (tk2c stands for character toolkit2) any file like tk2c${ORACLE_TERM}.res vt100 directory where the Tool Kit II .res files reside $O_H/tk2/admin/terminal $O_H/tk2/admin/terminal allow verification/trace to be turned on and off during install (echoes everything done by sh) T does set -x; anything else is off T if true, runs the <product>.verify scripts during install T; anything else is off

ORACLE_OWNER

General: Example: ORACLE_TERMINAL Use: General: Example: ORACLE_TRACE Use: General: Example: ORACLE_VERIFY Use: General:

Example: ORACLE_DOC* Use: General: Example: ORAENV_ASK Use: General: Example: ORAKITPATH Use: General: Example: ORAMAIL_EDITOR Use: General: Example: ORANSEMS# Use: General: Example: Use: General: Example: Use: General: Example: Use: General: Example: Use: General: Example: SDD_HOME Use: General: Example: SDD_PRINT Use: General: Example: Use: General: Example: Use: General:

T points to directory containing on-line doc starting v7.0.16 to any directory containing online doc (file.obd) $ORACLE_HOME/doc if set to anything, prompts for $O_S or $O_H when (c)oraenv is invoked NO or anything NO set to directory containing the v1 orakit resource file $ORACLE_HOME/<product>/admin/resource $ORACLE_HOME/forms30/admin/resource which editor to use for Oracle*Mail vi, emacs, ed vi number of semaphores that make up a set 1-UNIX maximum 60 sets the default pipe driver to v1 or v2 v1, v2 v2 directory containing Oraterm (TK I) resource files $ORACLE_HOME/oraterm/admin/resource directory containing patches and patch.list any directory,$O_H/patch,$O_H/install/patch $ORACLE_HOME/install/patch directory for SQL*TextRetrieval V2.0 to find forms can be any directory (if not set, read and write from current directory) $HOME/oracle directory containing top level CASE Dictionary directories can be any directory containing top level CASE directories $O_H/dict50/admin for CASE: set to UNIX print command to use lp, lpr lp for CASE: set to UNIX print command to use for wide lp, lpr lp now superceded by $ORACLE_PATH can be any directory (if not set, read

ORAPIPES*

ORATERMPATH

PATCH_HOME

RT_STATUS

SDD_WPRINT

SQLPATH

Example: SQLTR_STATUS Use: General: Example: SRW_TMP Use: General: Example: Use: General: Example: TNS_ADMIN* Use: General: Example: TWO_TASK Use: General: Example:

and write from current directory) $HOME/oracle directory for SQL*TextRetrieval V1.1 to find forms can be any directory (if not set, read and write from current directory) $HOME/oracle directory to use for SQL*ReportWriter temporary files any directory with >10M of disk space /usr/tmp version 7.0.12 only: Tool Kit II terminal resource file vt100 portion from tk2_vt100.res vt100 directory containing SQL*Net v2 files any dir (see manual for order followed-looks at /etc, $HOME, and $TNS_ADMIN $O_H/network/admin hoststring to use for SQL*Net v1 p:, f:, a:host:$O_S, d:host:$O_S, star:host:$O_S, t:host:$O_S, tt:host:$O_S, x:host:$O_S t:host:oracle

TK2DEV*

UNIX Environment Variables Affecting Oracle =========================================== ADA_PATH ALSYCOMP_DIR DISPLAY set to directory containing Ada compiler set to directory containing Ada compiler Use: General: Example: $HOME en_US, De_DE american_america ld options for compiling directory containing shared object libraries directory containing shared object libraries when set to a filename, logs information from install session `logname` or `whoami` Tells X-based tools which Display to use `hostname`:0.0 fubar:0.0

HOME LANG LANGUAGE LDOPTS LDPATH LD_LIBRARY_PATH LOG LOGNAME

MALLOCTYPE NLSPATH NONAMESERVER

3.1, 3.2 for telling AIX which malloc type to use OS directory containing language message files Use: General: Example: $PATH <printer name> /bin/csh or /bin/sh any valid term: ansi, hft, hp, mac2, sun, xterm, vtxxx some UNIX boxes allow /tmp to be renamed to $TMPDIR (any directory) sets local time zone set to file Orakit, or other X11 resource files set to dir containing resource specifications file Orakit, or other X11 resource files For tcp/ip networks without a named server 1 for no named server, 0 for named server 1

PATH PRINTER SHELL TERM TMPDIR TZ XENVIRONMENT XAPPLRESDIR

--------------------------------------------------------------------Oracle Worldwide Customer Support . Server Architecture on UNIX and NT Type: BULLETIN Status: PUBLISHED Content Type: TEXT/PLAIN Creation Date: 06-FEB-1998 Last Revision Date: 26-JUL-2000 Oracle Server Architecture on UNIX and NT ========================================= This article concentrates on the way Oracle works on UNIX and Windows NT, but other platforms will be mentioned. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. What are the Oracle background processes/threads? What are the differences between background processes on UNIX and NT? How are the background processes/threads implemented on UNIX and NT? What are the limits on file and database size on UNIX and NT? How does Oracle use memory on UNIX and NT? How does Oracle bypass the filesystem cache on UNIX and NT? How does Oracle utilise the features and characteristics of SMP systems? How does Oracle utilise the features and characteristics of MPP systems? How does Oracle work on clustered systems? The different SQL*Net protocol adapters and their environments. How Oracle environment variables are set on different systems. How is the ORACLE_HOME location chosen on different systems? What are the limitations on choosing ORACLE_SID?

14. 15. 16. 17. 18. 19. 20.

How are operating system and Oracle libraries used on UNIX and NT? Accounts and groups used for Oracle installation and administration. Finding out and setting the size of the System Global Area (SGA). How are different releases of Oracle distinguished on UNIX and NT? What is meant by the terms "upgrade" and "migrate"? What are the routes for transporting data between different platforms? What is the difference between a server option and a server cartridge?

1. What are the Oracle background processes/threads? These are the processes which manage the Oracle to be available; others and specific to certain (on UNIX) or the threads within a process (on NT) RDBMS. Some must always be running for the server are optional on all platforms; and some are optional platforms.

A = Must always be running O = Optional on all platforms P = Optional and specific to a platform DBWR (A) LGWR (A) PMON (A) SMON (A) CKPT (O) ARCH (O) RECO (O) SNPnnn (P) LCKnnn (P) Snnn (P) Dnnn (P) WMON (P) QMNn (P) TRWR (P) LMON (P) LMD0 (P) the database writer the log writer the process monitor the system monitor the checkpoint process (or thread) the archive process (or thread) the recoverer process (or thread) - snapshot process - inter-instance locking processes shared server process in multi-threaded server dispatcher process in multi-threaded server wakeup monitor process AQ Time Manager Trace Writer Lock Manager Monitor Lock Manager Daemon

2. What are the differences between background processes/threads on UNIX and NT? There is no difference between the functions, the background processes or threads fulfil. For example, the log writer does exactly the same, in Oracle terms, on UNIX and NT. The way they are implemented, however, is different. 3. How are the background processes/threads implemented on UNIX and NT? On UNIX, a separate operating system process is created to run each of the background functions listed above. On NT, they are run as different threads within the same process. 4. What are the limits on file and database size on different platforms?

On UNIX, block sizes usually vary between 2-8K, although larger than 8K is possible. Maximum database file size is 2Gb on most 32-bit UNIX platforms, though some (AIX, Solaris and HP/UX) now support a larger maximum, usually 32Gb. On NT, there are only 4 million blocks per datafile, because there are 32 bits available for block# and file#. So to support 256 (2**8) files/database requires 8 bits for the file number, leaving 24 bits for the block number;

this gives 2**24 or 16 million blocks per file. However, if 1024 (2**10) files are to be supported, only 222 or 4 million blocks/file is possible. So, 1024 files/database allows a maximum 4 million blocks/file. Note that the total maximum possible capacity of the database remains the same regardless of the way the bits are split up. 5. How does Oracle use memory on UNIX and NT? On UNIX, the background processes attach to shared memory, one of the standard interprocess communication methods on UNIX. On NT, this is not necessary, as the Oracle threads all share the same virtual address space anyway. 6. How does Oracle bypass the filesystem cache on UNIX and NT?

On UNIX, Oracle opens files using the O_SYNC flag to bypass the filesystem buffer cache. In the current Win32 API, the equivalent flags are FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH. The goal in both cases is the same; to ensure that data has been posted to disk before assuming that the write has been successful. 7. How does Oracle utilise the features and characteristics of SMP systems?

Oracle utilises as many CPUs as are available. This is completely transparent to the Oracle user or application developer. The only init.ora parameter that may need setting is SPIN_COUNT, to influence how long a process will wait for a latch to become available. Before and after tests should be always be carried carried out to determine the effects of setting this parameter on database performance. See [NOTE:30832.1] for details of SPIN_COUNT. 8. How does Oracle utilise the features and characteristics of MPP systems?

Oracle Parallel Server (OPS) is available for MPP systems from all the major vendors. The way in which it works varies between platforms; for example, whether the vendor's lock manager is used, or an Oracle-supplied one. 9. How does Oracle work on clustered systems?

As on MPP systems, this is very platform-specific, ranging from the more traditional type of VAX/OpenVMS or Alpha/OpenVMS clusters to emerging technologies such as NT clusters. Usually, OPS will be used, to take advantage of the independent processing capability of the different cluster nodes. The Oracle Fail Safe product is available for selected two-node NT clusters running MS Cluster Server. If one of the nodes fails, the other can take over its workload. This product is not related to OPS. 10. The different types of SQL*Net protocol adapters and their environments. The major protocol adapter for use with SQL*Net is TCP/IP. This is true for all platforms. Other adapters that are supplied with the Oracle server depend on the platform. For example, adapters that can be used on NT are Named Pipes (Microsoft networking), SPX (for use in Novell environments), TCP/IP, Bequeath (for local databases) and IPC (for local processes). On UNIX, TCP/IP, SPX, Bequeath and IPC are again supported. Oracle on many

UNIX platforms also includes adapters for the DECnet protocol and IBM LU6.2 protocol. Note that no DECnet protocol adapter is shipped with Oracle8. The most important point about protocol adapters is that no protocol adapter should be installed if the underlying protocol is not present on the machine, e.g. do not install the IPX adapter if you are not using the IPX/SPX protocol. Failure to observe this can lead to serious problems on UNIX. 11. How are Oracle environment variables set on different systems? The variables used by Oracle for the SID, ORACLE_HOME and so on are the same on different platforms, but stored differently. On OpenVMS, logical names are used; on UNIX and NT, environment variables. UNIX environment variables are set differently depending on whether the C-shell or Bourne/Korn shell is in use. On NT, environment variables may be set in one of three ways: 1. In a similar way to Bourne or Korn environment variables on UNIX. For example, in a command window, enter: C:\>set ORACLE_SID=ORC8 Such a setting only has effect in the command window where it was made. 2. In Start > Settings > Control Panel > System > Environment by entering either a System or User Variable name and value. This updates the Registry. 3. By running the Registry Editor (REGEDT32) directly, and entering a new Key Value (variable name) and Value Data (value) in the appropriate subtree. Care should always be taken when editing the registry directly. 12. How is the ORACLE_HOME location chosen on different systems? On most systems, including UNIX, the value of ORACLE_HOME is chosen by the DBA doing the install, based on knowledge of available disk space. On NT, the Oracle Installer offers as default the disk with the greatest amount of free space. This can be over-ridden by the DBA doing the install. 13. What are the limitations on choosing ORACLE_SID? The SID should consist of four or fewer alphanumeric characters. This is to avoid problems with filename length restrictions on some platforms, e.g. the 8.3 restriction on DOS, which is still present on NT if using DOS-style names (which Oracle requires). So the initialisation file for a database called ORCL will be called initORCL.ora, representing the longest possible filename. 14. How are operating system and Oracle libraries used on different systems? On UNIX, there is a general library for all products, plus separate libraries for each product under that product's directory. These directories contain a mixture of objects (which have a .o suffix) and archive libraries (which have a .a suffix). Before a product can be used, it must be built, using the make utility. This can lead to very large executables, as the relevant libraries have to be built in to the image. For example, a very small program such as one which simply prints hello may be 16K in size. This will not apply if shared libraries are used.

On NT, executables tend to be much smaller, because of the Windows usage of dynamic link libraries (DLLs). These are very similar to shared libraries on UNIX, or shareable images on OpenVMS. They are dynamically linked with the executable at runtime. 15. What accounts and groups are used for Oracle installation? On UNIX, a dba group (the default name is "dba") and Oracle user (no default, a good choice is something like "oracle" or "ora7") are required. Oracle cannot be installed by the root user. On NT, the account used for Oracle installation and maintenance must either be the Administrator account, or an account in the Administrators group. 16. Finding out the size of the System Global Area (SGA). The size of the SGA may be obtained as follows: (a) On startup. (b) By entering SHOW SGA when connected internal to Server Manager. Both of these show something like: Total System Global Area Fixed Size Variable Size Database Buffers Redo Buffers 4830836 46596 3948656 819200 16384 bytes bytes bytes bytes bytes

17. How are different releases of Oracle distinguished on UNIX and NT? On UNIX, there can be as many installations of Oracle as the machine's resources will allow. They are distinguished by the environment variables ORACLE_HOME and ORACLE_SID. The executables for different versions have the same name. On NT, all Oracle server installations must be in the same ORACLE_HOME (this is no longer the case from 8.0.4). Versions are distinguished by having the first two letters of the version at the end of their names, e.g. ORACLE73, EXP80. This means that installation of one minor release, e.g. 7.3.2.3, will overwrite another, e.g. 7.3.2.1, which may not have been the intended result. 18. What is meant by the terms "upgrade" and "migrate"? The term upgrade is used to refer to moving from one minor release to a higher minor release, e.g. 7.2.2.4 to 7.2.3. The term migrate is used to refer to moving from one major release to a higher major release, e.g. 7.3.4 to 8.0.4. 19. What are the routes for transporting data between different platforms? The tool for extracting data from an Oracle database is export. The file it creates is written in a special, proprietary format, which can only be read by the import utility. The latter may reside on another platform, so this route allows an Oracle database to be moved to a different platform. A less sophisticated way of extracting data is to issue the appropriate SELECT query in SQL*PLus, spooling the output to a file.

20. What is the difference between a server option and a server cartridge? An option is an component of the Oracle Server which, when installed, becomes part of the server kernel. The term cartridge denotes an option which provides a user interface to the kernel, and which may, in a future release of the server, become object-based. In some cases, for example ConText, what was an option in Oracle7 is referred to as a cartridge in Oracle8. In other cases, for example Parallel Query, what was an option has been included as a standard part of the database.

You might also like