You are on page 1of 11

What diagnostic information to collect for ADVM/ACFS related issues [ID 885363.

1 ] Modified:Apr 18, 2013Type:TROUBLESHOOTINGStatus:PUBLISHEDPriority:1 There are no commentsComments (0) Rate this document Email link to th is document Open document in new window Printable Page In this Document Purpose Troubleshooting Steps 1. What's New in 11.2.0.3 2. Overview 3. ASM Instance Alert Log and Process Trace Files 4. Operating System Log/Event files 5. ACFS/ADVM Specific Log Files 6. Panics / Crash Dumps / Hangs 7. ACFS Replication Diagnostic Information 8. Information Checklist When Reporting a Problem Appendix A - Persistent OKS logs (11.2.0.3 & onwards) Appendix B - diagcollection.pl script automatically collects most ACFS i nformation. (11.2.0.3.4 & onwards) References Applies to: Oracle Database - Enterprise Edition - Version 11.2.0.1 to 11.2.0.4 [Release 11. 2] Information in this document applies to any platform. Purpose <p >*Purpose <span >Enter a clear description of what the document is trying to achieve. The purpose of this document is to provide guidance on what to collect when tro ubleshooting ACFS/ADVM related issues. Troubleshooting Steps 1. What's New in 11.2.0.3 a) Starting on 11.2.0.3.4 release & onwards, diagcollection.pl script now automa tically collects most ACFS information (please check the Appendix B below for addi tional information). b) Starting on 11.2.0.3.0 release & onwards, OKS logs are now automatically writ ten to disk. This functionality is in addition to the existing in-memory OKS log . The in-memory OKS log features and functionality have not been altered in any way (please check the Appendix A below for additional information). 2. Overview The ACFS/ADVM product consists of several processes in the ASM instance, three s eparate kernel device drivers and several command and utility programs. The thre e kernel drivers that make up the ACFS products are: - Oracle ACFS ASM Cluster File system - Oracle ADVM ASM Cluster Dynamic Volume Manager - Oracle OKS Oracle Kernel Services When a support issue arises, it is important to collect all of the pertinent dat a related to ACFS/ADVM. Diagnostic information related to ACFS/ADVM is located in several areas includin g: System configuration (e.g. OS version, platform type, etc.) ASM alert log and process trace files

CSS trace files Operating system log/event files CRS resource status ACFS/ADVM specific log files Operating system crash dumps ACFS Replication trace and log files 3. ASM Instance Alert Log and Process Trace Files Diagnostic information related to the ASM instance consists of the alert log alo ng with the process log trace files for the ADVM processes. ASM process trace fi les are typically named as <ORACLE_SID>_<process_name>.<pid>.trc. In general, th e entire contents of the ASM diagnostic area should be collected for support iss ues (i.e diag/asm/+asm/+ASM/*). CSS trace files ($ORACLE_HOME/log/<hostname>/* s hould be included as well.. Files that have ADVM/ACFS specific information include: - Alert Log (i.e alert_+ASM.log) The alert log contains general information related to the ASM instance. - VDBG process trace file (i.e. +ASM_vdbg_<pid>.trc) The VDBG trace file contains information related to the VDBG process. - VBG process trace files (i.e. +ASM_vbg0_<pid>.trc) The VBG trace files contain information related to the various VBG processes. - VMB process trace file (i.e +ASM_vmb_<pid>.trc) The VMB trace file contains information related to the VMB process. - Foreground trace files (i.e. +ASM_ora_<pid>.trc) The foreground trace files contain information related to a foreground process, typically processing SQL. - Slave trace files (i.e. +ASM_v000_<pid>.trc) Slave trace files contain information related to slave processes that are spawne d at volume creation whose purpose is to zero out parts of the volume storage - ACFS process trace file (i.e. +ASM_acfs_<pid>.trc) The ACFS process trace file contains information related to the ACFS process. The ACFS/ADVM process trace files may or may not exist in the ASM diagnostic are a. By default, minimal logging is enabled for these processes. To increase the a mount of data being logged to these files, the following event should be set for the ASM instance: event="15199 trace name context forever, level 2097159" 4. Operating System Log/Event files ACFS/ADVM logs some information to the base operating system logs as described b elow. These messages are high level in nature and targeted to system administrat ors. The ACFS/ADVM messages are labeled with either Oracle ADVM , Oracle ACFS or e OKS for easy identification. - Linux (OEL5) /var/log/messages (should also include the /var/log/messages.<n> files as we ll) - Windows System Event log Please check the "How To Gather The OS Logs For Each Specific OS Platform. ( Doc ID 1349613.1)" for details.

Oracl

- Solaris /var/adm/messages (should also include the /var/adm/messages.<n> files as we ll) - AIX The default for AIX is to not log messages but, AIX can direct console messa ges to a file via descriptors in /etc/syslog.conf. Look at that file for a line like kern.debug /var/adm/messages rotate size 100k files 4 To determine the file names for console messages (if any) and then copy, in this case, files "/var/adm/messages*". Comments in /etc/syslog.conf explain the syntax of the description lines. 5. ACFS/ADVM Specific Log Files The kernel drivers maintain an internal memory based trace log for diagnostic pu rposes (aka the OKS log). The acfsutil command has a function to control the amoun t of data logged and another to copy the trace log to a file. If running 11.2.0.1 or 11.2.0.2, these trace log messages must be manually copie d from the kernel via the acfsutil log command described below. The trace log will be written by the kernel itself under some unusual circumstan ces like initialization problems or file system inconsistencies. A trace log dumped by the kernel will go to a file with a name of the form <p>ok s.log.<n>, where n=0-9 and <p> is a one character product Id (F for ACFS and V f or ADVM). Each time the kernel drivers are loaded, the serial number <n> is rese t to 0. Pre-existing log files are overwritten. The files are written to a platf orm specific directory. File creation will be noted in a system log file. The log levels are small values, one for each driver. Larger values log more infor mation. To log diagnostic information useful for analysis, use these commands: acfsutil log -p oks -l 5 acfsutil log -p ofs -l 5 acfsutil log -p avd -l 5 To restore the normal values, use: acfsutil log -p oks -l 2 acfsutil log -p ofs -l 2 acfsutil log -p avd -l 3 N.B. oks is used for OKS, ofs is used for ACFS, and avd is used for ADVM. On Win dows, use /p and /l instead of -p and -l. To copy the trace log, follow the appropriate instructions: Linux (OEL5) On Linux, the memory log can be dumped to a file via: cat /proc/oks/log > filename Files written by kernel drivers will be created in the /var/log directory an d a message will be written to the /var/log/messages file. Windows On Windows, the memory log can be dumped to a file via: acfsutil log [/f filename] The acfsutil log command creates a file in the current directory. The default file name is oks.log, and can be overridden via the /f option. Files written by kernel drivers will be created in the %systemroot%\system32 \drivers\acfs directory and an event will be posted to the Windows Event Log. Solaris

On Solaris, the memory log can be dumped to a file via: acfsutil log [-f filename] The acfsutil log command creates a file in the current directory. The default file name is oks.log, and can be overridden via the f option. Files written by kernel drivers will be created in the /var/log directory an d a message will be written to the /var/adm/messages file. If the acfsutil log is unresponsive, then the memory log can be dumped to a file (as root) via: echo "ks_log_buf/K | ::print KS_CHUNK_RING_BUFS ks_bufs | /K | /s" | sudo md b -k >oks.log AIX On AIX, the memory log can be dumped to a file via: acfsutil log [-f filename] The acfsutil log command creates a file in the current directory. The default file name is oks.log, and can be overridden via the -f option. Files written by kernel drivers will be created in the /var/log directory an d a message will be written to the /var/adm/messages file. 6. Panics / Crash Dumps / Hangs Some issues relating to ACFS/ADVM may result in system panics. In order to diagn ose these rare occurrences, the system must be configured to save crash dumps. T he following steps are needed to enable crash dumps for a system: Linux (OEL) - Install the debuginfo rpms for your kernel (optional) - Add the following lines to /etc/kdump.conf ext3 LABEL=/ (assumes / is the root disk) core_collector makedumpfile c d31 (only if using debuginfo kernels) path /var/crash - Append crashkernel=128M@16M to the end of the kernel lines in /boot/grub.con f (The node needs to be rebooted after this change) - Enable and start the kdump service sudo /sbin/chkconfig kdump on sudo /sbin/service kdump restart

Windows Windows systems should be configured for kernel memory dumps. From Control Pa nel run the System utility to bring up the System Properties window. Select the Advanc ed tab and them the Settings button . In the Write debugging information area, select K rnel Memory Dump and check the Overwrite any existing file check box. Solaris Solaris is normally configured to capture crash dumps when starting up after a system crash. They normally appear in /var/crash/<hostname>. The location to save crash dumps may be changed. To query the crash dump config information, run the following command: /usr/sbin/dumpadm Crash dump files consist of core (vmcore.XX) and symbol (unix.XX) files. The XX suffix is a numeric identifier that is increased every time a crash dump is saved. AIX AIX systems may not be configured to make core dumps automatically. First, dump space must be available, this should be on a LVM partition of type sysdump or p aging, .e.g:

# sysdumpdev primary secondary copy directory forced copy flag always allow dump dump compression type of dump

/dev/lg_dumplv /dev/sysdumpnull /var/adm/ras TRUE FALSE ON traditional Pvs ... 1 1

# lsvg -l rootvg | egrep 'TYPE|dump|paging' LV NAME TYPE LPs PPs hd6 paging 31 31 lg_dumplv sysdump 16 16

If the sysdump partition vg_dumplv exists, it may not be big enough. While it s hould be possible to increase its size, it may be easier to use a paging volume like hd6, above. To use partition hd6, consider the following command to use age first, then check with the customer and/or AIX support: sysdumpdev -p /dev/hd6 -P ACFS may crash on its own. If it doesn't, you can force a crash and dump with t he acfsutil panic command discussed below or with AIX's: sysdumpstart -p If crash dumps aren't saved after the system restarts, use: savecore /var/adm/ras to save the dump as file /var/adm/ras/vmcore.XX.BZ and save the boot image as fi le /var/adm/ras/vmunix.XX, where the XX suffix is a numeric identifier that is i ncreased every time a crash dump is saved. It's also helpful to copy kdb and the ACFS drivers: cp /usr/lib/drivers/oracle*.ext /usr/sbin/kdb /var/adm/ras Hangs If there is a system or process hang, you may force a crash dump via the followi ng undocumented command as root (Linux/UNIX) or Administrator (Windows): acfsutil panic acfsutil panic (panic the node) (panic all the nodes in the cluster) understand the man p

[/c | -c]

On Linux, if you cannot tolerate crashing a node (or multiple nodes), you ca n try the following to write stack traces to the system log: On each node, either type ALT-SysRq-t on the physical keyboard, or "echo t > /proc/sysrq-trigger" from a shell prompt

On Solaris, if you cannot tolerate crashing a node (or multiple nodes), you can try the following to write stack traces to the system log:

On each node type echo ::threadlist -v

| mdb -k

7. ACFS Replication Diagnostic Information In addition to all the pertinent data collected for ACFS/ADVM, the following will also need to be collected for ACFS Replication. Output from the following commands: acfsutil repl info -c /<primary_mountpoint> acfsutil repl info -c /<standby_mountpoint> acfsutil repl info /<primary_mountpoint> acfsutil repl info /<standby_mountpoint> acfsutil repl bg info /<primary_mountpoint> acfsutil repl bg info /<standby_mountpoint> ls -l /<primary_mountpoint>/.ACFS/repl/ready ls -l /<standby_mountpoint>/.ACFS/repl/ready ls -l /<primary_mountpoint>/.ACFS/repl/processed ls -l /<standby_mountpoint>/.ACFS/repl/processed Output from the following command on all primary nodes: ps -ef | grep acfsrepl Output from the following command on all standby nodes: ps -ef | grep acfsrepl Alert and trace logs: <grid_home>/log/<host>/alert<host>.log <grid_home>/log/<host>/client/acfsutil.log <grid_home>/log/<host>/crsd/oraagent/oraagent.log <grid_home>/log/<host>/crsd/orarootagent/orarootagent.log All files located in <grid_home>/log/<host>/acfsrepl/* All files located in <grid_home>/log/<host>/acfsreplroot/* For cluster problems, we need all of the above for each node. 8. Information Checklist When Reporting a Problem Complete OS Version information - Linux: Output of uname -a - Solaris: Contents of /etc/release Output of uname -a - AIX: Output of oslevel -s System message/event logs - Linux: /var/log/messages - Solaris: Recent entries at the end of /var/adm/messages - AIX: Recent entries at the end of /var/adm/messages Stats on mounted file systems - Output from acfsutil info fs - Output from mount (Linux) For CRS resource issues, - Output from crsctl stat res

- Output from crsctl stat res - Output from crsctl stat res - Output from crsctl stat res

init p p init

OKS Logs - All OKS logs generated by the system - Output from acfsutil log ASM trace files Grid Infrastructure log/trace files collected by $GI_HOME/bin/diagcollection.sh Crash Dump if available ACFS Replication Diagnostic Information from Section 6 For cluster problems, we need all of the above for each node. Appendix A - Persistent OKS logs (11.2.0.3 & onwards) OKS logs are now automatically written to disk. This functionality is in additio n to the existing in-memory OKS log. The in-memory OKS log features and function ality have not been altered in any way. OKS persistent log features 1. Automatically collects up to 10 log files, each with a maximum size of 100MB for a total of 1GB (defaults). 2. If more data is collected, the oldest log file is over-written. 3. The OKS persistent log hijacks the previously existing OKS console logging. 3.1. It does this by reusing the -n flag of the acfsutil log command. 3.2. Instead of (previously) sending the -n data to the console (and /var/log/me ssages (on Linux)), it sends the data to the persistent log instead. 3.3. You can continue to set separate log levels for the in-memory log and the p ersistent log. 3.4. An acfsutil log example is acfsutil log -p avd -l 4 -n 3 -c 0xfff where -l is the in-memory log level and -n is the persistent log level. 4. All persistent log parameters are tunable. 5. A new acfsutil command, plogconfig allows you to do this. 6. OKS persistent logging is started by the standard utilities (acfsload, usm_lo ad_from_view.sh, etc) after the drivers are loaded. 7. The persistent OKS logs are stored in the grid home at: <grid home>/log/<node name>/acfs/. Appendix B - diagcollection.pl script automatically collects most ACFS informati on. (11.2.0.3.4 & onwards) Starting on 11.2.0.3.4 (GI/PSU) release and onwards, the diagcollection.pl s cript will automatically collect most ACFS information, check the "Data Gatherin g for Troubleshooting Oracle Clusterware (CRS or GI) And Real Application Cluste r (RAC) Issues (Doc ID 289690.1)" for details. If running on older code, you must collect the ACFS manually as described in this document. The OKS log information is now stored persistently in files and these files are located in the ACFS directory of the grid home (<grid home>/log/<node name>/ acfs). More information can be found in Appendix A (above). For information on which platform versions are supported by ACFS, please ref er to the following document: =)> ACFS Supported On OS Platforms. (Doc ID 1369107.1) ################################################################################ ################################################################## Applies to: Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.3 - Release: 11

.2 to 11.2 Information in this document applies to any platform. Symptoms 1) ACFS filesystem cannot be mounted on one RAC node due to the ACFS-02017 error : [root@rzdb-rac-02-01 ~]# /bin/mount -t acfs /dev/asm/acf_002003-494 /u01/app/ora cle/acfsmounts mount.acfs: CLSU-00100: Operating System function: open64 failed with error data : 2 mount.acfs: CLSU-00101: Operating System error message: No such file or director y mount.acfs: CLSU-00103: error location: OOF_1 mount.acfs: CLSU-00104: additional error information: open64 (/dev/asm/acf_00200 3-494) mount.acfs: ACFS-02017: Failed to open volume /dev/asm/acf_002003-494. Verify th e volume exists. 2) This is a 4 node RAC-ACFS configuration. Cause 1) The associated ADVM volume (e.g. /dev/asm/acf_002003-494 ) is not present (at OS level) on node #1, but it is present on the other 3 RAC nodes: Node #1: [root@rzdb-rac-02-01 ~]# ls -l /dev/asm/acf_002003-494 ls: /dev/asm/acf_002003-494: No such file or directory <(==== *** No Present**** [root@rzdb-rac-02-01 ~]# ls -ld /u01/app/oracle/acfsmounts drwxr-xr-x 2 root root 4096 Sep 21 12:12 /u01/app/oracle/acfsmounts [oracle@rzdb-rac-02-01 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003 Volume Name: ACF_002003 Volume Device: /dev/asm/acf_002003-494 State: DISABLED Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts Node #2: [root@rzdb-rac-02-02 ~]# ls -l /dev/asm/acf_002003-494 brwxrwx--- 1 root oinstall 252, 252929 Jan 12 14:06 /dev/asm/acf_002003-494 (==== *** Present**** [root@rzdb-rac-02-02 ~]# ls -ld /u01/app/oracle/acfsmounts drwxrwx--- 6 root osasm 4096 Feb 14 13:30 /u01/app/oracle/acfsmounts [oracle@rzdb-rac-02-02 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003 Volume Name: ACF_002003

<

Volume Device: /dev/asm/acf_002003-494 State: ENABLED Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts Node #3: [root@rzdb-rac-02-03 ~]# ls -l /dev/asm/acf_002003-494 brwxrwx--- 1 root oinstall 252, 252929 Jan 12 14:06 /dev/asm/acf_002003-494 <(== == *** Present**** [root@rzdb-rac-02-03 ~]# ls -ld /u01/app/oracle/acfsmounts drwxrwx--- 6 root osasm 4096 Feb 14 13:30 /u01/app/oracle/acfsmounts [oracle@rzdb-rac-02-03 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003 Volume Name: ACF_002003 Volume Device: /dev/asm/acf_002003-494 State: ENABLED Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts Node #4: [root@rzdb-rac-02-04 ~]# ls -l /dev/asm/acf_002003-494 brwxrwx--- 1 root oinstall 252, 252929 Jan 12 14:06 /dev/asm/acf_002003-494 <(== == *** Present**** [root@rzdb-rac-02-04 ~]# ls -ld /u01/app/oracle/acfsmounts drwxrwx--- 6 root osasm 4096 Feb 14 13:30 /u01/app/oracle/acfsmounts [oracle@rzdb-rac-02-04 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003 Volume Name: ACF_002003 Volume Device: /dev/asm/acf_002003-494 State: ENABLED Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts 2) This occurred due to the associated volume ( /dev/asm/acf_002003-494 )was disable d in the affected node #1: [oracle@rzdb-rac-02-01 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003

Volume Name: ACF_002003 Volume Device: /dev/asm/acf_002003-494 State: DISABLED <(==== ******* Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts Solution 1) Please enable the associated volume on the affected node thru ASMCMD as follo w: ASMCMD> volenable -a 2) Then check the new state: ASMCMD> volinfo -a 3) Finally mount the ACFS filesystem on the associated/affected Node #1 (Manuall y (using the mount OS command) or during the CRS stack startup). Example (AIX example): # /usr/sbin/mount -v acfs /dev/asm/acf_002003-494 /u01/app/oracle/acfsmounts # df -k Note: For additional information (about the mount options for Solaris, Linux & AIX) please check the next manuals: http://www.oracle.com/pls/db112/homepage Oracle ACFS Command-line Tools for Linux and UNIX Environments Oracle ACFS Command-line Tools for the Solaris Environment Oracle ACFS Command-line Tools for the AIX Environment ################################################################################ ########################################## Check the cluster running state [root@multdb01tst-sm public]# crsctl stat res -t -init ################################################################################ ################################################################# :::::::::::::::::::::::::::::::::::: Connect interactive window of asm config :: :::::::::::::::::::::::::::::::::::::. Oracle instance(s) RUNNING on this node are: +ASM1 Oracle instance(s) CONFIGURED in oratab are: +ASM1 (home: /app/grid/11.2.0.3) cchtst (home: /app/oracle/product/11.2.0.3_cchtst) Enter Oracle SID you require : ORACLE_SID = [+ASM1] ? +-------------------------------------------------------------------------+ | This is your current Oracle environment | +-------------------------------------------------------------------------+ ORACLE_SID ........ = +ASM1 ORACLE_BASE ....... = /app/oracle ORACLE_HOME ....... = /app/grid/11.2.0.3 LD_LIBRARY_PATH ... = /app/grid/11.2.0.3/lib TNS_ADMIN ......... = /app/grid/11.2.0.3/network/admin

CRS_HOME .......... = /app/oracle/product/crs +-------------------------------------------------------------------------+ [oracle@multdb01tst-sm ~]$ OH [oracle@multdb01tst-sm 11.2.0.3]$ cd bin [oracle@multdb01tst-sm bin]$ ./asmca

You might also like