(Network - Clusterware - RAC) Troubleshooting Without ADDM V1

Cloug Meeting 2
Junio 8, 2016
Network/Clusterware/RAC
Troubleshooting Without ADDM
César Sáez León
• Miembro de la directiva de CLOUG desde 2012

• OCP Desde Oracle 8i
• OCE RAC Administrator
• OCE Performance Tuning
• Relator Oracle University desde 2001
• Gerente de Negocios y Marketing en Datactiva
Agenda
• Se abordarán 3 casos reales de análisis y solución de
problemas en plataformas clusterizadas.
– Clusterware
– Network
– RAC (Base de Datos)
• Se hace uso exclusivamente de herramientas gratuitas

(STATSPACK, Logs, Vistas Dinámicas, CVU, MOS, etc).
• Plataformas RAC10gR2 y RAC11gR2, Oracle 12cR1, aun está

con muy baja penetración en el mercado chileno, no hay
casos para mostrar aún, sin embargo la metodología es la
misma.
Lentitud - 10gR2 - STATSPACK
"Cliente informa lentitud en un proceso de negocio"

Release 10.2.0.4.0 - 64bit
RAC YES
Num CPUs 24
Phys Memory (MB) 60,413
Platform RHEL AS release 4 (Nahant Update 8)
Caso 1 RAC
STATSPACK
Instancia 1
• Snapshot Snap Id Snap Time Sessions Curs/Sess

~~~~~~~~ ---------- ------------------ -------- ---------
Begin Snap: 2030 18-Jan-11 14:00:03 125 34.4
End Snap: 2043 18-Jan-11 17:00:01 171 38.2
Elapsed: 179.97 (mins)
Instancia 2
• Snapshot Snap Id Snap Time Sessions Curs/Sess

~~~~~~~~ ---------- ------------------ -------- ---------
Begin Snap: 2031 18-Jan-11 14:00:01 119 36.3
End Snap: 2034 18-Jan-11 17:00:02 169 41.8
Caso 1 RAC
Instance Efficiency Percentages
• Instancia 1
Buffer Nowait %: 99.72 Redo NoWait %: 100.00

Buffer Hit %: 97.86 In-memory Sort %: 100.00
Library Hit %: 97.75 Soft Parse %: 89.50
Execute to Parse %: 87.27 Latch Hit %: 99.83
Parse CPU to Parse Elapsd %: 38.79 % Non-Parse CPU: 97.54
• Instancia 2

Caso 1 RAC
Top 5 Timed Events
Instancia 1
• Top 5 Timed Events Avg %Total

~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
---------------------------------- ------------ ----------- ------ ------
CPU time 4,105 60.3
db file sequential read 293,699 1,123 4 16.5
db file scattered read 91,186 374 4 5.5
SQL*Net break/reset to client 710,360 316 0 4.6
db file parallel read 26,613 107 4 1.6
Caso 1 RAC
Top 5 Timed Events
Instancia 2
• Top 5 Timed Events Avg %Total

~~~~~~~~~~~~~~~~~~ wait Call
------------------------------------- ---------- ----------- ------ ------
CPU time 8,929 39.8
gc cr multi block request 58,289,682 4,735 0 21.1
db file scattered read 717,216 4,203 6 18.8
db file sequential read 311,881 1,510 5 6.7
gc buffer busy 1,497,373 1,348 1 6.0
Caso 1 RAC
Time Model System Stats
• Instancia 1
Statistic Time (s) % of DB time
----------------------------------- -------------------- ------------
sql execute elapsed time 5,668.7 91.7
DB CPU 4,198.4 67.9
parse time elapsed 427.6 6.9
hard parse elapsed time 238.3 3.9
PL/SQL execution elapsed time 144.6 2.3
failed parse elapsed time 116.9 1.9
• Instancia 2

----------------------------------- -------------------- ------------
DB CPU 9,051.5 45.3
hard parse elapsed time 280.1 1.4
PL/SQL execution elapsed time 243.4 1.2
Indica que la fase de ejecución de sentencias SQL es lo que más consume tiempo
En esta fase se realiza el acceso a los datos (bloques)
Caso 1 RAC
Global Cache Efficiency
Percentages
• Instancia 1
Buffer access - local cache %: 99.64

Buffer access - remote cache %: 0.12
Buffer access - disk %: 0.24
• Instancia 2

Sobre un 11% de los bloques accesados por instancia2, provienen de

instancia1.
Caso 1 RAC
Global Cache Load Profile
• Instancia 1
~~~~~~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Global Cache blocks received: 60.26 5.63
Global Cache blocks served: 5,779.27 540.19
GCS/GES messages received: 11,659.76 1,089.83
GCS/GES messages sent: 279.52 26.13
DBWR Fusion writes: 1.74 0.16
Estd Interconnect traffic (KB): 49,048.19
• Instancia 2
--------------- ---------------
Global Cache blocks received: 5,779.04 282.22
Global Cache blocks served: 60.26 2.94
GCS/GES messages received: 279.57 13.65
GCS/GES messages sent: 11,660.55 569.44
Caso 1 RAC
Global Cache Services -
Workload Characteristics
• Instancia 1
Avg global enqueue get time (ms): 0.0
Avg global cache cr block receive time (ms): 0.6

Avg global cache current block receive time (ms): 0.4
Avg global cache cr block build time (ms): 0.0

Avg global cache cr block send time (ms): 0.0
Avg global cache cr block flush time (ms): 0.5
Global cache log flushes for cr blocks served %: 7.5
Avg global cache current block pin time (ms): 0.0
Avg global cache current block send time (ms): 0.0
Avg global cache current block flush time (ms): 1.6
Global cache log flushes for current blocks served %: 0.0
Caso 1 RAC
• Instancia 2



Caso 1 RAC
SQL
• Instancia 2
En instancia2, la misma query es la que:
• Consume más CPU

• Demora más tiempo
• Tiene más esperas por eventos de cluster
Esta query tuvo 1309 ejecuciones en el periodo.
DB/Inst: instancia/instancia2
a) SQL ordered by CPU DB/Inst: instancia/instancia2
CPU CPU per Elapsd Old

Time (s) Executions Exec (s) %Total Time (s) Buffer Gets Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
4388.08 1,309 3.35 48.5 8870.58 88,555,035 4280002066
Caso 1 RAC
SQL
b) SQL ordered by Elapsed DB/Inst: instancia/instancia2
Elapsed Elap per CPU Old

Time (s) Executions Exec (s) %Total Time (s) Physical Reads Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
8870.58 1,309 6.78 44.4 4388.08 3,799 4280002066
c) SQL ordered by Cluster Wait Time DB/Inst: instancia/instancia2
Cluster CWT % of Elapsd CPU Old

Wait Time (s) Elapsd Time Time (s) Time (s) Executions Hash Value
------------- ----------- ----------- ----------- -------------- ----------
6,053.21 68.2 8,870.58 4,388.08 1,309 4280002066
Caso 1 RAC
Información Adicional de la
Sentencia
SQL_ID |SQL_FULLTEXT |HASH_VALUE |OLD_HASH_VALUE
---------------------- |------------------------------------------------------------------------------------------------------------------------------------- |-------------------- |-----------------
Gpzfhxtdgzzmx |SELECT IMAGEN.IMAG_CODIGO, IMAG_EXPEDICION, IMAG_ORIGEN, IMAG_DESTINO, IMAG_REME |1526726269 |4280002066
SQL_ID gpzfhxtdgzzmx
--------------------
SELECT IMAGEN.IMAG_CODIGO, IMAG_EXPEDICION, IMAG_ORIGEN, IMAG_DESTINO, IMAG_REMESA, IMAG_FECHA_EXPEDICION,

IMAG_NUMERO_ENVIO, IMAG_RUTA_PRIVADA, DOCUMENTO.DOCU_NOMBRE
FROM IMAGEN,DOCUMENTO
WHERE IMAG_FECHA_BAJA IS NULL AND
DOCU_FECHA_BAJA IS NULL AND
(IMAGEN.DOCU_CODIGO = :CODIGO_DOCUMENTO OR :CODIGO_DOCUMENTO IS NULL) AND
(IMAGEN.DOCU_CODIGO IN (SELECT DOCU_CODIGO FROM ACCESO_DOCUMENTO WHERE ACCD_FECHA_BAJA IS NULL AND USUA_CODIGO =
:CODIGO_USUARIO)) AND
(IMAG_ORIGEN = :ORIGEN_EXPEDICION OR :ORIGEN_EXPEDICION IS NULL) AND
(IMAG_DESTINO = :DESTINO_EXPEDICION OR :DESTINO_EXPEDICION IS NULL) AND
(IMAG_REMESA = :REMESA_EXPEDICION OR :REMESA_EXPEDICION IS NULL) AND
(IMAG_EXPEDICION = :EXPEDICION OR :EXPEDICION IS NULL) AND
(IMAGEN.DELE_CODIGO = :CODIGO_DELEGACION OR :CODIGO_DELEGACION = :"SYS_B_0") AND
(IMAG_NUMERO_ENVIO = :NUMERO_ENVIO OR :NUMERO_ENVIO IS NULL) AND
(IMAGEN.DOCU_CODIGO = DOCUMENTO.DOCU_CODIGO) AND
(DOCUMENTO.GRUP_CODIGO = :CODIGO_GRUPO OR :CODIGO_GRUPO IS NULL) AND
(IMAG_FECHA_EXPEDICION >= :EXPEDICION_DESDE OR :EXPEDICION_DESDE IS NULL) AND
(IMAG_FECHA_EXPEDICION <= :EXPEDICION_HASTA OR :EXPEDICION_HASTA IS NULL) AND
(IMAG_FECHA_ALTA >= :ESCANEO_DESDE OR :ESCANEO_DESDE IS NULL) AND
(IMAG_FECHA_ALTA <= :ESCANEO_HASTA OR :ESCANEO_HASTA IS NULL)
Caso 1 RAC
Información Adicional de la
Sentencia
id Operation Name Rows Bytes Cost (%CPU) Time
0 SELECT STATEMENT 1 25014 (100)
1 NESTED LOOPS SEMI 1 123 25014 (2) 00:05:01
2 NESTED LOOPS 1 113 25013 (2) 00:05:01
3 TABLE ACCESS FULL IMAGEN 1 84 25012 (2) 00:05:01
4 TABLE ACCESS BY INDEX DOCUMENTO 1 29 1 (0) 00:00:01

ROWID
5 INDEX UNIQUE SCAN DOCU_PK 1 0 (0)
6 TABLE ACCESS BY INDEX ACCESO_DOCUMENT 4 40 1 (0) 00:00:01

ROWID O
7 INDEX RANGE SCAN ACCD_USUARIO 4 0 (0)
Caso 1 RAC
Reads vs Changes
CONCLUSIONES Y
RECOMENDACIONES
El problema se produce en el módulo w3wp.exe, con la consulta de
SQL_ID=gpzfhxtdgzzmx:
• Se ejecuta esta consulta en un nodo del cluster, mientras hay

modificaciones a las tablas base desde el otro nodo (o ambos).
• Se requiere determinar si es normal que esta query se ejecute

simultáneamente a modificaciones sobre las tablas base.
• Si es normal, el curso a seguir sería realizar la query y las

modificaciones, desde el mismo nodo
• Subir STATSPACK a nivel 7 para contar con estadísticas por

segmento
Caso 1 RAC
Lentitud – 10gR2 - STATSPACK
"Cliente informa lentitud en su plataforma"
Release 10.2.0.4.0 - 64bit

RAC YES
Num CPUs 24
Phys Memory (MB) 60,413
Platform RHEL AS release 4 (Nahant Update 8)
STATSPACK, MOS, Documentación
Caso 2 Network
STATSPACK
• Instancia 1
Snapshot Snap Id Snap Time Sessions Curs/Sess

~~~~~~~~ ---------- ------------------ -------- ---------
Begin Snap: 10240 15-Jan-16 09:00:02 216 31.3
End Snap: 10252 15-Jan-16 11:00:05 294 35.4
• Instancia 2
Snapshot Snap Id Snap Time Sessions Curs/Sess

~~~~~~~~ ---------- ------------------ -------- ---------
Begin Snap: 10241 15-Jan-16 09:00:05 209 41.0
End Snap: 10243 15-Jan-16 11:00:04 283 38.4
Caso 2 Network
Instance Efficiency
Percentages
• Instancia 1

• Instancia 2

Caso 2 Network
Top 5 Timed Events
• Instancia 1
Top 5 Timed Events Avg %Total

~~~~~~~~~~~~~~~~~~ wait Call
----------------------------------------- ------------ ----------- ------ ------
enq: TX - row lock contention 34,746 16,783 483 26.9
gc cr multi block request 5,519,578 12,631 2 20.2
CPU time 5,941 9.5
gc buffer busy 123,818 5,105 41 8.2
gc cr grant 2-way 315,907 4,103 13 6.6
• Instancia 2
Top 5 Timed Events Avg %Total

~~~~~~~~~~~~~~~~~~ wait Call
----------------------------------------- ------------ ----------- ------ ------
enq: TX - row lock contention 38,470 18,611 484 26.6
db file sequential read 2,248,154 13,128 6 18.8
gc current block 2-way 640,072 9,328 15 13.3
CPU time 9,246 13.2
gc cr grant 2-way 300,460 4,068 14 5.8
Caso 2 Network
Time Model System Stats
• Instancia 1

----------------------------------- -------------------- ------------
inbound PL/SQL rpc elapsed time 7,002.8 11.1
DB CPU 6,107.4 9.7
PL/SQL execution elapsed time 3,871.2 6.2
• Instancia 2

----------------------------------- -------------------- ------------
DB CPU 9,456.3 13.3
inbound PL/SQL rpc elapsed time 6,290.4 8.8
PL/SQL execution elapsed time 4,208.1 5.9
Caso 2 Network
• Instancia 1

--------------- ---------------
GCS/GES messages received: 726.71 37.72
GCS/GES messages sent: 2,049.10 106.36
• Instancia 2
--------------- ---------------
GCS/GES messages received: 2,047.36 86.42
GCS/GES messages sent: 725.80 30.64
Hay una diferencia de 13.17/s
Caso 2 Network
Global Cache Efficiency
Percentages
• Instancia 1

• Instancia 2

Caso 2 Network
• Instancia 1



Caso 2 Network
• Instancia 2



Caso 2 Network
Typical Latencies for RAC
Operations
gc blocks lost
Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
gc blocks lost 7,525 1.0 0.1
gc blocks lost 0 0.0 0.0
Referencia: Troubleshooting gc block lost and Poor Network Performance in a

RAC Environment (Doc ID 563566.1)
"Any block loss indicates a problem in network packet processing and should
be investigated"
Caso 2 Network
Global Cache Block Loss
Diagnostic Guide
• Poorly sized UDP receive (rx) buffer sizes / UDP buffer socket overflows
(RAC01)
[root@rac1 ~]# netstat -su

Udp:
123437667 packets received
171908 packets to unknown port received.
6828 packet receive errors
2081851993 packets sent
(RAC02)
[root@rac2 ~]# netstat -su

Udp:
387660084 packets received
155302 packets to unknown port received.
0 packet receive errors
468987970 packets sent
• Se encuentran pérdidas de paquetes UDP en el nodo 1, lo cual se traduce en aumento en latencias de transferencia y por
ende, demoras en el procesamiento y trabajo de Oracle.
Caso 2 Network
Diagnostic Guide
• Interconnect LAN non-dedicated
[root@rac1 ~]# cat /etc/hosts

# RED PRIVADA
10.180.23.26
10.180.23.27
(RAC01)
eth3 inet addr:10.180.23.26 Bcast:10.180.23.255 Mask:255.255.255.0
RX bytes:52287574010569 (47.5 TiB) TX bytes:65881057016247 (59.9 TiB)
(RAC02)
eth3 inet addr:10.180.23.27 Bcast:10.180.23.255 Mask:255.255.255.0
RX bytes:580766305300 (540.8 GiB) TX bytes:2772380340803 (2.5 TiB)
Caso 2 Network
Diagnostic Guide
• Limited capacity and over-saturated bandwidth
(Rac 1)
[root@rac1 ~]# ethtool eth3

Settings for eth3:
Speed: 100Mb/s
Duplex: Full
Auto-negotiation: on
(Rac2)
[root@rac2 ~]# ethtool eth3
Settings for eth3:
Speed: 100Mb/s
Duplex: Full
Auto-negotiation: on
Caso 2 Network
CONCLUSIONES Y
RECOMENDACIONES
CONCLUSIONES
La configuración UDP no es la adecuada

Se esta usando la red privada para tráfico distinto al del Interconnect del
cluster
La red privada no está correctamente configurada
RECOMENDACIONES
La red privada debe ser sólo de uso de interconnect

Las interfaces de red deben estar configuradas al menos a 1000 mbps
Caso 2 Network
No sube Grid en nodo 2 – 11gR2
– Logs, CVU
"No levanta el clusterware en nodo 2"
[root@server2 ~]# crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.
Release 11.2.0.1.0 - 64bit

RAC YES
Num CPUs 16
Phys Memory (GB) 35
Platform Linux x86 64-bit
• Logs OS, Logs Oracle Grid Infrastructure, CVU, MOS
Caso 3 Clusterware
Log OS
• Se revisó el archivo /var/log/messages desde la fecha

mantención (fin de semana) hasta el momento actual,
corroborándose que los problemas asociados a la falla de la
tarjeta madre, efectivamente habían sido corregidos.
• Es decir que el SO sube sin ningún tipo de error.
Caso 3 Clusterware
Logs Oracle Grid
Infrastructure
• 11gR2 Clusterware and Grid Home - What You Need to Know
(Doc ID 1053147.1)
• Important Log Locations
• Clusterware daemon logs are all under

<GRID_HOME>/log/<nodename>. Structure under
<GRID_HOME>/log/<nodename>:
• alert<NODENAME>.log - look here first for most clusterware

issues
Caso 3 Clusterware
alertnode.log
• /u01/app/11.2.0/grid_1/log/server2/alertserver2.log
• Lo primero fue la revisión del archivo de alerta del nodo, donde llama la
atención la siguiente entrada:
2015-12-14 17:27:44.559:
[cssd(20301)]CRS-1656:The CSS daemon is terminating due to a fatal error;
Details at (:CSSSC00012:) in
/u01/app/11.2.0/grid_1/log/server2/cssd/ocssd.log
• Que apunta a un grave problema con el Cluster Syncronization Service

Deamon.
Caso 3 Clusterware
ocssd.log
• En la revisión del log del Cluster Syncronization Service Deamon se encuentran las siguientes
entradas que apuntan a un problema con el archivo Voting Disk en el ASM:
2015-12-14 18:07:54.474: [ CSSD][3058187552]clssnmReadDiscoveryProfile: voting file discovery string(ORCL:*,/voting_file)
2015-12-14 18:07:54.477: [ SKGFD][1101998400]OSS discovery with :ORCL:*:
2015-12-14 18:07:54.477: [ SKGFD][1101998400]Discovery skipping bad asmlib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so:
2015-12-14 18:07:54.477: [ SKGFD][1101998400]Discovery advancing to nxt string :/voting_file:
2015-12-14 18:07:54.477: [ SKGFD][1101998400]UFS discovery with :/voting_file:
2015-12-14 18:07:54.508: [ CSSD][1101998400]clssnmvDiskVerify: discovered a potential voting file

2015-12-14 18:07:54.523: [ SKGFD][1101998400]Handle 0x120368a0 from lib :UFS:: for disk :/voting_file/voting_file3:
2015-12-14 18:07:54.525: [ CSSD][1101998400]clssnmvDiskVerify: Successful discovery for disk /voting_file/voting_file3, UID 883ff2bf-f1a94f85-bfcfd521-
c908cb0b, Pending CIN 0:1425783924:0, Committed CIN 0:1425783924:0
2015-12-14 18:07:54.526: [ SKGFD][1101998400]Lib :UFS:: closing handle 0x120368a0 for disk :/voting_file/voting_file3:
2015-12-14 18:07:55.070: [ CSSD][3058187552]ASSERT clssnml.c 453

2015-12-14 18:07:55.070: [ CSSD][3058187552]clssnmlgetleasehdls: Do not have sufficient voting files, found 1 of 2 configured files, needed at least 2
2015-12-14 18:07:55.070: [ CSSD][3058187552]###################################
2015-12-14 18:07:55.070: [ CSSD][3058187552]clssscExit: CSSD aborting from thread Main
2015-12-14 18:07:55.070: [ CSSD][3058187552]###################################
2015-12-14 18:07:55.070: [ CSSD][3058187552](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2015-12-14 18:07:55.070: [ CSSD][3058187552]
• El log indica que no reconoce ningún Voting Disk dentro del ASM, pero si uno afuera de él.
Caso 3 Clusterware
Voting Disk
• Desde el nodo que si funciona correctamente, se realiza una revisión de los Voting Disks
definidos en la configuración del cluster:
[root@server1 ~]# crsctl query css votedisk

## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 754ab903282a4f88bf48eb7090d42b4c (ORCL:OCR2_2) [OCRVOD]
2. ONLINE 883ff2bff1a94f85bfcfd521c908cb0b (/voting_file/voting_file3) [OCRVOD]
Located 2 voting disk(s).
• Esto comprueba que hay un problema con el ASM, ya que el sistema no puede leer la copia
del Voting Disk almacenada ahí.
Caso 3 Clusterware
CVU
• Para detectar problemas adicionales a los del ASM, se ejecutó un análisis completo de las dos
máquinas del cluster, desde el punto de vista que cumplan con lo necesario para albergar una
instalación de Oracle Grid Infrastructure:
[oracle@server1 ~]$ cd /u01/app/11.2.0/grid_1/bin/

[oracle@server1 bin]$ ./cluvfy stage -pre crsinst -n server1,server2
• Salió todo bien excepto la configuración de las ASMlibs:
Checking ASMLib configuration.
ERROR:
PRVF-10109 : ASMLib is not configured correctly on the nodes:
Check failed on nodes:
server2,server1
Check for ASMLib configuration failed.
Caso 3 Clusterware
ASM
• Para la revision de ASM se utilizó la nota “ASMLib Devices Not Discovered with Diskstring as
'ORCL:*' (Doc ID 1444115.1)”
• El problema se descubre a continuación:
[root@server2 sbin]# ./oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=oraacle
ORACLEASM_GID=oinstall
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER="dm-"
ORACLEASM_SCANEXCLUDE="sd"
[oracle@server1 sbin]$ ./oracleasm configure

ORACLEASM_ENABLED=true
ORACLEASM_UID=oracle
ORACLEASM_GID=dba
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER="dm-"
ORACLEASM_SCANEXCLUDE="sd"
• server2 no tiene correctamente configurados el usuario y el grupo dueños de ASM.
Caso 3 Clusterware
SOLUCIÓN
• Se reconfiguran las ASMlibs en el nodo 2:
[root@server2 sbin]# ./oracleasm configure -i

Configuring the Oracle ASM library driver.
This will configure the on-boot properties of the Oracle ASM library
driver. The following questions will determine whether the driver is
loaded on boot and what permissions it will have. The current values
will be shown in brackets ('[]'). Hitting <ENTER> without typing an
answer will keep that current value. Ctrl-C will abort.
Default user to own the driver interface [oraacle]: oracle

Default group to own the driver interface [oinstall]: dba
Start Oracle ASM library driver on boot (y/n) [y]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done
Caso 3 Clusterware
REVISIÓN FINAL
• Se realiza un reboot de server2 y se comprueba que todo sube correctamente de forma
automática:
[root@server2 server2]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@server2 ~]# ps -fea| grep pmon

oracle 22944 1 0 16:26 ? 00:00:00 asm_pmon_+ASM2
oracle 24432 1 0 16:26 ? 00:00:00 ora_pmon_instancia2
root 24622 21665 0 16:26 pts/1 00:00:00 grep pmon
[root@server2 ~]# su - oracle
[oracle@server2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Tue Dec 15 16:28:06 2015
Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options
SQL> select status from gv$instance;
STATUS
------------
OPEN
OPEN
Caso 3 Clusterware
CONCLUSIONES Y
RECOMENDACIONES
• Es altamente inusual que haya una copia del Voting Disk fuera de ASM,
sólo deben usarse sistema de archivos compatibles con una solución de
Oracle RAC, los cuales son:
• ASM (Automatic Storage Managent)

• OCFS (Oracle Cluster File System)
• Se recomienda revisar y cambiar la configuración actual para ajustarla a

las buenas prácticas entregadas por el fabricante.
Caso 3 Clusterware
References
• Oracle® Database Performance Tuning Guide
11g Release 2 (11.2)
Part Number E16638-05
• Oracle® Real Application Clusters Administration and Deployment Guide

• Oracle® Database Reference

• Statistics Package (STATSPACK) Guide (ID 394937.1)

• How to Use AWR Reports to Diagnose Database Performance Issues (Doc ID 1359094.1)
• Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID
563566.1)
• 11gR2 Clusterware and Grid Home - What You Need to Know (Doc ID 1053147.1)
• ASMLib Devices Not Discovered with Diskstring as 'ORCL:*' (Doc ID 1444115.1)
Questions

(Network - Clusterware - RAC) Troubleshooting Without ADDM V1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Network - Clusterware - RAC) Troubleshooting Without ADDM V1

Uploaded by

Copyright:

Available Formats

Cloug Meeting 2

• Miembro de la directiva de CLOUG desde 2012

• Se hace uso exclusivamente de herramientas gratuitas

• Plataformas RAC10gR2 y RAC11gR2, Oracle 12cR1, aun está

"Cliente informa lentitud en un proceso de negocio"

Phys Memory (MB) 60,413

Platform RHEL AS release 4 (Nahant Update 8)

• Snapshot Snap Id Snap Time Sessions Curs/Sess

• Snapshot Snap Id Snap Time Sessions Curs/Sess

Buffer Nowait %: 99.72 Redo NoWait %: 100.00

Buffer Nowait %: 100.00 Redo NoWait %: 100.00

• Top 5 Timed Events Avg %Total

• Top 5 Timed Events Avg %Total

Statistic Time (s) % of DB time

Buffer access - local cache %: 99.64

Buffer access - local cache %: 86.21

Sobre un 11% de los bloques accesados por instancia2, provienen de

Avg global enqueue get time (ms): 0.0

Avg global cache cr block receive time (ms): 0.6

Avg global cache cr block build time (ms): 0.0

Avg global enqueue get time (ms): 0.0

Avg global cache cr block receive time (ms): 0.5

Avg global cache cr block build time (ms): 0.0

Avg global cache current block pin time (ms): 0.0

En instancia2, la misma query es la que:

• Consume más CPU

Esta query tuvo 1309 ejecuciones en el periodo.

a) SQL ordered by CPU DB/Inst: instancia/instancia2

CPU CPU per Elapsd Old

Elapsed Elap per CPU Old

c) SQL ordered by Cluster Wait Time DB/Inst: instancia/instancia2

Cluster CWT % of Elapsd CPU Old

SELECT IMAGEN.IMAG_CODIGO, IMAG_EXPEDICION, IMAG_ORIGEN, IMAG_DESTINO, IMAG_REMESA, IMAG_FECHA_EXPEDICION,

0 SELECT STATEMENT 1 25014 (100)

1 NESTED LOOPS SEMI 1 123 25014 (2) 00:05:01

2 NESTED LOOPS 1 113 25013 (2) 00:05:01

3 TABLE ACCESS FULL IMAGEN 1 84 25012 (2) 00:05:01

4 TABLE ACCESS BY INDEX DOCUMENTO 1 29 1 (0) 00:00:01

5 INDEX UNIQUE SCAN DOCU_PK 1 0 (0)

6 TABLE ACCESS BY INDEX ACCESO_DOCUMENT 4 40 1 (0) 00:00:01

7 INDEX RANGE SCAN ACCD_USUARIO 4 0 (0)

• Se ejecuta esta consulta en un nodo del cluster, mientras hay

• Se requiere determinar si es normal que esta query se ejecute

• Si es normal, el curso a seguir sería realizar la query y las

• Subir STATSPACK a nivel 7 para contar con estadísticas por

"Cliente informa lentitud en su plataforma"

Release 10.2.0.4.0 - 64bit

STATSPACK, MOS, Documentación

Snapshot Snap Id Snap Time Sessions Curs/Sess

Snapshot Snap Id Snap Time Sessions Curs/Sess

Buffer Nowait %: 99.99 Redo NoWait %: 100.00

Buffer Nowait %: 100.00 Redo NoWait %: 100.00

Top 5 Timed Events Avg %Total

Top 5 Timed Events Avg %Total

Statistic Time (s) % of DB time

Statistic Time (s) % of DB time

Global Cache Load Profile

Hay una diferencia de 13.17/s

Buffer access - local cache %: 99.28

Buffer access - local cache %: 99.67

Avg global enqueue get time (ms): 6.2

Avg global cache cr block receive time (ms): 17.5

Avg global cache cr block build time (ms): 0.0

Avg global cache current block pin time (ms): 2612.1