DWRAC

What Are Oracle Real Application Clusters?
• Multiple instances
accessing the same
database Interconnect
• One Instance
per node
• Physical or Shared
logical access cache
to each
database file
• Software-controlled Instances
spread
data access
across nodes
Database
files
RAC Architecture
public network
VIP1 VIPn
Service Service Node n
Node1
Listener Listener
instance 1 instance n
ASM ASM
cluster
Oracle Clusterware interconnect Oracle Clusterware
Operating System Operating System
shared storage
Redo / Archive logs all instances
Managed by ASM
Database / Control files
RAW Devices OCR and Voting Disks

Global Resources Coordination
Cluster
Node1 Noden
Instance1 Instancen
GRD Master Cache GRD Master Cache
… Global …
LMON GES resources GES LMON
LMD0 LMD0
LMSx GCS GCS LMSx
LCK0 Interconnect LCK0
DIAG DIAG
Global Resource Directory (GRD)
Global Cache Services (GCS) Global Enqueue Services (GES)

RAC Software
Cluster
Node1 Noden
Instance1 Instancen
Cache Cache
… Global …
LMON resources LMON
LMD0 LMD0
LMSx LMSx
LCK0 LCK0
DIAG DIAG
Oracle Clusterware Oracle Clusterware

CRSD & RACGIMON Cluster CRSD & RACGIMON
EVMD interface EVMD
OCSSD & OPROCD OCSSD & OPROCD
Applications Global Applications
ASM, DB, Services, OCR management: ASM, DB, Services, OCR
VIP, ONS, EMD, Listener SRVCTL, DBCA, OEM VIP, ONS, EMD, Listener
RAC Software Storage
Node1 Noden Node1 Noden

Instance1 … Instancen Instance1 … Instancen
CRS_HOME CRS_HOME
ORACLE_HOME ORACLE_HOME CRS_HOME CRS_HOME
ASM_HOME ASM_HOME
Local storage Local storage Local storage Local storage
Voting files Voting files

OCR files OCR files
Shared storage ORACLE_HOME
ASM_HOME
Shared storage
Permits rolling patch upgrades

Software not a single
point of failure
RAC Database Storage
Node1 Noden
…
Instance1 Instancen
Archived Archived
log files log files
Local storage Local storage
Data files
Undo tablespace Undo tablespace
files for Temp files files for
instance1 Control files instancen
Flash recovery area files
Online Change tracking file Online
redo log files SPFILE redo log files
for instance1 TDE Wallet for instancen
Shared storage
Automatic Storage Management
• Eliminates need for

conventional file system and
volume manager
• Capacity on demand
• Add/drop disks online
• Automatic I/O load balancing
• Stripes data across disks to
balance load
• Best I/O throughput
• Automatic mirroring
• Easy
Automatic Storage Management
• Simplify and Automate Database Storage management

• Fraction of the time is needed to manage database files
• Increase Storage Utilization
• Eliminate over provisioning and maximize storage resource
utilization
• Predictably Delivers on Service Level Agreements
• Never get out of tune delivering higher performance than RAW
& File System over time
• Uncompromized availability empowering low cost storage
deployment reliably
Clusters and Scalability
SMP model RAC model
Shared
Memory
storage
Cache Cache SGA SGA
CPU CPU CPU CPU BGP BGP BGP BGP
Cache coherency Cache fusion
BGP: Background process

Real Application Clusters Benefits
• Highest Availability
• On-demand flexible
scalability Database
• Lower computing costs
• World record
performance
Storage
Levels of Scalability
• Hardware: Disk input/output (I/O)

• Internode communication: High bandwidth and low latency
• Operating system: Number of CPUs
• Database management system: Synchronization
• Application: Design
Scaleup and Speedup
Original system
Hardware Time 100% of task
Cluster system scaleup Cluster system speedup
Hardware Time Up to
200% Hardware
of Up to
100%
task 300%
Hardware of task
Time of
Hardware
task Time/2
Hardware
Time
Speedup/Scaleup and Workloads
Workload Speedup Scaleup
OLTP and Internet No Yes
DSS with parallel query Yes Yes
Batch (mixed) Possible Yes

Definition of a Data Warehouse
“An enterprise structured repository of subject-

oriented, time-variant, historical data used for
information retrieval and decision support. The
data warehouse stores atomic and summary data.”
Data Warehouse - Characteristics
• What is Data Warehousing today?

• Not a simple batch query and analytical engine anymore
• Large user population with diverse query and analytical
needs
• 1000’s of users accessing data both internally and
externally
• Large size, 10 TB and upwards of 100 TB
• Not a simple schema with few tables
• Multiple applications sharing an common copy of
enterprise data
• Strict performance and operational SLA’s
• Adaptable to growing business needs
• Constantly evolving with more business units and
functionality
• Constant requirement to scale users, data
Data Warehouse - Characteristics
• Large, complex database operations

• Complex SQL and calculations
• Updated through a controlled process
• Extract, Transform, Load (ETL)
• Heterogeneous workload
• ETL processing
• Scheduled reporting
• Ad hoc queries
• Aggregations etc…
• Peak usage of different workload patterns at
different times
• System have to be sized appropriately
Data Warehouse - Requirements
• High availability and reliability

• Deliver real-time data for real time queries
• Get more in-time, accurate data
• Stay Informed
• Have the ability to Make Decisions & Take Action
• Have a Lag-Time of Hours/ Minutes
• High performance and throughput.
• Capability to scale quickly as the business is
growing
• Flexibility to meet diverse, shifting demands
RAC and Data Warehouse
Physical Considerations
Configure for a Balanced System
Interconnects
“The weakest link”

defines the performance
Balance these
components:
HBA1
HBA2
HBA1
HBA2
HBA1
HBA2
HBA1
HBA2
CPU
HBA (Host Bus Adapter)
NICs and Interconnect Protocol
Switch speed
FC-Switch1 FC-Switch2 Controllers
Disks
Disk Disk Disk Disk Disk Disk Disk Disk

Array 1 Array 2 Array 3 Array 4 Array 5 Array 6 Array 7 Array 8
Grid Component* Dependencies
Rule of Thumb: Maximal Number of

200MB/s per CPU Number of Switches
HBA = = Number of
Number of ControllersHBAs
Number of HBA per
node = number CPUs
CPU,Node
per node Number of HBAs +
Number of Controllers
Host Bus Adapter
Switch
Number of nodes <=8
GigE, otherwise
Controller
infiniband
Disk
Interconnect
Minimum number of
disks = number of
* 2Gbit based controller x 4
I/O Design
• Optimal Storage Design

• Support workload that perform Sequential I/O
• Expressed, Bandwidth - MB/sec
• Large multi-block I/O’s -
• Table/Index scans
• Support workload that does Random I/O
• Expressed, I/O Operations – IOPS/sec
• Single block block I/O requests
• Estimation should include requirements for both
normal/backup I/O’s
I/O Design
Estimate aggregated throughput and IOPS

(E.g., 2GB/sec, or 30,000 IOPS)
Calculate the total bandwidth requirement per node

(E.g., 2GB/sec for 16 nodes = 128MB/node/sec
or 30,000/16 = 1875 IOPS/node)
Choose the appropriate storage class and build the configuration

(E.g., 120 IOPS per spindle, 16-way striped = 1920 IOPS per LUN
16 LUNS)
I/O Design
• DW Specific Best practices

• Plan 50-60% utilization per HBA
• Target 30-50 Meg Per CPU Core
• Use ASM
• Managing Ultra Large Database fairly simple
• Eliminate contention by evenly spreading I/O
• Expanding Storage need is addressed easily
• Re-balancing ensures I/O performance is constant
• Create optimal size LUN’s
• Small LUN’s for multi-terabyte DB’s are sub-optimal
• Pay attention to initial storage layout while increasing
cluster nodes exponentially
• Offset partition table to stripe-width of the Storage Array
Interconnect Design
• Interconnect Design
• In DW environment primary users of interconnect
• Inter-node Parallel Query
• Typical message size
• PARALLEL_EXECUTION_MESSAGE_SIZE default 2k
• Global Cache Fusion

• Two Types of message
• Short 256 Byte message
• Block Transfer - DB_BLOCK_SIZE
Interconnect Design
• Interconnect Bandwidth Estimation

• Message received (M)
• 256 * (GES message + GCS messages)
• Blocks received (B)
• (db_block_size * (cr block received + current block
received)) / mtu size
• PQ message received (P)
• (Parallel_execution_message_size * no of PQ remote
messages received) / mtu size
• Total bandwidth required …
• (Message received + Blocks received + PQ message
received) / max network transmit capacity
• (M+B+P)/85000
Interconnect design – Cache Traffic
• Example from AWR:

Global Cache Load Profile Per Sec Per Trans
------------------------------- ---------- ---------
Global Cache blocks received: 2.70 2.23
Global Cache blocks served: 2.84 2.36
GCS/GES messages received: 164.07 136.03
GCS/GES messages sent: 136.96 113.56
DBWR Fusion writes: 0.22 0.18
Estd Interconnect traffic (KB): 103.08
• This DW system primarily uses PQ

• Global cache traffic is minimal
• Mostly dictionary blocks
Interconnect Design – IPQ traffic
• Example from AWR:

Statistic Total per Sec per Trans
--------------------------- --------- -------- ----------
PX local messages recv'd 104 0.1 0.1
PX local messages sent 104 0.1 0.1
PX remote messages recv'd 200271 200.2 151.1
PX remote messages sent 213267 213.2 156.1
• The per second this system receives 200 messages

• PQ message Size is 8182
• Usage is 1.5 MB/Sec
• For this workload GigE should be optimal
Interconnect Design

• Plan 50-70% utilization of Network Bandwidth
• GigE performs very well
• IPQ usage is less
• Multiplexed GigE is choice for many customers
• For high IPQ usage
• Infiniband, if available on your platform
• RDS in Linux offers good performance over IB
Temporary Tablespace Design
• Large sorts in Data Warehouse use temp spaces

• For performance reasons temp space allocation is
managed thru SGA
• Unless requested, space allocated in one instance is not
returned to common pool
• Space reclamation is done under SS and CI enqueue
• This could cause slowdown if space is reclaimed
constantly
• A few queries with excessive temp space requirement
can cause imbalance of usage among instances
Temporary Tablespace Design

• Make sure enough temp space is allocated combining all
instances’ usage
• Allocate separate temp tablespace for users who perform large
sorts
• For each temp tablespace, create as many temp files as the no.
of instances
• This would eliminate ‘buffer busy’ waits associated with
temp file header
• If imbalance is found, use the following command to release
excessive allocation
• “alter session set events 'immediate trace name
drop_segments level <TS number + 1>';”
• Metalink Note: 465840.1 for more details
RAC and Data Warehouse
Database Technologies
Automatic Workload Management: Services
• Application workloads can be defined as Services

• Individually managed and controlled
• Assigned to instances during normal startup
• On instance failure, automatic re-assignment
• Service performance individually tracked
• Finer grained control with Resource Manager
• Integrated with other Oracle tools / facilities (E.G. Scheduler,
Streams)
• Managed by Oracle Clusterware
Many Services, One Database
Node-1 Node-2 Node-3 Node-4 Node-5 Node-6
Queries
Aggregations ETL1 Backu
p
ETL2
How to define a service
• 1. SRVCTL
srvctl add service –d ORA –s APP1 –r INSTANCE1,INSTANCE2
srvctl add service –d ORA –s APP2 –r INSTANCE3,INSTANCE4
(db) (service) (preferred instances)
• 2. Using OEM Grid Control
• 3. DBMS_SERVICE (for single instance)
DBMS_SERVICE.CREATE_SERVICE
Partitioning
• Powerful functionality for partitioning objects into smaller

piece
• Beneficial for any environment with large volumes of data
• Business decision, not hardware based (top-down design
approach, NOT bottom-up)
Partitioning Strategies
Range Partitioning
Hash Partitioning
List Partitioning
Composite Partitioning
• Composite Range-Range Partitioning
Composite Range-Hash Partitioning
Composite Range-List Partitioning
Composite List-Range Partitioning
Composite List-Hash Partitioning
Composite List-List Partitioning
Query Performance:
Partition Pruning
05-Jan
Only the relevant partitions are
05-Feb accessed
05-Mar select sum(sales_amount)
from sales
where sales_date between
05-Apr
to_date(‘01-MAR-2005’,‘DD-MON-YYYY’)
and
05-May to_date(‘31-MAY-2005’,’DD-MON-YYYY’)
05-Jun
Sales
Partition-wise Joins
• Partition-wise join may provide significant

performance improvements
• Partition-wise join supported for range, hash and
composite partitioning
• Optimizer chooses partition-wise joins whenever
possible
• Degree of parallelism not correlated to number of
partitions
Full Partition-wise Joins
When joining two tables that are partitioned on the join-key,

Oracle may choose to join on a per-partition basis
Lineitem Orders Lineitem Orders
Sub-1 05-Apr 05-Apr

Sub-1
Sub-1 Sub-1 Node 1
Sub-2 Sub-2 Sub-2 Sub-2

Node 2
Sub-3 Sub-3
Sub-3 Sub-3 Node 3
Partial Partition-wise Joins
Partial Partition-wise join: If Lineitem is partitioned by
the join key, then Orders can be re-distributed to
enable partition-wise join
Lineitem Orders Lineitem Orders
Sub-1 Sub-1
Sub-1 Node 1
Sub-2 Sub-2 Sub-2 Node 2
Sub-3
Sub-3 Sub-3 Node 3
What is Parallelism
• Breaking a single task into multiple smaller,

distinct units
• Instead of one process doing all the work multiple
processes working concurrently on smaller unit
• Independent of the number of nodes
How Parallel Execution Works?
• With serial execution, only one process is used

• With parallel execution:
• One parallel execution coordinator process
• Many parallel execution servers
• Table may be dynamically partitioned
Serial Process Coordinator

SELECT COUNT(*) SELECT COUNT(*) SALES
FROM sales FROM sales
SALES
Parallel
Execution Servers
Parallel Operations
SELECT cust_last_name, cust_first_name

FROM customers
ORDER BY cust_last_name;
Execution Servers
Consumers Producers
SQL Data Table on disk
sort A-K scan
dispatching
sort L-S scan
results
Coordinator sort T-Z scan
Table’s
dynamic
Intra-Parallelism Intra-Parallelism partitioning
DOP=3 (granules)
Inter-Parallelism
How Parallel Execution
Servers Communicate
• Rows Distribution:
• PARTITION QC
• HASH
• RANGE Parallel
• ROUND-ROBIN Execution
• BROADCAST Server Set 1
• QC(ORDER)
• QC(RANDOM) Parallel
Execution
Server Set 2
DOP=3
Degree of Parallelism (DOP)
• Number of parallel execution servers used by one

parallel operation
• Applies only to intra-operation parallelism
• If inter-operation parallelism is used then the number of
parallel execution servers can be twice the DOP
• No more than two sets of parallel execution servers can
be used for one parallelized statement
• When using partition granules, use a relatively high
number of partitions
Parallel Execution with RAC
• Execution slaves have node affinity with the

execution coordinator, but will expand if needed.
Node 1 Node 2 Node 3 Node 4
Execution
coordinator
Shared disks Parallel

execution
server
Adaptive Parallelism
•Adaptive Multiuser feature adjusts the DOP based on user load

Initially no workload
•Enabled by default: PARALLEL_ADAPTIVE_MULTI_USER=TRUE
1st user logs on

Node 1 Node 2
issues a query -> parallel 8
Node 1 Node 2 2nd user logs on

Node 2
3rd and 4th user logs on
Node 1
Inter-node Parallel Query– Oracle10g
• Parallel execution slaves allocated on instances

without regard for services
• Benefits of services greatly reduced when using
parallel execution
• Workaround – instance groups
• instance_groups=ig1,ig2,ig3 (not dynamic)
• parallel_instance_group=ig2 (dynamic)
Inter-node Parallel Query– Oracle11g
• Parallel execution slaves only allocated on

instances offering the service that the user
session is connected to
• All services have equivalent, dynamic instance
groups
• Services can be created
• For different IPQ user groups
• Preferred and Available Characteristics of services
can be exploited
• IPQ SLA’s can be guaranteed thru service failover
Overview: Parallel Join Execution
• EMP and DEPT joined on deptno QC

• Repartition EMP and DEPT on
deptno DFO Send
• Join each partition
Hash
Join
Receive Receive
Hash Hash
DFO DFO
Send Send
Table Table
Scan Scan
Parallel Hash-Join with 8 Slaves
Node 1 Node 2
Interconnect Can Become a Bottleneck

Pre-filtering can reduce communication
DFO
Hash Join
Filter Create Receive
Receive
Set
DFO
DFO Shared Bloom filter Send
Send
Test Filter Use
Scan Dept
Scan Dept
11gR1: Extended to Serial Execution
Serial Plan Hash Join
View
Filter Create
Set
Group By
Local Bloom filter
Filter Use
Test
Scan Dept
Scan Emp
Parallel Execution on RAC
• Need to Merge bloom filter over a cluster

• Potentially costly operation
• Prior to Merging, each node contains a private,
incomplete bloom filter
• Merging done in Parallel
• Each producers split the bloom filter in pieces
• Each pieces is sent to a single consumers on each other
node
• Each consumer merges the received pieces in their local
bloom filter
• After Merging the bloom filter is complete and can
be used for filtering
Two Approaches to Parallelism and Partitioning
Shared Everything Shared Nothing

Parallel degree independent of Static parallel degree dependent
the number of nodes on number of nodes
Data partitioning independent of Static Data partitioning
the number of nodes dependent on number of nodes
Hash 1 Hash 2 Hash 3 Hash 4

Data A-Z
Oracle Advanced Compression
• Oracle 9i compresses data only during bulk

load; useful for DW and ILM
• Oracle 11g compresses w/ inserts, updates
• Trade some cpu for disk & i/o efficiency
• Compress large application tables
• Transaction processing, data warehousing
• Compress all data types: structured,
unstructured
• Savings cascade to all db copies: test, dev,
standby, mirrors, archiving, backup, etc.
Let’s Talk About
RAC & Data Warehouse
The Key Question
How should I design and configure my Oracle Data

Warehouse ?
Answer : It depends….
Few Large Nodes
or
Many Small Nodes
?
Manageability
• Many nodes are more difficult to manage:

• Increase maintenance
• Performance problems are harder to diagnose
• Statistic gathering is more challenging
• However, computing power lost during planned and
unplanned outages has less impact:
• 16 x 2 grid 6% less power
• 4 x 8 grid 25% less power
• Many nodes are more flexible to distribute different
workloads
Scalability: Scale-Out
• Easy scale out

• Simply add nodes with no reconfiguration of database,
but
• Keep a balanced system
• Watch out for number of slots in switch
• We recommend adding only nodes with similar
performance characteristics: CPUs, HBAs, NICs
etc.
• Scale-out increment is one node
• 16 x 2 grid 6% increase in computing power
• 4 x 8 grid 25% increase in computing power
How can I run different workload types?
• Managing and partitioning the workload using Services

- Services provide a single system image for managing workload
- Service span one or more instances of the database. An instance
can support multiple services. The number of instances offering
the service is managed by the DBA independent of the application
- How many services do I need to define?
- How many instances will offer a service?
- Are there services that should run on one instance for
performance reason (contention on resources for example)
• Managing the workload using Resource Manager

- Using Oracle Database Resource Manager facilitates meeting
SLAs and provide effective control of system resources focused
primarily on running an Oracle database instance
What is the optimal partitioning strategy?
• Partitions are the foundation for achieving effective

performance in a large/very large data warehouse and other
features depend on partitioning to achieve the benefit
objective
• Important criteria considered when choosing partitioning
Performance (primary motivation)
Ease of administration/management
Data purge
Data archiving
Data movement
Data lifecycle management
Efficiency of backup
What is the optimal partitioning strategy?
• Grouping data by value for pruning (range, list)

• Balancing data distribution (hash partitioning).
• Dividing data across parallel processing to balance
workload (partition wise join)
• Combining different partition mechanism (composite
partitioning)
Which degree of parallelism ?
• Different scenarios can be used for parallel query:

- Standard use of parallel query for large data sets. In this
scenario, the degree of parallelism can be defined to utilize
all of the available resources across the cluster.
- Use of restricted parallel query. This scenario restricts the
processing to specific nodes in the cluster. Thus nodes can
be logically grouped for specific types of operations. This
can be done by using Services and/or
Parallel_instance_Group
Which degree of parallelism ?
• The downside of parallel operations is the exhaustion of

server resources:
- If I/O bottlenecks currently exist, use of parallel
operations may exacerbate it
- If CPU utilization is relatively high on one node, using
more instances for parallel query may help.
• The use of parallel operations within the RAC environment

provides for the flexibility to utilize all the server hardware
that is part of the cluster architecture. Utilizing instance
groups, database administrators can further control the
allocation of these resources based on application
requirements or service level agreements.
Summary – DW on RAC Best Practices
• Design to support Business Needs

• Implement and test
• Partition the Data
• Partition the Workload
• Configure Parallel Query
• Measure and Monitor

DWRAC

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DWRAC

Uploaded by

Copyright:

Available Formats

What Are Oracle Real Application Clusters?

Operating System Operating System

RAW Devices OCR and Voting Disks

Global Resource Directory (GRD)

Global Cache Services (GCS) Global Enqueue Services (GES)

Oracle Clusterware Oracle Clusterware

Node1 Noden Node1 Noden

Voting files Voting files

Permits rolling patch upgrades

• Eliminates need for

• Simplify and Automate Database Storage management

SMP model RAC model

Cache Cache SGA SGA

CPU CPU CPU CPU BGP BGP BGP BGP

Cache coherency Cache fusion

BGP: Background process

• Hardware: Disk input/output (I/O)

Hardware Time 100% of task

Cluster system scaleup Cluster system speedup

Workload Speedup Scaleup

OLTP and Internet No Yes

DSS with parallel query Yes Yes

Batch (mixed) Possible Yes

“An enterprise structured repository of subject-

• What is Data Warehousing today?

• Large, complex database operations

• High availability and reliability

“The weakest link”

Disk Disk Disk Disk Disk Disk Disk Disk

Rule of Thumb: Maximal Number of

• Optimal Storage Design

Estimate aggregated throughput and IOPS

Calculate the total bandwidth requirement per node

Choose the appropriate storage class and build the configuration

• DW Specific Best practices

• Global Cache Fusion

• Interconnect Bandwidth Estimation

• Example from AWR:

• This DW system primarily uses PQ

• Example from AWR:

• The per second this system receives 200 messages

• DW Specific Best practices

• Large sorts in Data Warehouse use temp spaces

• DW Specific Best practices

• Application workloads can be defined as Services

Node-1 Node-2 Node-3 Node-4 Node-5 Node-6

• Powerful functionality for partitioning objects into smaller

• Partition-wise join may provide significant

When joining two tables that are partitioned on the join-key,

Lineitem Orders Lineitem Orders

Sub-1 05-Apr 05-Apr

Sub-2 Sub-2 Sub-2 Sub-2

Sub-2 Sub-2 Sub-2 Node 2

• Breaking a single task into multiple smaller,

• With serial execution, only one process is used

Serial Process Coordinator

SELECT cust_last_name, cust_first_name

• Number of parallel execution servers used by one

• Execution slaves have node affinity with the

Node 1 Node 2 Node 3 Node 4

Shared disks Parallel

•Adaptive Multiuser feature adjusts the DOP based on user load

1st user logs on