Professional Documents
Culture Documents
Larry Chen
SRS2, Inc.
One Market St, Spear Tower, Suite 2260
San Francisco, CA 94105
USA
Executive Summary
Customers using Siebel Customer Relationship Management (CRM) solutions are
increasingly demanding greater scalability and high availability to support mission-
critical operations and continued business growth. Such environments are characterized
by 24 X 7 operations and several thousand concurrent users. In addition, many large
Siebel customers are interested in leveraging the Linux operating system to decrease the
cost of deployment for mission-critical enterprise software.
Increasing the overall capacity of a Siebel CRM environment requires scaling in both the
middle tier and database tier of Siebels N-tier architecture. The middle tier is designed to
scale up by adding more CPUs to a single Siebel Application Server and to scale out
easily through the addition of Siebel Application Servers. Traditionally, the database tier
could only be scaled up by replacing one proprietary computing platform with another
more powerful platform to get more database performance. The availability of Oracle 10g
Real Application Clusters (RAC) changes this paradigm, allowing the database tier to
scale out through the addition of lower cost servers, including those running Linux.
Siebel partnered with SRS2 in collaboration with Oracle and several hardware vendors to
test the viability of this approach in Siebel environments under real-world conditions. In
benchmarks utilizing a single server, a two-node Oracle 10g RAC cluster and a four-node
Oracle 10g RAC cluster, the results show that Oracle 10g RAC offers 80% scalability up
to 4 nodes. A single medium-sized database node supported up to 2,500 users, a 2-node
cluster supported 4,000 users, and a 4-node cluster supported 8,000 users (all servers at
75% CPU utilization), proving the viability of the scale-out approach.
This paper discusses the details of this investigation carried out using Egenera
BladeFrame servers and Network Appliance Unified Storage.
Because databases have traditionally been constrained to run only on a single server,
Siebel customers have typically followed a scale-up strategy for the database part of
the Siebel IT infrastructure. Whenever the database server becomes a bottleneck to
overall application performance, the server is replaced with a larger, faster machine.
While this approach is well understood, it can be highly disruptive to ongoing business
Oracle 10g Real Application Clusters (Oracle 10g RAC) provides a potential alternative
approach for scaling database performance for Siebel applications. Since, Oracle 10g
RAC is supported on Linux and Microsoft Windows 2003 Advanced Server, lower-cost,
industry-standard hardware platforms can be used. Oracle 10g RAC is designed to scale
through the addition of server hardware to a database cluster. Each server runs against the
same database, allowing the database infrastructure to be scaled-out as needs grow
while also providing high availability. This approach promises to be less disruptive to
ongoing business operations, more reliable, and less expensive to implement.
This paper explores the results of these studies, proving the viability of the scale-out
approach and providing a set of best practices for implementing Oracle 10g RAC for use
with Siebel Applications.
- Minimal IT support
- Easy to deploy/provision/re-configure
- Flexibility to allow new servers to be added to any tier
- Small footprint
- Centralized administration
- Ability to run multiple tests against different databases in parallel
The test infrastructure also had to meet the requirements of Oracle 10g RAC, which
utilizes clustered hardware to run multiple nodes (Oracle instances) against a single
database. If one cluster node fails, the other nodes continue to provide uninterrupted
access to the database. Database files are stored on shared storage that is physically or
logically connected to each node. To maintain the consistency of the database, Oracle
10g RAC software coordinates all database modifications between cluster nodes. A
cluster interconnect enables database instances to pass control information and data to
each other.
Two Control Blades (C Blades) in the Egenera BladeFrame were used for external I/O
and IP networks, while two Switch Blades (S Blades) were used for point-to-point
connections between Processing Blades and Control Blades over an integrated switched
fabric network inside the BladePlane running at 2.5 Gigabits/sec.
The BladeFrame architecture was designed for mission critical applications and has built-
in High Availability with N+1 failover. With no local disk, Processing Blades have no
permanent identity. This proved ideal for testing since individual blades could be
assigned different tasks and re-assigned as testing conditions were altered.
The individual blades in the Egenera BladeFrame were allocated to support one of four
different functions:
1. Load Generators: Configured with Windows 2003 Advanced Server and used
during testing to simulate user loads, thereby taking the place of the client tier in
Siebels N-tier architecture. Each blade was capable of simulating up to 2,000
users.
2. Application Servers: Configured as Siebel Application servers running Siebel
7.7 software on top of Windows 2003 Advanced Server. These Application
servers were divided into two clusters of 5 servers each for high availability.
3. Web servers: Configured as web servers running Windows 2003 Advanced
Server. All requests from the load generators go to the web servers as HTML,
resulting in calls to the Application servers.
4. Database servers: Configured with Oracle 10g RAC and Red Hat Linux
Advanced Server 2.1 (kernel level: 2.4.9e25). Each database server was
configured with 4 network interfaces, two for message passing between nodes,
and two for external network connections to Siebel Application Servers. In each
case, one interface served as primary and the second as backup to ensure high
availability.
NetApp provided a highly available FAS980 cluster for Siebel testing. Each FAS980 has
a maximum disk capacity of 32 TB, dual 2.8GHz Intel Pentium 4 CPUs with 2MB of
Level3 Cache, 8GB of system memory, and 512MB of NVRAM (nonvolatile RAM). The
cluster consisted of two FAS980 systems in an active-active configuration. Both storage
systems are active during normal operation. Should one system fail, the other takes over
the workload and physical storage of the failed system.
The NetApp storage cluster was connected to the Egenera BladeFrame via a fibre channel
SAN. The NetApp storage cluster provided block-level storage for each of the Egenera
blades serving as operating system "boot" devices. Block-level storage for the Oracle
databases and Siebel application servers utilized during testing were also provided by the
NetApp cluster.
Enterprise-class data protection including on-demand backup and restore are key
SAN Configuration
A pair of Brocade Silkworm 4100 2Gbit/sec fibre channel switches was used to provide
the SAN fabric between the NetApp cluster and the BladeFrame. The SAN was designed
to ensure no single points of failure for high availability.
Figure 2 shows how the hardware is deployed at Siebels data center. The twenty-four
servers required for the testing were provided neatly organized by the BladeFrame,
significantly reducing the wiring and data center space required.
The following diagram shows the logical layout of Egenera blades and Netapp storage
systems.
Most Siebel customers deploy only one of these two modules. Combining both Siebel
Call Center and Siebel eChannel produces heavier workloads than most customers will
see in their deployments. Both are OM-based products providing a broad array of
services and functionality, and have a large installed user base, typically with heavy user
load.
Test Database
A Siebel engineering scalability database was used for all tests. This database is a
representation of the cross-functional nature of Siebel Industry Solutions; the data shape
is as close to production shapes as can be simulated with a synthetically-generated
database, and is used for Siebel Performance/Scalability/Reliability testing. The total size
of the database was 200GB. The main tables involved in the test (S_SRV_BU, S_OPTY,
etc.) contain 4-6 million rows of data each.
1. A single active RAC node. A second RAC node was configured but inactive.
Simulated user load was added to drive the active node to 60% to 70% CPU
utilization on average. This test provided the baseline against which the scalability
of multi-node configurations was judged.
2. Two active RAC nodes. Simulated user load is added until both nodes reach 60%
to 70% CPU utilization on average.
3. Four active RAC nodes. Simulated user load is added until all nodes reach 75%
to 80% CPU utilization on average with acceptable performance.
Note that similar testing was performed with Oracle 9i RAC on an HP cluster. Those
results are not reported here. In general, customers can expect similar performance with
Oracle 9i RAC.
Each run included a ramp-up phase, a steady-state phase, and a ramp down phase, as
shown in Figure 4.
Database CPU usage was gathered during the steady state phase using the Linux vmstat
command. Oracle statspack snap shots were also taken for database analysis and tuning.
At the end of each run, the data was analyzed and the workload was deemed to have
passed or failed. Before a workload was deemed to have passed, it had to meet the
following criteria:
Number of RAC Nodes Avg. DB Server CPU (%) Total Number of Users
1 65%-70% 2500
2 65%-70% 4000
4 65%-70% 8000
Table 3: Total Number of simulated users supported
Adding nodes to the Oracle 10g RAC configuration results in 80% scalability. In other
words, 2 RAC nodes can support 4,000 users and 4 RAC nodes can support 8,000 users
versus:
Table 4 summarizes the total number of transactions completed per second for CallCenter
and eChannel in each configuration and details the average response time for new service
requests and new opportunities.
Based on these results, Oracle 10g RAC provides Siebel users an alternative to the
traditional scale-up database strategy. A customer can start with a modest configuration
consisting of 1 or 2 nodes and expect good scaling with each node added. Please note that
Siebel Remote requires ORDERED sequences which conflict with Oracle 10g RAC
scalability requirements. Therefore, Siebel does not support Siebel Remote with Oracle
10g RAC.
Our testing also derived substantial benefits from the hardware architecture described
above. The Egenera BladeFrame made it simple to deploy and manage not only database
servers, but also the Siebel Application servers, web servers, and load generators needed
to create a complete environment and to ensure high availability for all components.
Individual blade servers could be easily re-purposed as needs during testing changed with
a minimum of cabling and configuration alterations. This flexibility should be of benefit
in almost any dynamic IT environment.
The highly available NetApp storage cluster complemented the Egenera BladeFrame,
providing great flexibility, simplified management, and innovative features that made
testing easier. NetApp unified storage allowed us to support SAN and NAS connections
from the same storage system. During testing the test team was able to quickly and easily
reconfigure the SAN storage as needs changed. The ability to clone, delete and move
LUNs and file systems as necessary simplified the teams work, and should provide
similar benefits in any busy Siebel environment.
The tuning performed for this project was based on data from statspack reports taken at
regular intervals during steady state testing and also from closely monitoring CPU and
I/O activity on each database node. The first two pages of a statspack report briefly
describe the workload. Statistics and wait events presented there were monitored over
time to proactively detect performance issues and to ensure performance was optimal.
SAR and vmstat provided measures of CPU usage and I/O performance during the
benchmark.
The tuning changes described here apply to the workload created by this particular
benchmark. The workload consisted of approximately 30% inserts and 70% selects. Care
should be taken to verify that your workload is similar before applying these changes.
Tuning Recommendations
SEQUENCES
Sequences are used in the Siebel application in order to create sequential numbers.
Sequences have two performance related properties: caching and ordering . For optimal
performance in RAC, sequences should use the CACHE clause, a reasonably large cache
size and NOORDER unless gapless sequences and ordering are required.
In the benchmark high wait time was observed on SQ-ENQUEUE.The insert rate was
very high with multiple nodes inserting data simultaneously causing leaf block contention
on the index.
Tuning:
Changed the sequences to use large CACHE sizes (10000) and NOORDER.
HASH PARTITIONING
The benchmark experienced some contention on index blocks experiencing high insertion
rates. This contention shows up in statspack as high wait times for the following wait
events and affects response times:
enq: TX - index contention
gc buffer busy
gc current block busy
buffer busy waits
Tuning:
The tables and indexes under contention were identified and hash partitioned. A right-
growing index is a characteristic hot spot for OLTP applications, due mainly to the fact
It should be noted that SQL execution could be affected when partitioning tables and
indexes because index range scans may need to access all index partitions. This could be
addressed by ensuring that the partition key is used in the where clause of the query to
perform partition elimination.
_gc_defer_time:
The parameter _gc_defer_time represents the time a block is deferred (if a cleanout is
pending for the block ) before being shipped to the requesting node. The default is 30ms.
The parameter generally increases the local affinity for a particular block.
In the current benchmark waits were observed on the Global cache null to x and a high
ratio for current block defers (greater than 0.3).
This event is waited on when one instance wants to modify a current block and does not
find it in its local cache in the appropriate access mode. After sending the request to the
other node the session waits for the current block transfer from the remote cache. The
latency of this operation is strongly influenced by the time it takes the serving instance to
release the block and one of the main delaying factors is the _gc_defer_time.
Tuning:
Turn off _gc_defer_time by setting it to 0. The current block is not deferred and is
shipped immediately to the requesting node. Note that this tuning is specific to the
current workload and might not work for all workloads.
Hyperthreading
Oracle will work just fine on any O/S that recognizes a hyper-threading enabled system.
In addition, it will take advantage of the logical CPUs to their fullest extent (assuming the
O/S reports that it recognizes that hyper-threading is enabled).
Tuning:
During the Benchmark, turning on hyperthreading helped us eliminate some high run
queues.
The Statspack was showing lot of contention on Enq:HW contention. In Oracle, the
High-water Mark (HWM) of a segment is a pointer to a data block up to which free
blocks are formatted and are available to insert new data. If data is inserted at a high rate,
new blocks may have to be made available after a search of the freelists is unable to
return any space. This involves formatting the blocks, inserting them into a segment
header or bitmap block and raising the HWM.
Tuning:
The fast growing segments were identified . Uniform and large extent sizes were used for
the locally managed and automatic space managed segments which are subject to high
volume inserts. This alleviated some of the HW enqueue contention.
Acknowledgements
The authors would like to acknowledge the following individuals for their contributions
to this project: