Professional Documents
Culture Documents
7.0.0.1
Performance Report
IBM Corporation
WebSphere Business Process Management Performance Team
March 2010
Introduction
Introduction
ii
IBM Corporation
WebSphere Business Process Management Performance Team
March 2010
This publication is unclassified, but it is not intended for general or broad public circulation.
The purpose is to provide detailed performance data, best practices, and tuning information for
the products covered. The target audience is software services and technical support specialists.
The expected usage is to provide guidance in making rational configuration choices for proofs of
concept and for product deployments.
Though the content can be shared with customers, preferably in a one-on-one discussion, the
information is not intended as general sales material.
Introduction
iii
INTRODUCTION ............................................................................................................................... 1
1.1 OVERVIEW ........................................................................................................................................ 1
1.2 ADDITIONS IN THIS REPORT ............................................................................................................... 3
1.3 SUMMARY OF KEY MEASUREMENTS ................................................................................................ 4
1.4 DOCUMENT STRUCTURE AND USAGE GUIDELINES ........................................................................... 6
1.4.1
Document Structure................................................................................................................ 6
1.4.2
Measurement Usage Guidelines ............................................................................................. 7
Introduction
iv
3.7.1
Use Asynchrony judiciously.................................................................................................. 28
3.7.2
Set the Preferred Interaction Style to Sync whenever possible............................................. 28
3.7.3
Avoid Asynchronous Invocation of Synchronous Services in a FanOut / FanIn Block ........ 29
3.8 MEDIATION FLOW CONSIDERATIONS .............................................................................................. 30
3.8.1
Use mediations that benefit from WESB optimizations ........................................................ 30
3.8.2
Usage of XSLTs vs. BO Maps ............................................................................................... 32
3.8.3
Configure WESB Resources ................................................................................................. 32
3.9 LARGE OBJECT BEST PRACTICES .................................................................................................... 33
3.9.1
Avoid lazy cleanup of resources ........................................................................................... 33
3.9.2
Avoid tracing when processing large BOs............................................................................ 33
3.9.3
Avoid buffer-doubling code .................................................................................................. 33
3.9.4
Make use of deferredparsing friendly mediations for XML docs........................................ 33
3.10
WICS MIGRATION CONSIDERATIONS ......................................................................................... 34
3.11
WID CONSIDERATIONS .............................................................................................................. 35
3.11.1 Leverage Hardware Advantages .......................................................................................... 35
3.11.2 Make use of WAS shared libraries in order to reduce memory consumption....................... 35
3.12
FABRIC CONSIDERATIONS .......................................................................................................... 35
3.12.1 Only specify pertinent context properties in context specifications...................................... 35
3.12.2 Bound the range of values for context keys .......................................................................... 35
4
Introduction
5.1.4
5.1.5
6
Introduction
vi
11
Introduction
vii
11.3
SOABENCH 2008 MEDIATION FACET ................................................................................... 221
11.3.1 Transformation Mediations ................................................................................................ 221
11.3.2 Routing Mediations............................................................................................................. 221
11.3.3 Composite mediation .......................................................................................................... 222
11.3.4 Chained mediation.............................................................................................................. 224
11.4
SOABENCH 2008 MEDIATION FACET MESSAGE SIZES ............................................................ 225
12
Introduction
1.3.30
1.3.31
1.3.32
1.3.33
1.3.34
1.3.35
1.3.36
1.3.37
1.3.38
1.3.39
1.3.40
viii
Introduction
1 Introduction
1.1 Overview
This document is the fifth in a series of detailed performance reports for the WebSphere Business
Process Management (WebSphere BPM) product line. The report is authored by the IBM
WebSphere BPM performance team, with members in Austin Texas, Bblingen Germany, and
Hursley England. It explores the performance characteristics of the following products:
These products represent an integrated development and runtime environment based on a key set
of Service-Oriented Architecture (SOA) and Business Process Management (BPM) technologies:
Service Component Architecture (SCA), Service Data Object (SDO), and Business Process
Execution Language for Web Services (BPEL). These technologies in turn build on the core
capabilities of the WebSphere Application Server (WAS) 7.0 product.
A short description of each product covered in this report follows:
WebSphere Business Monitor provides the ability to monitor business processes in realtime, providing a visual display of business process status, business performance metrics,
and key business performance indicators, together with alerts and notifications to key
users that enables continuous improvement of business processes.
Introduction
WebSphere Business Modeler is IBMs premier business process modeling and analysis
tool for business users. It offers process modeling, simulation, and analysis capabilities
to help business users understand, document, and deploy business processes for
continuous improvement.
In addition to performance results, this document discusses the performance implications of the
supporting runtime environment, and describes best practices and tuning and configuration
parameters for the different software technologies involved.
We envision this report to be read by a wide variety of groups, both within IBM (development,
services, technical sales, etc.) and by customers. Please note that this document should not be
considered as a comprehensive sizing or capacity planning guide, though the document serves as
a useful reference for these activities.
The systems used to obtain measurements are intended to be representative mixes of potential
development and deployment systems running Windows, AIX, or Linux (note that there is a
separate performance report for WebSphere BPM products on z/OS). While we report results in
many cases on more than one hardware platform, this report is not intended for the purpose of
evaluating relative hardware performance between platforms. Many configurations are run
with some of the processor cores disabled, hyperthreading disabled, or both. While these changes
are marked on the charts, the reader should consider these before attempting any comparisons.
Finally, the workloads used to obtain measurements in this report are internal workloads (i.e., not
publicly available) that are designed to mimic customer usage patterns. Please see the workload
descriptions in this document for further information.
For those who are either considering or are in the very early stages of implementing a solution
incorporating these products, this document should prove a useful reference, both in terms of
best practices during application development and deployment, and as a reference for setup,
tuning and configuration information. It provides a useful introduction to many of the issues
influencing each product's performance, and can serve as a guide for making rational first choices
in terms of configuration and performance settings.
Similarly, those who have already implemented a solution utilizing these products might
effectively use the information presented here to attempt to match, to the extent possible, their
own workload characteristics to those presented here. By relating these characteristics to their
own workloads, the user is much more likely to gain insight as to what performance they might
expect, what possible inhibitors to better performance may be present, and how their overall
integrated solution performance may be improved.
All of these products build on the capabilities of the WAS infrastructure which runs on Java
Virtual Machines (JVMs), so BPM solutions also benefit from tuning, configuration, and best
practices information for WAS and corresponding platform JVMs (documented in the References
appendix). The reader is encouraged to use this report in conjunction with these references.
Please address questions or comments about this document to Mike Collins at
mcollin@us.ibm.com or Mike Collins/Austin/IBM.
Introduction
The following directed studies are either added or enhanced relative to the 6.2.0 report:
o
Partitioning large systems the effect of utilizing a single instance vs. clustering,
and the performance of a single cluster deployment pattern
WPS performance for a 32-bit JVM on 32-bit and 64-bit Windows systems
Introduction
SMP scaling data that demonstrates outstanding vertical scaling using AIX
systems, as shown by SOABench 2008 Automated Approval Mode 8 core
scaling of 7.3x and 16 core scaling of 11.9x, delivering throughput over 2,000
transactions per second.
Measurements on Red Hat Enterprise Linux 5.2 that show a throughput rate of
665 transactions per second using SOABench 2008 Automated Approval Mode
on an 8 core Intel system, an SMP scaling factor of 6.2x.
Support for 10,000 concurrent users with sub-second response times for long
running processes including Query Task, Claim Task, and Complete Task
operations.
Business Space response time improved by up to 55% relative to the 6.2.0.2based Feature Pack, assessed using Human Workflow widgets.
2.7x faster deploying the BPM@Work model from WB Modeler to WPS 7.0.0.1
Clean & Build response time of the Customer Service workspace shows a 45%
improvement from version 6.2.0
Peak memory utilization while building the Customer Service workspace shows a
32% improvement compared with WPS 6.2.0.
Introduction
o
5
Response time to publish the Loan Processing workspace with Resources on
Server shows a 1.9x improvement compared with version 6.2.0
JAX-WS binding now faster than JAX-RPC binding for Web Services
Introduction
Development Best Practices: guidelines for solution developers that will lead to high
performing systems.
WPS Performance Results: measurements for the SOABench 2008 Choreography Facet
workload.
WESB Performance Results: measurements for the SOABench 2008 Mediation Facet
workload.
WPS Core Workloads: a detailed description of the workloads used to measure the
performance characteristics of WPS.
WESB Core Workloads: a detailed description of the workloads used to measure the
performance characteristics of WESB.
Introduction
Data is presented for multiple hardware platforms, including POWER6, POWER7, Intel
Pentium IV Xeon, and Intel multi-core technologies. This is done to provide
representative coverage for WebSphere BPM production topologies. However, this data
should not be used to compare the relative performance of different hardware platforms.
The intent of this document is to show how the BPM stack performs on representative
configurations, not to compare hardware environments.
Use a high performance disk subsystem. In virtually any realistic topology, a server-class
disk subsystem (e.g. RAID adapter with multiple physical disks) is required on the tier(s)
that host the message and data stores to achieve acceptable performance. This point
cannot be overstated; the authors have seen many cases where the overall performance of
a solution is improved by several factors simply by utilizing appropriate disk subsystems.
Set an appropriate Java heap size to deliver optimal throughput and response time. JVM
verbosegc output will greatly help in determining the optimal settings. Further
information is available in Section 4.4.2.
Use DB2 instead of the default Derby DBMS. DB2 is a high-performing, industrial
strength database designed to handle high levels of throughput and concurrency, scale
well, and deliver excellent response time.
Tune your database for optimal performance. Proper tuning, and deployment, choices for
databases can greatly increase overall system throughput. For details, see Section 4.5.10.
Disable tracing. Tracing is clearly important when debugging, but the overhead of tracing
severely impacts performance. More information is available in Section 4.5.1.
For task and process list queries, use composite query tables. Query tables are designed to
produce excellent response times for high-volume task and process list queries. For
details, see Section 2.3.2.
Use work-manager based navigation to improve throughput for long running processes.
This optimization reduces the number of objects allocated, the number of objects
retrieved from the database, and the number of messages sent for Business Process
Choreographer messaging. For further information, see Section 4.5.6.1
10
Avoid too granular transaction boundaries in SCA and BPEL. Every transaction commit
results in expensive database and/or messaging operations. Design your transactions with
care, as described in Section 3.6.
11
2.3 Modeling
2.3.1 Choose non-interruptible over interruptible (long running)
processes whenever possible
Use interruptible processes, a.k.a. macroflows or long running processes, only when required
(e.g. long running service invocations and human tasks). Non-interruptible processes, a.k.a.
microflows or short running processes, exhibit much better performance at runtime. A noninterruptible process instance is executed in one J2EE transaction with no persistence of state,
while an interruptible process instance is typically executed in several J2EE transactions,
requiring that state be persisted in a database at transaction boundaries.
Whenever possible, utilize synchronous interactions for non-interruptible processes. A noninterruptible process is much more efficient than an interruptible process since it does not have to
utilize state or persistence in the backing database system.
A process is interruptible if the checkbox Process is long-running is set in the WebSphere
Integration Developer (WID) via Properties > Details for the process.
If interruptible processes are required for some capabilities, separate the processes such that the
most frequent scenarios can be executed in non-interruptible processes and exceptional cases are
handled in interruptible processes.
2.3.2 Choose query tables over standard query API for task list and
process list queries
Query tables were introduced in WPS 6.2.0. Query tables are designed to provide good response
times for high-volume task list and process list queries. Query tables offer improved query
performance:
Improved access to work items reduces the complexity of the database query.
Configurable high-performance filters on tasks, process instances, and work items allow
for efficient filtering.
Composite query tables can be configured to bypass authorization through work items.
Composite query tables allow the definition of a query tables that reflect the information
which is displayed on task lists and process lists presented to users.
Query improvements due to Query Tables are shown in Section 9.6.1. For further information,
please see the references below:
WebSphere Process Server Query Table Builder
http://www.ibm.com/support/docview.wss?uid=swg24021440
Query Tables in Business Process Choreography in the WPS 7.0 Info Center:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.bpc.doc/doc/bpc/c6bpel_querytables.html
12
A business process and its individual steps should have business significance and not try to
mimic programming level granularity. Use programming techniques like POJOs (Plain Old Java
Objects) or Java snippets for logic without business significance. This topic is discussed further
in the Software components: coarse-grained versus fine-grained paper available here:
http://www.ibm.com/developerworks/library/ws-soa-granularity/index.html
13
14
15
2.4 Topology
2.4.1 Deploy appropriate hardware
It is very important to pick a hardware configuration that contains the resources necessary to
achieve high performance in a WebSphere BPM environment. Here are some key considerations
in picking a hardware configuration:
Cores: Ensure that WPS and WESB are installed on a modern server system with
multiple cores. WPS and WESB scale well, both vertically in terms of SMP scaling, and
horizontally, in terms of clustering.
Memory: WPS and WESB benefit from both a robust memory subsystem as well as an
ample amount of physical memory. Ensure that the chosen system has server-class
memory controllers and as large as possible L2 and L3 caches (optimally, use a system
with at least a 4 MB L3 cache). Make sure there is enough physical memory for all the
applications (JVMs) combined that are expected to run concurrently on the system. 2 GB
per WPS/WESB JVM is a rough rule of thumb.
Disk: Ensure that the systems hosting the message and data stores, typically the database
tiers, have fast storage. This means utilizing RAID adapters with writeback caches and
disk arrays with many physical drives.
Network: Ensure that the network is sufficiently fast to not be a system bottleneck. As an
example, a dedicated Gigabit Ethernet network is a good choice.
Virtualization: Take care when using virtualization such as AIX dynamic logical
partitioning or VMWare virtual machines. Ensure sufficient processor, memory, and I/O
resources are allocated to each virtual machine or lpar. Avoid over-committing
resources.
16
We highly recommend the IBM Red Book on WebSphere BPM 7.0 Production Topologies
(http://www.redbooks.ibm.com/redpieces/abstracts/sg247854.html) to our readers, which is a
comprehensive guide to selecting appropriate topologies for both scalability and high-availability.
It is not the intent of this section to repeat any content from the above. Rather, we will distill
some of the key considerations when trying to scale up a topology for maximum performance.
2.4.4.1 Use the remote messaging and remote support deployment environment
pattern for maximum flexibility in scaling
See link:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.wps.doc/doc/cpln_topologypat.html
This topology (formerly known as the Golden Topology) prescribes the use of separate clusters
for applications, messaging engines, and support applications like the CEI (Common Event
Infrastructure) server, and the Business Rules Manager. This allows independent control of
resources to support the load on each of these elements of the infrastructure.
Note: As with many system choices, flexibility comes with some cost. For example, synchronous
CBE (Common Base Event) emission between an application and the CEI server in this topology
is a remote call, which is heavier than a local call. The benefit is the independent ability to scale
the application and support cluster. We assume the reader is familiar with these kinds of system
tradeoffs, as they occur in most server middleware.
2.4.4.2 Single Server vs. Clustered Topology Considerations
In general, there are 2 primary reasons to consider when evaluating moving to a clustered
topology from a single server configuration: scalability / load balancing in order to improve
overall performance and throughput, and high availability / failover to prevent loss of service due
to hardware or software failures. Although not mutually exclusive, there are considerations
applicable to each. In this report, the focus is on the performance (throughput) related aspects of
clustering, and not on the high availability aspects.
When considering the tradeoffs between a single server and a clustered configuration, an
interesting study can be found in section 9.10 of this document, Single Server vs. Clustered
WPS. Significant gains in throughput are measured with the workloads in this study due to
utilizing a clustered topology. It can be expected that most single server workloads that are
driving resources to saturation would benefit to some degree by moving to a clustered topology.
17
number of requests that the target application can process at the same time (concurrency)
If each of these performance aspects of the target applications can be established, then a rough
estimate of the maximum throughput capacity can be calculated. Similarly, if average throughput
is known, then either one of these 2 aspects can be roughly calculated as well. For example, a
target application that can process 10 requests per second with an average response time of 1
second can process approximately 10 requests at the same time (throughput / response time =
concurrency).
The throughput capacity of target applications is critical to projecting the end-to-end throughput
of an entire application. Also, the concurrency of target applications should be considered when
tuning the concurrency levels of the upstream WPS based components. For example, if a target
application can process 10 requests at the same time, the WPS components that invoke this
application should be tuned so that the simultaneous request from WPS at least match the
concurrency capabilities of the target. Additionally, overloading target applications should be
avoided since such configurations will not result any increase in overall application throughput.
For example, if 100 requests are sent to a target application that can only process 10 requests at
the same time, no throughput improvement will be realized versus tuning such that the number of
requests made matches the concurrency capabilities of the target.
Finally, for service providers that may take a long time to reply, either as part of main line
processing or in exception cases, do not utilize synchronous invocations that require a response.
This is to avoid tying up the WPS business process, and its resources, until the service provider
replies.
18
limit of around 1.4 GB for 32-bit JVMs. The heap size limit is much higher on 64-bit JVMs,
and is typically less of a gating factor on modern hardware configurations than the amount of
available physical memory.
2. Size of In-Memory Business Objects
Business Objects (BO), when represented as Java objects, are much larger in size than when
represented in wire format. For example, a BO that consumes 10 MB on an input JMS
message queue may result in allocations of up to 90 MB on the Java heap. The reason is that
there are many allocations of large and small Java objects as the BO flows through the
adapters and WPS or WESB. There are a number of factors that affect the in-memory
expansion of BOs.
The BO may contain many small elements and attributes, each requiring a few
unique Java objects to represent its name, value, and other properties.
Every Java object, even the smallest, has a fixed overhead due to an internal object
header that is 12-bytes long on most 32-bit JVMs, and larger on 64-bit JVMs,
Java objects are padded in order to align on 8-bye or 16-byte address boundaries.
As the BO flows through the system, it may be modified or copied, and multiple
copies may exist at any given time during the end-to-end transaction. What this
means is that the Java heap must be large enough to host all these BO copies in order
for the transaction to complete successfully.
19
Note that certain adapters like the Flat Files JCA Adapter can be configured to use a
SplitBySize mode with a SplitCriteria set to the size of each individual object. In this case a
large object would be split in chunks of the size specified by SplitCriteria to reduce peak memory
usage.
2.5.2.2 Claim Check pattern: when only a small portion of an input message is
used by the workload
When the input BO is too large to be carried around in a system and there are only a few
attributes that are actually needed by that process or mediation, one can exploit a pattern called
the claim check pattern. The claim check pattern applied to BO has the following steps:
Persist the larger data payload to a datastore and store the claim check as a reference in
the control BO.
Process the smaller control BO, which has a smaller memory footprint.
At the point where the solution needs the whole large payload again, check out the large
payload from the datastore using the key.
Merge the attributes in the control BO with the large payload, taking the changed
attributes in the control BO into account.
The Claim-Check pattern requires custom code and snippets in the solution. A less developerintensive variant would be to make use of custom data bindings to generate the control BO. This
approach suffers from the disadvantage of being limited to certain export/import bindings, and
the full payload still must be allocated in the JVM.
64-bit mode is an excellent choice for applications whose liveset approaches or exceeds
the 32-bit limits. Such applications either experience OutOfMemoryExceptions or suffer
excessive time in GC. We consider anything > 10% of time in GC as excessive. These
applications will exhibit much better performance when allowed to run with the larger
heaps they need. However, there must always be sufficient physical memory on the
system to back the Java heap size.
20
64-bit mode is also a good choice for applications that, though well behaved on 32-bit,
could be algorithmically modified to perform much better with larger heaps. An example
would be an application that frequently persists data to a data store to avoid maintaining a
very large in-memory cache, even if such a cache would greatly improve throughput.
Recoding such an application to tradeoff the more space available in 64-bit heaps for less
execution time would yield much better performance.
Moving to 64-bit still causes some degradation in throughput. If a 32-bit application fits
well within a 1.5-2.5GB heap, and the application is not expected to grow significantly,
32-bit BPM servers can still be a better choice than 64-bit.
21
2.7.2 Dashboard
The platform requirements of the Business Space, Dashboard, and Alphablox stack are relatively
modest compared to those of Monitor server and the database server. The most important
consideration for good Dashboard performance is to size and configure the DB server correctly.
Be sure it has enough CPU capacity for anticipated data mining queries, enough RAM for
bufferpools etc., and plenty of disk arms.
23
24
Use the Audit Logging property for Business Processes only if you need to log events in
the BPE database. This property can be set at the activity or process level; if set at the
process level the setting is inherited by all activities.
For long-running processes, disable the Enable persistence and queries of businessrelevant data flag under the Properties->Server tab, for both Process and for each
individual BPEL activity. Enabling this flag causes details of the execution of this
activity to be stored in the BPC database. This increases the load on the database and the
amount of data stored for each process instance. This setting should be used only if this
specific information will need to be retrieved later.
Human tasks can be specified in business processes (e.g. process administrators), invoke
activities, and receive activities. Specify these tasks only if needed. Also, when multiple
users are involved use group work items (people assignment criterion: Group) instead of
individual work items for group members (people assignment criterion: Group
Members).
25
If the caller was started by a persistent message, upon server restart the callers
transaction is rolled back and then retried. However, the result of the execution of the
long-running process on the server is not rolled back, since it was committed before the
server failure. As a result, the long-running process on the server is executed twice. This
duplication will cause functional problems in the application unless corrected manually.
If the caller was not started by a persistent message, and the response of the long-running
process was not submitted yet, it will end in the failed event queue.
Use as few variables as possible and minimize the size and the number of Business
Objects (BOs) used. In long-running processes, each commit saves modified variables to
the database (to save context), and multiple variables or large BOs make this very costly.
Smaller BOs are also more efficient to process when emitting monitor events.
Use transformations (maps or assigns) to produce smaller BOs by only mapping fields
necessary for the business logic.
Use group work items for large groups (people assignment criterion: Group) instead of
individual work items for group members (people assignment criterion: Group
Members).
Where possible, use native properties on the task object rather than custom properties.
For example, use the priority field instead of creating a new custom property priority.
Set the transactional behavior to commit after if the task is not part of a page-flow. This
improves the response time of task complete API calls.
APIs that provide task details and process details, such as htm.getTask(), should not be
called frequently. Use these methods only when required to display the task details of a
single task, for instance.
26
In EJB applications, make sure that transactions are not too time consuming:
long-running transactions create long-lasting locks in the database which prevent
other applications and clients to continue processing.
In a J2EE environment, use the HTM and BFM EJB APIs. If the client
application is running on a WPS server, use the local EJB interface.
In an application that runs remote to the process container, the Web services API
is an option.
Applications that assign the next available task to the user can use the claim(String
queryTableName, ) method on the Human Task Manger EJB interface. It implements a
performance optimized mechanism to handle claim collisions.
Dont put asynchronous invocations between two steps of a page-flow, because the
response time of asynchronous services increases as the load on the system increases.
Where possible, do not invoke long-running sub-processes between two steps of a pageflow, because long-running sub-processes are invoked using asynchronous messaging.
Clients that present task lists and process lists to the user should consider the following:
Use query tables for task list and process list queries. See the directed study in section
9.6.1 for further information.
Do not loop over the tasks displayed in the task or process list and execute an additional
remote call for each object. This will prevent the application from providing good
response times and good scalability.
Design the application such that during task list and process list retrieval, all information
is retrieved from a single query table. For instance, do not make calls to retrieve the input
message for task list or process list creation.
27
In user-driven scenarios, improving response time may require more granular transaction
boundaries, even at the cost of throughput.
Transactions can span across synchronous invocations, but cannot span asynchronous
invocations.
28
process flow container to start a new transaction before executing the activity, after executing the
activity, or both before and after.
In general, the Participates attribute provides the best throughput and should be used wherever
possible. This is true for both synchronous and asynchronous activities. In the two-way
asynchronous case, it is important to understand that the calling transaction always commits after
sending the request. The Participates setting refers to the transaction started by the process
engine for the response: when set, this allows the next activity to continue on the same
transaction.
In special cases, the other transaction settings may be preferable. Please refer to the InfoCenter
link above for details.
Use Commit before in parallel activities which start new branches to ensure parallelism. As
noted in the InfoCenter, there are other constraints to be considered.
Use Commit after for inline human tasks to increase responsiveness to human users. When this
option is chosen, after a human task is completed the thread/transaction handling the task
completion is also used to resume navigation of the human task activity in the process flow. The
users task completion action will not complete until the process engine commits the transaction.
By contrast, if the Participates setting is used, the commit will get delayed and result in longer
response time for the user. This is a classic response time versus throughput tradeoff.
Note that starting with the 6.2.0 release, Receive and Pick activities in BPEL flow are now
allowed to define their own transactional behavior property values. If not set, the default value of
initiating a Receive or Pick activity is Commit after. Consider using Participates where
possible, since Participates will perforrn better.
29
The invocation logic of processes is explained in more detail in the WPS InfoCenter at:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.bpc.doc/doc/bpc/cprocess_transaction.html
Some additional considerations are listed below:
At the input boundary to a module, exports that represent asynchronous transports like
MQ, JMS, or JCA (with async delivery set) will set the interaction style to Async. This
can cause downstream invocations to be async if the Preferred interaction style is left at
Any.
For an SCA import, its Preferred interaction style can be used to specify whether the
cross-module call should be Sync or Async.
For other imports that represent asynchronous transports like MQ or JMS, it is not
necessary to set the Preferred interaction style to Async. Doing so will introduce an
unnecessary async hop between the calling module and the invocation of the transport.
30
There are additional operational considerations, for example asynchronous invocations use the
SIBus messaging infrastructure which uses a database for persistence. Synchronous invocations
will perform well with basic tuning of the JVM heap size and thread pools but for asynchronous
invocations SCA artifacts require review and tuning. This will include tuning of the SCA
messaging engine (see section 4.4.7), datasources (section 4.4.6) and the database itself. For the
datasource, the tunings for JMS bindings in this report can be used as guidance as the
considerations are the same.
If multiple synchronous services with large latencies are being called then asynchronous
invocations can reduce the overall response time of the mediation flow at the expense of
increasing the internal response time of each individual service call. This assumes that
asynchronous callouts have been configured along with parallel waiting in the FanOut section of
the flow:
In the case of iteration of array - configuring the FanOut to "check for asynchronous
responses after all/N messages have been fired"
If there are a number of services in a fan-out section of a mediation flow then calling these
synchronously will result in an overall response time equal to the sum of the individual service
response times.
Calling the services asynchronously (with parallel waiting configured) will result in a response
time equal to at least the largest individual service response time in WESB plus the sum of the
time taken by WESB to process the remaining service callout responses residing on the
messaging engine queue.
For a FanOut/FanIn block the processing time for any primitives before or after the service
invocations will need to be added in both cases.
To optimise the overall response time when calling services asynchronously in a FanOut/FanIn
section of a mediation flow you should invoke the services in the order of expected latency if
known (highest latency first).
There is a trade off between parallelism and additional asynchronous processing to consider. The
suitability of asynchronous processing will depend on the size of the messages being processed,
the latency of the target services, the number of services being invoked and any response time
requirements expressed in service level agreements. Running performance evaluations on
mediations flows including fan-outs with high latency services is strongly recommended if
asynchronous invocations are being considered.
The default quality of service on service references is Assurred Persistent. A substantial reduction
in asynchronous processing time can be gained by changing this to Best Effort (non-persistent)
which eliminates I/O to the persistence store but the application MUST tolerate the possibility of
lost request or response messages. This level of reliability for SIBus can discard messages under
load and may require tuning.
31
The optimization is known as deferred parsing; as the name implies, parsing the message can be
deferred until absolutely required, and in several cases (described below) parsing can be avoided
altogether.
There are three categories of mediation primitives in WESB that benefit to a greater or lesser
degree from these internal optimizations:
Category 1 (greatest benefit)
Custom Mediation
Database Lookup
BO Mapper
Fan Out
Fan In
Message Logger
There is therefore an ideal pattern of usage in which these mediation primitives can take
advantage of a 'fastpath' through the code. Fully fastpathed flows can contain any of the above
mediation primitives in category 1 above, e.g.:
--> XSLT Primitive(/body) --> Route On Header --> EndPointLookup (non-Xpath) -->
Partially fastpathed flows can contain a route on body filter primitive (category 2) and any
number of category 1 primitives, e.g.
--> XSLT Primitive(/body) --> Route on body -->
32
In addition to the above optimizations, the ordering of primitives can be important. If the
mediation flow contains an XSLT primitive (with a root of /body - i.e the category 1 variant) and
category 3 primitives then the XSLT primitive should be placed ahead of the other primitives. So
--> Route On Header --> XSLT Primitive(/body) --> Custom Primitive -->
is preferable to
--> Route On Header --> Custom Primitive --> XSLT Primitive(/body) -->
It should be understood that there are costs associated with any primitive regardless of whether
the flow is optimally configured or not. If an Event Emitter primitive is using event distribution
or a Message Logger primitive is included there are associated infrastructure overheads for such
remote communications. Large messages increase processing requirements proportionally for
primitives (especially those accessing the body) and a custom mediation will contain code which
may not be optimally written. The above guidelines can help in designing for performance but
they cannot guarantee speed.
no further need for creating resources using scripts or the Admin Console
the ability to change the majority of performance tuning options as they now exposed in
the Tooling
In our performance tests we use pre-configured resources for the reason that by segregating the
performance tuning from the Business logic, the configuration for different scenarios can be
33
maintained in a single script. It is also easier to adjust these parameters once the applications have
been deployed.
The only cases where this pattern has not been followed is for Generic JMS bindings. In these
scenarios where resources have already been configured by the 3rd party JMS provider software
(MQ 6.0.2.2 for all instances in this report), the Tooling created resources are used to locate the
externally defined resources.
if (tracing_on) System.out.println(bo.toString();
34
Utilize JCA adapters to replace WBIA adapters, where possible. Migrated workloads
making use of custom WBIA adapters or legacy WBIA adapters result in interaction with
the WPS server through JMS, which is slower than the JCA adapters.
Some WBIA technology adapters like HTTP and Webservices are migrated by the WICS
migration wizard into native WPS SCA bindings, which is a better performing approach.
For WBIA adapters which are not migrated automatically to available SCA bindings,
development effort spent to manually migrate to a SCA binding will remove the
dependency on a legacy adapter as well as have better performance.
The WICS Migration Wizard in WID 7.0 offers a feature to merge the connector &
collaboration module together. Enable this option, if possible, as it increases performance
by reducing cross-module SCA calls.
WICS Collaborations are migrated into WPS BPEL processes. The resultant BPEL
processes can be further customized and made more efficient as follows:
o
The generated BPEL flows still make use of ICS API to perform BO &
Collaboration level tasks. Development effort spent cleaning up the migrated
BPEL to replace these APIs will result in better performance and better
maintainability.
Reduce memory pressure by splitting the shared library generated by the migration
wizard. The migration wizard creates a single shared library and puts all migrated
Business Objects, maps and relationships in it. This library is then shared by copy by all
the migrated modules. This can cause memory bloat for cases where the shared library is
very large and a large number of modules are present. The solution is to manually refactor the shared library into multiple libraries based on functionality or usage and
modify modules to only reference the shared libraries that are needed.
If original WICS maps contain many custom map steps, then development effort spent in
rewriting such map steps will result in better performance. The WICS Migration Wizard
35
in WID 7.0 generates maps that make use of ICS APIs, which is a translation layer above
WPS technologies. Removing this layer by making direct use of WPS APIs avoids the
cost of translation and hence produces better performance.
3.11.1
Importing and building an enterprise application is, in itself a resource intensive activity. Recent
improvements in desktop hardware architecture have greatly improved the responsiveness of
Import and Build activities, as demonstrated in Section 9.20.4. In particular, Intel Core2 Duo
cores perform much better than the older PentiumD architecture, even when the Core2 Duo runs
at a slower clock rate. Also, for I/O intensive activities (like Import) a faster disk drive reduces
total response time, as demonstrated in Section 9.19.2.
3.11.2
Make use of WAS shared libraries in order to reduce
memory consumption
For applications containing many projects utilizing a WPS shared library, server memory
consumption is reduced by defining the library as a WAS shared library as described in the
technote found at
http://www-01.ibm.com/support/docview.wss?uid=swg21298478.
Section 9.20.3 demonstrates some results obtained using this approach.
3.12.2
The possible values of a context key should be bound to either a finite set, or a minimum and
maximum value. The Fabric runtime caches metadata based on the contexts defined as required
or optional in the context specification. Thus having a context key which can take an unbounded
integer as its value will result in too many potential cache entries, which will make the cache less
efficient. Consider using classes of possible values rather than absolute numbers. For example,
36
for credit scores group the possible values under Poor, Average, Good, and Excellent, rather than
using the actual values. The actual values should then be placed in one of these categories and
the category should be passed as the context instead of the actual values.
37
38
The methodology for tuning can be stated very simply as an iterative loop:
Monitor the system to obtain metrics that indicate whether performance is being limited.
Use the tuning checklist in the next section for a systematic way to set parameters.
For specific initial values, consult Appendix A for settings that were used for the
various workloads that were run. These values can be considered for initial values.
For each physical machine in the topology including front end and back-end servers
like web servers, and DB servers:
o
For each JVM process started on a physical machine, i.e. WPS server, ME server,
etc.
use tools like ps or equivalent to get Core and memory usage per process
For each WPS or ME JVM, use TPV (Tivoli Performance Viewer) to monitor the
following:
39
For each thread pool (Web Container, default, work managers), the thread pool
utilization
Excessive utilization of physical resources like processor cores, disk, memory etc. These
can be resolved either by adding more physical resources, or rebalancing the load more
evenly across the available resources.
40
Move databases from the default Derby to a high performance DBMS such as DB2
Do not use the Unit Test Environment (UTE) for performance measurement
Tune external service providers and external interfaces to ensure they are not the
system bottleneck.
Configure Data Sources : Connection Pool size, Prepared Statement Cache size.
Consider using non-XA data sources for CEI data when that data is non-critical.
If work-manager based navigation is used, also optimize message pool size and
intertransaction cache size
Optimize the database configuration for the Business Process Choreographer database
(BPEDB)
Optimize indexes for SQL statements that result from task and process list queries using
database tools like the DB2 design advisor
Turn off state observers that are not needed , e.g. turn off audit logging
Database
Java
Monitor
Configure CEI
41
42
43
objects reside. The total heap size is the sum of the new area and the tenured space. The new area
size can be set independently from the total heap size. Typically the new area size should be set
between and of the total heap size. The relevant parameters are:
Resources > Resource Adapters > J2C activation specifications > ActivationSpec name
Resources > Resource Adapters > Resource adapters > resource adapter name >
Additional proerpties > J2C activation specifications > ActivationSpec name
Two custom properties (shown below) in the MDB ActivationSpec have considerable
performance implications. These are discussed further in Section 4.5.3.2.
maxConcurrency
maxBatchSize
Default
ORB.thread.pool
WebContainer
In addition, thread pools used by Work Managers are configured separately via:
Resources > Asynchronous beans > Work managers > work manager name > Thread pool
properties
The following Work Managers typically need to be tuned:
DefaultWorkManager
BPENavigationWorkManager
44
There a few ways of accessing the JMS connection factories and JMS queue connection factories
from Websphere admin console.
Resources > Resource Adapters > J2C connection factories > factory name
Resources > JMS > Queue connection factories > factory name
Resources > Resource Adapters > Resource adapters > resource adapter name (e.g. SIB
JMS Resource Adapter) > Additional properties > J2C connection factories > factory
name
From the connection factory admin panel, open Additional Properties > Connection pool
properties. Set the Maximum connections property to the max size of the connection pool.
Resources > JDBC Providers > JDBC provider name > Additional Properties > Data
sources > datasource name
sib.msgstore.discardableDataBufferSize
o
Default is 320K.
45
sib.msgstore.cachedDataBufferSizeCachedDataBufferSize
o
Default is 320K
The properties can be accessed under Service Integration > Buses > bus name > Messaging
Engines > messaging engine name > Additional properties > Custom properties.
Full details of these are given in the Info Center at the following location:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.pmc
.doc/concepts/cjk_learning.html
46
47
Note that higher concurrent processing means higher resource requirements (memory and number
of threads) on the server. It needs to be balanced with other tuning objectives, such as the
handling of large objects, handling large numbers of users, and providing good response time.
4.5.3.1 Tune edge components for concurrency
The first step is to ensure that Business Objects are handled concurrently at the edge components
of WPS or WESB. If the input BOs come from an adapter, ensure the adapter is tuned for
concurrent delivery of input messages. See Section 4.5.8 for more details on tuning adapters.
If the input BOs come from the WebServices export binding or direct invocation from a JSP or
Servlet, make sure the WebContainer thread pool is correctly sized. To allow for 100 in-flight
requests handled concurrently, the maximum size of the WebContainer thread pool needs to be
set to 100 or larger.
If the input BOs come from the messaging, the ActivationSpec (MDB bindings) and Listener
ports (MQ or MQJMS bindings) need to be tuned to handle sufficient concurrency.
4.5.3.2 Tune MDB ActivationSpec properties
For each JMS export component, there is an MDB and its corresponding ActivationSpec (JNDI
name: module name/export component name_AS). The default value for maxConcurrency of the
JMS export MDB is 10, meaning up to 10 BOs from the JMS queue can be delivered to the MDB
threads concurrently. Change it to 100 if a concurrency of 100 is desired.
Note that the Tivoli Performance Viewer (TPV) can be used to monitor the maxConcurrency
parameter. For each message being processed by an MDB there will be a message on the queue
marked as being locked inside a transaction (which will be removed once the onMessage
completes), these messages are classed as "unavailable". There is a PMI metric that gives you the
number of unavailable messages on each queue point (resource_name > SIB Service > SIB
Messaging Engines > bus_name > Destinations > Queues), called "UnavailableMessageCount".
If any queue has at least maxConcurrency unavailable messages it would imply that the number
of messages on the queue is currently running higher than the MDB's concurrency maximum. If
this occurs, increase the maxConcurrency setting for that MDB.
The maximum batch size in the activation spec also has an impact on performance. The default
value is 1. The maximum batch size value determines how many messages are taken from the
messaging layer and delivered to the application layer in a single step (note that this does NOT
mean that this work is done within a single transaction, and therefore this setting does not
influence transactional scope). Increase this value, for example to 8, for activation specs
associated with SCA modules and long-running business processes to improve performance and
scalability, especially for large multi-core systems.
4.5.3.3 Configure Thread pool sizes
The sizes of thread pools have a direct impact on a servers ability to run applications
concurrently. For maximum concurrency, the thread pool sizes need to be set to optimal values.
Increasing the maxConcurrency or Maximum sessions parameters only enables the concurrent
delivery of BOs from the JMS or MQ queues. In order for the WPS or WESB server to process
multiple requests concurrently, it is also necessary to increase the corresponding thread pool sizes
to allow higher concurrent execution of these Message Driven Beans (MDB) threads.
MDB work is dispatched to threads allocated from the Default thread pool. Note that all MDBs
in the application server share this thread pool, unless a different thread pool is specified. This
48
means that the Default thread pool size needs to be larger, probably significantly larger, than the
maxConccurency of any individual MDB.
Threads in the Web Container thread pool are used for handling incoming HTTP and Web
Services requests. Again, this thread pool is shared by all applications deployed on the server. As
discussed earlier, it needs to be tuned, likely to a higher value than the default.
ORB thread pool threads are employed for running ORB requests, e.g. remote EJB calls. The
thread pool size needs to be large enough to handle requests coming in through EJB interface,
such as certain human task manager APIs.
4.5.3.4 Configure dedicated thread pools for MDBs
The Default thread pool is shared by many WebSphere Application Server tasks. It is sometimes
desirable to separate the execution of JMS MDBs to a dedicated thread pool. Follow the steps
below to change the thread pool used for JMS MDB threads.
1) Create a new thread pool, say MDBThreadPool, on the server by following
Servers > Server Types > WebSphere application servers > server > Thread pools
and then click on New
2) Open the Service Integration Bus (SIB) JMS Resource Adapter admin panel with
server scope from Resources > Resource Adapters > Resource adapters. If the
adapter is not shown, go to Preferences, and set the Show built-in resources
checkbox.
3) Change Thread pool alias from Default to MDBThreadPool.
4) Repeat the 2 and 3 for SIB JMS Resource Adapters with node and cell scope.
5) Restart the server for the change to be effective.
SCA Module MDBs for asynchronous SCA calls use a separate resource adapter, the Platform
Messaging Component SPI Resource Adapter. Follow the same step as above to change the
thread pool to a different one, if so desired.
Note that even with a dedicated thread pool, all MDBs associated with the resource adapter still
share the same thread pool. However, they do not have to compete with other WebSphere
Application Server tasks that also use the Default thread pool.
4.5.3.5 Tune intermediate components for concurrency
If the input BO is handled by a single thread from end to end, the tuning for the edge components
is normally adequate. In many situations, however, there are multiple thread switches during the
end to end execution path. It is important to tune the system to ensure adequate concurrency for
each asynchronous segment of the execution path.
Asynchronous invocations of an SCA component utilize an MDB to listen for incoming events
that arrive in the associated input queue. Each SCA module defines an MDB and its
corresponding activation spec (JNDI name: sca/module name/ActivationSpec). Note that the SCA
module MDB is shared by all asynchronous SCA components within the module, including SCA
export components. Take this into account when configuring the ActivationSpecs
maxConcurrency propery value. SCA module MDBs use the same Default thread pool as those
for JMS exports.
The asynchrony in a long running business process occurs at transaction boundaries (see Section
3.6 for more details on settings that affect transaction boundaries). BPE defines an internal MDB
49
50
Message Engine persistence is usually backed by a database. Stating with the 6.2.0 release, a
standalone configuration of WPS or WESB can have the persistence storage of BPE and SCA
buses backed by the file system (filestore). The choice of filestore has to be made at profile
creation time. Use the Profile Management Tool to create a new Standalone enterprise service
bus profile or Standalone process server profile. Choose Profile Creation Options ->
Advanced profile creation -> Database Configuration, select checkbox Use a file store for
Messaging Engine (MEs). When this profile is used, filestores will be used for BPE and SCA
service integration buses.
4.5.4.2 Set Data Buffer Sizes (Discardable or Cached)
The DiscardableDataBufferSize is the size in bytes of the data buffer used when processing best
effort non persistent messages. The purpose of the discardable data buffer is to hold message data
in memory, since this data is never written to the data store for this Quality of Service. Messages
which are too large to fit into this buffer will be discarded.
The CachedDataBufferSize is the size in bytes of the data buffer used when processing all
messages other than best effort non persistent messages. The purpose of the cached data buffer is
to optimize performance by caching in memory data that might otherwise need to be read from
the data store.
The DiscardableDataBufferSize and CachedDataBufferSize can be set under Service IntegrationBuses -> bus name -> Messaging Engines -> messaging engine name -> Additional properties ->
Custom properties.
4.5.4.3 Move Message Engine datastores to a High Performance DBMS
For better performance, the Message Engine datastores should use production quality databases,
such as DB2, rather than the default Derby. The choice can be made at profile creation time
using advanced profile creation option. If the profile has already been created with Derby as
the ME datastore, the following method can be used to change the datastore to an alternative
database.
After the Profile Creation Wizard has finished and Business Process Choreographer is
configured, the system should contain four buses with one message engine each. The example
below shows the Buses in WPS installed on machine box01; the node and cell names are the
default
Bus
Messaging Engine
SCA.SYSTEM.box01Node01Cell.Bus
box01server1.SCA.SYSTEM.box01Node01Cell.Bus
SCA.APPLICATION. box01Node01Cell.Bus
box01-server1.SCA.APPLICATION.
box01Node01Cell.Bus
CommonEventInfrastructure_Bus
box01server1.CommonEventInfrastructure_Bus
BPC.box01Node01Cell.Bus
box01-server1.BPC.box01Node01Cell.Bus
51
Each of these message engines is by default configured to use a datastore in Derby. Each
datastore is located in its own database. For DB2, this is not optimal from an administrative point
of view. There are already many databases in the system and adding four more databases
increases the maintenance and tuning effort substantially. The solution proposed here uses a
single DB2 database for all four datastores. The individual datastores/tables are completely
separate and each message engine acquires an exclusive lock on its set of tables during startup.
Each message engine uses a unique schema name to identify its set of tables.
Instead of having a DB2 database per messaging engine we put all messaging engines into the
same database using different schemas to separate them.
Schema
Messaging Engine
SCASYS
box01server1.SCA.SYSTEM.box01Node01Cell.Bus
SCAAPP
box01-server1.SCA.APPLICATION.
box01Node01Cell.Bus
CEIMSG
box01server1.CommonEventInfrastructure_Bus
BPCMSG
box01-server1.BPC.box01Node01Cell.Bus
Create one schema definition for each message engine with the following command on Windows.
In the example below, <WAS Install> represents the WPS Installation directory, <user>
represents the user name, and <path> represents the fully qualified path to the referenced file.
<WAS Install>\bin\sibDDLGenerator.bat -system db2 -version 8.1 -platform windows statementend ; -schema BPCMSG -user <user> >createSIBSchema_BPCMSG.ddl
Repeat for each schema/messaging engine.
To be able to distribute the database across several disks, edit the created schema definitions and
put each table in a tablespace named after the schema used i.e. SCAAPP becomes
SCANODE_TS, CEIMSG becomes CEIMSG_TS and so on. The schema definition should look
like this after editing:
CREATE SCHEMA CEIMSG;
CREATE TABLE CEIMSG.SIBOWNER (
ME_UUID VARCHAR(16),
52
INC_UUID VARCHAR(16),
VERSION INTEGER,
MIGRATION_VERSION INTEGER
) IN CEIMSG_TB;
JDBC Provider
box01server1.SCA.SYSTEM.box01Node01Cell.Bus
box01-server1.SCA.APPLICATION.
box01Node01Cell.Bus
box01-
53
server1.CommonEventInfrastructure_Bus
box01-server1.BPC.box01Node01Cell.Bus
Create a new JDBC provider DB2 Universal JDBC Driver Provider for the non-XA
datasources first if it is missing. The XA DB2 JDBC Driver Provider should exist if BPC was
configured correctly for DB2.
Create four new JDBC datasources, one for CEI as an XA datasource, the remaining three as
single-phase commit (non-XA) datasources.
The following table provides new names.
Name of datasource
JNDI Name
CEIMSG_sib
jdbc/sib/CEIMSG
SCAAPP_sib
jdbc/sib/SCAAPPLICATION
DB2 Universal
SCASYSTEM_sib
jdbc/sib/SCASYSTEM
DB2 Universal
BPCMSG_sib
jdbc/sib/BPCMSG
DB2 Universal
Uncheck the checkbox named Use this Data Source in container managed
persistence (CMP)
Set the database name to the name used for the database created earlier for messaging
e.g. SIB
Select a driver type : 2 or 4. Per DB2 recommendations, use the JDBC Universal
Driver Type 2 connectivity to access local databases and Type 4 connectivity to
access remote databases.. Note that a driver of Type 4 requires a hostname and valid
port to be configured for the database.
In the Navigation Panel go to Service Integration -> Buses and change the datastores for
each Bus/Messaging Engine displayed.
Put in the new JNDI and schema name for each datastore. Uncheck the checkbox Create
Tables since the tables have been created already.
The server immediately restarts the message engine; the SystemOut.log shows the results
of the change and also shows if the message engine starts successfully.
Restart the server and validate that all systems come up using the updated configuration.
54
The last remaining task is tuning the database; please see Sections 4.5.10 and 4.5.11 for further
information on database and DB2-specific tuning, respectively.
Do not use wildcard (*) for the host name of the Web Container port. Replace it with
the hostname or IP address. The property can be accessed from Application servers >
server name > Container Settings > Web Container Settings > Web container >
Additional Properties > Web container transport chains > WCInboundDefault > TCP
inbound channel (TCP_2) > Related Items > Ports > WC_defaulthost > Host
Use localhost instead of host name in the Web Services client binding. If the actual
hostname is used and even if it is aliased to localhost, this optimization will be disabled.
The property can be accessed from Enterprise Applications > application name >
Manage Modules > application EJB jar > Web services client bindings > Preferred port
mappings > binding name. Use localhost (e.g. localhost:9080) in the URL.
Make sure there is not an entry for your server hostame and IP address in your servers
hosts file for name resolution. An entry in the hosts file inhibits this optimization by
adding name resolution overhead.
There are several parameters that control usage of these two optimizations. The first set of these
parameters are found by going to
55
Application Servers > server name > Business Integration > Business Process Choreographer
> Business Flow Manager > Business Process Navigation Performance
The key parameters are:
Check Enable advanced performance optimization to enable both the Work-Managerbased navigation and InterTransactionCache optimizations.
Work-Manager-Based Navigation Message Pool Size: this property specifies the size of
the cache used for navigation messages that cannot be processed immediately, provided
Work-Manager-based navigation has been enabled. The cache defaults to a size of (10 *
thread pool size of the BPENavigationWorkManager) messages. Note that if this cache
reaches its limit, WPS uses JMS-based navigation for new messages,, so for optimal
performance ensure this Message Pools size is set to a sufficiently high value.
InterTransaction Cache Size: this property specifies the size of the cache used to store
process state information that has also been written to the BPE database. It should be set
to twice the number of parallel running process instances. The default value for this
property is the thread pool size of the BPENavigationWorkManager.
In addition, customize the number of threads for the work manager using:
Resources -> Asynchronous Beans -> Work Managers -> BPENavigationWorkManager
The minimum and maximum number of threads should be increased from their default values of
5 and 12, respectively, using the methodology outlined below in the section titled Tuning for
Maximum Concurrency. If the thread pool size is modified, then the work request queue size
should also be modified and set to be twice the maximum number of threads.
4.5.6.2 Tuning the business process container for JMS navigation
If JMS-based navigation is configured, the following resources need to be optimized for efficient
navigation of business processes:
JMS connection factory BPECFC: set the connection pool size to the number of threads
in the BPEInternalActivationSpec + 10%. This resource can be found at:
Resources > JMS > Connection factories > BPECFC > Connection pool properties.
Note that this connection factory is also used when work-manager based navigation is in
use, but only for error situations or if the server is highly overloaded.
56
Up-to-date database statistics are key for good SQL query response times.
Databases offer tools to tune SQL queries. In most cases, additional indexes improve
query performance with potentially some impact on process navigation performance. For
DB2, the DB2 design advisor can be used to guide in choosing indexes.
API calls for task list and process list queries may take more time to respond, depending
on the tuning of the database and the amount of data in the database.
Ensure that concurrency (parallelism) is sufficiently high to handle the load and to utilize
the CPU. However, increasing the parallelism of API call execution beyond what is
necessary can negatively influence response times. Also, increased parallelism can put
excessive load on the BPC database. When tuning the parallelism of API calls, measure
response times before and after tuning, and adjust the parallelism if necessary.
If you are using persistent messaging the configuration of your database becomes important. Use
a remote DB2 instance with a fast disk array as the DB server. You may also find benefit in
tuning the connection pooling and statement cache of the DataSource. Please see sections 4.5.10
and 4.5.11 for further information on tuning DB2, and also note the relevant References at the
end of this document.
4.5.7.2
The Event Server which manages events can be configured to distribute events and/or log them to
the event database. Some mediations only require events to be logged to a database; for these
cases, performance is improved by disabling event distribution. Since the event server may be
used by other applications it is important to check that none of them use event monitoring which
requires event distribution before disabling this.
Event distribution can be disabled from Service integration > Common Event Infrastructure >
Event service > Event services > Default Common Event Infrastructure event server-> uncheck
Enable event distribution.
4.5.7.3 Configure WSRR Cache Timeout
WebSphere Service Registry and Repository (WSRR) is used by WESB for endpoint lookup.
When accessing the WSRR (e.g. using the endpoint lookup mediation primitive), results from the
registry are cached in WESB. The lifetime of the cached entries can be configured via Service
Integration->WSRR Definitions-><your WSRR definition name>->Timeout of Cache
57
Validate that the timeout is sufficiently large a value, the default timeout is 300 seconds, which is
reasonable from a performance perspective. Too low a value will result in frequent lookups to the
WSRR which can be expensive (especially if retrieving a list of results), and will also include the
associated network latency if the registry is located on a remote machine.
If deploying more than one cluster member (JVM) on a single physical system, it is
important to monitor not just the resource utilization (Core, disk, network, etc) of the
system as a whole, but also the utilization by each cluster member. This allows the
detection of a system bottleneck due to a particular cluster member.
If all members of a cluster are bottlenecked, scaling can be achieved by adding one or
more members to the cluster, backed by appropriate physical hardware.
If a singleton server or cluster member is the bottleneck, there are some additional
considerations:
A messaging engine in a cluster with One of N policy (to preserve event ordering)
may become the bottleneck. Scaling options include:
o
A database (DB) server may become the bottleneck. Approaches to consider are:
o
If the DB server is hosting multiple DBs that are active (for example, the
BPEDB and the MEDB), consider hosting each DB on a separate server.
58
The default maximum heap size in most implementations of Java is too small for many of the
servers in this configuration. The Monitor Launchpad installs Monitor and its prerequisite servers
with larger heap sizes, but you might check that these sizes are appropriate for your hardware and
workload. We use a maximum heap size of 1536M for our performance measurements.
4.5.9.2 Configure CEI
By default, when an event arrives at CEI, it is delivered to the registered consumer (in this case a
particular Monitor Model) and also into an additional, default queue. Performance is improved
by avoiding this double-store, which can be done using the WAS Admin Console by removing
the All Events event group found via:
Service Integration -> Common Event Infrastructure -> Event Service -> Event Services ->
Default Common Event Infrastructure event server -> Event Groups
Beyond its persistent delivery of events to registered consumers, CEI offers the ability to
explicitly store events in a database. This has significant performance overhead and should be
avoided if this additional functionality is not needed. The CEI Data Store is also configured in
the WAS Admin Console:
Service Integration -> Common Event Infrastructure -> Event Service -> Event Services ->
Default Common Event Infrastructure event server: deselect Enable Data Store
4.5.9.3 Configure Message Consumption Batch Size
Consuming events in large batches is much more efficient than one at a time. Up to some limit,
the larger the batch size, the higher event processing throughput will be. But there is a trade-off:
Consuming events, processing them, and persisting them to the Monitor database is done as a
transaction. So while a larger batch size yields better throughput, it will cost more if you have to
roll back. If you experience frequent rollbacks, consider reducing the batch size. This can be
done in the WAS Admin Console in Server Scope:
Applications -> Monitor Models -> <version> -> Runtime Configuration -> Tuning -> Message
Consumption Batch size: <default 100>
4.5.9.4 Enable KPI Caching
The cost of calculating aggregate KPI values increases as completed process instances
accumulate in the database. A KPI Cache is available to reduce the overhead of these
calculations, at the cost of some staleness in the results. The refresh interval is configurable via
the WAS Admin Console:
Applications -> Monitor Models -> <version> -> Runtime Configuration -> KPI -> KPI Cache
Refresh Interval
A value of zero (the default) disables the cache.
4.5.10
59
60
A further advantage can be gained on some operating systems such as AIX by using concurrent
I/O. This bypasses per-file locking, shifting responsibility for concurrency control to the database
and in some cases allowing more useful work to be offered to the adapter or the device.
An important exception to this guideline occurs for large objects (LOB, BLOB, CLOB, etc.)
which are not buffered by the database itself. In this case it can be advantageous to arrange for
file system caching, preferably only for files which back large objects.
4.5.10.6 Refine Table Indexes as Required
WebSphere BPM products typically provide a reasonable set of indexes for the database tables
they use. In general, creating indexes involves a tradeoff between the cost of queries and the cost
of statements which insert, update, or delete data. For query intensive workloads, it makes sense
to provide a rich variety of indexes as required to allow rapid access to data. For update intensive
workloads, it is often helpful to minimize the number of indexes defined, as each row
modification may require changes to multiple indexes. Note that indexes are kept current even
when they are infrequently used.
Index design therefore involves compromises. The default set of indexes may not be optimal for
the database traffic generated by a BPM product in a specific situation. If database CPU or disk
utilization is high or there are concerns with database response time, it may be helpful to consider
changes to indexes.
As described below, DB2 and Oracle databases provide assistance in this area by analyzing
indexes in the context of a given workload. Recommendations are given to add, modify, or
remove indexes. One caveat is that if the workload does not capture all relevant database activity
then a necessary index might appear unused, leading to a recommendation that it be dropped. If
the index is not present, future database activity could suffer as a result.
4.5.11
Providing a comprehensive DB2 tuning guide is beyond the scope of this report. However, there
are a few general rules of thumb that can assist in improving the performance of DB2
environments. In the sections below, we discuss these rules, and provide pointers to more
detailed information. The complete set of current DB2 manuals (including database tuning
guidelines) can be found by using the DB2 Information Center:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp.
Another excellent reference is Best practices for DB2 for Linux, UNIX, and Windows which is
available here:
http://www.ibm.com/developerworks/data/bestpractices/.
4.5.11.1 Update Database Statistics
DB2 provides an Automatic Table Maintenance feature, which runs the RUNSTATS command in
the background as required to ensure that the correct statistics are collected and maintained. This
is controlled by the database configuration parameter auto_runstats, and is enabled by default for
databases created by DB2 V9.1 and beyond. See also the Configure Automatic Maintenance...
wizard at the database level in the DB2 Control Center.
One approach to manually updating statistics on all tables in the database is use the REORGCHK
command. Dynamic SQL, such as that produced by JDBC, will immediately take the new
61
statistics into account. Static SQL, like that in stored procedures, must be explicitly rebound in
the context of the new statistics. Here is an example which performs these steps to gather basic
statistics on database DBNAME:
db2 connect to DBNAME
db2 reorgchk update statistics on table all
db2 connect reset
db2rbind DBNAME all
The REORGCHK and rebind (db2rbnd) should be executed when the system is relatively idle so
that a stable sample may be acquired and to avoid possible deadlocks in the catalog tables.
It is generally better to gather additional statistics, so also consider the following command for
every table requiring attention:
runstats on table <schema>.<table> with distribution and detailed indexes
4.5.11.2 Set Buffer Pool Sizes Correctly
A buffer pool is an area of memory into which database pages are read, modified, and held during
processing. Buffer pools improve database performance. If a needed page of data is already in
the buffer pool, that page is accessed faster than if the page had to be read directly from disk. As
a result, the size of the DB2 buffer pools is critical to performance.
The amount of memory used by a buffer pool depends upon two factors: the size of buffer pool
pages and the number of pages allocated. Buffer pool page size is fixed at creation time and may
be set to 4, 8, 16 or 32 KB. The most commonly used buffer pool is IBMDEFAULTBP which
has a 4 KB page size.
Note that all buffer pools reside in database global memory, allocated on the database machine.
The buffer pools must coexist with other data structures and applications, all without exhausting
available memory. In general, having larger buffer pools will improve performance up to a point
by reducing I/O activity. Beyond that point, allocating additional memory no longer improves
performance.
DB2 V9.1 and beyond provide self tuning memory management, which includes managing buffer
pool sizes. This is controlled globally by the self_tuning_mem database level parameter, which is
ON by default. Individual buffer pools can be enabled for self tuning using SIZE AUTOMATIC
at CREATE or ALTER time.
To choose appropriate buffer pool size settings manually, monitor database container I/O activity,
by using system tools or by using DB2 buffer pool snapshots. Be careful to avoid configuring
large buffer pool size settings which lead to paging activity on the system.
4.5.11.3 Maintain Proper Table Indexing
The DB2 Design Advisor, available from the Control Center, provides recommendations for
schema changes, including changes to indexes. It can be launched from the menu presented when
right-clicking on a database in the left column.
4.5.11.4 Size Log Files Appropriately
When using circular logging, it is important that the available log space permits dirty pages in the
bufferpool to be cleaned at a reasonably low rate. Changes to the database are immediately
written to the log, but a well tuned database will coalesce multiple changes to a page before
62
eventually writing that modified page back to disk. Naturally, changes recorded only in the log
cannot be overwritten by circular logging. DB2 detects this condition and forces the immediate
cleaning of dirty pages required to allow switching to a new log file. While this mechanism
protects the changes recorded in the log, all application logging must be suspended until the
necessary pages are cleaned.
DB2 works to avoid pauses when switching log files by proactively triggering page cleaning
under control of the database level softmax parameter. The default value of 100 for softmax
begins background cleaning activities when the gap between the current head of the log and the
oldest log entry recording a change to a dirty page exceeds 100% of one log file in size. In
extreme cases this asynchronous page cleaning cannot keep up with log activity, leading to log
switch pauses which degrade performance.
Increasing the available log space gives asynchronous page cleaning more time to write dirty
bufferpool pages and avoid log switch pauses. A longer interval between cleanings allows
multiple changes to be coalesced on a page before it is written, which reduces the required write
throughput by making page cleaning more efficient.
Available logspace is governed by the product of log file size and the number primary log files,
which are configured at the database level. logfilsiz is the number of 4K pages in each log file.
logprimary controls the number of primary log files. The Control Center also provides a
Configure Database Logging... wizard.
As a starting point, try using 10 primary log files which are large enough that they do not wrap
for at least a minute in normal operation.
Increasing the primary log file size does have implications for database recovery. Assuming a
constant value for softmax, larger log files mean that recovery may take more time. The softmax
parameter can be lowered to counter this, but keep in mind that more aggressive page cleaning
may also be less efficient. Increasing softmax gives additional opportunities for write coalescing
at the cost of longer recovery time.
The default value softmax is 100, meaning that the database manager will attempt to clean pages
such that a single log file needs to be processed during recovery. For best performance, we
recommend increasing this to 300, meaning that 3 log files may need processing during recovery:
db2 update db config for yourDatabaseName using softmax 300
4.5.11.5 Use SMS for Tablespaces Containing Large Objects
When creating REGULAR or LARGE tablespaces in DB2 V9.5 (and above) which contain
performance critical LOB data, we recommend specifying MANAGED BY SYSTEM to gain the
advantages of cached LOB handling in SMS.
Among WebSphere BPM products, this consideration applies to:
-- WPS: the Process Choreagrapher database, sometimes called BPEDB.
-- WPS and WESB: databases backing service integration bus message engine data stores.
For background, see the section Avoid Double Buffering <KGK -- need a link here/> above. A
detailed explanation follows.
DB2 tablespaces can be configured with NO FILE SYSTEM CACHING, which in many cases
improves system performance.
63
64
65
DB2 Version 9.7 supports new query semantics which always return the committed value of the
data at the time the query is submitted. This support is ON by default for newly created
databases. We found that performance improved in some cases when we disabled the new
behavior, reverting to the original DB2 query semantics:
db2 update db config for yourDatabaseName using cur_commit disabled
4.5.11.11
The following link discusses "Specifying initial DB2 database settings" with examples of creating
SMS tablespaces for the BPEDB. It also contains useful links for "Planning the BPEDB
database" and "Fine-tuning the Business Process Choreographer database"
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.web
sphere.bpc.doc/doc/bpc/t5tuneint_spec_init_db_settings.html
This link discusses "Creating a DB2 for Linux, UNIX, and Windows database for Business
Process Choreographer" and gives details on BPEDB database creation, including pointers to
useful creation scripts for a production environment.
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.bpc.doc/doc/bpc/t2codbdb.html
For our SOABench2008 OutSourced Mode workload, we achieved better throughput by dropping
several indexes from the ACTIVITY_INSTANCE_B_T table, as recommended by the Design
Advisor. This is a concrete example of how proper indexing is workload dependant. These same
indexes may be important for many other Process Choreographer workloads.
4.5.12
As with DB2, providing a comprehensive Oracle database tuning guide is beyond the scope of
this report. However, there are a few general rules of thumb that can assist in improving the
performance of Oracle environments. In the sections below, we discuss these rules, and provide
pointers to more detailed information. In addition, the following references are useful:
Oracle Database 11g Release 1 documentation (includes a Performance Tuning Guide):
http://www.oracle.com/pls/db111/homepage
A white paper discussing Oracle on AIX:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100883
4.5.12.1 Update Database Statistics
Oracle provides an automatic statistics gathering facility, which is enabled by default.
One approach to manually updating statistics on all tables in a schema is to use the dbms_stats
utility:
execute dbms_stats.gather_schema_stats( ownname
=> your_schema_name', -
options
=> DBMS_STATS.AUTO_SAMPLE_SIZE, -
cascade
=> TRUE, -
method_opt
degree
=> 15);
66
67
workload after adjusting our schema such that the SERVICE_CONTEXT column of the
PROCESS_CONTEXT_T table was CACHED, e.g.:
alter table process_context_t modify service context cache
4.5.13
Because the WebSphere BPM product set is written in Java, the performance of the Java Virtual
Machine (JVM) has a significant impact on the performance delivered by these products. JVMs
externalize multiple tuning parameters that may be used to improve both authoring and runtime
performance. The most important of these are related to garbage collection and setting the Java
heap size. This section will deal with these topics in detail.
Note that the products covered in this report utilize IBM JVMs on most platforms (AIX, Linux,
Windows, etc.), and the HotSpot JVMs on selected other systems, such as Solaris and HP/UX.
Vendor specific JVM implementation details and settings will be discussed as appropriate. Also
note that all BPM v7 products in this document use Java 6. It has characteristics similar to Java 5
used in the BPM v6.1 and v6.2.0 products, but much different from Java 1.4.2 used by V6.0.2.x
and earlier releases. For brevity, only Java 6 tuning is discussed here.
Following is a link to the IBM Java 6 Diagnostics Guide:
http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp
The guide referenced above discusses many more tuning parameters than those discussed in this
report, but most are for specific situations and are not of general use. For a more detailed
description of IBM Java 6 garbage collection algorithms, please see Section Memory
Management in the chapter titled Understanding the IBM SDK for Java.
Sun HotSpot JVM references follow:
The following URL provides a useful summary of HotSpot JVM options for Solaris:
http://java.sun.com/docs/hotspot/VMOptions.html
The following URL provides a useful FAQ about the Solaris HotSpot JVM:
http://java.sun.com/docs/hotspot/PerformanceFAQ.html#20
For more performance tuning information of Suns HotSpot JVM, follow the URL below.
http://java.sun.com/docs/performance/
68
69
same system (for example, if you run both WPS and WID on the same system), then you should
also read the next section, 4.5.13.3. If your objective is to support large Business Objects, read
Section 4.5.2.
For most production applications, the IBM JVM Java heap size defaults are too small and should
be increased. In general the HotSpot JVM default heap and nursery size are also too small and
should be increased (we will show how to set these parameters later).
There are several approaches to setting optimal heap sizes. We describe here the approach that
most applications should use when running the IBM JVM on AIX. The essentials can be applied
to other systems. Set the initial heap size (-Xms option) to something reasonable (for example,
256 MB), and the maximum heap size (-Xmx) option to something reasonable, but large (for
example, 1024 MB). Of course, the maximum heap size should never force the heap to page. It
is imperative that the heap always stays in physical memory. The JVM will then try to keep the
GC time within reasonable limits by growing and shrinking the heap. The output from verbosegc
should then be used to monitor GC activity.
If Generational Concurrent GC is used (-Xgcpolicy:gencon), the new area size can also be set to
specific values. By default, the new size is a quarter of the total heap size or 64 MB, whichever is
smaller. For better performance, the nursery size should be - 1/2 of the heap size or larger, and
it should not be capped at 64MB. New area sizes are set by JVM options: -Xmn<size>, Xmns<initialSize>, and -Xmnx<maxSize>.
A similar process can be used to set the size of HotSpot heaps. In addition to setting the minimum
and maximum heap size, you should also increase the nursery size to approximately - 1/2 of the
heap size. Note that you should never increase the nursery to more than 1/2 the full heap. The
nursery size is set using the MaxNewSize and NewSize parameters (that is,
-XX:MaxNewSize=128m, -XX:NewSize=128m).
After the heap sizes are set, verbosegc traces should then be used to monitor GC activity. After
analyzing the output, modify the heap settings accordingly. For example, if the percentage of time
in GC is high and the heap has grown to its maximum size, throughput may be improved by
increasing the maximum heap size. As a rule of thumb, greater than 10% of the total time spent in
GC is generally considered high. Note that increasing the maximum size of the Java heap may
not always solve this type of problem as it is could be a memory over-usage problem.
Conversely, if response times are too long due to GC pause times, decrease the heap size. If both
problems are observed, an analysis of the application heap usage is required.
4.5.13.3 Setting the Heap Size when running multiple JVMs on one system
Each running Java program has a heap associated with it. Therefore, if you have a configuration
where more than one Java program is running on a single physical system, setting the heap sizes
appropriately is of particular importance. An example of one such configuration is when the
WID is on the same physical system as WPS. Each of these is a separate Java program that has
its own Java heap. If the sum of all of the virtual memory usage (including both Java Heaps as
well as all other virtual memory allocations) exceeds the size of physical memory, the Java heaps
will be subject to paging. As previously noted, this causes total system performance to degrade
significantly. To minimize the possibility of this occurring, use the following guidelines:
Based on the verbosegc trace output, set the initial heap size to a relatively low value.
For example, assume that the verbosegc trace output shows that the heap size grows
70
quickly to 256 MB, and then grows more slowly to 400 MB and stabilizes at that
point. Based on this, set the initial heap size to 256 MB (-Xms256m).
Based on the verbosegc trace output, set the maximum heap size appropriately. Care
must be taken to not set this value too low, or Out Of Memory errors will occur; the
maximum heap size must be large enough to allow for peak throughput. Using the
above example, a maximum heap size of 768 MB might be appropriate (-Xmx768m).
This is to give the Java heap head room to expand beyond its current size of 400
MB if required. Note that the Java heap will only grow if required (e.g. if a period of
peak activity drives a higher throughput rate), so setting the maximum heap size
somewhat higher than current requirements is generally a good policy.
Be careful to not set the heap sizes too low, or garbage collections will occur
frequently, which might reduce throughput. Again, a verbosegc trace will assist in
determining this. A balance must be struck so that the heap sizes are large enough
that garbage collections do not occur too often, while still ensuring that the heap sizes
are not cumulatively so large as to cause the heap to page. This balancing act will, of
course, be configuration dependent.
71
The IBM JVM threading and synchronization components are based upon the AIX POSIX
compliant Pthreads implementation. The following environments variables have been found to
improve Java performance in many situations and have been used for the workloads in this
document. The variables control the mapping of Java threads to AIX Native threads, turn off
mapping information, and allow for spinning on mutex (mutually exclusive) locks.
export AIXTHREAD_COND_DEBUG=OFF
export AIXTHREAD_MUTEX_DEBUG=OFF
export AIXTHREAD_RWLOCK_DEBUG=OFF
export AIXTHREAD_SCOPE=S
export SPINLOOPTIME=2000
4.5.14
Power management is becoming common in processor technology; both Intel and Power core
processors now have this capability. This capability delivers obvious benefits, but it can also
decrease system performance whan a system is under high load, so consider whether or not to
enable power management. Using POWER6 hardware as an example, ensure that Power Saver
Mode is not enabled, unless desired. One way to modify or check this setting on AIX is through
the Power Management window on the HMC.
4.5.15
Note that the tuning below is unique to workloads migrated using the WICS migration wizard in
the WID. In addition to the tuning specified below, please follow the other WPS tuning
recommendations detailed in this document.
For JMS based messaging used to communicate with legacy WBIA adapters or custom
adapters , make use of non-persistent queues when possible.
For JMS based messaging used to communicate with legacy WBIA adapters or custom
adapters, make use of Websphere MQ based queues if available. By default, the adapters
use the MQ APIs to connect to the SIB based destinations via MQ Link. MQ Link is a
protocol translation layer which converts messages to and from MQ based clients. By
switching to Websphere MQ based queues, MQLink translation costs will be eliminated
and therefore performance will be improved.
Turn off server logs for verbose workloads. Some workloads emit log entries for every
transaction thus causing constant disk writes reducing overall throughput. Explore the
possibility of turning off server logs to reduce the throughput degradation for such
workloads.
72
73
1 core Win2008: 109 CCPS, about the same as Linux (108 CCPS).
4 cores Win2008: 390 CCPS, 3% faster than Linux (379 CCPS), indicating an SMP
scalability factor for Win2008 of 3.6x and for Linux a scalability factor of 3.5x.
8 cores Win2008: 694 CCPS, 4% faster than Linux (665 CCPS), indicating an SMP
scalability factor for Win2008 of 6.4x and for Linux a scalability factor of 6.2x.
800
6.4x
700
6.2x
600
Win 2008
500
400
3.5x
300
200
100
0
1 core
4 cores
8 cores
CPU Utilization 96% - 100% across all Bars
Scaling Factor on each Multi-Processor Bar
Hyperthreading not supported
Measurement Configuration
WPS
Driver,SOABench
Services1
SOABenchServices2
74
4 cores Win2008: 26.6 CCPS, 17% slower than Linux (32.0 CCPS), indicating a SMP
scalability factor for Win2008 of 2.9x and for Linux a scalability factor of 3.4x.
To achieve optimal throughput, changes were made to the indexes of the BPE DB by following
the recommendations of the DB2 Design advisor.
3.4x
30
2.9x
25
Win2008
20
15
10
5
0
1 core
4 cores
CPU Utilization 97% -100% across all Bars
Scaling Factor on each Multi-Processor Bar
Hyperthreading not supported
Measurement Configuration
WPS
Driver,SOABench
Services1
SOABenchServices2
DB2
75
2500
90%
11.9x
2000
95%
7.3x
1500
1000
98%
4.0x
500
0
4 cores
8 cores
16 cores
CPU Utilization and Scaling Shown Above Each Bar
Simultaneous Multithreading (SMT) enabled
Measurement Configuration
HTTP Server,
SOABench Driver
SOABench Services,
Active MEs, ME DB
WPS Applications
Cluster Members
POWER6 4.7
GHz - D
76
1x
8 core
Power5
8 core
Power5
8 core
16 core
16 core
AppCluster
SOABench
Services
SOABench
BPEL
App
Micro
Flow
MECluster
IBM HTTP
Server with
WebSphere
Plugin
SOABench
Automated
Driver
Active
MEs
DB2
(BPE)
DB2
(MEs)
DB2
(WPS)
77
120
93%
6.8x
100
80
98%
4.0x
60
40
20
0
4 cores
8 cores
Measurement Configuration
SOABench Driver,
HTTP Server,
SOABench Agent/
Outsourced Services
SOABench Services,
Active MEs, ME DB
POWER6 4.7
GHz - D
78
1x
8 core
Power5
8 core
Power5
SOABench
Agent and
Outsourced
Services
SOABench
Outsourced
Controller
(Driver)
8 core
16 core
16 core
AppCluster
SOABench
Services
SOABench
BPEL
App
Micro
Flow
MECluster
Async
Active
MEs
IBM HTTP
Server with
WebSphere
Plugin
DB2
(MEs)
DB2
(BPE)
Macro
Flow
DB2
(WPS)
79
6000
5000
97%
5.9x
4000
98%
4.0x
3000
98%
2.0x
2000
1000
98%
0
1 node
2 nodes
4 nodes
6 nodes
8 nodes
Measurement Configuration
HTTP Server,
SOABench Driver
SOABench Services,
Active MEs, ME DB
WPS Applications
Cluster Members
POWER6 4.7 GHz A
80
8x
8 core
Power5
8 core
Power5
8 core
16 core
4 core
IBM HTTP
Server with
WebSphere
Plugin
ServicesCluster
SOABench
Services
AppCluster
SOABench
Services
Micro
Flow
MECluster
IBM HTTP
Server with
WebSphere
Plugin
SOABench
Automated
Driver
Active
MEs
DB2
(MEs)
SOABench
BPEL
App
DB2
(BPE)
DB2
(WPS)
81
350
300
95%
3.9x
250
200
96%
2.0x
150
100
98%
50
0
1 node
2 nodes
4 nodes
6 nodes
Measurement Configuration
SOABench Driver,
HTTP Server,
SOABench Agent/
Outsourced Services
WPS Applications
Cluster Members
82
6x
8 core
Power5
8 core
Power5
SOABench
Agent and
Outsourced
Services
SOABench
Outsourced
Controller
(Driver)
8 core
16 core
4 core
AppCluster
SOABench
Services
SOABench
BPEL
App
Micro
Flow
MECluster
Async
Active
MEs
IBM HTTP
Server with
WebSphere
Plugin
DB2
(MEs)
DB2
(BPE)
Macro
Flow
DB2
(WPS)
83
-1, but we did not use resource sets to bind processes to processors, and we also did not use
memory affinity. It is likely that further tuning will produce better POWER7 results.
5.1.5.2 Results
The SOABench 2008 Automated Approval workload was used in this study. Results are as
follows:
1600
97%
1400
5.39x
1200
97%
1000
800
99%
600
99%
99%
400
200
100%
100%
3.72x
98%
5.62x
Power6
Power7
3.97x
1.92x
1.98x
0
1 core
2 core
4 core
6 core
Measurement Configuration
WebSphere Process Server
Driver
84
85
86
Measurement Configuration
WebSphere ESB
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
5000
97%
Reqs/sec
4000
97%
V6.2
V7.0.0.1
95%
97%
3000
96%
96%
2000
96%
1000
97%
77%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing The response message is passed through unmediated and as a result
is not parsed.
87
95%
4000
3500
96%
96%
Reqs/sec
3000
95%
2500
V6.2
95%
2000
96%
V7.0.0.1
97%
1500
96%
1000
500
97%
88%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation transforms the request message from one schema to another using the XSL
Transform primitive. It performs the reverse transform on the response message using the XSL
Transform primitive. The schemas are largely the same but the name of an element differs and the
two schemas have different namespaces. The request and response flows are eligible for deferred
parsing.
This chart highlights a major performance improvement between the 6.2 and 7.0.0.1
releases. The improvement will affect all mediations with JAX-WS bindings on the Export
and Import components and are eligible for deferred parsing.
88
97%
97%
1400
Reqs/sec
1200
97%
97%
1000
97%
800
V6.2
V7.0.0.1
98%
97%
600
98%
400
200
97% 99%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation transforms the request message from one schema to another using the XSL
Transform primitive. It performs the reverse transform on the response message using the XSL
Transform primitive. The schemas are completely different but contain similar data which is
mapped from one to the other. In addition to the transform, a value from the request message is
stored in a context header and set in the response message. The use of the context header means
that the transform operates on the root of the message rather than the body and hence the request
and response flows are not eligible for deferred parsing.
89
5000
96%
Reqs/sec
4000
97%
3000
V6.2
V7.0.0.1
95%
97%
96%
97%
2000
97%
1000
96%
77%
0
Base in/Base out
16 CORE
CPU Utilization Show n Above Each Bar
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.
The response message is passed through unmediated and as a result is not parsed.
97%
96%
Reqs/sec
4000
97%
96%
95%
V6.2
V7.0.0.1
96%
3000
96%
96%
2000
1000
86% 85%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
90
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a string in the SOAP or JMS header. The Web Services workload uses the
Internationalization Context header. The JMS workload uses the JMSCorrelationId header field.
The request message processing does not use the message body which is therefore not parsed.
The response message is passed through unmediated and as a result is not parsed.
97%
97%
97%
3000
Reqs/sec
97%
2500
97%
V6.2
97%
2000
97%
1500
V7.0.0.1
97%
1000
500
97%
97%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation sets the value of a single element in the request message using the Message
Element Setter primitive. The request message processing is not eligible for deferred parsing. The
response message is passed through unmediated and as a result is not parsed.
If your mediation flow is otherwise eligible for deferred parsing an alternative to using Message
Element Setter primitive would be to use an XSL Transform primitive to set the required field.
This would then be eligible for deferred parsing.
91
98%
98%
3000
98%
Reqs/sec
2500
98%
2000
V6.2
V7.0.0.1
98%
98%
1500
98%
98%
1000
500
98% 98%
0
Base in/Base out
16 CORE
CPU Utilization Show n Above Each Bar
This mediation uses the Business Object Map primitive to map the body of the request message
into a new Business Object. The request message processing is not eligible for deferred parsing.
The response message is passed through unmediated and as a result is not parsed.
Service Invoke Mediation - Windows
7000
97%
97%
6000
Reqs/sec
5000
97%
97%
97%
4000
V6.2
97%
3000
97%
V7.0.0.1
96%
2000
1000
97% 98%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation invokes the SOABench target from a Service Invoke primitive. The request
message is unprocessed but is wired to a service invoke which in turn is wired directly to the
inputResponse node. The request message processing is eligible for deferred parsing and there is
no separate response flow. A similar functional flow can be achieved by having separate request
92
and response flows with no mediation primitives. Testing has shown that these two approaches
are equivalent in performance terms.
2500
97%
97%
Reqs/sec
2000
96%
4 Fans V6.2
96%
97%
97%
1500
95%
1000
97%
97%
96%
97%
4 Fans V7.0.0.1
97%
97%
98%
96%
96%
96%
97%
97%
96%
96%
500
97%
97%
96%
98% 98%
97% 96% 95%96%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation invokes multiple SOABench services, sets a field in each response, merges the
responses and transforms the merged response. The request message processing examines a field
in the message to establish the number of fan outs (service calls) to invoke. Some additional
processing primitives are wired into the flow (see section 11.1 for details ) and the response from
the fan-in is wired directly to the inputResponse node as the service calls have already been
made. There is no separate response flow. The mediation is not eligible for deferred parsing.
93
1600
97%
1400
Reqs/sec
1200
97%
1000
800
98%
97%
97%
V6.2
V7.0.0.1
98%
600
97%
400
200
96% 98%
0
Base in/Base out
16 CORE
CPU Utilization Show n Above Each Bar
This mediation processes the request message with several mediation primitives. First a message
filter checks a field for authentication, next a custom mediation logs the message to the console,
this is followed by a routing filter using a value in the body of the message and finally an XSLT
primitive transforms the message. The request message is not eligible for deferred parsing.
The response message is passed through unmediated and as a result is not parsed.
96%
97%
1000
Reqs/sec
800
97%
98%
98%
97%
600
V6.2
V7.0.0.1
98%
98%
400
200
98%
98%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
94
95
Measurement Configuration
JMS Producer/Consumer
WebSphere ESB
Intel 2.8GHz - B
Intel 3.0GHz - D
98%
Reqs/Sec
2000
1500
6.2.0.1
97%
1000
7.0.0.1
98%
500
97%
98%
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing.
96
98%
98%
1800
1600
Reqs/Sec
1400
1200
1000
97%
6.2.0.1
98%
7.0.0.1
800
600
400
97%
200
98%
0
1K
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.
96%
3000
96%
97%
Reqs/Sec
2500
2000
6.2.0.1
7.0.0.1
1500
1000
82%
73%
500
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
97
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries the JMSCorrelationId property in the JMS header. The request message processing
does not use the message body which is therefore not parsed.
99%
800
700
Reqs/Sec
600
500
6.2.0.1
7.0.0.1
400
99%
300
98%
200
100
99%
99%
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
This mediation transforms the request message from one schema to another using the XSL
Transform primitive. The schemas are completely different but contain similar data which is
mapped from one to the other. In addition to the transform, a value from the request message is
stored in a context. The use of the context header means that the transform operates on the root of
the message rather than the body and hence the request flow is not eligible for deferred parsing.
98
98%
98%
700
Reqs/Sec
600
500
6.2.0.1
400
98%
300
7.0.0.1
99%
200
100
97%
99%
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
This mediation uses a Filter primitive to query a field in the body of the request message, this is
followed by a Custom Mediation primitive which could be used for custom logging (no logging
takes place in this scenario to prevent IO contention). A further Filter primitive is then used to
route the message to one of two XSLT Transformation primitives. For this scenario the
transformation as detailed in the Schema Mediation is used.
The request message processing is not eligible for deferred parsing (because the body is parsed
for the initial Filter) or native form reuse (as the body may have been changed in the Custom
Mediation).
99
Measurement Configuration
JMS
Producer/Consumer
WebSphere ESB
DB2
Intel 2.8GHz - B
Intel 3.0GHz - D
Intel 3.5GHz - A
94%
1200
Reqs/Sec
1000
800
95%
97%
6.2.0.1
7.0.0.1
600
400
200
97%
97%
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing.
100
93%
96%
Reqs/Sec
1000
800
6.2.0.1
97%
96%
7.0.0.1
600
400
200
95%
99%
0
1K
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.
93%
92%
1400
91%
93%
Reqs/Sec
1200
1000
6.2.0.1
800
7.0.0.1
600
400
88%
77%
200
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
101
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries the JMSCorrelationId property in the JMS header. The request message processing
does not use the message body which is therefore not parsed.
96%
600
Reqs/Sec
500
400
6.2.0.1
300
99%
99%
7.0.0.1
200
100
98%
99%
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
This mediation transforms the request message from one schema to another using the XSL
Transform primitive. The schemas are completely different but contain similar data which is
mapped from one to the other. In addition to the transform, a value from the request message is
stored in a context. The use of the context header means that the transform operates on the root of
the message rather than the body and hence the request flow is not eligible for deferred parsing.
102
96%
600
Reqs/Sec
500
400
6.2.0.1
300
99%
99%
7.0.0.1
200
100
98%
99%
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved
10K
4 CORE
100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
This mediation uses a Filter primitive to query a field in the body of the request message, this is
followed by a Custom Mediation primitive which could be used for custom logging (no logging
takes place in this scenario to prevent IO contention). A further Filter primitive is then used to
route the message to one of two XSLT Transformation primitives. For this scenario the
transformation as detailed in the Schema Mediation is used.
The request message processing is not eligible for deferred parsing (because the body is parsed
for the initial Filter) or native form reuse (as the body may have been changed in the Custom
Mediation).
103
Measurement Configuration
Web Services Client
WebSphere ESB
Intel 3.67GHz - C
PPC 4.2GHz - A
PPC 4.2GHz - B
99%
450
89%
400
99% 99%
100%
Reqs/sec
350
300
100%
98%
250
T
200
V6.2
V7.0.0.1
99%
150
100
50
100%
99%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing The response message is passed through unmediated and as a result
is not parsed.
104
500
100%
100%
Reqs/sec
400
100%
99%
300
V6.2
99%
99%
200
V7.0.0.1
100%
99%
100
99%
99%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation transforms the request message from one schema to another using the XSL
Transform primitive. It performs the reverse transform on the response message using the XSL
Transform primitive. The schemas are largely the same but the name of an element differs and the
two schemas have different namespaces. The request and response flows are eligible for deferred
parsing.
This chart highlights a major performance improvement between the 6.2.0 and 7.0.0.1
releases. The improvement will affect all mediations which have a deferred parsing eligible
transform and which use document literal wsdl.
105
Reqs/sec
200
150
99%
V6.2
100%
100%
100
V7.0.0.1
100%
100%
100%
50
99% 100%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation transforms the request message from one schema to another using the XSL
Transform primitive. It performs the reverse transform on the response message using the XSL
Transform primitive. The schemas are completely different but contain similar data which is
mapped from one to the other. In addition to the transform, a value from the request message is
stored in a context header and set in the response message. The use of the context header means
that the transform operates on the root of the message rather than the body and hence the request
and response flows are not eligible for deferred parsing.
106
100%
500
Reqs/sec
99%
100%
400
100%
100%
99%
300
V6.2
99%
V7.0.0.1
200
100
99% 100%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a string in the SOAP or JMS header. The Web Services workload uses the
Internationalization Context header. The JMS workload uses the JMSCorrelationId header field.
The request message processing does not use the message body which is therefore not parsed.
The response message is passed through unmediated and as a result is not parsed.
107
100%
400
99%
Reqs/sec
99%
100%
300
V6.2
99%
100%
V7.0.0.1
99%
200
100
100%
100%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation routes the request message to the target service using the Filter primitve. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.
The response message is passed through unmediated and as a result is not parsed.
108
100%
100%
400
99% 99%
Reqs/sec
350
300
100%
250
V6.2
99%
V7.0.0.1
100%
200
99%
150
100
50
100%
100%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation sets the value of a single element in the request message using the Message
Element Setter primitive. The request message processing is not eligible for deferred parsing. The
response message is passed through unmediated and as a result is not parsed.
If your mediation flow is otherwise eligible for deferred parsing an alternative to using Message
Element Setter primitive would be to use an XSL Transform primitive to set the required field.
This would then be eligible for deferred parsing.
109
Reqs/sec
200
100% 100%
150
V6.2
100%
100%
100
99%
V7.0.0.1
99%
100%
100%
50
99% 100%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation processes the request message with several mediation primitives. First a message
filter checks a field for authentication, next a custom mediation logs the message to the console,
this is followed by a routing filter using a value in the body of the message and finally an XSLT
primitive transforms the message. The request message is not eligible for deferred parsing.
The response message is passed through unmediated and as a result is not parsed.
110
100% 99%
120
Reqs/sec
100
80
100%
99%
99%
99%
V6.2
V7.0.0.1
60
99%
100%
40
20
99% 99%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010
All i ht
d
This mediation is identical in function to the preceding composite mediation but the primitives
are in separate modules linked by SCA bindings. The request message is not eligible for deferred
parsing.
The response message is passed through unmediated but unlike the composite mediation it is not
eligible for deferred parsing as a result of passing back through the SCA bindings.
111
100%
96%
350
98%
300
99%
Reqs/sec
250
200
V6.2
150
100%
V7.0.0.1
91%
98%
99%
100
50
83% 99%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation uses the Business Object Map primitive to map the body of the request message
into a new Business Object. The request message processing is not eligible for deferred parsing.
The response message is passed through unmediated and as a result is not parsed.
112
100%
100%
600
99%
Reqs/sec
500
99%
100%
100%
400
V6.2
99%
99%
V7.0.0.1
300
200
100
100% 98%
0
Base in/Base out
1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved
This mediation invokes the SOABench target from a Service Invoke primitive. The request
message is unprocessed but is wired to a service invoke which in turn is wired directly to the
inputResponse node. The request message processing is eligible for deferred parsing and there is
no separate response flow. A similar functional flow can be achieved by having separate request
and response flows with no mediation primitives. Testing has shown that these two approaches
are equivalent in performance terms.
113
Measurement Configuration
JMS Producer/Consumer
WebSphere ESB
Intel 3.67GHz - A
PPC 4.2GHz - A
95%
95%
Reqs/Sec
6000
5000
6.2
4000
97%
96%
7.0.0.1
3000
2000
1000
97%
96%
0
Base
Copyright IBM Corporation 2009, 2010. All rights reserved
10
8 CORE
100
CPU Utilization Shown Above Each Bar
Simultaneous Multithreading (SMT) Enabled
This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing. The response message is passed through unmediated and as a result
is not parsed.
114
Reqs/Sec
5000
97%
4000
6.2
97%
3000
7.0.0.1
97%
2000
1000
97%
98%
0
Base
Copyright IBM Corporation 2009, 2010. All rights reserved
10
8 CORE
100
CPU Utilization Shown Above Each Bar
Simultaneous Multithreading (SMT) Enabled
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.
115
Measurement Configuration
JMS Producer/Consumer
WebSphere ESB
DB2
Intel 3.67GHz - A
PPC 4.2GHz - A
PPC 4.2GHz - B
91%
3500
Reqs/Sec
3000
2500
6.2
2000
7.0.0.1
1500
1000
34%
30%
500
77%
68%
0
Base
Copyright IBM Corporation 2009, 2010. All rights reserved
10
8 CORE
100
CPU Utilization Shown Above Each Bar
Simultaneous Multithreading (SMT) Enabled
This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing. The response message is passed through unmediated and as a result
is not parsed.
116
90%
92%
Reqs/Sec
3000
2500
6.2
2000
7.0.0.1
1500
1000
42%
37%
500
83%
71%
0
Base
Copyright IBM Corporation 2009, 2010. All rights reserved
10
8 CORE
100
CPU Utilization Shown Above Each Bar
Simultaneous Multithreading (SMT) Enabled
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.
6.2.4
117
The results below are for the SOABench 2008 Transform Schema workload running on AIX
using the 10K message lengths. They show the SMP scaling achieved when running the ESB
Server in 3 different configurations: 1-way, 4-way and 8-way. Note that simultaneous multithreading (SMT) is enabled for all measurements.
The measurement configuration below was used for all measurements:
Measurement Configuration
Web Services Client
WebSphere ESB
Intel 3.67GHz - C
PPC 4.2GHz - A
PPC 4.2GHz - B
Reqs/sec
400
98%
300
1 Core
4 Cores
200
100
8 Cores
3.8x
98%
0
Scenario - Transform Schema (10K in /10K out)
3 human tasks
7 KPIs
11 metrics
3 cubes
Durations reported here are averages of multiple measurements, gathered from an analysis of
messages logged in the server during deployment. The first deploy operation after startup is not
included in the average. This reflects the typical user experience during interactive process
design. We note that the first deploy operation after startup, while taking somewhat longer due to
one-time initialization costs, also benefits substantially from the improvements delivered in
V7.0.0.0.
In the topology used for these measurements, WB Modeler client and WB Monitor server
machines are connected to the same subnet of a shared (non-private) network at 100 Mbps.
118
V7.0
Measurement Configuration
WB Modeler Client
WB Monitor Server
Deployment time is reduced in V7.0.0.0 to less than half of the time needed in V6.2 due to
several improvements, notably:
Exploiting the new EJB 3.0 support available in the WebSphere V7 Application Server
which underpins the runtime of WB Monitor V7. This eliminates the need for a separate
EJB deploy step.
119
120
121
Time (Seconds)
180
160
140
120
100
80
60
40
156
124
99
98
69
20
0
WID 6.1.0.0
WID 6.1.0.1
WID 6.1.2
WID 6.2.0.1
WID 7.001
2 Core
Copyright IBM Corporation 2010. All rights reserved
Memory (MB)
250
276
272
245
240
200
215
150
100
50
0
WID 6.1.0.0
WID 6.1.0.1
WID 6.1.2
WID 6.2.0.1
2 Core
Copyright IBM Corporation 2010. All rights reserved
WID 7.001
122
Time (Seconds)
400
350
407
300
250
200
203
150
182
100
99
50
86
0
WID 6.1.0.0
WID 6.1.0.1
WID 6.1.2
WID 6.2.0.1
WID 7.001
2 Core
Copyright IBM Corporation 2010. All rights reserved
300
250
281
281
278
273
227
200
150
100
50
0
WID 6.1.0.0
WID 6.1.0.1
WID 6.1.2
WID 6.2.0.1
2 Core
Copyright IBM Corporation 2010. All rights reserved
WID 7.001
123
Time (Seconds)
160
164
140
144
120
100
91
80
60
40
20
0
WID 6.2.0
WID 6.2.0.1
WID 7.001
2 Cores
Copyright IBM Corporation 2010. All rights reserved
Memory (MB)
450
400
417
350
300
250
324
283
200
150
100
50
0
WID 6.2.0
WID 6.2.0.1
2 Cores
Copyright IBM Corporation 2010. All rights reserved
WID 7.001
Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz A
124
125
Time (Seconds)
1000
1018
800
600
611
400
200
0
BPM 6.2 - RoS
2 Core
Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B
126
713
Time (Seconds)
600
500
521
400
538
3.02x
300
200
100
0
BPM 6.2 - RoS
2 Core
Copyright IBM Corporation 2010. All rights reserved
Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B
127
BPM@Work Workload
Deploy Response Time - Windows
500
Time (Seconds)
400
412
300
200
153
100
0
BPM 6.2
Copyright IBM Corporation 2010. All rights reserved
BPM 7.001
2 Core
Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B
Directed Studies
128
9 Directed Studies
This section provides a more detailed exploration of some features, along with development and
deployment options, within WPS, WESB, and WID. Generally, these studies are motivated by
lessons learned in the course of performance analysis of these products, or direct interaction with
WebSphere Business Process Management customers. Each of these studies is meant to illustrate
a set of issues that may be of interest, but is not intended to provide an exhaustive analysis of the
component in question. Several of the studies also support points made in the Architecture Best
Practices and Development Best Practices sections above.
Note that some of the directed studies below contain the same information as was presented in
earlier versions of the performance report; these studies were not repeated using WebSphere
BPM 6.2.0 since the conclusion would not change significantly. The charts and section headers
are clearly labeled to indicate this.
35
30
28.3
26.6
25
Win2008 32-bit
Operating System
Win2008 64-bit
Operating System
20
15
10
5
0
4 cores
Directed Studies
129
Measurement Configuration
WPS
Driver,SOABench
Services1
SOABenchServices2
DB2
Directed Studies
130
550
500
450
400
350
300
6.2.0.1
250
7.0.0.1
200
150
100
50
0
4 core
Measurement Configuration
WebSphere Process Server
POWER6 4.7 GHz D
Driver
Intel Xeon 2.93GHz -A
DB2
POWER6 4.7 GHz D
Directed Studies
131
400
338
350
300
250
230
223
248
32 bit
200
64 bit
150
100
50
0
6.2.0.1
7.0.0.1
Measurement Configuration
WebSphere Process Server
POWER6 4.7 GHz D
Driver
Intel Xeon 2.93GHz -A
DB2
POWER6 4.7 GHz D
users
9.3.1 Introduction
The SOABench 2008 InHouse Claim Processing workload, described in Section 10.4.4, is
evaluated on an IBM xSeries 3950 M2, 2.93 GHz Xeon (4x quad-core) running Windows 2008
Server. This workload is used to demonstrate the throughput and response time characteristics of
WebSphere Process Server business choreography as an increasing number of users are
concurrently processing insurance claims. Before the workload runs, 50,000 process instances
representing existing insurance claim activity are preloaded into the business process
choreography database. The insurance claims are divided equally among 125 regions. Users
belong to a single region and can only process insurance claims from their region, which is
enforced via authentication by a Tivoli Directory Server. Within a region, users are divided into 2
Directed Studies
132
groups, adjusters and underwriters. Of the four human tasks required to complete an insurance
claim, two are done by adjusters and two are done by underwriters.
Users query active process instances for a list of work that they can perform. A work item is
claimed (selected from the list) and then completed by the user. Users think between query,
claim, and complete activities. The think time is random but averages a total of 180 seconds per
human task. The time a user waits for responses to their human task queries, claims and
completes is recorded as response time. The rate at which insurance claims are completed is the
throughput. Once an entire insurance claim is finished, another is added to the region to maintain
active process instances at a constant level.
A multi tier topology was used for this study:
A WPS Server which runs the processes involved in the application scenario.
2 support systems which each run workload generators (client agents) under the direction
of a single client controller. One support system handles asynchronous service requests
and the other handles synchronous service requests by the business processes running on
the WPS Server.
Directed Studies
133
10
3000
Throughput
Query
response
2500
59% cpu
6000
users
2000
1500
8
6
1000
Response Time ms
14
12
99% cpu
8880
users
85% cpu
8400
users
4
500
2
0
24
0
72
12 0
0
16 0
8
21 0
60
26
4
31 0
20
36
0
40 0
8
45 0
60
50
4
55 0
2
60 0
00
64
8
69 0
6
74 0
4
79 0
20
84
0
88 0
8
90 0
40
User Load
Preload:
125 regions with
400 tasks each =
50,000 total tasks.
80 users per region
Total Task
Think Time:
180 seconds
Measurement Configuration
WebSphere Process Server
Intel Xeon 2.93GHz -A
Driver 1
Driver 2
DB2
Directed Studies
134
14
12
56% cpu
10000
users
3000
Query
response
2500
31% cpu
6000
users
10
Throughput
2000
1500
1000
Response Time ms
16
4
500
10
00
0
92
00
84
00
76
00
68
00
60
00
52
00
36
00
44
00
20
00
28
00
0
12
00
40
0
User Load
Preload:
125 regions with
400 tasks each =
50,000 total tasks.
80 users per region
Total Task
Think Time:
180 seconds
Measurement Configuration
WebSphere Process Server
Intel Xeon 2.93GHz -A
Driver 1
Driver 2
DB2
Directed Studies
135
4. Log out
The measurement for the initial iteration of the above steps is discarded, so the results below
utilized a primed browser cache. The results in this study show the average of the subsequent
eight measurement iterations in the browser.
Client hardware
OS:
Windows XP (32bit)
CPU:
Memory:
2 GB RAM
Network:
OS:
CPU:
FSB:
1333 MHz
Memory:
16 GB RAM
HDD:
Network:
Directed Studies
136
Software environment
WPS:
Version 7.0.0.1
DB:
o
The total elapsed time of the migration grows linearly with the number of migrated
instances, as expected: migrating 100 Process Instances takes 4.5 seconds, 1,000 Process
Instances takes 44.7 seconds, 10.000 Process Instances takes 453.8 seconds.
OS:
CPU:
FSB:
1333 MHz
Memory:
16 GB RAM
HDD:
Directed Studies
137
on the duration of the synchronous migrate() method call. A migration is considered complete
after this call completes.
1000
453.8
100
10
44.7
4.5
1
100 Instances
Copyright IBM Corporation 2010. All rights reserved
1,000 Instances
10,000 Instances
changes to the physical representation of work items (that is, changes to the BPC
database schema)
The measurements in sections 9.6.1 and 9.6.2 have been made on the following machine setup:
Two physical machines: WPS server (standalone setup) and remote database (DB2 v9)
IBM xSeries 3650, 4x3.0 Ghz, 16MB Cache, 16GB memory (DB2 server and
WPS server)
Directed Studies
o
138
The measurements are CPU intensive and do not lead to an I/O bottleneck
All measurements have been made with a preloaded database with ~250,000 process instances.
250,000 business processes with one human task in state ready is available in the
database. Group work items are used to assign human tasks. 1,000 users are defined,
divided into 200 groups. A limit of 50 human tasks is returned by each query.
10 simulated users continually execute queries during the measurement interval in order
to measure the average response time. Therefore, the database is executing 10 parallel
queries continuously during the measurement interval.
No special tuning has been applied to WPS beyond that recommended in this report.
Note that the database is the bottleneck for these measurements, running at 100% CPU
utilization. Standard BPC database tuning was applied, described in the following:
WebSphere Process Server V6.1 Business Process Choreographer: Performance
Tuning Automatic Business Processes for Production Scenarios with DB2
http://www.ibm.com/support/docview.wss?uid=swg27012639
Improving the performance of complex BPC API queries on DB2
http://www.ibm.com/support/docview.wss?uid=swg21299450
The following figure shows a screenshot of the query table used for the QueryProperties query
workload:
Directed Studies
139
The following charts summarizes the query response times achieved using WPS 7.0 with query
tables versus the response times achieved using WPS 6.1.2 with the standard query API. As
demonstrated below, WPS 7.0 queries are up to 20 times faster than WPS 6.1.2 due to the query
table optimization. In addition, these results were obtained without using expert-level database
tuning, but rather the standard tuning described in this document and in the links above.
WPS 7.0 Query Tables vs. WPS 6.1.2 Standard Query API
5.7
3.7
Seconds
WPS 6.1.2
0.2
0.27
WPS 7.0
ExternalData query
workload
QueryProperties query
workload
Directed Studies
140
Figure 2: Query workloads results (response time in seconds)
The default set of indexes as provided with the WPS 6.2.0 installation were used, no
additional indexes were created.
Figure 3 shows BPC Explorer query response times obtained using a pre-filled BPC database
with the following characteristics:
100,000 processes with a human task assigned to a group (group work item)
These results demonstrate that BPC Explorer query response times are significantly improved in
WPS 6.2.0 by a factor of up to 7.5 times when compared to WPS 6.1.2.
Directed Studies
141
30
26
25
20
15
9
9
10
4
3
4
6.1.2 Index Structure
0
My ToDos
(Tasks)
1
Administered
By Me (Tasks) Instance Details
(Processes)
Directed Studies
142
300%
250%
WPS 6.0.2.1
200%
WPS 6.1.0.0
WPS 6.2.0.1
150%
WPS 7.0.0.1
100%
50%
0%
4 cores
Note that the chart above uses 2 versions of the SOABench workload; SOABench 2005
Automated Approval Mode and SOABench 2008 Automated Approval Mode. The 2005 version
was used previously to obtain the WPS 6.0.2.1 and 6.1.0 results. The bridge between the 2
different versions of the workload was built by running WPS 6.2.0.1 on both versions of the
workload, and then running WPS 7.0.0.1 on the 2008 version. Therefore, the results presented
above are normalized throughput rather than raw throughput, since the 2 versions of the workload
do not produce comparable throughput SOABench 2008 is more complex, as shown in the
workload descriptions referenced above.
Directed Studies
143
6.2.0 is 3.8 times faster than WPS 6.0.0. In addition, WPS 6.2.0 is 10% faster than WPS 6.1.0.
Note that we expect that WPS 7.0 performs similarly to WPS 6.2.0.
Tuning parameter settings for Banking are described in Appendix A - Banking Settings. One key
configuration difference starting with WPS 6.1.0 is the usage of filestores for the messaging
buses, as opposed to using local databases in previous releases. Another key difference is the use
of WorkManager based navigation and the gencon garbage collection policy in 6.2.0.
99%
99%
4.0
3.5
WPS 6.0.0
WPS 6.0.1
WPS 6.0.1.1
WPS 6.0.2
WPS 6.1.0
WPS 6.2.0
99%
3.0
2.5
98%
2.0
100%
1.5
98%
1.0
0.5
0.0
Measurement Configuration
WebSphere Process Server, DB2
Intel 3.0 GHz A
Directed Studies
144
process instances are preloaded into the business process choreography database. An active
process instance is defined as one not yet completed. It can be in-flight, but it can also be
persisted into the business process choreography database if it is waiting for a response from an
outbound service call. The client driver maintains a constant number of active process instances
by issuing new 3 KB requests as processes in the system are completed.
A three tier topology was used for this study:
A WPS Server which runs the processes involved in the application scenario.
Two client systems. One runs a client driver and an application to handle asynchronous
service requests. The other runs an application to handle synchronous service requests.
As shown below, throughput remains essentially constant as the active number of process
instances is varied between 2,500 and 1,000,000. With 2,500 and 25,000 preloaded process
instances, WPS 7.0.0.1 runs the workload at a rate of 28.4 Claims Completed per Second (CCPS).
With 125,000 and 250,000 process instances preloaded, the workload runs at nearly the same rate,
28.2 and 28.3 CCPS respectively. With 500,000 and 1,000,000 preloaded process instances, the
rate dips very slightly to 28.1 and 27.9 CCPS, respectively.
35
30
25
20
15
10
5
0
2.5K
25K
125K
250K
500K
1000K
Measurement Configuration
WebSphere Process Server
Driver 1
Driver 2
DB2
Directed Studies
145
Workload Throughput
28.4
28.4
28.2
28.3
28.1
27.9
19%
19%
19%
19%
20%
20%
2%
5%
23%
39%
40%
41%
9%
9%
9%
9%
9%
9%
The amount of disk storage needed for the Business Process Choreographer database as the
number of process instances increases is shown in the chart below. This information was obtained
using the DB2 Control Center Storage Manager. The second chart shows the size of a database
backup at various preloads. The backups were created using the command: DB2 BACKUP
DATABASE database.
For both charts a 2x growth in the preloaded tasks results in a 2x growth in the storage
requirements, reaching approximately 77 Gigabytes at 1,000,000 preloaded tasks.
Directed Studies
146
90
76.84
80
70
60
Trend:
As Preload doubles,
Size approxim ately
doubles
50
40
38.63
30
19.19
20
10
10.19
0.30
2.15
2.5K
25K
0
125K
250K
500K
1000K
Size determined using DB2
Storage Manager snapshot
90
77.15
80
70
60
Trend:
As Preload doubles,
Size approxim ately
doubles
50
40
39.71
30
19.11
20
10
11.14
0.32
2.17
2.5K
25K
0
125K
250K
500K
1000K
Size determined using backup
of database saved to disk
Directed Studies
147
The growth of the Business Process Choreographer database depends on the data passing through
the process. As seen above, since the requests passing into the system did not change, database
growth behavior is predictable as more requests are preloaded into the system.
An additional consideration for growth is the definition of the process being handled. A more
complex process can result in greater storage requirements. Numerous tables in the Business
Process Choreographer database are involved in process instance storage.
The pie chart below shows the Kilo Bytes used by tables in the Business Process Choreographer
database per task. The data was extrapolated from a database with 25,000 preloaded SOABench
2008 Outsourced Claim Processing tasks. The storage per task is 91 KB. Thirteen tables make up
the majority of storage used. The SCOPED_VARIABLE_INSTANCE_B_T table and the
ACTIVITY_INSTANCE_B_T table account for 58 KB (64%) of the storage used.
11
11
4
SCOPED_VARIABLE_INSTANCE_B_T
ACTIVITY_INSTANCE_B_T
PROCESS_CONTEXT_T
WORK_ITEM_T
RESTART_EVENT_B_T
4
36
EVENT_INSTANCE_B_T
CORRELATION_SET_PROPERTIES_B_T
TASK_INSTANCE_T
INVOKE_RESULT2_B_T
PROCESS_INSTANCE_B_T
QUERYABLE_VARIABLE_INSTANCE_T
SCOPE_INSTANCE_B_T
RETRIEVED_USER_T
Other
The number of rows in these tables depends on the process definition. The chart below shows the
number of rows in various database tables needed to store a single process instance for this study.
The ACTIVITY_INSTANCE_B_T table uses 16 rows to hold its portion of the process instance.
This corresponds to the 16 activity blocks in the process definition. The
SCOPED_VARIABLE_INSTANCE_B_T table uses 24 rows per process instance. This
corresponds to the number of assignments done by the process.
Directed Studies
148
16x
ACTIVITY_INSTANCE_B_T
CORRELATION_SET_INSTANCE_B_T
CORRELATION_SET_PROPERTIES_B_T
1x
1x
5x
4x
EVENT_INSTANCE_B_T
Table Name
INVOKE_RESULT2_B_T
PARTNER_LINK_INSTANCE_B_T
PROCESS_CONTEXT_T
PROCESS_INSTANCE_B_T
QUERYABLE_VARIABLE_INSTANCE_T
RESTART_EVENT_B_T
RETRIEVED_USER_T
SCOPE_INSTANCE_B_T
2x
1x
1x
1x
1x
2x
1x
24x
SCOPED_VARIABLE_INSTANCE_B_T
TASK_INST_LDESC_T
TASK_INSTANCE_T
1x
1x
WORK_ITEM_T
0.0
Copyright IBM Corporation 2005, 2010. All rights reserved
5x
5.0
10.0
15.0
20.0
25.0
3KB requests and 3KB responses: 390 CCPS, using WPS 7.0.0.1 which represents a 23%
improvement over WPS 6.2.0.1 (318 CCPS).
10KB requests and 10KB responses: 177 CCPS on WPS 7.0.0.1, demonstrating a 57%
improvement over WPS 6.2.0.1 (113 CCPS).
100KB requests and 100KB responses: 23.5 CCPS using WPS 7.0.0.1 which represents a
86% improvement over WPS 6.2.0.1 (12.6 CCPS ).
In addition to the improvements delivered in WPS 7.0.0.1, the other conclusion to draw from the
above data is that throughput drops significantly as BO size increases.
The bar labels on the chart below show the throughput improvement delivered in WPS 7.0.0.1 vs.
6.20.1, rounded to the nearest 10th percentile.
Directed Studies
149
400
1.2x
350
300
WPS 6.2.0.1
250
200
WPS 7.0.0.1
1.6x
150
100
50
1.9x
0
3k-3k
10k-10k
100k-100k
4 cores
Measurement Configuration
WPS
Driver,SOABench
Services1
SOABenchServices2
DB2
Directed Studies
150
Overview
The ability of a single Java Virtual Machine to efficiently use processor cores at high utilization
diminishes as the number of cores increases. To demonstrate this, this study directly compares
the vertical (SMP) and horizontal (clustered) measurements of SOABench 2008 on POWER6
running AIX using data shown previously in sections 5.1.3 and 5.1.4 respectively.
The same numbers of processor cores were used to run both Automated Approval and
OutSourced Modes. Although impressive throughput and scaling rates were achieved in the
single server topology, both workloads demonstrated significant performance gains by applying a
clustered topology where the same number of cores were divided among separate hardware
partitions on which multiple WPS JVMs worked together as cluster members (nodes).
Note that when additional hardware partitions are added, underlying resources are also added
such as: Java heaps, WebSphere log streams, network adapters, TCP stacks, disk adapters, file
systems, etc.
9.10.2
3000
98%
2500
98%
2000
1500
95%
90%
8 cores
8 cores
16 cores
16 cores
1 node
2 nodes x 4 cores
1 node
4 nodes x 4 cores
1000
500
0
Directed Studies
9.10.3
151
OutSourced Mode
140
96%
120
93%
100
80
60
40
20
0
8 cores
8 cores
1 node
2 nodes x 4 cores
Overview
It was recommended earlier in this report that the remote messaging and remote support
deployment environment pattern should be used for maximum flexibility in scaling. However,
there is a new capability in WAS 7.0 that affects message-driver bean (MDB) connection
behavior that is interesting to examine.
This section studies the impact of this MDB connection behavior on performance when measured
in the context of SOABench 2008 OutSourced Mode with a single cluster deployment
environment pattern. For comparison, measurements with a remote messaging and remote
support deployment environment pattern were shown in section 5.1.4.3.
9.11.2
Directed Studies
152
As will happen with the single cluster deployment environment pattern, when an MDB
application is installed in the same cluster with the message engine, it will use MDB connection
behavior dependent upon the value of the alwaysActivateAllMDBs property of the appropriate
activation specification.
See this link for more information:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.pmc
.nd.doc/concepts/cjn_mdb_endpt_overview.html
When this property has a value of false, the MDB will only connect to an active message engine
within the same JVM. When this property has a value of true, the MDB will also connect to an
active message engine on a separate JVM in the cluster. These two behaviors are depicted in the
following two charts.
8 core
Power5
8 core
Power5
8 core
16 core
4 core
4 core
SingleCluster
SOABench
BPEL
App
SOABench
BPEL
App
Micro
Flow
Micro
Flow
Async
Async
Macro
Flow
Macro
Flow
X
Failover
MEs
Active
MEs
Directed Studies
153
8 core
Power5
8 core
Power5
8 core
16 core
4 core
4 core
SingleCluster
SOABench
BPEL
App
SOABench
BPEL
App
Micro
Flow
Micro
Flow
Async
Async
Macro
Flow
Macro
Flow
Failover
MEs
Active
MEs
9.11.3
Topology
For this study, a single cluster contains the application and messaging engine, and this cluster has
two cluster members (nodes). The messages engines will run as failover on one node (left node)
and active on the other node (right node).
Dependent on the property value, the MDB in the left node will or wont connect to the active
message engine in the other JVM. The MDB in the right node will always connect to the active
message engine because it is within the same JVM.
Directed Studies
154
8 core
Power5
8 core
Power5
SOABench
Agent and
Outsourced
Services
SOABench
Outsourced
Controller
(Driver)
8 core
16 core
4 core
4 core
SingleCluster
SOABench
BPEL
App
SOABench
Services
DB2
(BPE)
IBM HTTP
Server with
WebSphere
Plugin
DB2
(MEs)
SOABench
BPEL
App
Micro
Flow
Micro
Flow
Async
Async
Macro
Flow
Macro
Flow
Failover
MEs
Active
MEs
DB2
(WPS)
9.11.4
Workload
The SOABench 2008 OutSourced Mode workload is not purely MDB driven. A full description
of the workload can be found in section 10.4.3. A significant portion of load is driven via
WebServices invocations, which are sprayed across the nodes from the IBM HTTP server
pictured in the topology above.
This is an important point, because even when the MDB of a particular node is unable to connect
to an active message engine, there is still a significant amount of work for it to perform.
9.11.5
Results
Reading from left to right, the 1st bar in the chart below is provided as a baseline for comparison.
For this measurement bar, the left node is stopped and the right node is started and handling all
workload traffic.
The 2nd bar shows pre-WAS 7.0 MDB behavior where the alwaysActivateAllMDBs property is
set to false. Again, because this workload is not purely MDB driven, the left node still handles
some workload traffic; however its CPU utilization is only 59% busy while the right node is
running at a very high 97% CPU busy.
The 3rd bar shows the performance improvement achieved when the property is set to true and the
left node is now able to perform additional work via its MDB connection to the active ME in the
right node, raising the left nodes CPU utilization to 81%. However, because of the very high
Directed Studies
155
CPU utilization (98%) of the right node, the left node has trouble taking more work from the ME
to drive its CPU utilization even higher.
The 4th bar shows further performance gains obtained by adjusting the weights on the http sprayer
to favor the left node for the non-MDB traffic, thus driving higher overall workload throughput
and better balance of CPU utilization between the left and right nodes. However, if the input
traffic varies significantly, the CPU utilization could become imbalanced one way or the other
until the http sprayer weight is adjusted. In practice, this would need to be monitored closely and
adjusted accordingly.
Single Cluster
SOABench 2008 OutSourced Mode - AIX
(4 cores per node)
140
120
59%,97%
1.5x
100
80
94%,97%
1.9x
81%,98%
1.7x
98%
60
40
20
0
1 node
2 nodes, pre-V7
MDB behavior
2 nodes
2 nodes, http
sprayer weight 5-4
Although not measured here, we predict that adding more nodes to this single cluster topology
would further increase performance as long as the http sprayer weights are adjusted to achieve
good balance and the active message engine node does not become the bottleneck due to
excessive CPU utilization. Potentially, with enough cluster members, the http sprayer weight for
the active message engine node would have to be set to 0 (lowest) so that it only handles
messaging engine related work. However, well before such maintenance intensive adjustments of
the http sprayers weights are made, an alternate cluster topology should be considered.
9.11.6
Summary
A single cluster deployment environment is now more viable due to WAS 7.0 MDB
enhancements, especially for workloads heavily dependent on MDBs.
However, as this study illustrates, due to the imbalance of CPU utilization across nodes related to
where the active message engines are running, such a configuration should be considered
carefully for anything but the simplest of implementations.
Directed Studies
156
WID Considerations
Topology Considerations
Clustering Tuning
Finally, here are discussions on issues for high volume runtime deployments:
Tuning Checklist
Tuning Methodology
Directed Studies
157
The Websphere Integration Developer (WID) provides a wizard and command line utility which
enables users to migrate Websphere InterChange Server (WICS) content to equivalent artifacts on
Websphere Process Server (WPS). This wizard can, with minimum developer input, generate
fully functional WPS artifacts. Please note that migration is a complex topic with many different
aspects; for a complete discussion please see the IBM WebSphere InterChange Server Migration
to WebSphere Process Server Red Book at the following location:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247415.pdf
This section evaluates the performance of WID 7.0.0.1-generated migration artifacts running on
WPS 7.0.0.1 by comparing it with the performance of an equivalent workload running on WICS
4.3.0.6 and an equivalent WPS workload run on previous versions of WID/WPS (6.1.0 & 6.2.0).
The workload used for evaluation is Contact Manager with a Web Services binding. The Contact
Manager workload is described in section 10.2. There are 4 workloads used to evaluate the
performance, each of which is different but semantically equivalent.
WICS version: utilizes the WebSphere Business Integration Adapters (WBIA) Web
Services adapter to act as the source of Business Objects, and an emulated Clarify adapter
as the destination. The Web Services adapter interacts with the WICS server using
WebSphere MQ and the emulated Clarify adapter is connected to the WICS server via
IIOP.
WPS 6.1.0 version: developed by making use of the WICS Migration Wizard in WID
6.1.0 to migrate the WICS workload described above. This wizard migrates the Web
Services adapter to still be the WBIA Web Services adapter (but to be run in a standalone
JMS mode) and migrates the emulated Clarify adapter to a custom adapter which
interfaces with WPS using JMS. The workload was subsequently modified to remove the
relationship map step from the maps and to post an async one way JMS message for each
interaction with the emulated Clarify adapter. This is to ensure that the workload driver
can drive enough work into the system to maximize throughput. The generated workload
is measured on WPS 6.1.0.
WPS 6.2.0 version: developed like the WPS 6.1.0 version by using the WID 6.2.0 WICS
migration wizard. This wizard differs from the 6.1.0 version in that it migrates the WBIA
Web Services adapter to an HTTP SCA binding with a custom data handler. Postmigration modifications performed are the same as in WPS 6.1.0 version. The workload
is then measured on WPS 6.2.0.
WPS 7.0.0.1 version: developed like the WPS 6.2.0 workload but using the WID 7.0 .0.1
WICS migration wizard. The 7.0.0.1 wizard offers the option of merging the connector
and collaboration modules during migration. Post-migration, the workload was changed
to incorporate the Migration Development Best Practices and to post an async one way
JMS message for each interaction with the emulated Clarify adapter. The workload is
then measured on WPS 7.0.0.1.
All four workloads described above are evaluated on an IBM pSeries model 9117-MMA, 4.7
GHz (8-way SMP) running AIX 6.1 to demonstrate the throughput characteristics. Measurements
are shown in the chart below.
On the above specified setup with all eight cores enabled, the WID 7.0.0.1 migrated workload
runs on WPS 7.0.0.1 at a rate of 1004 Business Transactions Per Second (BTPS), which is a 54%
improvement over WPS 6.2.0. WPS 6.2.0 runs the WID 6.2.0 migrated workload at a rate of 650
BTPS which is an 8.3x improvement over 6.1.0. WID 6.1.0 migrated workload runs on WPS
6.1.0 at a rate of 78 BTPS.
Directed Studies
158
On the same setup as above, WICS 4.3.0.6 runs its workload at a rate of 1,049 BTPS. A few notes
on this data are relevant:
WPS 7.0.0.1 delivers comparable throughput as WICS for the same workload.
WICS 4.3.0 only utilizes 54% of the available cores, even after comprehensive tuning
was done. This is due to limitations in the WICS runtime architecture, notably a singlethreaded listener path for processing incoming events. WPS does not have this limitation
and therefore has superior SMP scaling, as is demonstrated in the chart below.
The data presented below is for a single server configuration, since WICS does not
support clustering. WPS can deliver higher throughput rates than are show below via
clustering.
1000
WICS 4.3.0.6
800
62%
600
400
200
7%
0
8 core
Measurement Configuration
WICS, WPS server
DB2
Driver
Intel 3.5GHz - D
This section contains a series of studies exploring the behavior of a system in the presence of a
large input event (BO). Data is shown for WPS 7.0.0.1 and WESB 7.0.0.1.
Directed Studies
159
For any application, the maximum size input object that it can support depends on a number of
factors. The amount of processing required to complete a transaction and the representation of the
input event internal to the application are clearly important as they affect the number of copies of
the event required to be held in memory and the nature of the objects held in the Java Heap
(whether they are contiguous or composed of a set of smaller, discrete objects).
Also, the ability to process large input events usually depends on the transactional nature of the
processing involved. Some data processing systems are able to break a large transaction into
multiple smaller transactions that are processed (or committed) independently, while others are
not. Whenever possible it is advisable to design a solution that does not depend on processing
input events of arbitrarily large size. Please refer to the Best Practices described in Section 2.5 for
more information related to processing Large Business Objects.
The sections that follow display a wide variety of results. While it may be tempting to do so,
please do not view the data as a fundamental product limit for the largest input event size. Rather,
these sections are a set of case studies intended to explore the factors affecting the ability of a
solution to successfully process a large input event.
9.14.2
The SOABench 2008 Automated Approval workload (see section 10.4.2) was used to explore the
ability to handle large objects within a business process running in WPS 7.0.0.1. The purpose of
this study was to find the maximum object size that the system can handle repeatedly (20 times
for this study) without exceptions. The system evaluated to find the maximum size is an AIX 6.1
system with 31 GB of RAM running a 32 bit version of WPS 7.0.0.1. In addition an evaluation of
an AIX 64 bit version of WPS 7.0.0.1 was done for a single 500 MB request.
Large Object requests were produced in the client driver by creating additional customer detail
fields in the claim request which is referred to as the "payload." Note that the charts below show
the client driver's input object size and not the actual size processed by WPS; the generation of
the payload results in an actual request size 6% larger than the client reports. For example a 100
MB request is actually 106 MB in WPS (110 MB on the wire with packet overhead).
Responses from the server are constant at 3 KB in size. The SOABench 2008 automated approval
workload implementation used for this study holds 7 copies of the payload for use during the
various steps of the process flow resulting in many large contiguous memory objects contending
for Java heap space. Note: the SOABench 2005 automated approval workload, used in previous
versions of this performance report, holds 5 copies of the payload so maximum object size should
not be compared between the two workload versions.
The maximum Java heap size required was determined by repeated experiments to balance the
memory needed for native memory versus the Java heap as large object sizes were increased. On
AIX the optimal maximum heap was determined to be 2600 MB but to achieve this it was
necessary to set an operating system variable:
"export LDR_CNTRL=MAXDATA=0xB0000000@DSA"
in the session starting the WPS server to provide additional memory segments for user processes.
For the AIX WPS 7.0.0.1 64 bit system study, the maximum Java heap was set to 9800 MB with
no additional AIX system variable settings required. In all cases, native heap space was
preserved by using type 4 JDBC drivers for WPS datasources. See reference:
http://www-128.ibm.com/developerworks/eserver/articles/aix4java1.html
The chart below shows the 32-bit WPS system object maximum was 150 MB for WPS 6.2.0.1
and 170 MB for WPS 7.0.0.1, a 20 MB improvement. The 64 bit WPS 7.0.0.1 was able to handle
Directed Studies
160
the 500 MB object request submitted. Note that this was the largest size attempted; finding the
maximum size for this system was not attempted.
Transaction completion time also improves on large requests in WPS 7.0.0.1. 150 MB requests
on the IBM pSeries power6, 4.7 GHz 4 core, AIX 6.1 system took 542 seconds each on 32-bit
WPS 6.2.0.1, but the larger 170 MB request took only 490 seconds on 32-bit WPS 7.0.0.1. The
500 MB request on this hardware running 64-bit WPS 7.0.0.1 took 1,376 seconds to complete.
Due to the response times shown above, it was necessary to increase several timeout settings for
both the SOABench client driver and the WPS server running the workload. These include:
Increasing the Application Server Transaction Service timeouts for Total transaction
lifetime, Async response, Client inactivity, and Maximum transaction.
Increasing the SOABench BPEL EJB module web service client bindings request timeout.
Increasing socket read and write timeouts for both the SOABench Client and Server
invocations using the JVM properties (in seconds) Dcom.ibm.ws.webservices.readTimeout and "-Dcom.ibm.ws.webservices.writeTimeout=".
180
150
160
140
120
6.2.0.1
100
80
7.0.0.1
60
40
20
0
Request Size Shown Above Each Bar
Measurement Configuration
WebSphere Process Server
POWER6 4.7 GHz D
Driver
Intel Xeon 2.93GHz -A
DB2
POWER6 4.7 GHz D
Directed Studies
161
600
Achieved**
500
500
400
AIX 32 bit
300
AIX 64 bit
Maximum
170
200
100
0
Measurement Configuration
WebSphere Process Server
POWER6 4.7 GHz D
9.14.3
Driver
DB2
The JMS binding and Web Services scenarios were evaluated with large messages to determine
the largest message which could be processed in sustained operation. The tests were run for a
period of 2 hours.
These tests were run using the Transform Value mediation and a Custom mediation which
transforms the value of a single field in the request message. These mediations were chosen as
they represent a simple case requiring little processing and a complex case which will cause the
request to be serialized, respectively. For details of the mediations see section 11.3. For details of
the topology used see section 11.1 and 11.2.
Directed Studies
162
The Java heap was set to a fixed size of 1536 MB for these measurements.
9.14.3.1 Web Services Binding large messages
The chart below shows that the maximum request size ranges from 82 MB to 96 MB, and the
maximum response size ranges from 91 MB to 110 MB, depending on the processing done in the
mediation.
120
100
80
V6.2
V7.0
60
40
20
0
Transform Value
Mediation (req)
Transform Value
Mediation (res)
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
(req=request m essage
res=response m essage)
Measurement Configuration
Web Services Client
WebSphere ESB
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
Directed Studies
163
Message size in MB
120
100
6.2
80
7.0.0.1
60
40
20
0
Transform Value Custom Mediation Transform Value Custom Mediation
Mediation Non
Non Persistent
Mediation
Persistent
Persistent
Persistent
4 CORE
Measurement Configuration
JMS
Producer/Consumer
WebSphere ESB
DB2
Intel 2.8GHz - B
Intel 3.0GHz - D
Intel 3.50GHz - A
For non-persistent messaging, using the default messaging provider within WESB
(WebSphere Platform Messaging) is 38% faster than the MQ JMS provider using the
Directed Studies
164
Base message size (1.2 KB). MQ JMS provides equivalent messaging performance to
the MQ binding for the same scenario.
For persistent messaging, the default messaging provider is 49% faster than the MQ JMS
provider using the Base message size. MQ JMS messaging outperforms the MQ binding
by 7% for the same scenario.
Note: Generic JMS was not tested in V7.0.0.1. V6.2 tests showed the performance to be identical
to MQ JMS.
9.15.1
The following charts compare the throughput for the different non persistent messaging bindings
using the Transform Value mediation and a range of message sizes. For details of the mediations
and request sizes see sections 11.3 and 11.4. All data is obtained on a 4-way hyper-threaded
WESB server machine. For details of the topologies used see section 11.2.
Reqs/Sec
2000
1500
98%
Base
98%
10
1000
98%
100
98%
98%
500
98%
95%
99%
0
JMS
MQ JMS
MQ
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled
4 CORE
Measurement Configuration
JMS Producer/Consumer
WebSphere ESB
Intel 2.8GHz - B
Intel 3.0GHz - D
Directed Studies
165
9.15.2
The following charts compare the throughput for the different persistent messaging bindings
using the Transform Value mediations and a range of message sizes. For details of the mediations
and request sizes see sections 11.3 and 11.4. All data is obtained on a 4-way hyper-threaded
WESB server machine. For details of the topologies used see section 11.2.
94%
Reqs/Sec
1000
800
Base
97%
10
92%
81%
600
91%
100
88%
400
200
97%
89%
80%
0
JMS
MQ JMS
MQ
4 CORE
Measurement Configuration
JMS
Producer/Consumer
WebSphere ESB
DB2
Intel 2.8GHz - B
Intel 3.0GHz - D
Intel 3.50Ghz - A
Directed Studies
166
96%
96%
Reqs/sec
2500
96%
96%
97%
96%
XSLT
2000
96%
1500
BOMap
96%
1000
71%
500
98%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
The XSLT mediation sets the value of a single element in the request message and copies all
other elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing.
The BOMap mediation uses the Business Object Map primitive to map the body of the request
message into a new Business Object and sets the value of a single element. The request message
processing is not eligible for deferred parsing.
Directed Studies
167
2500
97%
97%
Reqs/sec
2000
97%
ElemSet XSLT
97%
1500
97%
ElemSet BOMap
97%
97%
1000
500
98% 98%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
The two mediation flows in this chart are the same as in the chart above but with a Message
Element Setter primitive inserted into the mediation flows before the XSL Transform and
BOMap primitives. The Message Element Setter primitive is included to force a parse of the
message so that the flow is not eligible for deferred parsing.
Measurement Configuration
Web Services Client
WebSphere ESB
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
Directed Studies
168
linking primitives in a single module using multiple MFCs. For each of the three cases all the
mediation code is still running in a single JVM.
As the chart shows below, using a composite mediation is significantly cheaper than the chained
variation as less data conversion (with an associated reduction in heap usage) will take place.
Splitting the primitives across multiple MFCs in the same module has a lower overhead with the
proportional cost decreasing with message size.
Composite vs Chained Mediation - Windows
1600
1400
97%
96%
98%
1200
Reqs/sec
Composite
Composite (Multi MFC)
Chained
97%
96%
1000
98%
97%
98%
800
98%
98%
98%
98%
600
400
200
98%98%98%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
Measurement Configuration
Web Services Client
WebSphere ESB
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
Directed Studies
169
97%
97%
JAX-RPC
JAX-WS
1400
Reqs/sec
1200
98% 97%
1000
98% 98%
800
98% 98%
600
400
200
98% 98%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
Directed Studies
170
95%
JAX-RPC
4000
3500
JAX-WS
96%
96%
Reqs/sec
3000
96%
2500
95%
96%
95%
95%
2000
1500
1000
500
89% 88%
0
Base in/Base out
16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved
Measurement Configuration
Web Services Client
WebSphere ESB
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
Directed Studies
171
The studies presented in the following sections explore issues relevant to the performance of
WebSphere Process Server and WebSphere Integration Developer 7.001 when used in an
authoring environment.
From these studies, the following observations can be made:
1. Deployment to a production server is expected to be as much at twice as fast as what is
experienced in a development environment.
2. When using wsadmin to install SCA Modules, installing multiple modules in a WAS
Session and then saving the configuration change together is faster than installing (and
saving) each of the Modules individually.
3. In addition to memory savings, defining Shared Libraries according to the technote
(available at: http://www-01.ibm.com/support/docview.wss?uid=swg21298478) reduces
total application install time.
Directed Studies
9.19.2
172
For this study, we compare response time when publishing the Loan Processing workload from
WID 7.0.0.1 to WPS 7.0.0.1 on a variety of hardware configurations. Two different machine
types are used: a desktop system resembling a typical developers workstation and a server
system resembling a typical production server. Additionally, each of the two systems is measured
in three different configurations, varying the number of cores available to the system as well as
the configuration of the Disk subsystem.
The results from the Model 9196 Desktop system indicate that addition of a second processing
core improves publish responsiveness from 738 seconds to 612 seconds (a 17% improvement).
Addition of a second physical disk drive (and installing WID & WPS to that drive, isolating its
activities from those associated with the operating system) delivers an additional 12%
improvement.
The results from the Model 7233 Server System indicate that, even with only a single processing
core active, the presence of a fast disk subsystem (RAID Disk array combined with filesystem
improvements available in the server operating system) leads to improved publish responsiveness.
Addition of a second core further improves responsiveness. Additional cores beyond the second
would lead to only a small improvement in responsiveness.
From this data it would be reasonable to expect deployment to a production server to be as much
as twice as fast as deployments that developers experience on their workstations, due simply to
the hardware differences typical in the two environments.
738
51%
Time (Seconds)
600
612
35%
500
538
41%
477
96%
400
374
66%
300
200
100
0
1 Core, 1 Disk
2 Core, 1 Disk
2 Core, 2 Disk
1 Core, RAID
Model 9196
Copyright IBM Corporation 2010. All rights reserved
2 Core, RAID
Model 7233
Bar Labels: Response Time & Average CPU Utilization
Measurement Configuration
Directed Studies
173
Model 9196
Model 7233
Directed Studies
9.19.3
174
In this study we use the 60 Modules in the Loan Processing application to demonstrate the
relative performance of some of the options available when deploying Modules via the wsadmin
tool.
First, we use a wsadmin install script that saves the changes made under the configuration session
multiple times when executing the install. Each of the 60 Modules is installed, saved & started
independently, before proceeding to the next Module. This installation operation completes in
466 seconds as shown in the Multiple WS Saves measurement in the chart below.
Second, we use a wsadmin install script that installs all 60 of the Modules, with a single save
operation after all of the Modules are installed. Then, each of the Modules is started. This
operation completes in 382 seconds, 18% faster than the Multiple WS Saves measurement. This
data appears as the Single WS Save measurement in this data chart.
Finally, the shared libraries technique described in the technote,
http://www-01.ibm.com/support/docview.wss?uid=swg21298478, is used in conjunction with the
Single WS Save technique described here. In addition to the memory savings that shared libraries
provides, it delivers an additional 15% savings in install response time, for a total install time of
326 seconds (30% faster than the Multiple WS Saves approach).
466
Time (Seconds)
400
382
350
300
326
250
200
150
100
50
0
Multiple WS Saves
Single WS Save
2 Core
Copyright IBM Corporation 2010. All rights reserved
Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B
Directed Studies
175
9.20.1
In WebSphere Application Server Version 6.1, the Security Configuration Wizard enables you to
configure application or Java 2 security. For further information, please see the IBM InfoCenter:
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.wsf
ep.multiplatform.doc/info/ae/ae/usec_secureadminappinfra.html
In order to run an application with Java 2 security enabled, required permissions have to be
granted in the was.policy file of the application ears. Please see the IBM InfoCenter for more
details:
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.wsf
ep.multiplatform.doc/info/ae/ae/csec_rsecmgr2.html
Following is a screen shot of the admin console page with Java 2 security enabled:
Directed Studies
176
The automated approval workload of the Choreography facet, described in section 10.5.2, is
evaluated on an IBM xSeries 3950 M2 2.93 GHz Xeon (4 quad-core processors), running with 4
cores enabled on Windows Server 2008, to demonstrate the throughput characteristics of
WebSphere Process Server in this configuration. 3 KB requests and 3 KB responses are utilized.
The workload is run in infrastructure mode, making the processing done during service call
invocations trivial.
With no security enabled, WPS 6.2.0 runs the workload at a rate of 556 Business Transactions per
Second (BTPS). With application security enabled, WPS runs the workload at a rate of 524 BTPS
indicating a degradation of 6% comparing to setting with no security enabled. In addition to
application security, when Java 2 security is enabled, the rate drops to 360 BTPS, indicating a
degradation of 35% comparing to when no security is enabled.
600
99%
99%
500
WPS 6.2
400
99%
WPS 6.2+ApplicationSecutity
300
WPS 6.2+ApplicationSecutity+J2Security
200
100
4 cores
Copyright IBM Corporation 2005, 2009. All rights reserved
Measurement Configuration
WebSphere Process Server
Driver
DB2
Directed Studies
177
The manual approval workload of the Choreography facet, described in section 10.5.3, is
executed on an IBM xSeries 3950 M2, 2.93 GHz Xeon (4x quad-core), running with 4 cores
enabled on Windows Server 2008, to demonstrate the throughput characteristics of WebSphere
Process Server in this configuration. 3 KB requests and 3 KB responses are utilized. The
workload is run in infrastructure mode, making the processing done during service call
invocations trivial.
With no security enabled, WPS 6.2.0 runs the workload at a rate of 44 Business Transactions per
Second (BTPS). With application security enabled, WPS runs the workload at a rate of 34 BTPS
indicating a degradation of 23% comparing to having no security enabled. In addition to
application security, when Java 2 security is enabled, the rate drops to 29 BTPS indicating a
degradation of 34% comparing to having no security enabled.
50
99%
40
99%
WPS 6.2
99%
30
WPS 6.2+ApplicationSecutity
20
WPS 6.2+ApplicationSecutity+J2Security
10
4 cores
Copyright IBM Corporation 2005, 2009. All rights reserved
Measurement Configuration
WebSphere Process Server
Driver
DB2
9.20.2
Remote Messaging Deployment Environment Startup
Time and Footprint
Directed Studies
178
The Loan Processing workload described in Section 12.2 was used to quantify the startup time
and footprint improvements in WPS 6.2.0 when running in a remote messaging deployment
environment with many application modules installed in the cell.
See this link for an overview of various deployment environment patterns including remote
messaging:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v6r2mx/index.jsp?topic=/com.ibm.webspher
e.wps.620.doc/doc/cpln_topologypat.html
There is a significant reduction in the time it takes to start the Message Engine associated with
WPS 6.2.0 when using this workload, as shown in the chart below. Message Engine startup time
is reduced by a factor of 6.4 times.
Time in Seconds
1000
800
WPS APP
WPS ME
600
1016
400
126
200
159
0
64 bit WPS 6.1.0.1
Copyright IBM Corporation 2005, 2009. All rights reserved
Directed Studies
179
There is also a significant reduction in memory footprint after startup in both the Message Engine
JVM and the WPS 6.2.0 JVM with this workload installed, as is demonstrated in the chart below.
The system memory footprint is reduced from 903 MB to 624 MB, and improvement of 31%.
Millions
LiveBytes
700
600
520
WPS APP
WPS ME
500
423
400
300
200
383
201
100
0
64 bit WPS 6.1.0.1
Measurement Configuration
9.20.3
APP, ME
DB2
4 core LPAR on
PPC 1.9 GHz - A
When a WPS application makes use of data-type or interface definitions defined in a library
module, WID copies the artifacts from the library into the application module so that those types
may be available to the runtime. If many application modules make use of a library, its artifacts
are copied many times, increasing the memory pressure on the Server runtime. A technote
(available at: http://www-01.ibm.com/support/docview.wss?uid=swg21298478) describes a
technique that declares the library modules as WAS Shared Libraries and allows their artifacts to
be shared among WPS modules.
Directed Studies
180
In this study, we examine the memory reduction realized when rebuilding the Loan Processing
application to make use of the technique described in the technote. We prepared deployment code
using Java EE Prepare For Deploy and exported the application from WID as a set of EAR files
and then used a jacl script to deploy the EARs to the WPS server via wsadmin.
This application makes moderate use of sharing; 2 shared libraries are used by all 62 modules,
and 20 other shared libraries are used by approximately 5 modules each.
The chart below shows that the peak live memory within the WPS Java Heap when publishing the
Loan Processing application via the standard mechanism is 378MB. When using the WAS
Shared Library technique described in the technote, peak memory is reduced 11% to 335MB.
350
378
335
300
250
200
150
100
50
0
Standard Deployment
2 Core
Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B
One of the steps described in the WAS Shared Library technote instructs the Administrator to
copy Shared Library files to the <WAS_HOME>/lib/ext directory for deployment and then to
delete those files when the deployment is complete. The chart below shows the importance of
deleting the Shared Library files from this temporary location. When using the standard
deployment technique, the WPS Java Heap contains 339MB of live data after restart. When using
the WAS Shared Library Technique, WPS liveset is reduced 19% to 275MB. However, if the
temporary library files are not eliminated, the memory reduction is only 13%.
Directed Studies
181
Memory (MB)
400
350
300
339
294
250
275
200
150
100
50
0
Standard Deployment
2 Core
Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B
9.20.4
For this study, we selected four different machine types to run key measurements of the Loan
Processing workload. An additional run on one machine with no anti-virus software installed was
also made for ease in comparison with the measurements presented in Chapter 8 of this report.
Newer machines showed significant improvements. For each data chart in this section, the
percentages at the top of each bar indicate the average system CPU utilization during the
measurement.
9.20.4.1 Impact on Import time
Using a new workspace, the WebSphere Integration Developer 6.2 was opened, the Build
automatically preference was disabled, and the Loan Processing workload was Imported.
Measurement started when the Import began and stopped as soon as the Import was complete and
the processor cores became idle. This was done seven times on each machine, the result below
being the average.
Directed Studies
182
As can be seen in the following chart, the newer machines can finish the import much more
quickly than the older machines. Comparing the laptops, the T60p completed the Import in 259
seconds, 2.1 times faster than the T42p. Among the desktops, the model 9196 completed the
Import in 215 seconds, 2.5 times faster than the model 8212.
Time (Seconds)
600
500
82%
66%
555
544
400
300
68%
259
200
60%
215
64%
154
100
0
T42p
8212
T60p
9196
9196 no AV
Directed Studies
183
Time (Seconds)
400
92%
62%
413
425
300
61%
268
200
60%
205
63%
179
100
0
T42p
8212
T60p
9196
9196 no AV
Directed Studies
184
Time (Seconds)
300
250
58%
93%
324
268
200
59%
150
182
59%
58%
136
100
127
50
0
T42p
8212
T60p
9196
9196 no AV
Measurement Configuration
T42p
8212
T60p
9196
Intel 2.0 GHz- A Intel 2.8 GHz - D Intel 2.16 GHz - A Intel 2.66 GHz-A
9.20.5
The following chart compares two types of routing based on a value in the SOAP header for a
Web Services scenario. In both cases the value retrieved from the header is used to determine the
target service endpoint. The Route on Header mediation selects the service endpoint by routing to
a hard wired callout node based on the header value extracted in a filter primitive. For each
alternative endpoint a user would need to wire in additional nodes for the filter primitive to
access.
In contrast the dynamic endpoint lookup mediation uses the value from the header (accessed by
the endpoint lookup primitive itself) to look up the endpoint from a WSRR repository. This value
is cached by WESB so the performance data below does not show the cost of the WSRR lookup
but shows the performance of routing to the target service using the previously cached endpoint.
The chart shows that the cost of using the dynamic endpoint lookup primitive to route rather than
wiring in alternative targets (a less flexible approach) is minimal.
Directed Studies
185
1400
Route On Header
97%
98%
1200
98%
98%
Reqs/sec
97% 97%
1000
800
600
400
87% 85%
200
0
Base in/Base out
10 in/Base out
10 in/10 out
4 CORE
Copyright IBM Corporation 2008, 2009. All rights reserved
Measurement Configuration
WebSphere ESB
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
9.20.6
In this study two WESB mediations (Transform Value and Route on Body) were driven by an
increasing client load to assess the following scaling characteristics:
1. Horizontal Client Scaling An initial load of x clients each making y requests per second is
increased by adding more clients (increasing x).
2. Vertical Client Scaling An initial load of x clients each making y requests per second is
increased by speeding up the clients (increasing y).
Warm up periods were applied for all of the measurements described below to ensure that the
code had settled to a consistent level of performance.
All client scaling measurements were run with a message size combination of Base/10.
Directed Studies
186
For details of the mediations and request/response sizes see section 11.3 and 11.4. All data is
obtained using Web services bindings on a 4 core WESB server machine with Hyper-Threading
(HT) disabled. For details of the topology used see section 11.1.
Directed Studies
187
1200
90
80
70
800
60
50
600
40
400
30
20
200
10
0
0
500
1000
1500
2000
0
2500
No. Clients
Req/Sec
Measurement Configuration
WebSphere ESB
Intel 2.8GHz - C
Intel 3.0GHz - C
1000
Directed Studies
188
The following chart shows that CPU consumption per request remained consistent across the
evaluation (apart from a larger value at the lower throughput measurement which was probably
skewed by timer tasks). Response time increases in a linear fashion until the server system
approaches CPU saturation; at this point any further increase in clients causes a more direct
impact on latency.
1.6
0.003
0.0025
1.2
0.002
1
0.8
0.0015
0.6
0.001
1.4
0.4
0.0005
0.2
0
0
200
400
600
800
1000
1200
1400
0
1600
No. Clients
Resp (s)
Measurement Configuration
Web Services Client
WebSphere ESB
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
The next 2 charts show that Server CPU consumption, request rates, and response times for the
Route On Body mediation result in a similar profile to the XformValue evaluation above.
Directed Studies
189
1200
120
1000
100
800
80
600
60
400
40
200
20
0
0
200
400
600
800
1000
1200
1400
1600
0
1800
No. Clients
Requests per sec
Major CPU %
0.6
0.0035
0.5
0.003
0.0025
0.4
0.002
0.3
0.0015
0.2
0.001
0.1
0.0005
0
0
200
400
600
800
1000
1200
1400
0
1600
No. Clients
Resp (s)
CPU/Req(s)
Measurement Configuration
WebSphere ESB
Intel 2.8GHz - C
Intel 3.0GHz - C
Directed Studies
190
1000
10
100
1
10
1
100
10
0.1
100
10000
0.1
0.01
Req/Sec
Measurement Configuration
WebSphere ESB
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
The next chart shows that response times grow progressively with a sharp increase at the CPU
saturation point. CPU per requests is reasonably flat apart from the initial spike evident in some
of the scaling tests at very low utilization.
Directed Studies
191
0.12
0.003
0.1
0.0025
0.08
0.002
0.06
0.0015
0.04
0.001
0.02
0.0005
0
100
10
0.1
0
0.01
Resp (s)
Measurement Configuration
WebSphere ESB
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
The next 2 charts shows that Route on Body results for vertical scaling produced a similar profile
to the XformValue vertical tests above.
Directed Studies
192
100
1000
100
10
10
1
100
10
1
0.01
0.1
Req/Sec
0.14
0.003
0.12
0.0025
0.1
0.002
0.08
0.0015
0.06
0.001
0.04
0.0005
0.02
0
100
10
0.1
0
0.01
Resp (s)
Measurement Configuration
WebSphere ESB
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
Directed Studies
9.20.7
193
The results shown in this section compare local and remote bindings using the same hardware
configuration and the Contact Manager workload.. For remote bindings, a total of 3 JVMs are
used, 2 of which are WPS instances while the third JVM hosts the Messaging Engine (not a factor
in this study). The SAP Emulator module is runs on the first WPS instance, and the Contact
Manager and Clarify Emulator modules run on the other WPS instance. Therefore, the remote
binding between the SAP Emulator module and Contact Manager module cross the boundaries of
two separate WPS instances. There are two key findings in this study.
There is a significant throughput difference between local and remote bindings. The
throughput of Contact Manager using the local Synchronous SCA binding is 198 BTPS,
over 3.1x better than the remote Synchronous SCA binding. The difference between
local and remote Web Services bindings is smaller, but still significant. The throughput
of Contact Manager with an optimized local Web Services binding is 110 BTPS,
compared with 88 BTPS for a remote Web Services binding, a difference of 25%.
There is significant benefit due to local Web Services binding optimization, as discussed
in Section 4.5.5, if the Web Services target is hosted on the same JVM. The optimized
throughput of 110 BTPS is 15% higher than the unoptimized throughput of 96 BTPS.
ContactManager - Windows 2000
Local versus Remote Bindings
250
200
100%
BTPS
150
100%
100%
100
1 WPS JVM
100%
100%
2 WPS JVMs
50
0
SCA Sync
WebServices
Measurement Configuration
WebSphere Process Server
DB2
Intel 2.8GHz A
194
3 roles, 1 managed and 2 not managed, so that 2 relationship cross references are
created
1 service call
195
10.2.1
196
In this implementation of the Contact Manager workload, the Contact Manager Application
receives Business Objects (BOs) from the SAP Client Module via synchronous cross-module
SCA invocation, i.e., synchronously invoking an import bound to the corresponding export with
an SCA binding. Its first task is to transform the input BOs from the SAP format to a Generic
format via an Interface Map SCA Component. These generic BOs are then passed to a Business
Process component which contains logic responsible for determining whether the Business Event
requires creation of a new Contact, or updating an existing one and then routing the event to the
destination application. For all of the Business Events measured, a new contact was created. On
the way to the destination, the BO must be mapped again from generic format to the format
understood by the destination application. The destination application, simulated by the Clarify
Client Module, is also invoked via a cross-module, synchronous SCA binding. This module
simulates destination application work, including generation of a new unique identifier, and then
returns a modified BO to the Contact Manager Module. This return BO is mapped again from
Clarify to Generic format before the response is returned to the SAP Client Module.
SAP Emulator
Module
Contact Manager
Module
Clarify Emulator
Module
GemericTOClarify
Interface Map
SAPToGeneric
Interface Map
Contact Manager
Process
MAP
REL
BMK
SC
A
ClarifyTOGeneric
Interface Map
MAP
sy
nc
sy
nc
S
CA
BMK
REL
MAP
REL
10.2.2
197
SCA Components may expose their interfaces as Web Services via the Web Services binding.
This capability is modeled for performance purposes by changing the synchronous SCA binding
between the SAP Emulation Module and ContactManager Module to be WebServices, as
depicted in Figure 2. For measurement purposes, the Web Services client can be either local or
remote. The difference is that for the remote case the client resides on a different physical
machine from the remainder of the application.
SAP Emulator
Module
Contact Manager
Module
Clarify Emulator
Module
GemericTOClarify
Interface Map
SAPToGeneric
Interface Map
Contact Manager
Process
MAP
REL
BMK
MAP
ClarifyTOGeneric
Interface Map
sy
nc
SO
A
SC
/h
ttp
BMK
REL
MAP
REL
198
10.3 Banking
10.3.1
Business Process
Transaction
Generator
JMS
Java Services
Sync or
Async
POJO
The workload setup consists of a Transaction Generator, which generates the load, and a Banking
process, which contains a scenario and outbound services. The Banking measurement run starts
when the workload driver places a large number of mortgage request instances onto a JMS queue.
Instances of the banking process are started via JMS messages. A Banking measurement run
concludes when the workload driver determines that all process instances have completed
processing.
A business transaction in this workload is mortgage loans completed.
10.3.2
Banking Scenarios
The Banking scenarios differ in the setting of the transactional behavior of the invoke activities.
When using the synchronous SCA binding, the process component wired to sync services has the
transactional behavior flag on invokes set to commit after. When using the SCA asynchronous
or JMS binding, the process component wired to async services has the transactional behavior
flag on invokes set to participates.
The BPEL process is shown in the following diagrams:
199
Loop2
Loop3
Invoke activities
200
1 receive activity
1 reply activity
1 correlation set
10.3.3
Banking Services
Depending upon which binding option is used, the Banking process component is wired in one
of the follow fashions:
BankingProcessJMS: Banking process wired to a JMS MDB using import with JMS binding
(transactional behavior flag set to participates).
BankingProcessJavaSync : Banking process wired to a synchronous POJO (transactional behavior
flag set to commit after),
BankingProcessJavaAsync: Banking process wired to an asynchronous POJO (transactional
behavior flag set to participates),
BankingProcessEJBSOAP: Banking process wired to EJB session bean wrapped as SOAP web
service,
BankingProcessEJB: Banking process wired to EJB session bean using a self-written mapper.
This is required because business process components always have w-typed references (this is a
BPEL restriction) and session bean imports always have j-typed interfaces. This self-written
mapper mediates between the j-typed and w-typed interfaces by calling the session bean import
and also handles data mapping, and
The diagram which follows illustrates these choices. Note that in this report, measurements are
shown only for the JMS binding.
201
Overview
The SOABench 2008 workload is used in numerous studies in this report. It is an implementation
of the SOABench 2008 specification. SOABench 2008 replaces an earlier version, SOABench
2005, which was used in previous editions of the BPM Performance Report. Similar to the 2005
version, the 2008 version models the business processes of an automobile insurance company and
is intended to evaluate the performance of a distributed application implemented using a Service
Oriented Architecture (SOA).
The 2008 implementation extends the scope of the 2005 version in several ways. The Automated
Approval (microflow only) scenario performs more synchronous service calls than the previous
version. The Manual Approval (microflow + macroflow pattern) scenario in the previous version
is now implemented in two ways, an Outsourced scenario which does claim approval via
asynchronous Web Service calls, and an InHouse scenario which uses human tasks to approve
claims. In addition the InHouse scenario divides work among users and groups, adds think time to
user activity in human tasks; tracks response time of human task actions as well as recording
throughput. This makes the InHouse scenario very useful for evaluating response time and
throughput using a range of active concurrent users. Finally, the 2008 version also includes the
use of preloaded Process Choreography tasks in both the OutSourced and InHouse Scenarios.
The following diagram illustrates the workload architecture flow.
202
10.4.2
One of the modes of operation for SOABench 2008 in handling insurance claim requests is using
automated approval. No human or asynchronous tasks take place in this scenario; the flow is
implemented as a microflow that makes synchronous service invocations. All of the service
invocations are to service providers that return cached responses; this prevents bottlenecks in the
service providers while exercising the process server.
A claim request is sent to the HandleClaimMicro business process which performs an operation
called CreateClaim followed by FraudCheck. This scenario then follows the FastpathApproval
path which performs synchronous services calls for ApproveClaim, InformPolicyHolder, and
CompleteClaim. The process finishes by sending a response back to the requestor.
The Business Object (BO) size for the input request is variable. By default, a 3 KB request size is
used. The BO size for the reply is fixed at 3 KB.
The BPEL process is shown in the following diagram.
203
1 Receive
1 Reply
1 Choice
10.4.3
The SOABench 2008 Outsourced scenario is one of two scenarios that utilize long running
processes (macroflow) for manual approval of insurance claims. OutSourced Mode uses both a
microflow and a macroflow; the microflow is the same process shown for Automated Approval
Mode above, but in this mode the logic does not follow the fast path approval path. Instead a long
204
1 parallel activity
205
206
10.4.4
The SOABench 2008 InHouse scenario is one of two scenarios that utilize long running processes
(macroflow) for manual approval of insurance claims. InHouse Mode uses both a microflow and
a macroflow; the microflow is the same process shown for Automated Approval Mode above, but
in this mode the logic does not follow the fast path approval path. Instead it invokes a long
running business process called HandleClaimHuman.
As in the Automated Approval Scenario, all of the service invocations are to service providers
that return cached responses which prevents bottlenecks in that area while exercising the process
server.
Claims enter the system via client requests to the HandleClaimMicro process. Synchronous web
service invocations are then made to CreateClaim, FraudCheck, and RecoverVehicle. No human
or asynchronous tasks take place in the HandleClaimMicro process except for an invocation to
the InvokeInHouseLong for the InHouse claim processing workload. This process finishes by
sending a response back to the requestor but the claim is not complete until the long running
process invoked by InvokeInHouseLong is finished.
Before running this scenario the system is preloaded with Insurance claim requests in various
stages of completion. The insurance claims are assigned equally to regions. Human task
processing is done by users belonging to a single region and those users can only process
insurance claims from their region which is enforced via authentication. Within a region, users
are divided into 2 groups, adjusters and underwriters. Of the four human tasks required to
complete an insurance claim, two are done by adjusters and two are done by underwriters.
Users query existing processes for a list of work that they can perform. A work item is claimed
(selected from the list) and then completed by the user. Users think between query, claiming,
and completing activities. The think time is random but averages a total of 180 seconds per
human task. The time a user waits for responses to their human task queries, claims and
completes is recorded as response time. The rate at which insurance claims are completed is the
throughput. Once an entire insurance claim is finished, another is added to the region to maintain
its work at the preloaded level.
The BPEL for the HandleClaimMicro process is shown in the Automated Approval section. The
path to InvokeExternalLong contains the following activities:
The second, long running, process named HandleClaimHuman is called via InvokeInHouseLong.
Early in this process three parallel activities take place, one asynchronous, one-way web service
invoke and two human tasks to be done by users in the adjusters group. When all three activities
complete the process continues to the two-way UpdateClaim web service invocation, followed by
(for this scenario) two Human Tasks called FirstApprovalTask and SecondApprovalTask which
are performed by users in the underwriters group. Upon completion of SecondApprovalTask all
claims for this scenario then take the approval path where three more two-way calls to web
services are performed to complete the claim and process.
The HandleClaimLongExternal process contains the following elements:
207
4 Human Tasks
1 parallel activity
208
209
Overview
The SOABench 2005 workload was used in previous BPM performance reports; the description
is included in this report as a bridge since this report contains the initial set of measurements for
the SOABench 2008 workload (described above).
The SOABench 2005 workload is an implementation of the SOABench 2005 specification and
models the business processes of an automobile insurance company. SOABench 2005 is intended
to evaluate the performance of a distributed application implemented using a Service Oriented
Architecture (SOA). SOABench 2005 uses a driver that produces a complex workload similar to
a real production system. The complex driver workload is made up of several subset technologies
called facets which can be included or excluded from performance evaluations. Examples of
SOABench 2005 facets include Services (use of service components), Mediation (use of
mediation to transform requests and responses), and Choreography (application implementation
using service choreography).
By combining facets, SOABench 2005 implements 2 aspects of the IT systems of an insurance
company called SOAAssure. The first is the Claims application which combines the
Choreography and Services facets to process insurance claims. The second is realized using the
Mediation and Services facets and provides a third-party gateway which enables another
company to establish whether coverage exists for an existing policy. The following diagram
illustrates the workload architecture flow.
210
211
SOABench Architecture
Simulate
service
requestors
SOABench
Client
Process
Choreography
Integration
Handle Claim
Process
(Macro flow)
Enterprise
Service Bus
Fraud Check
SCA component
Submit
Claim
Handle Claim
process
(micro flow)
Mediations
Check
Coverage
Claim Approval
Business Rule
Service providers
Service providers
Service
providers
Claim service
implementation
Claim service
(Web
service)
implementation
Claim
service
(Web
service)
implementation
(Web service)
Human Tasks
Simulator
Service
implementation
Adjuster
Business
Data
The SOABench 2005 Client can drive the workload with mediation or business process claim
requests. The minimum request and response size is 3 KB but this can be increased by the user.
The client driver also provides for an infrastructure mode to make interactions with the backend
Service providers trivial. The Human Tasks Simulator handles both adjuster and underwriter
tasks generated during the Choreography facet manual approval process.
10.5.2
One of the workloads in the SOABench 2005 Choreography facet is the handling of an insurance
claim using automated approval. No human or asynchronous tasks take place in this scenario; the
flow is implemented as a microflow. A claim request is sent to a business process which performs
an operation called HandleClaim. HandleClaim does Submit Claim to create the claim, checks
the claim for validity via FraudCheck_SCA , then approves and invokes the Complete Claim
operation. The process finishes by sending a response back to the requestor.
The BPEL process is shown in the following diagram.
212
1 java invoke
10.5.3
Another workload in the SOABench 2005 Choreography facet is the handling of an insurance
claim using manual approval. Depending on claim amount, either 1 or 2 human tasks are
performed. For data in this report the second task occurs for 40% of claim requests. The workload
starts in the process used in the Automated Approval Scenario (a microflow), as described in the
previous section. A claim request is sent to the process which performs HandleClaim.
HandleClaim does Submit Claim to create the claim, skips the check claim for validity, then
calls a long running (macroflow) process to perform more work on the claim.
The long running process does a fraud check on the claim via FraudCheck_SCA. A claims
adjuster also looks at the claim via the Adjuster human task and the claim is updated through a
webservice call. For the workload measured, all claims are marked valid and then checked by a
business rule to determine if an underwriter needs to evaluate the claim. Forty percent of the
claims are checked by the Underwriter human task. At this point all claims are processed for
claim amount and approved using 2 more webservice calls. The current long running process then
calls back the microflow process to perform the FinishClaim operation which performs a
webservice call to complete the claim.
An adjuster and underwriter simulator is used to process human tasks for the long running
process.
The BPEL process is shown in the following diagram.
213
214
1 java invoke
2 process calls
2 java snippets
215
216
JMS bindings
MQ JMS bindings
MQ bindings
The tests make use of the mediations and Web Services from the SOABench 2008 workload.
SOABench 2008 is a workload intended to evaluate the performance of a distributed application
implemented using a Service-Oriented Architecture. For a description of the SOABench 2008
workload, please see section 11.3.
217
WESB mediation
WESB
Mediation
WebSphere 7.0
SOABench2008
50 HTTP Clients
11.1.1
218
The Fan Out mediation allows you to iterate over a repeating element in the request message.
This mediation iterates over the following:
Use a Message Element Setter to update the shared context with some data from the
response. The shared context was created on the Input node.
The Fan In mediation will then wait for all iterations to complete, before using an XSLT
mediation to create a response message. It then returns the response.
This test was then executed with different request messages so that we would get a different
number of Fan Out iterations. It was executed with requests that would result in 1, 2 and 4 Fan
Out iterations. Note that each iteration is run sequentially, rather than in parallel.
Fan Out
Mediation
Service
Invoke
Mediation
Message
Element
Setter
Mediation
50 HTTP Clients
WebSphere 7.0
SOABench2008
Fan In
Mediation
XSLT
Mediation
219
WESB mediation
The tests use a standalone JMS producer and consumer. The JMSPerfHarness workload program
is used for this as it can be configured to run standalone JMS producers and consumers and
measure the rate at which messages are processed by the consumers. The producer and consumer
are within the same JVM and therefore co-located on one machine
11.2.1
JMS Export
JMS
Queue
WESB
XSLT
Transformation
Mediation
Mediation
JMS Import
JMS Queue
JMS
Producer
JMS
Consumer
DB2
11.2.2
220
The MQ JMS and MQ bindings are all used to connect to an MQ Queue Manager. Messages are
delivered into WESB from the MQ inbound queue and sent to an MQ outbound queue. No
internal SIB queues are used in this scenario. The MQ Queue Manager is deployed on the same
machine as the WESB server.
JMS
Producer
JMS
Consumer
MQ Queue Manager
Mediation
MQ JMS
Export
MQ JMS
Import
221
11.3.1
Transformation Mediations
These are mediations which transform requests and in some cases responses. There are various
levels of complexity of transformation possible.
XSLT Value transform mediation
Transforms the value of a single element in the request message using XSLT.
XSLT Namespace transform mediation
Transforms request and response messages from one schema to another using XSLT. The
schemas are largely the same but the name of an element differs and the two schemas have
different namespaces.
XSLT Schema transform mediation
Transforms request and response messages from one schema to another using XSLT. The
schemas are completely different but contain similar data which is mapped from one to the other.
In addition to the transform a value from the request is transferred to the response by storing it in
a context header.
Message element setter mediation
Transforms the value of a single element in the request message using the Message Element
Setter primitive.
Business Object Mapper mediation
Uses the Business Object Mapper mediation to map the entire body of the request into a new
Business Object.
11.3.2
Routing Mediations
222
These are mediations which route requests to different services based on content.
Route on header mediation
Route the request based on the presence of a string in the SOAP or JMS header. The Web
Services workload does not use any standard headers, so we use an optional one called
Internationalization Context. The JMS workload introspects the JMSCorrelationId header field.
Route on body mediation
Route the request based on the content of a field in the body of the request
Service Invoke mediation
Uses the Service Invoke primitive to invoke a Web Service, and then returns the response.
Service Invoke
Mediation
50 HTTP Clients
WebSphere 7.0
SOABench2008
11.3.3
Composite mediation
223
The composite mediation consists of four mediation primitives wired together inside a single
mediation module. This saves the overhead of inter-module call overheads, but at the expense of
the ability to individually administer the pieces of the overall mediation. The Authorisation
mediation is a routing mediation which checks a password field in the request body.
No logging is performed in either the JMS or Web Services implementations of this scenario.
Transform
Schema
Authorisation
50 HTTP Clients
Logging
Route
Body
Transform
Schema
WebSphere 7.0
SOABench2008
SOAAssure Service
WebSphere 7.0
SOABench2008
LegacySure Service
11.3.4
224
Chained mediation
The chained mediation performs the same function as the composite mediation but the four types
of mediation primitives are each packaged as separate mediation modules, which are then joined
together using bindings.
Authorisation
50 HTTP Clients
Logging
Transform
WebSphere 7.0
SOABench2008
SOAAssure Service
Route Body
Transform
WebSphere 7.0
SOABench2008
LegacySure Service
225
SOABench 2008
Client
Web Services
Web Services
JMS payload
SOAP Request
SOAP Response
Base
1.8 K
0.8 K
1.2 K
10
9.1 K
8.3 K
8.5 K
100
107.3 K
106.5 K
106.7 K
Workloads
226
orders for furniture, scheduling the orders for shipment to the customer, shipping the
orders to the customer, and maintaining the inventory of the company.
Order Processing contains 25 business integration modules, 2 business integration libraries, 57
interfaces, 150 data types and makes use of the full spectrum of SCA component kinds available
in WPS.
12.4 BPM@Work
BPM@Work is a Business Process Modeler workload modeling a software development
storyline. It contains a single, complex business process that results in 11 independent process
models that get installed via direct deploy from Modeler to the WPS server.
227
Value
1536
768
100
300
300
50
50
500
50,
400,
50,
no
228
Setting
Value
5000,
400
False
These settings are common for measurements at all cores and all numbers of nodes except for the
following additional changes that were made for vertical scaling measurements:
WebContainer Thread Pool Min,Max
100, 100
com.ibm.websphere.webservices.http.maxConnection
50
Production Template
Security disabled
Business Process support established with bpeconfig.jacl (note that this sets the Data
sources > BPEDataSourceDb2 > WebSphere Application Server data source properties
statement cache to 300)
PMI disabled
HTTP maxPersistentRequests to -1
GC policy set to Xgcpolicy:gencon (see table below for nursery setting Xmn)
Remote DB2 databases (connection type 4) used for BPE, SIB System, and SIB BPC
databases
229
Automated Approval
OutSourced Approval
Cores
Cores
Tuning
Variations
1
1280
1280
1280
1280
1280
640
640
640
768
768
100
150
150
100
300
100
200
200
100
200
150
250
250
150
350
30
30
30
30
150
SYSTEM ME database
connection pool max
30
40
40
30
100
80
80
80
80
100
160
160
160
160
BPEInternalActivationSpec
batch size
10
10
SOABenchBPELMod2_AS
batch size
32
32
200
200
Yes
Yes
40
100
40
200
40
200
230
Automated Approval
OutSourced Approval
Cores
Cores
Tuning
Variations
1
400
400
4000
4000
30, 30, 30
30, 30, 30
false
false
allowPerformanceOptimizations
The DB2 database server has 3 databases defined for use by the WPS server. The database logs
and tablespaces were spread across a RAID array to distribute disk utilization. The database used
for the BPC.cellname..Bus data store was not tuned. The SCA.SYSTEM.cellname.BUS database
and the BPE database were tuned as follows.
The SCA.SYSTEM.cellname.BUS database:
o
db2 update db cfg for sysdb using logbufsz 512 logfilsiz 8000 logprimary 20
logsecond 20 auto_runstats off
db2 update db cfg for bpedb using logbufsz 512 logfilsiz 10000 logprimary 20
logsecond 10 auto_runstats off
231
o
A WPS Server which runs the processes involved in the application scenario.
A Tivoli Directory Server with LDAP database for user authentication. This ran on the
support system below with the client controller.
2 support systems which each run workload generators (client agents) under the direction
of a single client controller. One support system handles asynchronous service requests
and the other handles synchronous service requests by the business processes running on
the WPS Server.
The database system was tuned in a similar fashion as for the SOABench 2008 OutSourced
scenario measurements. In addition, unused indexes were deleted per the db2 advisor.
The client systems were tuned with two considerations in mind. The first was maintaining load on
the WPS server running the workload which involved Java, thread pool and work manager
tuning. The second was to avoid problems preloading the numerous process tasks into the
system. The latter involved increasing timeouts and resources to maintain connectivity during the
preloading.
Client tuning:
o Transaction Service > tran lifetime timeout
9000
o Transaction Service > async response timeout 9000
o Transaction Service > client inactivity timeout 9000
o Transaction Service > max tran timeout
9000
o Java > max Heap
1280
o Java > -Xgcpolicy
gencon
o Java > -Xmn
512M
o Java Custom > com.ibm.websphere.webservices.http.maxConnection
o Java Custom > com.ibm.ws.webservices.writeTimeout
o Java Custom > com.ibm.ws.webservices.readTimeout
o port 9080 > TCP inbound > Max open connections
30000
o port 9080 > TCP inbound > Inactivity timeout
60
o port 9080 > HTTP inbound > Max persistent req
unlimited
o port 9080 > HTTP inbound > read timeout
6000
o port 9080 > HTTP inbound > write timeout
6000
o port 9080 > HTTP inbound > persistent timeout
3000
o Thread Pool Default min, max
50 to 300
o Thread Pool ORB min max
10 to 100
o Thread Pool WebContainer min max
100 to 400
o Thread Pool TCPChannel min max
5 to 50
unlimited
9000
9000
232
o
o
o
o
o
o
o
9000
20 sec
52768
For the system running the directory server the following setting was updated through the LDAP
server admin console.
o
Server Administration > Manage Server properties > Search Settings > Search Size
Limit "unlimited"
The WPS server tuning parameters for this workload are as follows.
o Transaction Service > tran lifetime timeout
900
o Transaction Service > async response timeout 900
o Transaction Service > client inactivity timeout 900
o Transaction Service > max tran timeout
900
o Business Flow Manager > Allow Perf optimizations
yes
o Business Flow Manager > Message Pool Size
4000
o Business Flow Manager > max age for stalled messages 360
o Business Flow Manager > max process time on thread 360
o Business Flow Manager > Intertransaction cache size
400
o Business Flow Manager > DataCompressionOptimization
false
o Java > Heap
1280
o Java > -Xgcpolicy:
gencon
o Java > -Xmn
768M
o Java Custom > com.ibm.websphere.webservices.http.maxConnection
150
o Java Custom > com.ibm.ws.webservices.writeTimeout 9000
o Java Custom > com.ibm.ws.webservices.readTimeout 9000
o Java Custom > com.ibm.websphere.webservices.http.waitingThreadsThreshold
o port 9080 > TCP inbound > pool > WebContainer
yes
o port 9080 > TCP inbound > Max open connections
20000
o port 9080 > TCP inbound > Inactivity timeout
60
o port 9080 > HTTP inbound > Max persistent req
unlimited
o port 9080 > HTTP inbound > read timeout
60
o port 9080 > HTTP inbound > write timeout
60
o port 9080 > HTTP inbound > persistent timeout 60
o Thread Pool Default
50 to 200
o Thread Pool ORB
10 to 50
o Thread Pool WebContainer
10 to 300
o Thread Pool TCPChannel
5 to 20
o connection pool BPE DB
25 to 350
233
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
30 sec
Security related tuning for WPS running the InHouse scenario is as follows:
o
MAXAPPLS which must be large enough to accommodate connections from all possible
JDBC Connection Pool threads, and
The default buffer pool sizes (number of 4K pages in IBMDEFAULTBP) for each
database are set so that each pool is 256MB in size.
The following table shows the parameter settings used for this report.
234
Parameter Name
BPEDB Setting
APP_CTL_HEAP_SZ
144
APPGROUP_MEM_SZ
13001
CATALOGCACHE_SZ
521
CHNGPGS_THRESH
55
DBHEAP
600
LOCKLIST
500
LOCKTIMEOUT
30
LOGBUFSZ
245
LOGFILSIZ
1024
LOGPRIMARY
11
LOGSECOND
10
MAXAPPLS
90
MAXLOCKS
57
MINCOMMIT
NUM_IOCLEANERS
NUM_IOSERVERS
10
PCKCACHESZ
915
SOFTMAX
440
SORTHEAP
228
STMTHEAP
2048
DFT_DEGREE
DFT_PREFETCH_SZ
32
UTIL_HEAP_SZ
11663
IBMDEFAULTBP
65536
235
In addition to these database level parameter settings, several other parameters were also
modified using the WAS Admin Console, mostly those affecting concurrency (i.e., thread
settings).
Database connection pool size for the BPEDB was increased to 60 and the statement
cache size for the BPEDB was increased to 300.
The maximum connections property for JMS connection pools was set to 40
Connectivity to the local database is via the DB2 JDBC Universal Driver Type 2 driver.
236
Tracing is disabled
Security is disabled
Java Heap size is fixed at 1280 MB for Windows and 1280 MB for AIX
Gencon garbage collection policy enabled, setting the nursery heap size to 1024 MB.
WebContainer Thread pool inactivity timeouts for thread pools set to 3500
Otherwise, unless specifically noted in the workload description, the default settings as supplied
by the product installer were used.
237
2 GB RAM
100Mbit Ethernet
Software
3 GB RAM
100Mbit Ethernet
Software
238
3.0 GB RAM
1Gbit Ethernet
Software
239
4 GB RAM
100Mbit Ethernet
Software
4 GB RAM
1Gbit Ethernet
Software
4 x 2.8GHz Pentium 4
240
Hyperthreading disabled
4 GB RAM
100Mbit Ethernet
Software
WPS 6.1.0
2MB L3 cache
3.5 GB RAM
1 Gbit Ethernet
Software
JMSPerfHarness
241
1.5 GB RAM
1 Gbit Ethernet
Software
3 GB RAM
L1 2 x 16 KB, L2 2 x 1 MB caches
100Mbit Ethernet
Software
1.3.10
Intel 2.93GHz A
Hardware
24GB RAM
L1 (Primary cache): 32K Instruction (I) + 32K Data (D) per processor, L2 (Secondary
cache): 8MB I+D per processor (4MB shared per 2 cores)
242
1 Gigabit Ethernet
Software
1.3.11
Intel 2.93GHz B
Hardware
24 GB RAM
1 Gigabit Ethernet
Software
WPS 7.0.0.1
1.3.12
Intel 2.93GHz C
Hardware
40 GB RAM
1 Gigabit Ethernet
243
Software
WESB 7.0.0.1
1.3.13
Intel 2.93GHz D
Hardware
24 GB RAM
1 Gigabit Ethernet
244
Software
1.3.14
Intel 3.0GHz A
Hardware
6 GB RAM, 4 MB L3 Cache
Software
1.3.15
Intel 3.0GHz - B
Hardware
6 GB RAM, 4 MB L3 Cache
Software
IBM WebSphere Process Server, 6.0.2.0 Build m0649.11 with 6.0.2-WS-WPS-ESBWinX32-CritFixes.zip packaged 13 DEC 2006
245
1.3.16
Intel 3.0GHz - C
Hardware
4MB L3 cache
4 GB RAM
back cache
1 Gbit Ethernet
Software
1.3.17
Intel 3.0GHz - D
Hardware
4MB L3 cache
3.5 GB RAM
1 Gbit Ethernet
Software
WebSphere MQ V6.0.2.2
246
1.3.18
Hardware
4.0 GB RAM
1Gbit Ethernet
Software
1.3.19
Intel 3.5GHz - A
Hardware
3 GB RAM
1 Gbit Ethernet
Software
1.3.20
Intel 3.5GHz - B
Hardware
16 GB RAM
1 Gbit Ethernet
Software
247
1.3.21
Intel 3.5GHz C
Hardware
10 GB RAM
1 Gb Ethernet
Software
1.3.22
Hardware
Hyperthreading disabled
16GB RAM
Software
1.3.23
Intel 3.67GHz - A
Hardware
3.25 GB RAM
248
1 Gbit Ethernet
Software
JMSPerfHarness
1.3.24
Hardware
Hyper-threading disabled
4 GB RAM
1 Gbit Ethernet
Software
1.3.25
Intel 3.67GHz - C
Hardware
2.0 GB RAM
1Gbit Ethernet
249
Software
1.3.26
Hardware
64GB RAM
1 Gb Ethernet
Software
AIX 5300-11-01-0944
WPS 7.0.0.1
1.3.27
Hardware
16GB RAM
1 Gb Ethernet
Software
AIX 5300-07-01-0748
250
1.3.28
Hardware
32GB RAM
1 Gb Ethernet
Software
AIX 5300-07-01
1.3.29
Hardware
16GB RAM
Software
AIX 5300-07-01-0748
1.3.30
PPC 4.2GHz - A
Hardware
64 GB RAM
1 Gbit Ethernet
Software
AIX 6.1.0.0
251
1.3.31
PPC 4.2GHz - B
Hardware
64 GB RAM
1 Gbit Ethernet
Software
AIX 6.1.0.0
252
1.3.32
Hardware
64GB RAM
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
1.3.33
Hardware
64GB RAM
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
1.3.34
Hardware
253
32GB RAM
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
1.3.35
Hardware
64GB RAM
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
1.3.36
Hardware
32GB RAM
254
1 Gb Ethernet
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
DB2 9.5 FP 3
1.3.37
Hardware
64GB RAM
Software
AIX 6100-00-03-0808
WPS 6.2.0
Websphere MQ 6.0.2.5
1.3.38
Hardware
128RAM
Software
255
AIX 6100-04-01-0944
WPS 7.0.0.1
DB2 9.7 FP 1
1.3.39
Hardware
12 GB RAM
Software
AIX 6..1
WPS 7.0.0.1
1.3.40
Hardware
12 GB RAM
Software
AIX 6..1
Appendix B References
256
Appendix B References
1. WebSphere BPM Performance References
https://w3quickplace.lotus.com/QuickPlace/wasperf/PageLibrary852569AF00670F15.nsf
/h_Toc/3648196DB48799C7852570EE00730294/?OpenDocument&Form=h_PageUI
2. WebSphere BPM Version 7.0 information center
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp
3. WebSphere Application Server Performance Best Practices and Resources
https://w3quickplace.lotus.com/QuickPlace/wasperf/Main.nsf/h_Toc/e600a81c8a827220
85256efb000b5116/?OpenDocument
4. WebSphere Application Server Performance URL
http://www.ibm.com/software/webservers/appserv/was/performance.html
5. WebSphere Application Server 7.0 information center (including Tuning Guide)
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websph
ere.base.doc/info/aes/ae/welcome_base.html
6. Setting up a Data Store in the Messaging Engine
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websph
ere.pmc.nd.multiplatform.doc/tasks/tjm0005_.html
7. DB2 Best Practices for Linux, UNIX, and Windows
http://www.ibm.com/developerworks/data/bestpractices/?&S_TACT=105AGX11&S_C
MP=FP
8. DB2 Version 9.7 Info Center
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp.
9. DB2 Version 9.5 Info Center
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp
10. Redbook: WebSphere BPM v7 Production Topologies
http://www.redbooks.ibm.com/redpieces/abstracts/sg247854.html
11. Redbook: IBM WebSphere InterChange Server Migration to WebSphere Process
Server
http://www.redbooks.ibm.com/redbooks/pdfs/sg247415.pdf
Appendix B References
257