WebSphere BPM&C 7.WebSphere Business Process Management (BPM) 7.0.0.1 Performance Report0.0.1 Performance Report

WebSphere Business Process Management (BPM)
7.0.0.1
Performance Report
Best Practices, Tuning and Configuration, and Measurements for the

following products:
WebSphere Process Server (WPS) 7.0.0.1
WebSphere Enterprise Server Bus (WESB) 7.0.0.1
WebSphere Integration Developer (WID) 7.0.0.1
WebSphere Business Monitor (Monitor) 7.0.0.0
WebSphere Business Modeler (Modeler) 7.0.0.1
IBM Corporation
WebSphere Business Process Management Performance Team
March 2010
Introduction
Copyright IBM Corporation 2005, 2010. All right reserved.
Introduction
ii
WebSphere Business Process Management (BPM)

7.0.0.1
Performance Report
Best Practices, Tuning and Configuration, and Measurements for the
following products:
WebSphere Enterprise Server Bus (WESB) 7.0.0.1
IBM Corporation
WebSphere Business Process Management Performance Team
March 2010
This publication is unclassified, but it is not intended for general or broad public circulation.
The purpose is to provide detailed performance data, best practices, and tuning information for
the products covered. The target audience is software services and technical support specialists.
The expected usage is to provide guidance in making rational configuration choices for proofs of
concept and for product deployments.
Though the content can be shared with customers, preferably in a one-on-one discussion, the
information is not intended as general sales material.
Introduction
iii
INTRODUCTION ............................................................................................................................... 1
1.1 OVERVIEW ........................................................................................................................................ 1
1.2 ADDITIONS IN THIS REPORT ............................................................................................................... 3
1.3 SUMMARY OF KEY MEASUREMENTS ................................................................................................ 4
1.4 DOCUMENT STRUCTURE AND USAGE GUIDELINES ........................................................................... 6
1.4.1
Document Structure................................................................................................................ 6
1.4.2
Measurement Usage Guidelines ............................................................................................. 7
ARCHITECTURE BEST PRACTICES............................................................................................ 8

2.1 OVERVIEW ........................................................................................................................................ 8
2.2 TOP TUNING AND DEPLOYMENT GUIDELINES ................................................................................... 9
2.3 MODELING ...................................................................................................................................... 11
2.3.1
Choose non-interruptible over interruptible (long running) processes whenever possible.. 11
2.3.2
Choose query tables over standard query API for task list and process list queries............ 11
2.3.3
Choose the appropriate granularity for a process................................................................ 11
2.3.4
Use Events Judiciously ......................................................................................................... 12
2.3.5
Choose efficient Meta-Data management............................................................................. 13
2.3.6
Considerations when choosing between business processes and business state machines .. 14
2.3.7
Minimize state transitions in BSM ........................................................................................ 14
2.4 TOPOLOGY ...................................................................................................................................... 15
2.4.1
Deploy appropriate hardware .............................................................................................. 15
2.4.2
Use a high performing database (such as DB2) ................................................................... 15
2.4.3
Deploy local modules in the same server ............................................................................. 15
2.4.4
Best Practices for Clustering................................................................................................ 15
2.4.5
Evaluate service providers and external interfaces.............................................................. 16
2.5 LARGE OBJECTS .............................................................................................................................. 17
2.5.1
Factors Affecting Large Object Size Processing .................................................................. 17
2.5.2
Large Object Design Patterns .............................................................................................. 18
2.6 64-BIT CONSIDERATIONS ................................................................................................................ 19
2.7 WEBSPHERE BUSINESS MONITOR ................................................................................................... 21
2.7.1
Event Processing .................................................................................................................. 21
2.7.2
Dashboard ............................................................................................................................ 21
2.7.3
Database Server ................................................................................................................... 21
DEVELOPMENT BEST PRACTICES........................................................................................... 23

3.1 INTRODUCTION ............................................................................................................................... 23
3.2 SCA CONSIDERATIONS ................................................................................................................... 23
3.2.1
Cache results of ServiceManager.locateService() ................................................................ 23
3.2.2
Reduce the number of SCA Modules, when appropriate ...................................................... 23
3.2.3
Use synchronous SCA bindings across local modules.......................................................... 24
3.2.4
Utilize multi-threaded SCA clients to achieve concurrency ................................................. 24
3.2.5
Add Quality of Service Qualifiers at appropriate level ........................................................ 24
3.3 BUSINESS PROCESS CONSIDERATIONS ............................................................................................ 24
3.3.1
Modeling best practices for activities in a business process ................................................ 24
3.3.2
Do not use 2-way synchronous invocation of long running business processes................... 24
3.3.3
Minimize number and size of BPEL variables and BOs ....................................................... 25
3.4 HUMAN TASK CONSIDERATIONS..................................................................................................... 25
3.5 BUSINESS PROCESS AND HUMAN TASKS CLIENT CONSIDERATIONS ............................................... 25
3.6 TRANSACTIONALITY CONSIDERATIONS .......................................................................................... 26
3.6.1
Exploit SCA transaction qualifiers ....................................................................................... 27
3.6.2
Avoid two-way synchronous invocation of an asynchronous target ..................................... 27
3.6.3
Exploit transactional attributes for BPEL activities in long-running processes .................. 27
3.7 INVOCATION STYLE CONSIDERATIONS ........................................................................................... 28
Introduction
iv
3.7.1
Use Asynchrony judiciously.................................................................................................. 28
3.7.2
Set the Preferred Interaction Style to Sync whenever possible............................................. 28
3.7.3
Avoid Asynchronous Invocation of Synchronous Services in a FanOut / FanIn Block ........ 29
3.8 MEDIATION FLOW CONSIDERATIONS .............................................................................................. 30
3.8.1
Use mediations that benefit from WESB optimizations ........................................................ 30
3.8.2
Usage of XSLTs vs. BO Maps ............................................................................................... 32
3.8.3
Configure WESB Resources ................................................................................................. 32
3.9 LARGE OBJECT BEST PRACTICES .................................................................................................... 33
3.9.1
Avoid lazy cleanup of resources ........................................................................................... 33
3.9.2
Avoid tracing when processing large BOs............................................................................ 33
3.9.3
Avoid buffer-doubling code .................................................................................................. 33
3.9.4
Make use of deferredparsing friendly mediations for XML docs........................................ 33
3.10
WICS MIGRATION CONSIDERATIONS ......................................................................................... 34
3.11
WID CONSIDERATIONS .............................................................................................................. 35
3.11.1 Leverage Hardware Advantages .......................................................................................... 35
3.11.2 Make use of WAS shared libraries in order to reduce memory consumption....................... 35
3.12
FABRIC CONSIDERATIONS .......................................................................................................... 35
3.12.1 Only specify pertinent context properties in context specifications...................................... 35
3.12.2 Bound the range of values for context keys .......................................................................... 35
4
PERFORMANCE TUNING AND CONFIGURATION................................................................ 37

4.1 INTRODUCTION ............................................................................................................................... 37
4.2 PERFORMANCE TUNING METHODOLOGY ........................................................................................ 38
4.3 TUNING CHECKLIST ........................................................................................................................ 40
4.4 TUNING PARAMETERS ..................................................................................................................... 42
4.4.1
Tracing and Logging flags.................................................................................................... 42
4.4.2
Java tuning parameters ........................................................................................................ 42
4.4.3
MDB ActivationSpec............................................................................................................. 43
4.4.4
Thread Pool Sizes ................................................................................................................. 43
4.4.5
JMS Connection Pool Sizes .................................................................................................. 43
4.4.6
JDBC DataSource Parameters............................................................................................. 44
4.4.7
Messaging Engine Properties............................................................................................... 44
4.4.8
Run production servers in production .................................................................................. 45
4.5 ADVANCED TUNING ........................................................................................................................ 45
4.5.1
Tracing and Monitoring considerations ............................................................................... 45
4.5.2
Tuning for Large Objects...................................................................................................... 45
4.5.3
Tuning for Maximum Concurrency....................................................................................... 46
4.5.4
Messaging Tuning................................................................................................................. 49
4.5.5
Web Services Tuning ............................................................................................................ 54
4.5.6
Business Process Choreographer Tuning............................................................................. 54
4.5.7
WESB Tuning........................................................................................................................ 56
4.5.8
Clustered Topology Tuning .................................................................................................. 57
4.5.9
WebSphere Business Monitor Tuning................................................................................... 58
4.5.10 Database: General Tuning ................................................................................................... 59
4.5.11 Database: DB2 Specific Tuning ........................................................................................... 60
4.5.12 Database: Oracle Specific Tuning........................................................................................ 65
4.5.13 Advanced Java Heap Tuning ................................................................................................ 67
4.5.14 Power Management Tuning.................................................................................................. 71
4.5.15 WPS Tuning for WICS migrated workloads ......................................................................... 71
WEBSPHERE PROCESS SERVER 7.0.0.1 PERFORMANCE RESULTS ................................ 72

5.1 SOABENCH 2008 CHOREOGRAPHY FACET .................................................................................. 73
5.1.1
Automated Approval on Windows 2008 and RHE Linux 5.2 ................................................ 73
5.1.2
OutSourced on Windows 2008 and RHE Linux 5.2 .............................................................. 74
5.1.3
Vertical (SMP) scaling on AIX POWER6............................................................................. 75
Introduction
5.1.4
5.1.5
6
Horizontal (clustered) scaling on AIX POWER6.................................................................. 78

Automated Approval on AIX POWER7 ................................................................................ 82
WEBSPHERE ESB 7.0.0.1 PERFORMANCE RESULTS ............................................................ 85

6.1 WINDOWS RESULTS......................................................................................................................... 86
6.1.1
Web Services Binding ........................................................................................................... 86
6.1.2
JMS Binding Non Persistent .............................................................................................. 95
6.1.3
JMS Binding Persistent...................................................................................................... 99
6.2 AIX RESULTS ................................................................................................................................ 103
6.2.1
Web Services Binding ......................................................................................................... 103
6.2.2
JMS Binding Non Persistent ............................................................................................ 113
6.2.3
JMS Binding Persistent.................................................................................................... 115
6.2.4
Web Services Binding SMP scaling .................................................................................... 117
WEBSPHERE BUSINESS MONITOR 7.0.0.0 PERFORMANCE RESULTS.......................... 118

7.1
INTERACTIVE PROCESS DESIGN IMPROVEMENTS .......................................................................... 118
WID 7.0.0.1 AND MODELER 7.0.0.1 PERFORMANCE RESULTS ........................................ 120

8.1 BUILD ACTIVITIES ........................................................................................................................ 120
8.1.1
Order Processing Workload ............................................................................................... 121
8.1.2
Loan Processing Workload................................................................................................. 122
8.1.3
Customer Service Workload ............................................................................................... 123
8.2 PUBLISH ACTIVITIES ..................................................................................................................... 125
8.2.1
Publish Including Generation of Deploy Code................................................................... 125
8.2.2
Publish with Deploy Code Cached in the Application........................................................ 126
8.3 DIRECT DEPLOY ACTIVITIES ......................................................................................................... 127
DIRECTED STUDIES .................................................................................................................... 128

9.1 THROUGHPUT FOR 32-BIT JVM ON 32-BIT AND 64-BIT WINDOWS ................................................ 128
9.2 THROUGHPUT AND MEMORY USAGE FOR 64 BIT JVM ON AIX...................................................... 129
9.2.1
Introduction ........................................................................................................................ 129
9.2.2
Throughput Results............................................................................................................. 129
9.2.3
Memory Footprint Results .................................................................................................. 130
9.3 THROUGHPUT AND RESPONSE TIME FOR UP TO 10,000 CONCURRENT USERS ................................. 131
9.3.1
Introduction ........................................................................................................................ 131
9.3.2
Results 4 WPS server cores ............................................................................................. 132
9.3.3
Results 8 WPS server cores ............................................................................................. 133
9.4 BUSINESS SPACE RESPONSE TIME FOR HUMAN WORKFLOW ......................................................... 134
9.5 PROCESS INSTANCE MIGRATION PERFORMANCE .......................................................................... 136
9.6 BPC QUERY RESPONSE TIME ....................................................................................................... 137
9.6.1
Query Table Response Time ............................................................................................... 138
9.6.2
BPC Explorer Response Time (WPS 6.2.0 data) ................................................................ 140
9.7 WPS RELEASE-TO-RELEASE IMPROVEMENTS ............................................................................... 142
9.7.1
SOABench 2008 Automated Approval (microflow) ............................................................ 142
9.7.2
Banking (macroflow) .......................................................................................................... 142
9.8 IMPACT OF VARYING NUMBER OF ACTIVE BUSINESS PROCESS INSTANCES .................................. 143
9.8.1
Throughput as increase Preloaded Process Instances ....................................................... 143
9.8.2
Database System Behavior ................................................................................................. 145
9.9 IMPACT OF BUSINESS OBJECT SIZE ON THROUGHPUT .................................................................... 148
9.10
TOPOLOGY STUDY: SMP VS. CLUSTERED WPS ....................................................................... 150
9.10.1 Overview............................................................................................................................. 150
9.10.2 Automated Approval Mode ................................................................................................. 150
9.10.3 OutSourced Mode ............................................................................................................... 151
9.11
SINGLE CLUSTER DEPLOYMENT ENVIRONMENT PATTERN....................................................... 151
9.11.1 Overview............................................................................................................................. 151
Introduction
vi
9.11.2 MDB Connection Behavior................................................................................................. 151

9.11.3 Topology ............................................................................................................................. 153
9.11.4 Workload ............................................................................................................................ 154
9.11.5 Results................................................................................................................................. 154
9.11.6 Summary ............................................................................................................................. 155
9.12
SCALING UP PRODUCTION DEPLOYMENTS ................................................................................ 156
9.13
WICS TO WPS MIGRATION ..................................................................................................... 156
9.14
LARGE OBJECT SIZE STUDY ...................................................................................................... 158
9.14.1 Introduction and Caveats ................................................................................................... 158
9.14.2 Large Objects in WPS......................................................................................................... 159
9.14.3 Large Objects in WESB ...................................................................................................... 161
9.15
MESSAGING BINDING COMPARISON USING WESB .................................................................. 163
9.15.1 Messaging Binding Comparison Non Persistent ............................................................. 164
9.15.2 Messaging Binding Comparison Persistent..................................................................... 165
9.16
XSL TRANSFORM (XSLT) VS. BOMAP PRIMITIVES USING WESB........................................... 166
9.17
MODULARITY IMPACT - COMPOSITE VS. CHAINED MEDIATIONS.............................................. 167
9.18
THROUGHPUT USING JAX-WS VS. JAX-RPC FOR WEB SERVICES .......................................... 169
9.19
AUTHORING STUDIES ............................................................................................................... 171
9.19.1 Summary of Key Measurements.......................................................................................... 171
9.19.2 Hardware Study Server vs. Desktop systems ................................................................... 172
9.19.3 Deployment Strategy Study................................................................................................. 174
9.20
BPM 6.2.0 DIRECTED STUDIES ................................................................................................ 175
9.20.1 Impact of Enabling Security at Runtime ............................................................................. 175
9.20.2 Remote Messaging Deployment Environment Startup Time and Footprint ....................... 177
9.20.3 Authoring - Shared Libraries Study.................................................................................... 179
9.20.4 Authoring - Hardware Comparison Study.......................................................................... 181
9.20.5 Dynamic/Static Routing Comparison using WESB............................................................. 184
9.20.6 WESB Client Scaling .......................................................................................................... 185
9.20.7 Local versus remote SCA bindings WPS 6.1.0 data ........................................................ 193
10
WEBSPHERE PROCESS SERVER CORE WORKLOADS................................................ 194

10.1
INTRODUCTION......................................................................................................................... 194
10.2
CONTACT MANAGER ................................................................................................................ 195
10.2.1 SCA Synchronous Binding.................................................................................................. 196
10.2.2 Web Services Binding ......................................................................................................... 197
10.3
BANKING.................................................................................................................................. 198
10.3.1 Banking Workload Description .......................................................................................... 198
10.3.2 Banking Scenarios .............................................................................................................. 198
10.3.3 Banking Services................................................................................................................. 201
10.4
SOABENCH 2008 CHOREOGRAPHY FACET ........................................................................... 202
10.4.1 Overview............................................................................................................................. 202
10.4.2 Automated Approval Scenario details ................................................................................ 203
10.4.3 Outsourced Scenario details............................................................................................... 204
10.4.4 InHouse Scenario details.................................................................................................... 207
10.5
SOABENCH 2005 (USED IN PREVIOUS PERFORMANCE REPORTS) ............................................. 210
10.5.1 Overview............................................................................................................................. 210
10.5.2 Choreography facet: Automated Approval......................................................................... 212
10.5.3 Choreography facet: Manual Approval.............................................................................. 213
11
WEBSPHERE ESB CORE WORKLOADS............................................................................ 216

11.1
WEB SERVICES TEST SCENARIO ............................................................................................... 217
11.1.1 Web Services Fan Out / Fan In Mediation ......................................................................... 218
11.2
JMS TEST SCENARIOS .............................................................................................................. 219
11.2.1 JMS Binding test topology .................................................................................................. 219
11.2.2 MQ JMS and MQ Binding Test topology............................................................................ 219
Introduction
vii
11.3
SOABENCH 2008 MEDIATION FACET ................................................................................... 221
11.3.1 Transformation Mediations ................................................................................................ 221
11.3.2 Routing Mediations............................................................................................................. 221
11.3.3 Composite mediation .......................................................................................................... 222
11.3.4 Chained mediation.............................................................................................................. 224
11.4
SOABENCH 2008 MEDIATION FACET MESSAGE SIZES ............................................................ 225
12
WID AND MODELER CORE WORKLOADS ...................................................................... 226

12.1
12.2
12.3
12.4
ORDER PROCESSING ................................................................................................................. 226

LOAN PROCESSING ................................................................................................................... 226
CUSTOMER SERVICE ................................................................................................................ 226
BPM@WORK .......................................................................................................................... 226
APPENDIX A - MEASUREMENT CONFIGURATIONS................................................................... 227

1.1 WPS SETTINGS ............................................................................................................................. 227
1.1.1
SOABench 2008 Automated Approval and OutSourced Mode Settings: AIX ..................... 227
1.1.2
SOABench 2008 Automated Approval and OutSourced Mode Settings: Windows and Linux
228
1.1.3
SOABench 2008 InHouse Settings...................................................................................... 231
1.1.4
Banking Settings ................................................................................................................. 233
1.2 WESB SETTINGS .......................................................................................................................... 236
1.2.1
WESB Common Settings ..................................................................................................... 236
1.2.2
WESB Settings for Web Services measurements................................................................. 236
1.2.3
WESB Settings for JMS measurements ............................................................................... 236
1.2.4
DB2 Settings for JMS persistent measurements ................................................................. 236
1.3 INDIVIDUAL MEASUREMENT SYSTEM DESCRIPTIONS ................................................................... 237
1.3.1
Intel 2.0GHz - A.................................................................................................................. 237
1.3.2
Intel 2.16GHz - A................................................................................................................ 237
1.3.3
Intel 2.2 GHz D2D1......................................................................................................... 238
1.3.4
Intel 2.66GHz - A................................................................................................................ 239
1.3.5
Intel 2.66GHz - B................................................................................................................ 239
1.3.6
Intel 2.8GHz - A.................................................................................................................. 239
1.3.7
Intel 2.8GHz - B.................................................................................................................. 240
1.3.8
Intel 2.8GHz - C.................................................................................................................. 241
1.3.9
Intel 2.8GHz - D ................................................................................................................. 241
1.3.10 Intel 2.93GHz A ............................................................................................................... 241
1.3.11 Intel 2.93GHz B ............................................................................................................... 242
1.3.12 Intel 2.93GHz C............................................................................................................... 242
1.3.13 Intel 2.93GHz D............................................................................................................... 243
1.3.14 Intel 3.0GHz A ................................................................................................................. 244
1.3.15 Intel 3.0GHz - B.................................................................................................................. 244
1.3.16 Intel 3.0GHz - C.................................................................................................................. 245
1.3.17 Intel 3.0GHz - D ................................................................................................................. 245
1.3.18 Intel 3.0GHz D2D2.......................................................................................................... 246
1.3.19 Intel 3.5GHz - A................................................................................................................. 246
1.3.20 Intel 3.5GHz - B................................................................................................................. 246
1.3.21 Intel 3.5GHz C................................................................................................................. 247
1.3.22 Intel 3.5 GHz D................................................................................................................ 247
1.3.23 Intel 3.67GHz - A................................................................................................................ 247
1.3.24 Intel 3.67 GHz - B............................................................................................................... 248
1.3.25 Intel 3.67GHz - C................................................................................................................ 248
1.3.26 PPC 1.9 GHz - A................................................................................................................. 249
1.3.27 PPC 2.2 GHz A................................................................................................................ 249
1.3.28 PPC 2.2 GHz B................................................................................................................ 250
1.3.29 PPC 2.2 GHz C................................................................................................................ 250
Introduction
1.3.30
1.3.31
1.3.32
1.3.33
1.3.34
1.3.35
1.3.36
1.3.37
1.3.38
1.3.39
1.3.40
viii
PPC 4.2GHz - A.................................................................................................................. 250

PPC 4.2GHz - B.................................................................................................................. 251
POWER6 4.7 GHz - A......................................................................................................... 252
POWER6 4.7 GHz - B......................................................................................................... 252
POWER6 4.7 GHz - C ........................................................................................................ 252
POWER6 4.7 GHz - D ........................................................................................................ 253
POWER6 4.7 GHz E........................................................................................................ 253
POWER6 4.7 GHz F........................................................................................................ 254
POWER6 4.7 GHz G ....................................................................................................... 254
POWER7 3.55 GHz A ...................................................................................................... 255
POWER7 3.55 GHz B ...................................................................................................... 255
APPENDIX B REFERENCES ............................................................................................................. 256
Introduction
1 Introduction
1.1 Overview
This document is the fifth in a series of detailed performance reports for the WebSphere Business
Process Management (WebSphere BPM) product line. The report is authored by the IBM
WebSphere BPM performance team, with members in Austin Texas, Bblingen Germany, and
Hursley England. It explores the performance characteristics of the following products:
WebSphere Enterprise Service Bus (WESB) 7.0.0.1
These products represent an integrated development and runtime environment based on a key set
of Service-Oriented Architecture (SOA) and Business Process Management (BPM) technologies:
Service Component Architecture (SCA), Service Data Object (SDO), and Business Process
Execution Language for Web Services (BPEL). These technologies in turn build on the core
capabilities of the WebSphere Application Server (WAS) 7.0 product.
A short description of each product covered in this report follows:
WebSphere Process Server allows the deployment of standards-based business

integration applications in a service-oriented architecture (SOA), which takes everyday
business applications and breaks them down into individual business functions and
processes, rendering them as services. Based on the robust J2EE infrastructure and
associated platform services provided by WebSphere Application Server, WebSphere
Process Server can help you meet current business integration challenges. This includes,
but is not limited to, business process automation.
WebSphere Enterprise Service Bus provides the capabilities of a standards-based

enterprise service bus. WESB manages the flow of messages between service
requesters and service providers. Mediation modules within WESB handle
mismatches between requesters and providers, including protocol or interactionstyle, interface and quality of service mismatches.
WebSphere Integration Developer is the development environment for building

WebSphere BPM solutions. It is a common tool for building service-oriented architecture
(SOA)-based integration solutions across WebSphere Process Server, WebSphere
Enterprise Service Bus, and other WebSphere BPM products.
WebSphere Business Monitor provides the ability to monitor business processes in realtime, providing a visual display of business process status, business performance metrics,
and key business performance indicators, together with alerts and notifications to key
users that enables continuous improvement of business processes.
Introduction
WebSphere Business Modeler is IBMs premier business process modeling and analysis
tool for business users. It offers process modeling, simulation, and analysis capabilities
to help business users understand, document, and deploy business processes for
continuous improvement.
In addition to performance results, this document discusses the performance implications of the
supporting runtime environment, and describes best practices and tuning and configuration
parameters for the different software technologies involved.
We envision this report to be read by a wide variety of groups, both within IBM (development,
services, technical sales, etc.) and by customers. Please note that this document should not be
considered as a comprehensive sizing or capacity planning guide, though the document serves as
a useful reference for these activities.
The systems used to obtain measurements are intended to be representative mixes of potential
development and deployment systems running Windows, AIX, or Linux (note that there is a
separate performance report for WebSphere BPM products on z/OS). While we report results in
many cases on more than one hardware platform, this report is not intended for the purpose of
evaluating relative hardware performance between platforms. Many configurations are run
with some of the processor cores disabled, hyperthreading disabled, or both. While these changes
are marked on the charts, the reader should consider these before attempting any comparisons.
Finally, the workloads used to obtain measurements in this report are internal workloads (i.e., not
publicly available) that are designed to mimic customer usage patterns. Please see the workload
descriptions in this document for further information.
For those who are either considering or are in the very early stages of implementing a solution
incorporating these products, this document should prove a useful reference, both in terms of
best practices during application development and deployment, and as a reference for setup,
tuning and configuration information. It provides a useful introduction to many of the issues
influencing each product's performance, and can serve as a guide for making rational first choices
in terms of configuration and performance settings.
Similarly, those who have already implemented a solution utilizing these products might
effectively use the information presented here to attempt to match, to the extent possible, their
own workload characteristics to those presented here. By relating these characteristics to their
own workloads, the user is much more likely to gain insight as to what performance they might
expect, what possible inhibitors to better performance may be present, and how their overall
integrated solution performance may be improved.
All of these products build on the capabilities of the WAS infrastructure which runs on Java
Virtual Machines (JVMs), so BPM solutions also benefit from tuning, configuration, and best
practices information for WAS and corresponding platform JVMs (documented in the References
appendix). The reader is encouraged to use this report in conjunction with these references.
Please address questions or comments about this document to Mike Collins at
mcollin@us.ibm.com or Mike Collins/Austin/IBM.
Introduction
1.2 Additions in this report

As in previous editions of this report, measurements, best practices, and tuning guidance are
provided for WPS, WESB, WID, and Monitor. In addition, this report adds performance
information for the products and capabilities shown below. The Table of Contents shows specific
links to each of these.
SOABench 2008 is introduced, and is used extensively to demonstrate the performance

characteristics of WPS and WESB. SOABench 2008 replaces SOABench 2005, which
was used in previous reports. Note that the content of the Choreography Facet in
SOABench 2008 is quite different than the 2005 version, so the reader should not
compare performance across the different versions of the SOABench workload.
POWER7 data for AIX systems is shown for WPS.
WebSphere Business Modeler response times when deploying models to WPS or WB

Monitor.
The following directed studies are either added or enhanced relative to the 6.2.0 report:
o
Business Space response time for Human Workflow scenarios
Response Time and throughput for up to 10,000 concurrent WPS users
Partitioning large systems the effect of utilizing a single instance vs. clustering,
and the performance of a single cluster deployment pattern
WICS to WPS migration best practices, tuning, and performance
Scaling up production deployments
64-bit improvements in throughput and memory utilization
Large Object performance and capability
The performance effect of a varying number of process instances
Process instance migration performance
The relationship between BO size and throughput
WPS performance for a 32-bit JVM on 32-bit and 64-bit Windows systems
A comparison of the performance of various messaging bindings
A comparison of web services performance using JAX-WS and JAX-RPC
A comparison of dynamic vs. static routing in mediations
XSL Transform vs. BO Map performance
A modularity study comparing composite (single module) vs. chained (multiple

modules) performance
Authoring performance using a variety of workloads, hardware platforms, and

deployment strategies.
Introduction
1.3 Summary of Key Measurements

This document is the fifth version of WebSphere Business Process Management (WebSphere
BPM) Performance Reports; the previous report presented the performance of the 6.2.0 versions
of these products. Highlights in this report include:
WPS 7.0.0.1 improved dramatically in several areas, including

o
POWER7 throughput that is up to 50% faster than a POWER 6 system, with a

throughput rate of 1,408 Claims Completed Per Second (CCPS) using
SOABench 2008 Automated Approval on a 6 core system, demonstrating an
SMP scaling factor of 5.4x out of 6.
Clustering data that shows near-linear horizontal scaling with up to 8 nodes,

delivering 5,400 CCPS using SOABench 2008 Automated Approval Mode, 7.8
times better throughput than on a single node.
SMP scaling data that demonstrates outstanding vertical scaling using AIX
systems, as shown by SOABench 2008 Automated Approval Mode 8 core
scaling of 7.3x and 16 core scaling of 11.9x, delivering throughput over 2,000
transactions per second.
Continuing the drum beat of release to release improvements, delivering a 23%

improvement over WPS 6.2.0.1, and a 2.5x improvement over WPS 6.0.2.1,
measured using SOABench 2008 Automated Approval.
Measurements on Red Hat Enterprise Linux 5.2 that show a throughput rate of
665 transactions per second using SOABench 2008 Automated Approval Mode
on an 8 core Intel system, an SMP scaling factor of 6.2x.
Support for 10,000 concurrent users with sub-second response times for long
running processes including Query Task, Claim Task, and Complete Task
operations.
Business Space response time improved by up to 55% relative to the 6.2.0.2based Feature Pack, assessed using Human Workflow widgets.
Migrated WICS application performance that demonstrates a WICS application (Contact

Manager) migrated to WPS using the 7.0.0.1 tooling delivers 1,004 business transactions
per second on an 8 core AIX POWER6 system, a 54% improvement over a WPS 6.2.0
migrated workload. Further, the WPS 7.0.0.1 throughput is essentially equivalent to the
throughput delivered by the original WICS application.
Dramatic improvements in Direct To Deploy response time:
2.7x faster deploying the BPM@Work model from WB Modeler to WPS 7.0.0.1
2x faster deploying the Vacation Process model from WB Modeler to WB

Monitor 7.0.0.0
Authoring build and publish data, highlighted by:

o
Clean & Build response time of the Customer Service workspace shows a 45%
improvement from version 6.2.0
Peak memory utilization while building the Customer Service workspace shows a
32% improvement compared with WPS 6.2.0.
Introduction
o
5
Response time to publish the Loan Processing workspace with Resources on
Server shows a 1.9x improvement compared with version 6.2.0
WESB 7.0.0.1 measurements, including:

o
Web Services binding delivers throughput improvements of up to 150% for the

JAX-WS transform namespace mediation
JAX-WS binding now faster than JAX-RPC binding for Web Services
JMS binding delivers throughput improvements of up to 22%.
Introduction
1.4 Document Structure and Usage Guidelines

1.4.1 Document Structure
As previously stated, this document contains information that pertains to several WebSphere
BPM products. As such, it is not necessarily intended that the document be read end to end,
although some readers will find that useful. The document is structured such that information for
a specific product is easy to find by using the Table of Contents (e.g. if a reader is only interested
in WESB information, scan the Table of Contents for "WESB").
Following is the structure of this document. There are 11 major chapters and 2 appendices
following this Introduction. The first 3 chapters are Best Practices and Tuning Considerations for
three different phases of WebSphere BPM projects: Architecture, Development, and Deployment.
At least one of these chapters will be of interest to any reader of this document, and many will
find value in all 3 chapters. Following these chapters are 4 chapters showing the performance of
each of the major products covered in this document, then further down are 3 chapters describing
the workloads used to obtain these results. There is also a chapter of Directed Studies, which
contains additional data for each of the products covered in this report. Finally, the document
concludes with 2 appendices: one detailing measurement configurations and the other with a list
of useful references. Here is the structure in linear order:
Architecture Best Practices: recommendations for architecture and topology decisions

that will produce high performing and scalable solutions.
Development Best Practices: guidelines for solution developers that will lead to high
performing systems.
Performance Tuning and Configuration: a discussion of the configuration parameters and

settings for the major software components which comprise a business process
management solution.
WPS Performance Results: measurements for the SOABench 2008 Choreography Facet
workload.
WESB Performance Results: measurements for the SOABench 2008 Mediation Facet
workload.
WebSphere Integration Developer and WebSphere Business Modeler Performance

Results : measurements for a representative set of WID and Modeler workloads.
WebSphere Business Monitor Performance Results : measurements for deploying a

Monitor model to WB Monitor.
Directed Studies: a series of specific performance related investigations are presented

which address a specific aspect of one of the products.
WPS Core Workloads: a detailed description of the workloads used to measure the
performance characteristics of WPS.
WESB Core Workloads: a detailed description of the workloads used to measure the
performance characteristics of WESB.
WebSphere Integration Developer and WebSphere Business Modeler Core Workloads : a

detailed description of the workloads used to measure the performance characteristics of
the WID and WB Modeler.
Introduction
Appendix A - Measurement Configurations: details of the hardware and software

configurations used for all measurements are presented.
Appendix B - References: links to best practices, performance information, and product

information for both the products in this report, and related products such as WebSphere
Application Server, DB2, etc.
1.4.2 Measurement Usage Guidelines

There are several important points to understand in order to properly interpret the measurements
presented in this document:
Data is presented for multiple hardware platforms, including POWER6, POWER7, Intel
Pentium IV Xeon, and Intel multi-core technologies. This is done to provide
representative coverage for WebSphere BPM production topologies. However, this data
should not be used to compare the relative performance of different hardware platforms.
The intent of this document is to show how the BPM stack performs on representative
configurations, not to compare hardware environments.
Hardware threading can significantly impact system performance. Two different

technologies are relevant; POWERs Simultaneous Multithreading (SMT) and Intels
Hyper-Threading. Where applicable, hardware threading usage is indicated on the
measurement charts.
Multi-core technology is prevalent for current hardware systems. As such, in this

document we use the term Core (or processor Core) in lieu of CPU to refer to a physical
processor. The term CPU Utilization is still used to represent the aggregate utilization of
all cores in the system.
WebSphere BPM 7.0.0.1 provides significant performance improvements in many areas;

highlighting these is the focus of the measurements in this report. However, for some
scenarios WebSphere BPM 6.2.0 data is still relevant and is included in this report as
well. These scenarios are clearly labeled in the Table Of Contents and section titles.
Architecture Best Practices
2 Architecture Best Practices

2.1 Overview
This section provides guidance on how to architect a high-performing and scalable WebSphere
BPM solution. Many of these best practices are illustrated in the Directed Studies chapter of this
document
The purpose of this chapter is to highlight the best practices associated specifically with the
technologies and features delivered in the WebSphere BPM products covered in this report.
However, these products are built on top of existing technologies like WAS (WebSphere
Application Server), Platform Messaging, and DB2. Each of these technologies has associated
best practices that apply. It is not our intent to enumerate these here. Instead the reader is referred
to Appendix B for a set of references and pointers to this information.
2.2 Top Tuning and Deployment Guidelines

The remainder of this chapter details architectural best practices for WebSphere BPM solutions.
Development Best Practices and Performance Tuning and Configuration are covered in
subsequent chapters. The reader is strongly encouraged to read these chapters, since the authors
have found this information to be very beneficial for numerous customers over the years.
However, if you read nothing else in this document, please read and adhere to the following
key tuning and deployment guidelines, since they are relevant in virtually all performance
sensitive customer engagements.
Use a high performance disk subsystem. In virtually any realistic topology, a server-class
disk subsystem (e.g. RAID adapter with multiple physical disks) is required on the tier(s)
that host the message and data stores to achieve acceptable performance. This point
cannot be overstated; the authors have seen many cases where the overall performance of
a solution is improved by several factors simply by utilizing appropriate disk subsystems.
Set an appropriate Java heap size to deliver optimal throughput and response time. JVM
verbosegc output will greatly help in determining the optimal settings. Further
information is available in Section 4.4.2.
Where possible, utilize non-interruptible processes (microflows) instead of long running

processes (macroflows). Macroflows are required for many processes (e.g, if human
tasks are employed, or state needs to be persisted). However, there is significant
performance overhead associated with macroflows. Further, if macroflows are needed
for some portion of the solution, separate the solution into both microflows and
macroflows to maximize utilization of microflows. For details, see Section 2.3.1.
Use DB2 instead of the default Derby DBMS. DB2 is a high-performing, industrial
strength database designed to handle high levels of throughput and concurrency, scale
well, and deliver excellent response time.
Tune your database for optimal performance. Proper tuning, and deployment, choices for
databases can greatly increase overall system throughput. For details, see Section 4.5.10.
Disable tracing. Tracing is clearly important when debugging, but the overhead of tracing
severely impacts performance. More information is available in Section 4.5.1.
Configure thread and connection pools to enable sufficient concurrency. This is

especially important for high volume, highly concurrent workloads, since the thread pool
settings directly influence how much work can be concurrently processed by the server.
For more information, see Section 4.5.3.3.
For task and process list queries, use composite query tables. Query tables are designed to
produce excellent response times for high-volume task and process list queries. For
details, see Section 2.3.2.
Use work-manager based navigation to improve throughput for long running processes.
This optimization reduces the number of objects allocated, the number of objects
retrieved from the database, and the number of messages sent for Business Process
Choreographer messaging. For further information, see Section 4.5.6.1
Avoid unnecessary usages of asynchronous invocations. Asynchronous invocation is

often needed on the edges of modules, but not within a module. Utilize synchronous
preferred interaction styles, as is described in Section 3.7.2.
10
Avoid too granular transaction boundaries in SCA and BPEL. Every transaction commit
results in expensive database and/or messaging operations. Design your transactions with
care, as described in Section 3.6.
11
2.3 Modeling
2.3.1 Choose non-interruptible over interruptible (long running)
processes whenever possible
Use interruptible processes, a.k.a. macroflows or long running processes, only when required
(e.g. long running service invocations and human tasks). Non-interruptible processes, a.k.a.
microflows or short running processes, exhibit much better performance at runtime. A noninterruptible process instance is executed in one J2EE transaction with no persistence of state,
while an interruptible process instance is typically executed in several J2EE transactions,
requiring that state be persisted in a database at transaction boundaries.
Whenever possible, utilize synchronous interactions for non-interruptible processes. A noninterruptible process is much more efficient than an interruptible process since it does not have to
utilize state or persistence in the backing database system.
A process is interruptible if the checkbox Process is long-running is set in the WebSphere
Integration Developer (WID) via Properties > Details for the process.
If interruptible processes are required for some capabilities, separate the processes such that the
most frequent scenarios can be executed in non-interruptible processes and exceptional cases are
handled in interruptible processes.
2.3.2 Choose query tables over standard query API for task list and
process list queries
Query tables were introduced in WPS 6.2.0. Query tables are designed to provide good response
times for high-volume task list and process list queries. Query tables offer improved query
performance:
Improved access to work items reduces the complexity of the database query.
Configurable high-performance filters on tasks, process instances, and work items allow
for efficient filtering.
Composite query tables can be configured to bypass authorization through work items.
Composite query tables allow the definition of a query tables that reflect the information
which is displayed on task lists and process lists presented to users.
Query improvements due to Query Tables are shown in Section 9.6.1. For further information,
please see the references below:
WebSphere Process Server Query Table Builder
http://www.ibm.com/support/docview.wss?uid=swg24021440
Query Tables in Business Process Choreography in the WPS 7.0 Info Center:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.bpc.doc/doc/bpc/c6bpel_querytables.html
2.3.3 Choose the appropriate granularity for a process
12
A business process and its individual steps should have business significance and not try to
mimic programming level granularity. Use programming techniques like POJOs (Plain Old Java
Objects) or Java snippets for logic without business significance. This topic is discussed further
in the Software components: coarse-grained versus fine-grained paper available here:
http://www.ibm.com/developerworks/library/ws-soa-granularity/index.html
2.3.4 Use Events Judiciously

The purpose of CBE (Common Base Event) emission in WPS is for business activity monitoring.
Since CBE emission uses a persistent mechanism, it is inherently heavy weight. One should
utilize CBE only for events that have business relevance. Further, emitting CBEs to a database is
not recommended; instead CBE emission should be done via the messaging infrastructure.
Finally, do not confuse business activity monitoring and IT monitoring, The Performance
Monitoring Infrastructure (PMI) is far more appropriate for the latter.
With this in mind, the following generally holds for most customers:
Customers are concerned about the state of their business and their processes. Therefore
events that signify changes in state are important. For long-running and human task
activities, this is fairly natural: use events to track when long-running activities complete,
when human tasks change state, etc.
For short running flows that complete within seconds, it is usually sufficient to know that
a flow completed, perhaps with the associated data. It usually makes no sense to
distinguish events within a microflow that are only milliseconds or seconds apart.
Therefore, 2 events (start, end) are usually sufficient for a microflow.
13
2.3.5 Choose efficient Meta-Data management

2.3.5.1 Follow Java Language Specification for Complex DataType Names
While WebSphere BPM v7 allows characters in Business Object type names that would not be
permissible in Java class names, the internal data representation of complex data type names does
make use of Java types. As such, performance is better if BO types follow the Java naming
standards, because if valid Java naming syntax is used then no additional translation is required.
2.3.5.2 Avoid use of anonymous derived types in XSDs
Some XSD features (restrictions on the primitive string type, for example) result in modifications
to the type that require a new sub-type to be generated. If these types are not explicitly declared,
then a new sub-type (a derived type) is generated at runtime. Performance is generally better if
this can be avoided. So, avoid adding restrictions to elements of primitive type where possible. If
a restriction is unavoidable, consider creating a new, concrete SimpleType that extends the
primitive type to include the restriction. Then XSD elements may utilize that type without
degraded performance.
2.3.5.3 Avoid referencing elements from one XSD in another XSD
If A.xsd defines an element AElement:
<xs:element name="AElement">
<xs:simpleType name="AElementType">
<xs:restriction base="xs:string">
<xs:minLength value="0" />
<xs:maxLength value="8" />
</xs:restriction>
</xs:simpleType>
</xs:element>
It may be referenced from another file, B.xsd as:
<xs:element ref="AElement" minOccurs="0" />
This has been shown to perform poorly. It is much better to define the type concretely and then
make any new elements use this type. So, A.xsd becomes:
<xs:simpleType name="AElementType">
<xs:restriction base="xs:string">
<xs:minLength value="0" />
<xs:maxLength value="8" />
</xs:restriction>
</xs:simpleType>
14
and B.xsd becomes:

<xs:element name="BElement" type="AElementType" minOccurs="0" />
2.3.5.4 Reuse Data Object type metadata where possible
Within application code, it is common to refer to types, for instance when creating a new
Business Object. It is possible to refer to a Business Object type by name for instance in the
method DataFactory.create(String uri, String typeName). It is also possible to refer to the type by
a direct reference as in the method DataFactory.create(Type type). In cases where a Type is likely
to be used more than once, it is usually faster to retain the Type (for instance, via
DataObject.getType()) and reuse that type for the second and future uses.
2.3.6 Considerations when choosing between business processes

and business state machines
Business state machines (BSM) provide an attractive way of implementing business flow logic.
For some applications, it is more intuitive to model the business logic as a state machine, and the
resultant artifacts are easy to understand. However, BSM is implemented using the business
process infrastructure, so there will always be a performance impact when choosing BSM over
business processes. If an application can be modeled using either BSM or business processes and
performance is a differentiating factor, choose business processes. There are also more options
available for optimizing business process performance than there are for BSM performance.
2.3.7 Minimize state transitions in BSM

Where possible, minimize external events to drive state transitions in business state machines.
External event driven state transitions are very costly from a performance perspective. In fact, the
total time taken to execute a BSM is proportional to the number of state transitions that occur
during the life span of the state machine. For example, if a state machine transitions through
states A -> B -> B -> B -> C, (4 transitions), it is twice as time consuming as making transitions
through states A -> B -> C (2 transitions). Take this into consideration when designing a BSM.
Also, automatic state transitions are much less costly than event driven state transitions.
15
2.4 Topology
2.4.1 Deploy appropriate hardware
It is very important to pick a hardware configuration that contains the resources necessary to
achieve high performance in a WebSphere BPM environment. Here are some key considerations
in picking a hardware configuration:
Cores: Ensure that WPS and WESB are installed on a modern server system with
multiple cores. WPS and WESB scale well, both vertically in terms of SMP scaling, and
horizontally, in terms of clustering.
Memory: WPS and WESB benefit from both a robust memory subsystem as well as an
ample amount of physical memory. Ensure that the chosen system has server-class
memory controllers and as large as possible L2 and L3 caches (optimally, use a system
with at least a 4 MB L3 cache). Make sure there is enough physical memory for all the
applications (JVMs) combined that are expected to run concurrently on the system. 2 GB
per WPS/WESB JVM is a rough rule of thumb.
Disk: Ensure that the systems hosting the message and data stores, typically the database
tiers, have fast storage. This means utilizing RAID adapters with writeback caches and
disk arrays with many physical drives.
Network: Ensure that the network is sufficiently fast to not be a system bottleneck. As an
example, a dedicated Gigabit Ethernet network is a good choice.
Virtualization: Take care when using virtualization such as AIX dynamic logical
partitioning or VMWare virtual machines. Ensure sufficient processor, memory, and I/O
resources are allocated to each virtual machine or lpar. Avoid over-committing
resources.
2.4.2 Use a high performing database (such as DB2)

WPS, WESB, and Monitor are packaged with the Derby database, an open source database
designed for ease-of-use and platform neutrality. If performance and reliability are important, use
an industrial strength database (such as IBMs DB2) for any performance measurement or
production installation. Examples of databases that can be moved to DB2 include the BPE
database, Relationship databases, and the WebSphere Platform Messaging (WPM) Messaging
Engine data stores.
The conversion requires the administrator to create new JDBC providers in the admin console
under Resources > JDBC Providers. Once created, a data source can be added to connect to a
database using the new provider.
2.4.3 Deploy local modules in the same server

If planning to deploy modules on the same physical server, better performance will be achieved
by deploying the modules to the same application server JVM, as this allows the server to exploit
this locality. Section 9.17 demonstrates this benefit.
2.4.4 Best Practices for Clustering
16
We highly recommend the IBM Red Book on WebSphere BPM 7.0 Production Topologies
(http://www.redbooks.ibm.com/redpieces/abstracts/sg247854.html) to our readers, which is a
comprehensive guide to selecting appropriate topologies for both scalability and high-availability.
It is not the intent of this section to repeat any content from the above. Rather, we will distill
some of the key considerations when trying to scale up a topology for maximum performance.
2.4.4.1 Use the remote messaging and remote support deployment environment
pattern for maximum flexibility in scaling
See link:
e.wps.doc/doc/cpln_topologypat.html
This topology (formerly known as the Golden Topology) prescribes the use of separate clusters
for applications, messaging engines, and support applications like the CEI (Common Event
Infrastructure) server, and the Business Rules Manager. This allows independent control of
resources to support the load on each of these elements of the infrastructure.
Note: As with many system choices, flexibility comes with some cost. For example, synchronous
CBE (Common Base Event) emission between an application and the CEI server in this topology
is a remote call, which is heavier than a local call. The benefit is the independent ability to scale
the application and support cluster. We assume the reader is familiar with these kinds of system
tradeoffs, as they occur in most server middleware.
2.4.4.2 Single Server vs. Clustered Topology Considerations
In general, there are 2 primary reasons to consider when evaluating moving to a clustered
topology from a single server configuration: scalability / load balancing in order to improve
overall performance and throughput, and high availability / failover to prevent loss of service due
to hardware or software failures. Although not mutually exclusive, there are considerations
applicable to each. In this report, the focus is on the performance (throughput) related aspects of
clustering, and not on the high availability aspects.
When considering the tradeoffs between a single server and a clustered configuration, an
interesting study can be found in section 9.10 of this document, Single Server vs. Clustered
WPS. Significant gains in throughput are measured with the workloads in this study due to
utilizing a clustered topology. It can be expected that most single server workloads that are
driving resources to saturation would benefit to some degree by moving to a clustered topology.
2.4.5 Evaluate service providers and external interfaces

One of the typical usage patterns for WPS is as an integration layer between incoming requests
and backend systems for the business (target applications or service providers). In these
scenarios, the throughput will be limited by the layer with the lowest throughput capacity.
Considering the simple case where there is only one target application; the WPS based integration
solution cannot achieve throughput rates higher than the throughput capacity of the target
application regardless of the efficiency of the WPS based implementation or the size or speed of
the system hosting WPS. Thus, it is critical to understand the throughput capacity of all target
applications and service providers, and apply this information when designing the end-to-end
solution.
There are 2 key aspects of the throughput capacity of a target application or service provider:
response time, both for typical cases and exception cases
17
number of requests that the target application can process at the same time (concurrency)
If each of these performance aspects of the target applications can be established, then a rough
estimate of the maximum throughput capacity can be calculated. Similarly, if average throughput
is known, then either one of these 2 aspects can be roughly calculated as well. For example, a
target application that can process 10 requests per second with an average response time of 1
second can process approximately 10 requests at the same time (throughput / response time =
concurrency).
The throughput capacity of target applications is critical to projecting the end-to-end throughput
of an entire application. Also, the concurrency of target applications should be considered when
tuning the concurrency levels of the upstream WPS based components. For example, if a target
application can process 10 requests at the same time, the WPS components that invoke this
application should be tuned so that the simultaneous request from WPS at least match the
concurrency capabilities of the target. Additionally, overloading target applications should be
avoided since such configurations will not result any increase in overall application throughput.
For example, if 100 requests are sent to a target application that can only process 10 requests at
the same time, no throughput improvement will be realized versus tuning such that the number of
requests made matches the concurrency capabilities of the target.
Finally, for service providers that may take a long time to reply, either as part of main line
processing or in exception cases, do not utilize synchronous invocations that require a response.
This is to avoid tying up the WPS business process, and its resources, until the service provider
replies.
2.5 Large Objects

An issue frequently encountered by field personnel is trying to identify the largest object size that
WPS, WESB, and the corresponding adapters can effectively and efficiently process. There are a
number of factors affecting large object processing in each of these products. We present both a
discussion of the issues involved as well as practical guidelines for the v7 releases of these
products.
The single most important factor affecting large object processing is the JVM. WebSphere BPM
V7 uses the Java 6 JVM, which is substantially different than the 1.4.2 JVM that was used in
WebSphere BPM V6.0.2 and earlier. As such, this section has been rewritten and the
recommendations and best practices differ from WebSphere BPM V6.0.2 and earlier.
In general, objects 5 MB or larger may be considered large and require special attention.
Objects 100 MB or larger are very large and generally require significant tuning to be processed
successfully.
2.5.1 Factors Affecting Large Object Size Processing

Stated at a high level, the object size capacity for a given installation depends on the size of the
Java heap and the load placed on that heap (that is, the live set) by the current level of incoming
work; the larger the heap, the larger the business object that can be successfully processed.
In order to be able to apply this somewhat general statement, one must first understand that the
object size limit is based on three fundamental implementation facts of Java Virtual Machines:
1. Java Heap Size Limitations
The limit for the size of the Java heap is operating system dependent. Further details on
maximum heap sizes are given in section 4.5.2.1, but it is not unusual to have a heap size
18
limit of around 1.4 GB for 32-bit JVMs. The heap size limit is much higher on 64-bit JVMs,
and is typically less of a gating factor on modern hardware configurations than the amount of
available physical memory.
2. Size of In-Memory Business Objects
Business Objects (BO), when represented as Java objects, are much larger in size than when
represented in wire format. For example, a BO that consumes 10 MB on an input JMS
message queue may result in allocations of up to 90 MB on the Java heap. The reason is that
there are many allocations of large and small Java objects as the BO flows through the
adapters and WPS or WESB. There are a number of factors that affect the in-memory
expansion of BOs.
The single-byte binary wire representation is generally converted to multi-byte

character representations (e.g. Unicode), resulting an expansion factor of 2.
The BO may contain many small elements and attributes, each requiring a few
unique Java objects to represent its name, value, and other properties.
Every Java object, even the smallest, has a fixed overhead due to an internal object
header that is 12-bytes long on most 32-bit JVMs, and larger on 64-bit JVMs,
Java objects are padded in order to align on 8-bye or 16-byte address boundaries.
As the BO flows through the system, it may be modified or copied, and multiple
copies may exist at any given time during the end-to-end transaction. What this
means is that the Java heap must be large enough to host all these BO copies in order
for the transaction to complete successfully.
3. Number of Concurrent Objects Being Processed

The largest object that can be successfully processed is inversely proportional to the number
of requests being processed simultaneously. This is due to the fact that each request will have
its own memory usage profile (liveset) as it makes its way through the system. So,
simultaneously processing multiple large objects dramatically increases the amount of
memory required, since the sum total of each requests livesets has to fit into the configured
heap.
2.5.2 Large Object Design Patterns

There are 2 proven design patterns for processing large objects successfully; each is described
below. In cases where neither can be applied, 64-bit mode should be considered. See the next
section for details.
2.5.2.1 Batched Inputs: Send Large Objects as Multiple Small Objects
If a large object needs to be processed then the solutions engineer must find a way to limit the
number of large Java objects that are allocated. The primary technique for doing this involves
decomposing large business objects into smaller objects and submitting them individually.
If the large objects are actually a collection of small objects as assumed above, the solution is to
group the smaller objects into conglomerate objects less than 1 MB in size. This has been done at
a variety of customer sites and has produced good results. If there are temporal dependencies or
an all-or-nothing requirement for the individual objects then the solution becomes more
complex. Implementations at customer sites have shown that dealing with this complexity is
worth the effort as demonstrated by both increased performance and stability.
19
Note that certain adapters like the Flat Files JCA Adapter can be configured to use a
SplitBySize mode with a SplitCriteria set to the size of each individual object. In this case a
large object would be split in chunks of the size specified by SplitCriteria to reduce peak memory
usage.
2.5.2.2 Claim Check pattern: when only a small portion of an input message is
used by the workload
When the input BO is too large to be carried around in a system and there are only a few
attributes that are actually needed by that process or mediation, one can exploit a pattern called
the claim check pattern. The claim check pattern applied to BO has the following steps:
Detach the data payload from the message.
Extract the required attributes into a smaller control BO
Persist the larger data payload to a datastore and store the claim check as a reference in
the control BO.
Process the smaller control BO, which has a smaller memory footprint.
At the point where the solution needs the whole large payload again, check out the large
payload from the datastore using the key.
Delete the large payload from the datastore.
Merge the attributes in the control BO with the large payload, taking the changed
attributes in the control BO into account.
The Claim-Check pattern requires custom code and snippets in the solution. A less developerintensive variant would be to make use of custom data bindings to generate the control BO. This
approach suffers from the disadvantage of being limited to certain export/import bindings, and
the full payload still must be allocated in the JVM.
2.6 64-bit Considerations

Since WPS 6.1.0, full 64-bit support has been available in WPS. However, applications can
continue to be run in either 32-bit or 64-bit mode. In 32-bit mode, the maximum heap size is
limited by the 4GB address space size, and in most 32-bit operating systems, the practical limit
varies between 1.5-2.5 GB. In contrast, while maximum heap size is essentially limitless in 64-bit
mode, standard Java best practices still apply. The sum of the maximum heap sizes of all the java
processes running on a system should not exceed the physical memory available on the system.
BPM v7 brought further improvement to its 64-bit implementation. The memory footprint of a
64-bit runtime server is now about the same as the 32-bit version. What this means is that there is
no longer a memory footprint penalty for utilizing 64-bit if the heap size is lower than 27 GB.
This was not the case for BPM v6.1 and v6.2; see section 9.2 for details.
Here are the factors to consider when determining which of these modes to run in:
64-bit mode is an excellent choice for applications whose liveset approaches or exceeds
the 32-bit limits. Such applications either experience OutOfMemoryExceptions or suffer
excessive time in GC. We consider anything > 10% of time in GC as excessive. These
applications will exhibit much better performance when allowed to run with the larger
heaps they need. However, there must always be sufficient physical memory on the
system to back the Java heap size.
20
64-bit mode is also a good choice for applications that, though well behaved on 32-bit,
could be algorithmically modified to perform much better with larger heaps. An example
would be an application that frequently persists data to a data store to avoid maintaining a
very large in-memory cache, even if such a cache would greatly improve throughput.
Recoding such an application to tradeoff the more space available in 64-bit heaps for less
execution time would yield much better performance.
Moving to 64-bit still causes some degradation in throughput. If a 32-bit application fits
well within a 1.5-2.5GB heap, and the application is not expected to grow significantly,
32-bit BPM servers can still be a better choice than 64-bit.
21
2.7 WebSphere Business Monitor

2.7.1 Event Processing
A major factor in event processing performance is the tuning of the Monitor Database. Attention
should be paid especially to adequate bufferpool sizes to minimize disk reading activity and the
placement of the database logs which ideally should be on a physically separate disk subsystem
from the database tablespaces.
By default, events are delivered directly from CEI to the monitor database, bypassing an
intermediate queue. We recommend using this default delivery style for better performance, as it
avoids an additional persistence step in the flow. For additional background see the topic
Bypassing the JMS Queue in the WebSphere Business Monitor Information Center at:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/topic/com.ibm.btools.help.monitor.in
st.doc/inst/cfg_qb.html
2.7.2 Dashboard
The platform requirements of the Business Space, Dashboard, and Alphablox stack are relatively
modest compared to those of Monitor server and the database server. The most important
consideration for good Dashboard performance is to size and configure the DB server correctly.
Be sure it has enough CPU capacity for anticipated data mining queries, enough RAM for
bufferpools etc., and plenty of disk arms.
2.7.3 Database Server

Both event processing and Dashboard rely on a fast, well-tuned database server for good
performance. The design of Monitor assumes that any customer using it has strong on-site DB
administrator skills. We strongly advise that the database tuning advice and recommendations
beginning in section 4.5.10 be read and followed.
Development Best Practices
23
3 Development Best Practices

3.1 Introduction
This section discusses best practices that are relevant to the solution developer. It primarily
addresses modeling, design, and development choices that are made while designing and
implementing a WebSphere BPM solution. The WebSphere Integration Developer (WID) tool is
used to implement the vast majority of these Best Practices.
3.2 SCA Considerations

3.2.1 Cache results of ServiceManager.locateService()
When writing Java code to locate an SCA service, either within a Java component or a Java
snippet, consider caching the result for future use, as service location is a relatively expensive
operation. Note that WID-generated code does not do this, so editing is required to cache the
locateService result.
3.2.2 Reduce the number of SCA Modules, when appropriate

WPS components are assembled into modules for deployment. When assembling modules we
recognize that many factors come into play. Performance is one key factor, but maintainability,
versioning requirements and module ownership must be considered as well. In addition, more
modules can allow for better distribution across servers and nodes. Still, it is important to
recognize that modularization also has a cost. When components will be placed together in a
single server instance, it is best to package them within a single module for best performance.
24
3.2.3 Use synchronous SCA bindings across local modules

For cross-module invocations, where the modules are likely to be deployed locally, i.e. within the
same server JVM, we recommend using the synchronous SCA binding. This binding has been
optimized for module locality and will outperform other bindings. Note that synchronous SCA is
as expensive as other bindings when invocations are made between modules located in different
WPS or WESB servers; this is shown in section 9.20.7.
3.2.4 Utilize multi-threaded SCA clients to achieve concurrency

Synchronous components that are invoked locally, i.e. from a caller in the same server JVM,
execute on the context of the callers thread. Thus concurrency, if desired, must be provided by
the caller in the form of multiple threads.
3.2.5 Add Quality of Service Qualifiers at appropriate level

Quality of Service (QoS) qualifiers such as Business Object Instance Validation can be added at
the interface level, or at an operation level within an interface. Since there is additional overhead
associated with QoS qualifiers, do not apply a qualifier at the interface level if it is not needed for
all operations of the interface.
3.3 Business Process Considerations

3.3.1 Modeling best practices for activities in a business process
Use the Audit Logging property for Business Processes only if you need to log events in
the BPE database. This property can be set at the activity or process level; if set at the
process level the setting is inherited by all activities.
For long-running processes, disable the Enable persistence and queries of businessrelevant data flag under the Properties->Server tab, for both Process and for each
individual BPEL activity. Enabling this flag causes details of the execution of this
activity to be stored in the BPC database. This increases the load on the database and the
amount of data stored for each process instance. This setting should be used only if this
specific information will need to be retrieved later.
For long-running processes, a setting of participates on all activities generally provides

the best throughput performance. See section 3.6.2 for more details.
Human tasks can be specified in business processes (e.g. process administrators), invoke
activities, and receive activities. Specify these tasks only if needed. Also, when multiple
users are involved use group work items (people assignment criterion: Group) instead of
individual work items for group members (people assignment criterion: Group
Members).
3.3.2 Do not use 2-way synchronous invocation of long running

business processes
When designing long-running business process components, ensure that callers of a 2-way
(request/response) interface do not use synchronous semantics, as this ties up the caller's
resources (thread, transaction, etc.) until the process completes. Instead, such processes should
25
either be invoked asynchronously, or via a 1-way synchronous call, where no response is

expected.
In addition, calling a 2-way interface of a long-running business process synchronously
introduces difficulties when exceptions occur. Suppose a non-interruptible process calls a longrunning process using the 2-way request/response semantics, and the server fails after the longrunning process has completed, but before the callers transaction is committed:
If the caller was started by a persistent message, upon server restart the callers
transaction is rolled back and then retried. However, the result of the execution of the
long-running process on the server is not rolled back, since it was committed before the
server failure. As a result, the long-running process on the server is executed twice. This
duplication will cause functional problems in the application unless corrected manually.
If the caller was not started by a persistent message, and the response of the long-running
process was not submitted yet, it will end in the failed event queue.
3.3.3 Minimize number and size of BPEL variables and BOs
Use as few variables as possible and minimize the size and the number of Business
Objects (BOs) used. In long-running processes, each commit saves modified variables to
the database (to save context), and multiple variables or large BOs make this very costly.
Smaller BOs are also more efficient to process when emitting monitor events.
Specify variables as Data Type variables. This improves runtime performance.
Use transformations (maps or assigns) to produce smaller BOs by only mapping fields
necessary for the business logic.
3.4 Human Task Considerations
Use group work items for large groups (people assignment criterion: Group) instead of
individual work items for group members (people assignment criterion: Group
Members).
Where possible, use native properties on the task object rather than custom properties.
For example, use the priority field instead of creating a new custom property priority.
Set the transactional behavior to commit after if the task is not part of a page-flow. This
improves the response time of task complete API calls.
3.5 Business Process and Human Tasks Client Considerations

General considerations:
APIs that provide task details and process details, such as htm.getTask(), should not be
called frequently. Use these methods only when required to display the task details of a
single task, for instance.
Do not put too much work into a single client transaction:

o
In servlet applications, a global transaction is typically not available. If the

servlet calls the HTM and BFM APIs directly, transaction size is typically not a
concern.

o
26
In EJB applications, make sure that transactions are not too time consuming:
long-running transactions create long-lasting locks in the database which prevent
other applications and clients to continue processing.
Chose the protocol which best suits your needs:

o
In a J2EE environment, use the HTM and BFM EJB APIs. If the client
application is running on a WPS server, use the local EJB interface.
In a Web 2.0 application, use the REST API.
In an application that runs remote to the process container, the Web services API
is an option.
Clients that follow a page-flow pattern should consider the following:
Use the completeAndClaimSuccessor() API if possible. This provides optimal response

time.
Applications that assign the next available task to the user can use the claim(String
queryTableName, ) method on the Human Task Manger EJB interface. It implements a
performance optimized mechanism to handle claim collisions.
Dont put asynchronous invocations between two steps of a page-flow, because the
response time of asynchronous services increases as the load on the system increases.
Where possible, do not invoke long-running sub-processes between two steps of a pageflow, because long-running sub-processes are invoked using asynchronous messaging.
Clients that present task lists and process lists to the user should consider the following:
Use query tables for task list and process list queries. See the directed study in section
9.6.1 for further information.
Do not loop over the tasks displayed in the task or process list and execute an additional
remote call for each object. This will prevent the application from providing good
response times and good scalability.
Design the application such that during task list and process list retrieval, all information
is retrieved from a single query table. For instance, do not make calls to retrieve the input
message for task list or process list creation.
3.6 Transactionality Considerations

One of the strengths of the WebSphere Process Server platform is the precise control it provides
for specifying transactional behavior. We strongly recommend that when modeling a process or
mediation assembly, the modeler should carefully design their desired transaction boundaries as
dictated by the applications needs. Transaction boundaries are expensive in system resources;
hence the objective of this section is to guide the modeler in avoiding unnecessary transaction
boundaries.
There are some general guiding principles at work here:
27
The throughput of a particular usage scenario is inversely related to the number of

transaction boundaries traversed in the scenario, so fewer transactions is faster
In user-driven scenarios, improving response time may require more granular transaction
boundaries, even at the cost of throughput.
Transactions can span across synchronous invocations, but cannot span asynchronous
invocations.
Avoid synchronous invocation of a two-way asynchronous target. The caller transactions

failure recovery can be problematic
We will see this in more detail in the following sections.
3.6.1 Exploit SCA transaction qualifiers

In an SCA assembly, the number of transaction boundaries can be reduced by allowing
transactions to propagate across components. For any pair of components where this is desired,
we recommend using the following golden path:
SuspendTransaction= false, for the calling components reference
joinTransaction= true, for the called components interface
Transaction= any|global, for both components implementation
The above assumes that the first component in such a chain either starts or participates in a global
transaction.
3.6.2 Avoid two-way synchronous invocation of an asynchronous

target
If the target component has to be invoked asynchronously and its interface is of two-way
request/response style, the target cannot be safely invoked through synchronous SCA calls. After
the caller sends the request to the target, it then waits for response from the target. Upon
receiving the request, the asynchronous target starts a new transaction, and upon completion of
the request processing returns the response asynchronously to the caller through the response
queue. If system failure occurs after the caller successfully sent the request but before receiving
the response, the caller transaction is rolled back and then retried. As a result, the target will be
invoked a second time.
3.6.3 Exploit transactional attributes for BPEL activities in longrunning processes

While SCA qualifiers control component level transactional behavior, there are additional
transactional considerations in long-running business processes which can cause activities to be
run in multiple transactions. The scope of those transactions and the number of transactions can
be changed with the transactional behavior settings on Java Snippet, Human Task, and Invoke
activities. Please see the WPS InfoCenter for a detailed description of these settings at:
e.bpc.doc/doc/bpc/cprocess_transaction.html
There are four choices: Commit before, Commit after, Participates, and Requires own. Only
the Participates setting does not require a new transaction boundary, the other three require the
28
process flow container to start a new transaction before executing the activity, after executing the
activity, or both before and after.
In general, the Participates attribute provides the best throughput and should be used wherever
possible. This is true for both synchronous and asynchronous activities. In the two-way
asynchronous case, it is important to understand that the calling transaction always commits after
sending the request. The Participates setting refers to the transaction started by the process
engine for the response: when set, this allows the next activity to continue on the same
transaction.
In special cases, the other transaction settings may be preferable. Please refer to the InfoCenter
link above for details.
Use Commit before in parallel activities which start new branches to ensure parallelism. As
noted in the InfoCenter, there are other constraints to be considered.
Use Commit after for inline human tasks to increase responsiveness to human users. When this
option is chosen, after a human task is completed the thread/transaction handling the task
completion is also used to resume navigation of the human task activity in the process flow. The
users task completion action will not complete until the process engine commits the transaction.
By contrast, if the Participates setting is used, the commit will get delayed and result in longer
response time for the user. This is a classic response time versus throughput tradeoff.
Note that starting with the 6.2.0 release, Receive and Pick activities in BPEL flow are now
allowed to define their own transactional behavior property values. If not set, the default value of
initiating a Receive or Pick activity is Commit after. Consider using Participates where
possible, since Participates will perforrn better.
3.7 Invocation Style Considerations

3.7.1 Use Asynchrony judiciously
Components and modules may be wired to each other either synchronously or asynchronously.
The choice of interaction style can have a profound impact on performance and care should be
exercised when making this choice.
3.7.2 Set the Preferred Interaction Style to Sync whenever possible

Many WPS component types like interface maps or business rules invoke their target components
based on the target interfaces setting of preferred interaction style. Since synchronous crosscomponent invocations are better performing, it is recommended to set the Preferred Interaction
Style to Sync whenever possible. Only in specific cases, for example when invoking a longrunning business process, or more generally whenever the target component requires
asynchronous invocation, should this be set to Async.
In WID 6.2 when a new component is added to an Assembly Diagram, its Preferred Interaction
Style is set to synchronous, asynchronous, or any based on the component. In previous
releases of the WID, the default initial setting of Preferred Interaction Style is set to any unless
explicitly changed by the user. If a components Preferred Interaction Style is set to any, how
the component is invoked is determined by the callers context.. If the caller is a long running
business process, a Preferred Interaction Style setting of any is treated as asynchronous. If the
caller is a non-interruptible business flow, a Preferred Interaction Style setting of any is treated
as synchronous.
29
The invocation logic of processes is explained in more detail in the WPS InfoCenter at:
e.bpc.doc/doc/bpc/cprocess_transaction.html
Some additional considerations are listed below:
When setting an interfaces Preferred interaction style to Async, it is important to realize

the downstream implications. Any components invoked downstream will inherit the
async interaction style unless they explicitly set Preferred interaction style to Sync.
At the input boundary to a module, exports that represent asynchronous transports like
MQ, JMS, or JCA (with async delivery set) will set the interaction style to Async. This
can cause downstream invocations to be async if the Preferred interaction style is left at
Any.
For an SCA import, its Preferred interaction style can be used to specify whether the
cross-module call should be Sync or Async.
For other imports that represent asynchronous transports like MQ or JMS, it is not
necessary to set the Preferred interaction style to Async. Doing so will introduce an
unnecessary async hop between the calling module and the invocation of the transport.
3.7.2.1 Avoid unnecessary cross-component asynchronous invocations within a

module
It is important to realize that asynchronous invocations are intended to provide a rich set of
qualities of service, including transactions, persistence, and recoverability. Hence, an
asynchronous invocation should be thought of as a full messaging hop to its target. When the
intended target of the invocation is in the same module, a synchronous invocation will yield much
higher performance.
Some qualities of services such as event sequencing and store-and-forward can only be associated
with asynchronous SCA calls. Consider the performance impact of asynchronous invocations
when setting these qualities of service.
3.7.3 Avoid Asynchronous Invocation of Synchronous Services in a

FanOut / FanIn Block
Do not select asynchronous (deferred response interaction) service invocations for services with
synchronous bindings (e.g. Web Services) unless there is an overriding need for this, and the nonperformance implications for this style of invocation are well understood.
Apart from the performance implications of calling a synchronous service asynchronously there
are reliability and transactional aspects to be considered. Make sure you understand these nonperformance implications of using asynchronous callouts before considering their use. Generally,
asynchronous callouts should only be used for idempotent query type services. If you need to
guarantee that the service is only called once do not use asynchronous invocation. It is beyond the
scope of this performance report to provide complete guidance on the functional applicability of
using asynchronous callouts in your mediation flow; more information can be found in the WID
help documentation and WPS/WESB InfoCenters.
Assuming that asynchronous callouts are functionally applicable for you there may be a
performance reason for invoking a service in this style but it should be understood that
asynchronous processing is inherently more expensive in terms of CPU cycles due to the
additional messaging overhead incurred by calling a service this way.
30
There are additional operational considerations, for example asynchronous invocations use the
SIBus messaging infrastructure which uses a database for persistence. Synchronous invocations
will perform well with basic tuning of the JVM heap size and thread pools but for asynchronous
invocations SCA artifacts require review and tuning. This will include tuning of the SCA
messaging engine (see section 4.4.7), datasources (section 4.4.6) and the database itself. For the
datasource, the tunings for JMS bindings in this report can be used as guidance as the
considerations are the same.
If multiple synchronous services with large latencies are being called then asynchronous
invocations can reduce the overall response time of the mediation flow at the expense of
increasing the internal response time of each individual service call. This assumes that
asynchronous callouts have been configured along with parallel waiting in the FanOut section of
the flow:
In the case of iteration of array - configuring the FanOut to "check for asynchronous
responses after all/N messages have been fired"
In case of extra wires/FlowOrder primitive - by default.
If there are a number of services in a fan-out section of a mediation flow then calling these
synchronously will result in an overall response time equal to the sum of the individual service
response times.
Calling the services asynchronously (with parallel waiting configured) will result in a response
time equal to at least the largest individual service response time in WESB plus the sum of the
time taken by WESB to process the remaining service callout responses residing on the
messaging engine queue.
For a FanOut/FanIn block the processing time for any primitives before or after the service
invocations will need to be added in both cases.
To optimise the overall response time when calling services asynchronously in a FanOut/FanIn
section of a mediation flow you should invoke the services in the order of expected latency if
known (highest latency first).
There is a trade off between parallelism and additional asynchronous processing to consider. The
suitability of asynchronous processing will depend on the size of the messages being processed,
the latency of the target services, the number of services being invoked and any response time
requirements expressed in service level agreements. Running performance evaluations on
mediations flows including fan-outs with high latency services is strongly recommended if
asynchronous invocations are being considered.
The default quality of service on service references is Assurred Persistent. A substantial reduction
in asynchronous processing time can be gained by changing this to Best Effort (non-persistent)
which eliminates I/O to the persistence store but the application MUST tolerate the possibility of
lost request or response messages. This level of reliability for SIBus can discard messages under
load and may require tuning.
3.8 Mediation Flow considerations

3.8.1 Use mediations that benefit from WESB optimizations
Certain types of mediations benefit from internal optimization in WebSphere ESB, and deliver
improved performance. This specialized optimization can be regarded as a kind of 'fastpath'
through the code and is in addition to any general optimization of the WESB mediation code.
31
The optimization is known as deferred parsing; as the name implies, parsing the message can be
deferred until absolutely required, and in several cases (described below) parsing can be avoided
altogether.
There are three categories of mediation primitives in WESB that benefit to a greater or lesser
degree from these internal optimizations:
Category 1 (greatest benefit)
Route on Message Header (Message Filter Primitive)
XSLT Primitive (Transforming on /body as the root)
EndpointLookup without Xpath user properties.
Event Emitter (CBE Header Only)
Category 2 (medium benefit)
Route on Message Body (Message Filter Primitive)
Category 3 (lowest benefit)
Custom Mediation
Database Lookup
Message Element Setter
BO Mapper
Fan Out
Fan In
Set Message Type
Message Logger
Event Emitter (Except for CBE Header only)
EndpointLookup utilising Xpath user properties
XSLT Primitive (with a non /body root)
There is therefore an ideal pattern of usage in which these mediation primitives can take
advantage of a 'fastpath' through the code. Fully fastpathed flows can contain any of the above
mediation primitives in category 1 above, e.g.:
--> XSLT Primitive(/body) --> Route On Header --> EndPointLookup (non-Xpath) -->
Partially fastpathed flows can contain a route on body filter primitive (category 2) and any
number of category 1 primitives, e.g.
--> XSLT Primitive(/body) --> Route on body -->
32
In addition to the above optimizations, the ordering of primitives can be important. If the
mediation flow contains an XSLT primitive (with a root of /body - i.e the category 1 variant) and
category 3 primitives then the XSLT primitive should be placed ahead of the other primitives. So
--> Route On Header --> XSLT Primitive(/body) --> Custom Primitive -->
is preferable to
--> Route On Header --> Custom Primitive --> XSLT Primitive(/body) -->
It should be understood that there are costs associated with any primitive regardless of whether
the flow is optimally configured or not. If an Event Emitter primitive is using event distribution
or a Message Logger primitive is included there are associated infrastructure overheads for such
remote communications. Large messages increase processing requirements proportionally for
primitives (especially those accessing the body) and a custom mediation will contain code which
may not be optimally written. The above guidelines can help in designing for performance but
they cannot guarantee speed.
3.8.2 Usage of XSLTs vs. BO Maps

In a mediation flow which is eligible for deferred parsing (detailed above), the XSL Transform
primitive gives better performance than the Business Object Map primitive. However in a
mediation flow where the message is already being parsed the Business Object Map primitive
gives better performance than the XSL Transform primitive.
Note that if you are transforming from the root (/) then the Business Object Map will always
perform better.
Section 9.16 contains a detailed discussion of this topic, along with performance results to
support the conclusions.
3.8.3 Configure WESB Resources

When creating resources using the WebSphere Integration Developer (WID) Tooling, the
application developer is given the choice to use pre-configured WESB resources or to let the
Tooling generate the Mediation Flow related resources that it requires. Both approaches have
their advantages and disadvantages.
Pre-configured resources support:
existing resources to be used
external creation/tuning scripts to be applied
easier post deployment adjustment
Tooling created resources support:
no further need for creating resources using scripts or the Admin Console
the ability to change the majority of performance tuning options as they now exposed in
the Tooling
In our performance tests we use pre-configured resources for the reason that by segregating the
performance tuning from the Business logic, the configuration for different scenarios can be
33
maintained in a single script. It is also easier to adjust these parameters once the applications have
been deployed.
The only cases where this pattern has not been followed is for Generic JMS bindings. In these
scenarios where resources have already been configured by the 3rd party JMS provider software
(MQ 6.0.2.2 for all instances in this report), the Tooling created resources are used to locate the
externally defined resources.
3.9 Large Object Best Practices

3.9.1 Avoid lazy cleanup of resources
Lazy cleanup of resources adds to the liveset required when processing large objects. Any
resources which can be cleaned up (e.g, by dropping object references when no longer required)
should be done as soon as is practical.
3.9.2 Avoid tracing when processing large BOs

Tracing and logging can add significant memory overhead. A typical tracing activity is to dump
the BO payload. Creating a string representation of a large BO can trigger allocation of many
large and small Java objects in the Java heap. Avoid turning on tracing when processing large
BO payloads in production environments.
Also, avoid constructing trace messages outside of conditional guard statement. For example, the
sample code below will create a large String object even if tracing is disabled.
String boTrace = bo.toString();
While this pattern is always inefficient, it hurts performance even more if the BO size is large.
To avoid unnecessarily creating a BO when tracing is disabled, move the String construction
inside an if statement., as is shown below
if (tracing_on) System.out.println(bo.toString();
3.9.3 Avoid buffer-doubling code

Study the memory implications of using Java data structures which expand their capacity based
on input (eg. StringBuffer, ByteArrayOutputStream). Such data structures usually double their
capacity when they run out of space; this doubling can produces significant memory pressure
when processing large objects. If possible, always assign an initial size to such data structures.
3.9.4 Make use of deferredparsing friendly mediations for XML docs

Certain mediations can reduce memory pressure as they retain the document in their native form
and avoid inflating them into their full BO representation. These mediations are listed above in
Section 3.8.1. Where possible, use these mediations.
34
3.10 WICS Migration considerations
Utilize JCA adapters to replace WBIA adapters, where possible. Migrated workloads
making use of custom WBIA adapters or legacy WBIA adapters result in interaction with
the WPS server through JMS, which is slower than the JCA adapters.
Some WBIA technology adapters like HTTP and Webservices are migrated by the WICS
migration wizard into native WPS SCA bindings, which is a better performing approach.
For WBIA adapters which are not migrated automatically to available SCA bindings,
development effort spent to manually migrate to a SCA binding will remove the
dependency on a legacy adapter as well as have better performance.
The WICS Migration Wizard in WID 7.0 offers a feature to merge the connector &
collaboration module together. Enable this option, if possible, as it increases performance
by reducing cross-module SCA calls.
WICS Collaborations are migrated into WPS BPEL processes. The resultant BPEL
processes can be further customized and made more efficient as follows:
o
Migrated BPEL processes enable support for compensation by default. If the

migrated workload does not make use of compensation, this support can be
disabled to gain performance. The relevant flag can be found in the WID under
process name-> properties->Details-> Require a compensation sphere context to
be passed in
The generated BPEL flows still make use of ICS API to perform BO &
Collaboration level tasks. Development effort spent cleaning up the migrated
BPEL to replace these APIs will result in better performance and better
maintainability.
Investigate the possibility of replacing BPEL processes produced by migration

with other artifacts. All WICS collaborations currently get migrated into BPEL
processes. For certain scenarios other WPS artifacts may be better choices (e.g.
Business Rules). Analyze the BPEL processes produced by migration to ensure
the processes are the best fit for your scenario.
Disable Message Logger calls in migration-generated MFC components. The WICS

Migration Wizard in WID 7.0 generates a Mediation Flow Component (MFC) to deal
with the mapping details of a connector : it contains the code handling
synchronous/asynchronous calls to maps that transform Application Specific BO to/from
Generic BO and visa versa. The generated MFC contain embedded MessageLogger calls
which log the message to a database. Disable these calls (Select MessageLogger instance,
choose the details panel, uncheck Enabled checkbox) if not required in your business
scenario. This reduces writes to the database and thus improves performance.
Reduce memory pressure by splitting the shared library generated by the migration
wizard. The migration wizard creates a single shared library and puts all migrated
Business Objects, maps and relationships in it. This library is then shared by copy by all
the migrated modules. This can cause memory bloat for cases where the shared library is
very large and a large number of modules are present. The solution is to manually refactor the shared library into multiple libraries based on functionality or usage and
modify modules to only reference the shared libraries that are needed.
If original WICS maps contain many custom map steps, then development effort spent in
rewriting such map steps will result in better performance. The WICS Migration Wizard
35
in WID 7.0 generates maps that make use of ICS APIs, which is a translation layer above
WPS technologies. Removing this layer by making direct use of WPS APIs avoids the
cost of translation and hence produces better performance.
3.11 WID Considerations

This section describes recommendations intended to improve the performance of activities
commonly encountered by application developers during the development of an enterprise
application, primarily Import, Build & Publish of an application workspace.
3.11.1
Leverage Hardware Advantages
Importing and building an enterprise application is, in itself a resource intensive activity. Recent
improvements in desktop hardware architecture have greatly improved the responsiveness of
Import and Build activities, as demonstrated in Section 9.20.4. In particular, Intel Core2 Duo
cores perform much better than the older PentiumD architecture, even when the Core2 Duo runs
at a slower clock rate. Also, for I/O intensive activities (like Import) a faster disk drive reduces
total response time, as demonstrated in Section 9.19.2.
3.11.2
Make use of WAS shared libraries in order to reduce
memory consumption
For applications containing many projects utilizing a WPS shared library, server memory
consumption is reduced by defining the library as a WAS shared library as described in the
technote found at
http://www-01.ibm.com/support/docview.wss?uid=swg21298478.
Section 9.20.3 demonstrates some results obtained using this approach.
3.12 Fabric Considerations

3.12.1
Only specify pertinent context properties in context
specifications
The effectiveness of Fabrics runtime caching of metadata is governed by the number of context
properties explicitly listed in a context specification. Thus care should be taken to limit cached
content by using only the context properties that are pertinent to a particular dynamic decision.
For example if a credit score context property is not used in a particular dynamic decision, then
dont list that context property in the associated context specification.
Note that this applies to strict context specifications, which is the preferred mechanism.
3.12.2
Bound the range of values for context keys
The possible values of a context key should be bound to either a finite set, or a minimum and
maximum value. The Fabric runtime caches metadata based on the contexts defined as required
or optional in the context specification. Thus having a context key which can take an unbounded
integer as its value will result in too many potential cache entries, which will make the cache less
efficient. Consider using classes of possible values rather than absolute numbers. For example,
36
for credit scores group the possible values under Poor, Average, Good, and Excellent, rather than
using the actual values. The actual values should then be placed in one of these categories and
the category should be passed as the context instead of the actual values.
Performance Tuning and Configuration
37
4 Performance Tuning and Configuration

4.1 Introduction
In order to optimize performance it is usually necessary to configure the system differently than
the default settings. This chapter lists several areas to consider during system tuning. This
includes tuning the WebSphere BPM products, and also other products in the system (e.g. DB2).
The documentation for each of these products contains a wealth of information regarding
performance, capacity planning and configuration. This documentation would likely offer the
best guidance for performance considerations in a variety of operational environments.
Assuming that all these issues have been addressed from the perspective of the actual product,
additional levels of performance implications are introduced at the interface between these
products and the products covered in this report.
A number of configuration parameters are available to the system administrator. While this
chapter identifies several specific parameters observed to affect performance, it does not address
all available parameters. For a complete list of configuration parameters and possible settings
please see the relevant product documentation.
The next section describes a methodology to use when tuning a deployed system. It is followed
by a basic tuning checklist that enumerates the major components and their associated tuning
concepts. The subsections that follow address tuning in more detail, first describing several
tuning parameters and their suggested setting (where appropriate), and finally providing advanced
tuning guidelines for more detailed guidance for key areas of the system. While there is no
guarantee that following the guidance in this chapter will immediately provide acceptable
performance, it is likely that degraded performance can be expected if these parameters are
incorrectly set.
Finally, the last section of this document contains References to related documentation that may
prove valuable when tuning a particular configuration.
38
4.2 Performance Tuning Methodology

We recommend a system-wide approach to performance tuning of a WebSphere BPM
environment. Please note that the art of system performance tuning, which requires training and
experience, is not going to be exhaustively described here. Rather, we will highlight some key
aspects of tuning that are particularly important.
It is important to note that tuning encompasses every element of the deployment topology:
Physical hardware topology choices
Operating System parameters tuning
WPS, WAS, and ME tuning
The methodology for tuning can be stated very simply as an iterative loop:
Pick a set of reasonable initial parameter settings.
Run the system.
Monitor the system to obtain metrics that indicate whether performance is being limited.
Use monitoring data to guide further tuning changes.
Repeat until done.
We will now examine each in turn:
Pick a set of reasonable initial parameter settings.
Use the tuning checklist in the next section for a systematic way to set parameters.
For specific initial values, consult Appendix A for settings that were used for the
various workloads that were run. These values can be considered for initial values.
Monitor the system. We recommend monitoring the system(s) to determine system

health, as well as to determine the need for further tuning. The following should be
monitored:
For each physical machine in the topology including front end and back-end servers
like web servers, and DB servers:
o
Monitor Core utilization, memory utilization, disk utilization, network

utilization using relevant OS tools like vmstat, iostat, netstat, or equivalent
For each JVM process started on a physical machine, i.e. WPS server, ME server,
etc.
use tools like ps or equivalent to get Core and memory usage per process
collect verbosegc statistics
For each WPS or ME JVM, use TPV (Tivoli Performance Viewer) to monitor the
following:
For each data source, the data connection pool utilization
39
For each thread pool (Web Container, default, work managers), the thread pool
utilization
Use monitoring data to guide further tuning changes

This is a vast topic which requires skill and experience. In general, this phase of tuning
requires the analyst to look at the collected monitoring data, detect performance bottlenecks,
and do further tuning. The key characteristic about this phase of tuning is that it is driven by
the monitoring data collected in the previous phase.
Examples of performance bottlenecks include, but are not limited to:
Excessive utilization of physical resources like processor cores, disk, memory etc. These
can be resolved either by adding more physical resources, or rebalancing the load more
evenly across the available resources.
Excessive utilization of virtual resources. Examples include heap memory, connection

pools, thread pools, etc. For these, tuning parameters should be used to remove the
bottlenecks.
40
4.3 Tuning Checklist

This checklist serves as a guide or to do list when tuning a WebSphere BPM solution. Each of
these topics is covered in more detail in the remainder of this chapter.
Common
Disable Tracing and Monitoring when possible
Move databases from the default Derby to a high performance DBMS such as DB2
If security is required use Application security, not Java2 security.
Use appropriate hardware configuration for performance measurement, e.g.

ThinkPads and desktops are not appropriate for realistic performance evaluations.
If hardware virtualization is used, ensure adequate processor, memory, and I/O

resources are allocated to each virtual machine. Avoid over-committing resources.
Do not run production server in development mode or with development profile
Do not use the Unit Test Environment (UTE) for performance measurement
Tune external service providers and external interfaces to ensure they are not the
system bottleneck.
Configure MDB Activation Specs
Configure for clustering (where applicable)
Configure Thread Pool sizes
Configure Data Sources : Connection Pool size, Prepared Statement Cache size.
Consider using non-XA data sources for CEI data when that data is non-critical.
Business Process Choreographer
Use work-manager based navigation for long running processes

o
If work-manager based navigation is used, also optimize message pool size and
intertransaction cache size
Use Query Tables to optimize query response time
Optimize Business Flow Manager resources: database connection (BPEDB), activation

specification (BPEInternalActivationSpec), and JMS connection (BPECF and BPECFC);
Optimize the database configuration for the Business Process Choreographer database
(BPEDB)
Optimize indexes for SQL statements that result from task and process list queries using
database tools like the DB2 design advisor
Turn off state observers that are not needed , e.g. turn off audit logging
Messaging and Message Bindings
Optimize Activation Specification (JMS, MQJMS, MQ)
Optimize Queue Connection Factory (JMS, MQJMS, MQ)
Configure Connection Pool Size (JMS, MQJMS, MQ)
Configure SIBus Data Buffer Sizes
Database
Place Database Tablespaces and Logs on a Fast Disk Subsystem
Place Logs on Separate Device from Tablespace Containers
Maintain Current Indexes on Tables
Update Database Statistics
Set Log File Sizes Correctly
Optimize Buffer Pool Size (DB2) or Buffer Cache Size (Oracle)
Set the Heap / Nursery Sizes to manage memory efficiently
Choose the appropriate garbage collection policy (generally, -Xgcpolicy:gencon)
Java
Monitor
Configure CEI
Set message consumption batch size
41
42
4.4 Tuning parameters

4.4.1 Tracing and Logging flags
Tracing and logging are often necessary when setting up a system or debugging issues. However,
these capabilities produce performance overhead that is often significant; minimize their use
when evaluating performance or in production environments.
This section lists tracing parameters used in the products covered in this report. Some flags or
checkboxes are common to all or a subset of the products, while others are specific to a particular
product. Unless stated otherwise, all of these parameters can be set via the Admin
Console.Tracing and logging flags
To enable or disable tracing, go to Troubleshooting > Logs and Trace > server name > Change
Log Detail Levels and set both the Configuration and Runtime to *=all=disabled.
To change the PMI level go to Monitoring and Tuning -> Performance Monitoring Infrastructure
-> server name and select none.
In addition, Cross-Component Tracing (XCT) is very useful for problem determination, enabling
correlation of SCA component information with log entries. However, XCT should not be used
in production or while obtaining performance data. There are two levels of XCT settings: enable
or enable with data snapshot. Both incur significant performance overhead. Enable with data
snapshot is particularly costly because of the additional I/O involved in saving snapshots in files.
To enable or disable Cross-Component Trace, go to Troubleshooting > Cross-Component Trace.
Select the XCT setting from three options, disable, enable, or enable with data snapshot, in
Configuration and/or Runtime. Changes to Runtime takes effect immediately while changes to
Configuration require a server restart to take effect.
4.4.2 Java tuning parameters

In this section we list a few frequently used Java Virtual Machine (JVM) tuning parameters. For
a complete list, consult the JVM tuning guide offered by the JVM supplier.
The JVM admin panel can be accessed from Servers > Application Servers > your server name >
Server Infrastructure > Java and Process Management > Process Definition > Additional
Properties > Java Virtual Machine.
4.4.2.1 Java GC policy
The default garbage collection algorithm on platforms with an IBM JVM is a generational
concurrent collector (specified via -Xgcpolicy:gencon under Generic JVM arguments on the Java
Virtual Machine admin panel). Our results show that this garbage collection policy usually
delivers better performance with a tuned nursery size as discussed in the next section.
4.4.2.2 Java Heap sizes
To change the default Java heap sizes, set the Initial Heap Size and Maximum Heap Size
explicitly on the Java Virtual Machine admin panel.
If Generational Concurrent Garbage Collector is used, the Java heap is divided into a new area
(nursery) where new objects are allocated and an old area (tenured space) where longer lived
43
objects reside. The total heap size is the sum of the new area and the tenured space. The new area
size can be set independently from the total heap size. Typically the new area size should be set
between and of the total heap size. The relevant parameters are:
-Xmns<size> : initial new area size
-Xmnx<size> : maximum new area size
-Xmn<size> : fixed new area size
4.4.3 MDB ActivationSpec

There are a few shortcuts to access the MDB ActivationSpec tuning parameters.
Resources > Resource Adapters > J2C activation specifications > ActivationSpec name
Resources > JMS > Activation specifications > ActivationSpec name
Resources > Resource Adapters > Resource adapters > resource adapter name >
Additional proerpties > J2C activation specifications > ActivationSpec name
Two custom properties (shown below) in the MDB ActivationSpec have considerable
performance implications. These are discussed further in Section 4.5.3.2.
maxConcurrency
maxBatchSize
4.4.4 Thread Pool Sizes

WebSphere uses thread pools to manage concurrent tasks. The Maximum Size property of a
thread pool can be set under
Servers > Application servers > server name > Additional Properties > Thread Pools > thread
pool name.
The following thread pools typically need to be tuned:
Default
ORB.thread.pool
WebContainer
In addition, thread pools used by Work Managers are configured separately via:
Resources > Asynchronous beans > Work managers > work manager name > Thread pool
properties
The following Work Managers typically need to be tuned:
DefaultWorkManager
BPENavigationWorkManager
4.4.5 JMS Connection Pool Sizes
44
There a few ways of accessing the JMS connection factories and JMS queue connection factories
from Websphere admin console.
Resources > Resource Adapters > J2C connection factories > factory name
Resources > JMS > Connection factories > factory name
Resources > JMS > Queue connection factories > factory name
Resources > Resource Adapters > Resource adapters > resource adapter name (e.g. SIB
JMS Resource Adapter) > Additional properties > J2C connection factories > factory
name
From the connection factory admin panel, open Additional Properties > Connection pool
properties. Set the Maximum connections property to the max size of the connection pool.
4.4.6 JDBC DataSource Parameters

DataSources can be accessed from either of these paths:
Resources > JDBC > Data sources > datasource name
Resources > JDBC Providers > JDBC provider name > Additional Properties > Data
sources > datasource name
4.4.6.1 Connection Pool Size

The maximum size of the DataSource connection pool is limited by the value of Maximum
connections property, which can be configured from the DataSource panels Additional
Properties -> Connection pool properties.
The following DataSources typically need to be tuned:
BPEDataSource for BPE DB
SCA Application Bus ME DataSource
SCA System Bus ME DataSource
CEI Bus ME DataSource
4.4.6.2 Prepared Statement Cache Size

The DataSource prepared statement cache size can be configured from the DataSources
Additional properties > WebSphere Application Server data source properties.
For WPS, the BPEDB datasource should typically be tuned to a higher value; 300 is suggested as
an initial value.
4.4.7 Messaging Engine Properties

Two message engine custom properties may impact the messaging engine performance:
sib.msgstore.discardableDataBufferSize
o
In memory buffer for best effort nonpersistent messages.
Default is 320K.

o
45
Once full, messages will be discarded to allow newer messages to be written to

the buffer.
sib.msgstore.cachedDataBufferSizeCachedDataBufferSize
o
In memory cache for messages other than best effort nonpersistent
Default is 320K
The properties can be accessed under Service Integration > Buses > bus name > Messaging
Engines > messaging engine name > Additional properties > Custom properties.
Full details of these are given in the Info Center at the following location:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.pmc
.doc/concepts/cjk_learning.html
4.4.8 Run production servers in production

WebSphere application servers can be run in development mode which may reduce startup time
for the server by using JVM settings to disable bytecode verification and reduce JIT compilation
time. This setting should not be used on production servers, however, since it is not designed to
produce optimal runtime performance. Make sure the checkbox Run in development mode on
the Admin Console panel Servers > Application Servers > your server name > Configuration is
unchecked.
Server profiles may also be created with production or development templates. Use production
profile templates for production servers.
4.5 Advanced Tuning

4.5.1 Tracing and Monitoring considerations
The ability to configure tracing and monitoring at different levels for a variety of system
components has proven to be extremely valuable during periods of system analysis or debugging.
The WebSphere BPM product set provides rich monitoring capabilities, both in terms of business
monitoring via the Common Event Interface (CEI) and audit logging, and system performance
monitoring via the Performance Monitoring Infrastructure (PMI) and the Application Response
Measurement (ARM) infrastructure. While these capabilities provide insight into the performance
of the running solution, these features can degrade overall system performance and throughput.
Therefore, it is recommended that tracing and monitoring be used judiciously and when
possible, turned off entirely to ensure optimal performance.
Most tracing and monitoring is controlled via the WAS Admin console. Please validate that the
appropriate level of tracing/monitoring is set for PMI Monitoring, Logging, and Tracing via the
Admin Console.
Further, use the Admin Console to validate that the "Audit logging" and "Common Event
Infrastructure logging" check boxes are disabled in the Business Flow Manager and the Human
Task Manager, unless these capabilities are required for business reasons.
The WebSphere Integration Developer (WID) is also used to control event monitoring. Please
check the Event Monitor tab for your Components and Business Processes to ensure that event
monitoring is applied judiciously.
4.5.2 Tuning for Large Objects
46
4.5.2.1 Heap Limitations: Increase the Java Heap to its maximum

One of the key factors affecting large object processing is the maximum size of the Java heap. In
this section we discuss how to set the heap size as big as possible on two commonly used
platforms. For more comprehensive heap setting techniques, consult Section 4.5.13.
Windows:
Due to address space limitations in the Windows 32 bit operating system, the largest heap that
can be obtained is around 1.4 GB to 1.6 GB for 32-bit JVMs. When using a 64-bit Windows
JVM, however, the heap size is only limited by the available physical memory.
AIX:
Using the normal Java heap settings, the Java5 and Java 6 JVM supports heaps 2 GB to 2.4 GB
on 32-bit systems. Note that since the 4GB address space allowed by the 32-bit system is shared
with other resources, the actual limit of the heap size depends on memory usage by resources
such as thread stacks, JIT compiled code, loaded classes, shared libraries, buffers used by OS
system services, etc. An extremely large heap squeazes address space reserved for other resources
and may cause runtime failures. On 64-bit systems, the available address space is practically
unlimited, so the heap size is usually limited only by available memory.
4.5.2.2 Reduce or eliminate other processing within WPS, WESB and Adapters
while processing a large object.
One way to allow for larger object sizes is to limit the concurrent processing within the JVM.
One should not expect to be able to process a steady stream of the largest objects possible
concurrently with other WPS, WESB, and WebSphere Adapters activities. The operational
assumption that needs to be made when considering large objects is that not all objects will be
large or very large and that large objects will not arrive very often, perhaps once or twice per
day. If more than one very large object is being processed concurrently the likelihood of
failure increases dramatically.
The size and number of the normally arriving smaller objects will affect the amount of Java
heap memory consumption in the system. Generally speaking, the heavier the load on a system
when a large object is being processed the more likely that memory problems will be
encountered.
For adapters, the amount of concurrent processing can be influenced by setting the pollPeriod and
pollQuantity parameters. To allow for larger object sizes, set a relatively high value for
pollPeriod (e.g. 10 seconds) and low value for pollQuantity (e.g. 1) to minimize the amount of
concurrent processing that occurs. Note that these settings are not optimal for peak throughput,
so if a given adapter instance must support both high throughput for smaller objects interspersed
with occasional large objects, then trade-offs must be made.
4.5.3 Tuning for Maximum Concurrency

For most high volume deployments on server-class hardware, there will be many operations
which take place simultaneously. Tuning for maximum concurrency ensures that the server will
accept enough load to saturate its Core(s). One sign of an inadequately tuned configuration is
when additional load does not result in additional Core utilization, while the Cores are not fully
utilized. To optimize these operations for maximum concurrency, the general guideline is to
follow the execution flow and remove bottlenecks one at a time.
47
Note that higher concurrent processing means higher resource requirements (memory and number
of threads) on the server. It needs to be balanced with other tuning objectives, such as the
handling of large objects, handling large numbers of users, and providing good response time.
4.5.3.1 Tune edge components for concurrency
The first step is to ensure that Business Objects are handled concurrently at the edge components
of WPS or WESB. If the input BOs come from an adapter, ensure the adapter is tuned for
concurrent delivery of input messages. See Section 4.5.8 for more details on tuning adapters.
If the input BOs come from the WebServices export binding or direct invocation from a JSP or
Servlet, make sure the WebContainer thread pool is correctly sized. To allow for 100 in-flight
requests handled concurrently, the maximum size of the WebContainer thread pool needs to be
set to 100 or larger.
If the input BOs come from the messaging, the ActivationSpec (MDB bindings) and Listener
ports (MQ or MQJMS bindings) need to be tuned to handle sufficient concurrency.
4.5.3.2 Tune MDB ActivationSpec properties
For each JMS export component, there is an MDB and its corresponding ActivationSpec (JNDI
name: module name/export component name_AS). The default value for maxConcurrency of the
JMS export MDB is 10, meaning up to 10 BOs from the JMS queue can be delivered to the MDB
threads concurrently. Change it to 100 if a concurrency of 100 is desired.
Note that the Tivoli Performance Viewer (TPV) can be used to monitor the maxConcurrency
parameter. For each message being processed by an MDB there will be a message on the queue
marked as being locked inside a transaction (which will be removed once the onMessage
completes), these messages are classed as "unavailable". There is a PMI metric that gives you the
number of unavailable messages on each queue point (resource_name > SIB Service > SIB
Messaging Engines > bus_name > Destinations > Queues), called "UnavailableMessageCount".
If any queue has at least maxConcurrency unavailable messages it would imply that the number
of messages on the queue is currently running higher than the MDB's concurrency maximum. If
this occurs, increase the maxConcurrency setting for that MDB.
The maximum batch size in the activation spec also has an impact on performance. The default
value is 1. The maximum batch size value determines how many messages are taken from the
messaging layer and delivered to the application layer in a single step (note that this does NOT
mean that this work is done within a single transaction, and therefore this setting does not
influence transactional scope). Increase this value, for example to 8, for activation specs
associated with SCA modules and long-running business processes to improve performance and
scalability, especially for large multi-core systems.
4.5.3.3 Configure Thread pool sizes
The sizes of thread pools have a direct impact on a servers ability to run applications
concurrently. For maximum concurrency, the thread pool sizes need to be set to optimal values.
Increasing the maxConcurrency or Maximum sessions parameters only enables the concurrent
delivery of BOs from the JMS or MQ queues. In order for the WPS or WESB server to process
multiple requests concurrently, it is also necessary to increase the corresponding thread pool sizes
to allow higher concurrent execution of these Message Driven Beans (MDB) threads.
MDB work is dispatched to threads allocated from the Default thread pool. Note that all MDBs
in the application server share this thread pool, unless a different thread pool is specified. This
48
means that the Default thread pool size needs to be larger, probably significantly larger, than the
maxConccurency of any individual MDB.
Threads in the Web Container thread pool are used for handling incoming HTTP and Web
Services requests. Again, this thread pool is shared by all applications deployed on the server. As
discussed earlier, it needs to be tuned, likely to a higher value than the default.
ORB thread pool threads are employed for running ORB requests, e.g. remote EJB calls. The
thread pool size needs to be large enough to handle requests coming in through EJB interface,
such as certain human task manager APIs.
4.5.3.4 Configure dedicated thread pools for MDBs
The Default thread pool is shared by many WebSphere Application Server tasks. It is sometimes
desirable to separate the execution of JMS MDBs to a dedicated thread pool. Follow the steps
below to change the thread pool used for JMS MDB threads.
1) Create a new thread pool, say MDBThreadPool, on the server by following
Servers > Server Types > WebSphere application servers > server > Thread pools
and then click on New
2) Open the Service Integration Bus (SIB) JMS Resource Adapter admin panel with
server scope from Resources > Resource Adapters > Resource adapters. If the
adapter is not shown, go to Preferences, and set the Show built-in resources
checkbox.
3) Change Thread pool alias from Default to MDBThreadPool.
4) Repeat the 2 and 3 for SIB JMS Resource Adapters with node and cell scope.
5) Restart the server for the change to be effective.
SCA Module MDBs for asynchronous SCA calls use a separate resource adapter, the Platform
Messaging Component SPI Resource Adapter. Follow the same step as above to change the
thread pool to a different one, if so desired.
Note that even with a dedicated thread pool, all MDBs associated with the resource adapter still
share the same thread pool. However, they do not have to compete with other WebSphere
Application Server tasks that also use the Default thread pool.
4.5.3.5 Tune intermediate components for concurrency
If the input BO is handled by a single thread from end to end, the tuning for the edge components
is normally adequate. In many situations, however, there are multiple thread switches during the
end to end execution path. It is important to tune the system to ensure adequate concurrency for
each asynchronous segment of the execution path.
Asynchronous invocations of an SCA component utilize an MDB to listen for incoming events
that arrive in the associated input queue. Each SCA module defines an MDB and its
corresponding activation spec (JNDI name: sca/module name/ActivationSpec). Note that the SCA
module MDB is shared by all asynchronous SCA components within the module, including SCA
export components. Take this into account when configuring the ActivationSpecs
maxConcurrency propery value. SCA module MDBs use the same Default thread pool as those
for JMS exports.
The asynchrony in a long running business process occurs at transaction boundaries (see Section
3.6 for more details on settings that affect transaction boundaries). BPE defines an internal MDB
49
and its ActivationSpec: BPEInternalActivationSpec. The maxConcurrency parameter needs to be

tuned following the same guideline as for SCA module and JMS export MDBs (described above).
The only catch is there is one BPEInternalActivationSpec in the WPS server.
4.5.3.6 Configure JMS and JMS queue connection factories
Multiple concurrently running threads may bottleneck on resources such as JMS and database
connection pools if such resources are not tuned properly. The Maximum Connections pool size
specifies the maximum number of physical connections that can be created in this pool. These are
the physical connections to the backend resource, for example a DB2 database. Once the
connection pool limit is reached, no new physical connections can be created and the requester
waits until a physical connection that is currently in use is returned to the pool, or a
ConnectionWaitTimeout exception is issued.
For example, if the Maximum Connections value is set to 5, and there are five physical
connections in use, the pool manager waits for the amount of time specified in Connection
Timeout for a physical connection to become free. The threads waiting for connections to
underlying resource are blocked until the connections are freed up and allocated to them by the
pool manager. If no connection is freed in the specified interval, a ConnectionWaitTimeout
exception is issued.
If Maximum Connections is set to 0, the connection pool is allowed to grow infinitely. This also
has the side effect of causing the Connection Timeout value to be ignored.
The general guideline for tuning connection factories is that their max connection pool size needs
to match the number of concurrent threads multiplied by the number of simultaneous connections
per thread.
For each JMS, MQ, or MQJMS Import, there is a Connection Factory created during application
deployment. The maximum connections property of the JMS Connection Factorys connection
pool should be large enough to provide connections for all threads concurrently executing in the
import component. For example, if 100 threads are expected to run in a given module, the
maximum connections property should be set to 100. The default is 10.
From the connection factory admin panel, open Additional Properties > Connection pool
properties. Set Maximum connections property to the max size of the connection pool
4.5.3.7 Configure DataSource options
The maximum connections property of DataSources should be large enough to allow concurrent
access to the databases from all threads. Typically there are a number of DataSources configured
in WPS/WESB servers, e.g. BPEDB datasource, WPSDB datasource, and Message Engine DB
datasources. Set each DataSources maximum connection property to match the maximum
concurrency of other system resources as discussed previously in this chapter.
4.5.3.8 Set DataSource prepared statement cache size
The BPC container uses prepared statements extensively. The statement cache sizes should be
large enough to avoid repeatedly preparing statements for accessing the databases.
The prepared statement cache for the BPEDB datasource should be at least 300.
4.5.4 Messaging Tuning

4.5.4.1 For Message Engines, choose datastore or filestore
50
Message Engine persistence is usually backed by a database. Stating with the 6.2.0 release, a
standalone configuration of WPS or WESB can have the persistence storage of BPE and SCA
buses backed by the file system (filestore). The choice of filestore has to be made at profile
creation time. Use the Profile Management Tool to create a new Standalone enterprise service
bus profile or Standalone process server profile. Choose Profile Creation Options ->
Advanced profile creation -> Database Configuration, select checkbox Use a file store for
Messaging Engine (MEs). When this profile is used, filestores will be used for BPE and SCA
service integration buses.
4.5.4.2 Set Data Buffer Sizes (Discardable or Cached)
The DiscardableDataBufferSize is the size in bytes of the data buffer used when processing best
effort non persistent messages. The purpose of the discardable data buffer is to hold message data
in memory, since this data is never written to the data store for this Quality of Service. Messages
which are too large to fit into this buffer will be discarded.
The CachedDataBufferSize is the size in bytes of the data buffer used when processing all
messages other than best effort non persistent messages. The purpose of the cached data buffer is
to optimize performance by caching in memory data that might otherwise need to be read from
the data store.
The DiscardableDataBufferSize and CachedDataBufferSize can be set under Service IntegrationBuses -> bus name -> Messaging Engines -> messaging engine name -> Additional properties ->
Custom properties.
4.5.4.3 Move Message Engine datastores to a High Performance DBMS
For better performance, the Message Engine datastores should use production quality databases,
such as DB2, rather than the default Derby. The choice can be made at profile creation time
using advanced profile creation option. If the profile has already been created with Derby as
the ME datastore, the following method can be used to change the datastore to an alternative
database.
After the Profile Creation Wizard has finished and Business Process Choreographer is
configured, the system should contain four buses with one message engine each. The example
below shows the Buses in WPS installed on machine box01; the node and cell names are the
default
Bus
Messaging Engine
SCA.SYSTEM.box01Node01Cell.Bus
box01server1.SCA.SYSTEM.box01Node01Cell.Bus
SCA.APPLICATION. box01Node01Cell.Bus
box01-server1.SCA.APPLICATION.
box01Node01Cell.Bus
CommonEventInfrastructure_Bus
box01server1.CommonEventInfrastructure_Bus
BPC.box01Node01Cell.Bus
box01-server1.BPC.box01Node01Cell.Bus
51
Each of these message engines is by default configured to use a datastore in Derby. Each
datastore is located in its own database. For DB2, this is not optimal from an administrative point
of view. There are already many databases in the system and adding four more databases
increases the maintenance and tuning effort substantially. The solution proposed here uses a
single DB2 database for all four datastores. The individual datastores/tables are completely
separate and each message engine acquires an exclusive lock on its set of tables during startup.
Each message engine uses a unique schema name to identify its set of tables.
4.5.4.3.1 Setting up the data stores for the messaging engines

For information on setting up a data store see Configuring a messaging engine to use a data
store in the WAS 7.0 Info Center at the following link:
.nd.multiplatform.doc/tasks/tjm0005_.html
4.5.4.3.2
Create the DB2 database and load the datastore schemas.
Instead of having a DB2 database per messaging engine we put all messaging engines into the
same database using different schemas to separate them.
Schema
Messaging Engine
SCASYS
SCAAPP
box01Node01Cell.Bus
CEIMSG
box01server1.CommonEventInfrastructure_Bus
BPCMSG
Create one schema definition for each message engine with the following command on Windows.
In the example below, <WAS Install> represents the WPS Installation directory, <user>
represents the user name, and <path> represents the fully qualified path to the referenced file.
<WAS Install>\bin\sibDDLGenerator.bat -system db2 -version 8.1 -platform windows statementend ; -schema BPCMSG -user <user> >createSIBSchema_BPCMSG.ddl
Repeat for each schema/messaging engine.
To be able to distribute the database across several disks, edit the created schema definitions and
put each table in a tablespace named after the schema used i.e. SCAAPP becomes
SCANODE_TS, CEIMSG becomes CEIMSG_TS and so on. The schema definition should look
like this after editing:
CREATE SCHEMA CEIMSG;
CREATE TABLE CEIMSG.SIBOWNER (
ME_UUID VARCHAR(16),
52
INC_UUID VARCHAR(16),
VERSION INTEGER,
MIGRATION_VERSION INTEGER
) IN CEIMSG_TB;
CREATE TABLE CEIMSG.SIBCLASSMAP (

CLASSID INTEGER NOT NULL,
URI VARCHAR(2048) NOT NULL,
PRIMARY KEY(CLASSID)
) IN CEIMSG_TB;
.
It is possible to provide separate tablespaces for the various tables here. Optimal distribution
depends on application structure and load characteristics. In this example one tablespace per
datastore was used.
After creating all schema definitions and defined tablespaces for the tables, create a database
named SIB.
Create the tablespaces and distribute the containers across available disks by issuing the
following command for a system managed tablespace:
DB2 CREATE TABLESPACE CEIMSG_TB MANAGED BY SYSTEM USING(
'<path>\CEIMSG_TB' )
Place the database log on a separate disk if possible.
Create the schema of the database by loading the four schema definitions into the database.
Please see Sections 4.5.10 and 4.5.11 for further information on database and DB2-specific
tuning, respectively.
4.5.4.3.3 Create the datasources for the messaging engines

Create a datasource for each message engine and configure each message engine to use the new
datastore using the admin console.
The following table shows the default state:
Messaging Engine
JDBC Provider
Derby JDBC Provider (XA)
box01Node01Cell.Bus
Derby JDBC Provider
box01-
Derby JDBC Provider
53
server1.CommonEventInfrastructure_Bus
Derby JDBC Provider
Create a new JDBC provider DB2 Universal JDBC Driver Provider for the non-XA
datasources first if it is missing. The XA DB2 JDBC Driver Provider should exist if BPC was
configured correctly for DB2.
Create four new JDBC datasources, one for CEI as an XA datasource, the remaining three as
single-phase commit (non-XA) datasources.
The following table provides new names.
Name of datasource
JNDI Name
Type of jdbc provider
CEIMSG_sib
jdbc/sib/CEIMSG
DB2 Universal (XA)
SCAAPP_sib
jdbc/sib/SCAAPPLICATION
DB2 Universal
SCASYSTEM_sib
jdbc/sib/SCASYSTEM
DB2 Universal
BPCMSG_sib
jdbc/sib/BPCMSG
DB2 Universal
When creating a datasource

y
Uncheck the checkbox named Use this Data Source in container managed
persistence (CMP)
Set a Component-managed authentication alias
Set the database name to the name used for the database created earlier for messaging
e.g. SIB
Select a driver type : 2 or 4. Per DB2 recommendations, use the JDBC Universal
Driver Type 2 connectivity to access local databases and Type 4 connectivity to
access remote databases.. Note that a driver of Type 4 requires a hostname and valid
port to be configured for the database.
4.5.4.3.4 Change the datastores of the messaging engines

Use the Admin Console to change the datastores of the messaging engines.
In the Navigation Panel go to Service Integration -> Buses and change the datastores for
each Bus/Messaging Engine displayed.
Put in the new JNDI and schema name for each datastore. Uncheck the checkbox Create
Tables since the tables have been created already.
The server immediately restarts the message engine; the SystemOut.log shows the results
of the change and also shows if the message engine starts successfully.
Restart the server and validate that all systems come up using the updated configuration.
54
The last remaining task is tuning the database; please see Sections 4.5.10 and 4.5.11 for further
information on database and DB2-specific tuning, respectively.
4.5.5 Web Services Tuning

If the target of the Web Services import binding is hosted locally in the same application server,
the performance can be further improved by exploiting the optimized communication path
provided by the Web container. Normally requests from the Web Services clients are sent
through the network connection between the client and the service provider. For local Web
Services calls, however, WAS offers a direct communication channel bypassing the network layer
completely. Follow the steps below to enable this optimization. Use the WAS Admin Console to
make these changes.
Set Web container custom property enableInProcessConnections to true at Application

servers > server name > Container Settings > Web Container Settings > Web container >
Additional Properties > Custom Properties
Do not use wildcard (*) for the host name of the Web Container port. Replace it with
the hostname or IP address. The property can be accessed from Application servers >
server name > Container Settings > Web Container Settings > Web container >
Additional Properties > Web container transport chains > WCInboundDefault > TCP
inbound channel (TCP_2) > Related Items > Ports > WC_defaulthost > Host
Use localhost instead of host name in the Web Services client binding. If the actual
hostname is used and even if it is aliased to localhost, this optimization will be disabled.
The property can be accessed from Enterprise Applications > application name >
Manage Modules > application EJB jar > Web services client bindings > Preferred port
mappings > binding name. Use localhost (e.g. localhost:9080) in the URL.
Make sure there is not an entry for your server hostame and IP address in your servers
hosts file for name resolution. An entry in the hosts file inhibits this optimization by
adding name resolution overhead.
4.5.6 Business Process Choreographer Tuning

4.5.6.1 Tuning Work-Manager-based navigation for business processes
Starting with WPS 7.0, work-manager-based navigation is the default navigation mode for WPS
(versus JMS-based navigation).
Work-Manager-based navigation provides two performance optimizations, keeping the quality of
service of process navigation with persistent messaging (JMS-based navigation):
Work-Manager-based navigation. A WorkManager is a thread pool for J2EE threads.

WorkManager process navigation exploits an underlying capability of WAS to start the
processing of ready-to-navigate business flow activities without using messaging as
provided by JMS providers.
The InterTransactionCache, a part of the Work-Manager-based navigation mode which

holds process instance state information in memory, reducing the need to retrieve
information from the BPE database.
There are several parameters that control usage of these two optimizations. The first set of these
parameters are found by going to
55
Application Servers > server name > Business Integration > Business Process Choreographer
> Business Flow Manager > Business Process Navigation Performance
The key parameters are:
Check Enable advanced performance optimization to enable both the Work-Managerbased navigation and InterTransactionCache optimizations.
Work-Manager-Based Navigation Message Pool Size: this property specifies the size of
the cache used for navigation messages that cannot be processed immediately, provided
Work-Manager-based navigation has been enabled. The cache defaults to a size of (10 *
thread pool size of the BPENavigationWorkManager) messages. Note that if this cache
reaches its limit, WPS uses JMS-based navigation for new messages,, so for optimal
performance ensure this Message Pools size is set to a sufficiently high value.
InterTransaction Cache Size: this property specifies the size of the cache used to store
process state information that has also been written to the BPE database. It should be set
to twice the number of parallel running process instances. The default value for this
property is the thread pool size of the BPENavigationWorkManager.
In addition, customize the number of threads for the work manager using:
Resources -> Asynchronous Beans -> Work Managers -> BPENavigationWorkManager
The minimum and maximum number of threads should be increased from their default values of
5 and 12, respectively, using the methodology outlined below in the section titled Tuning for
Maximum Concurrency. If the thread pool size is modified, then the work request queue size
should also be modified and set to be twice the maximum number of threads.
4.5.6.2 Tuning the business process container for JMS navigation
If JMS-based navigation is configured, the following resources need to be optimized for efficient
navigation of business processes:
Activation specification BPEInternalActivationSpec: The maximum concurrent endpoints

parameter specifies the parallelism that is used for process navigation across all process
instances. Increase the value of this parameter to increase the number of business
processes executed concurrently. This resource can be found at:
Resources > Activation Specifications > BPEInternalActivationSpec.
JMS connection factory BPECFC: set the connection pool size to the number of threads
in the BPEInternalActivationSpec + 10%. This resource can be found at:
Resources > JMS > Connection factories > BPECFC > Connection pool properties.
Note that this connection factory is also used when work-manager based navigation is in
use, but only for error situations or if the server is highly overloaded.
4.5.6.3 Tuning task list and process list queries

Task list and process list queries in Business Process Choreographer applications are made using
the standard query API (query() and queryAll() APIs, and related REST and Web services
interfaces), and the query table API (queryEntities() and queryRows() APIs). All task list and
process list queries result in SQL queries against the Business Process Choreographer database.
These SQL queries might need special tuning in order to provide optimal response times:
56
Up-to-date database statistics are key for good SQL query response times.
Databases offer tools to tune SQL queries. In most cases, additional indexes improve
query performance with potentially some impact on process navigation performance. For
DB2, the DB2 design advisor can be used to guide in choosing indexes.
4.5.6.4 Tuning Business Process Choreographer API calls

Business Process Choreographer API calls are triggered by requests external to the WPS runtime.
Examples are remote EJB requests, Web service requests, Web requests over HTTP, requests that
come through the SCA layer, or JMS requests. The connection pools associated with each of
these communication mechanisms may need to be tuned. Consider the following hints when
tuning the connection pools:
API calls for task list and process list queries may take more time to respond, depending
on the tuning of the database and the amount of data in the database.
Ensure that concurrency (parallelism) is sufficiently high to handle the load and to utilize
the CPU. However, increasing the parallelism of API call execution beyond what is
necessary can negatively influence response times. Also, increased parallelism can put
excessive load on the BPC database. When tuning the parallelism of API calls, measure
response times before and after tuning, and adjust the parallelism if necessary.
4.5.7 WESB Tuning

Following are additional configuration options that are relevant to tuning WESB. Please see
Appendix A - WESB settings for a list of the values used to obtain the WESB measurements
shown in this document.
4.5.7.1
Tune the database, if using persistent messaging
If you are using persistent messaging the configuration of your database becomes important. Use
a remote DB2 instance with a fast disk array as the DB server. You may also find benefit in
tuning the connection pooling and statement cache of the DataSource. Please see sections 4.5.10
and 4.5.11 for further information on tuning DB2, and also note the relevant References at the
end of this document.
4.5.7.2
Disable event distribution for CEI
The Event Server which manages events can be configured to distribute events and/or log them to
the event database. Some mediations only require events to be logged to a database; for these
cases, performance is improved by disabling event distribution. Since the event server may be
used by other applications it is important to check that none of them use event monitoring which
requires event distribution before disabling this.
Event distribution can be disabled from Service integration > Common Event Infrastructure >
Event service > Event services > Default Common Event Infrastructure event server-> uncheck
Enable event distribution.
4.5.7.3 Configure WSRR Cache Timeout
WebSphere Service Registry and Repository (WSRR) is used by WESB for endpoint lookup.
When accessing the WSRR (e.g. using the endpoint lookup mediation primitive), results from the
registry are cached in WESB. The lifetime of the cached entries can be configured via Service
Integration->WSRR Definitions-><your WSRR definition name>->Timeout of Cache
57
Validate that the timeout is sufficiently large a value, the default timeout is 300 seconds, which is
reasonable from a performance perspective. Too low a value will result in frequent lookups to the
WSRR which can be expensive (especially if retrieving a list of results), and will also include the
associated network latency if the registry is located on a remote machine.
4.5.8 Clustered Topology Tuning

One reason for deploying a clustered topology is to be able to add more resources to system
components that are bottlenecked due to increasing load. Ideally, it should be possible to scale up
a topology arbitrarily to match the required load. The WPS Network Deployment (ND)
infrastructure provides this capability. However, effective scaling still requires standard
performance monitoring and bottleneck analysis techniques to be used.
Here are some considerations, and tuning guidelines, when expanding or tuning a clustered
topology. In the discussion below, we assume additional cluster members also imply additional
server hardware.
If deploying more than one cluster member (JVM) on a single physical system, it is
important to monitor not just the resource utilization (Core, disk, network, etc) of the
system as a whole, but also the utilization by each cluster member. This allows the
detection of a system bottleneck due to a particular cluster member.
If all members of a cluster are bottlenecked, scaling can be achieved by adding one or
more members to the cluster, backed by appropriate physical hardware.
If a singleton server or cluster member is the bottleneck, there are some additional
considerations:
A messaging engine in a cluster with One of N policy (to preserve event ordering)
may become the bottleneck. Scaling options include:
o
Hosting the active cluster member on a more powerful hardware server, or

removing extraneous load from the existing server
If the Message Engine (ME) cluster is servicing multiple busses, and

messaging traffic is spread across these busses, consider employing a
separate ME cluster per bus.
If a particular bus is a bottleneck, consider whether destinations on that bus

can tolerate out of order events, in which case the cluster policy can be
changed to allow workload balancing with partitioned destinations.
Partitioning a bus also has considerations for balancing work across the ME
cluster members. For further information, please see the following:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.
ibm.websphere.pmc.nd.doc/concepts/cjt0014_.html
A database (DB) server may become the bottleneck. Approaches to consider are:
o
If the DB server is hosting multiple DBs that are active (for example, the
BPEDB and the MEDB), consider hosting each DB on a separate server.
If a single DB is driving load, consider a more powerful DB server.
Beyond the above, database partitioning and clustering capabilities can be

exploited.
58
4.5.9 WebSphere Business Monitor Tuning

4.5.9.1
Configure Java heap sizes
The default maximum heap size in most implementations of Java is too small for many of the
servers in this configuration. The Monitor Launchpad installs Monitor and its prerequisite servers
with larger heap sizes, but you might check that these sizes are appropriate for your hardware and
workload. We use a maximum heap size of 1536M for our performance measurements.
4.5.9.2 Configure CEI
By default, when an event arrives at CEI, it is delivered to the registered consumer (in this case a
particular Monitor Model) and also into an additional, default queue. Performance is improved
by avoiding this double-store, which can be done using the WAS Admin Console by removing
the All Events event group found via:
Service Integration -> Common Event Infrastructure -> Event Service -> Event Services ->
Default Common Event Infrastructure event server -> Event Groups
Beyond its persistent delivery of events to registered consumers, CEI offers the ability to
explicitly store events in a database. This has significant performance overhead and should be
avoided if this additional functionality is not needed. The CEI Data Store is also configured in
the WAS Admin Console:
Service Integration -> Common Event Infrastructure -> Event Service -> Event Services ->
Default Common Event Infrastructure event server: deselect Enable Data Store
4.5.9.3 Configure Message Consumption Batch Size
Consuming events in large batches is much more efficient than one at a time. Up to some limit,
the larger the batch size, the higher event processing throughput will be. But there is a trade-off:
Consuming events, processing them, and persisting them to the Monitor database is done as a
transaction. So while a larger batch size yields better throughput, it will cost more if you have to
roll back. If you experience frequent rollbacks, consider reducing the batch size. This can be
done in the WAS Admin Console in Server Scope:
Applications -> Monitor Models -> <version> -> Runtime Configuration -> Tuning -> Message
Consumption Batch size: <default 100>
4.5.9.4 Enable KPI Caching
The cost of calculating aggregate KPI values increases as completed process instances
accumulate in the database. A KPI Cache is available to reduce the overhead of these
calculations, at the cost of some staleness in the results. The refresh interval is configurable via
the WAS Admin Console:
Applications -> Monitor Models -> <version> -> Runtime Configuration -> KPI -> KPI Cache
Refresh Interval
A value of zero (the default) disables the cache.
4.5.10
59
Database: General Tuning
4.5.10.1 Provide Adequate Statistics For Optimization

Databases often have a wide variety of available choices when determining the best approach to
accessing data. Statistics, which describe the shape of the data, are used to guide the selection
of a low-cost data access strategy. Statistics are maintained on tables and indexes. Examples of
statistics include the number of rows in a table and the number of distinct values in a certain
column.
Gathering statistics can be expensive, but fortunately for many workloads a set of representative
statistics will allow good performance over a large span of time. It may be necessary to refresh
statistics periodically if the data population shifts dramatically.
4.5.10.2 Place Database Log files on a Fast Disk Subsystem
Databases are designed for high availability, transactional processing and recoverability. Since
for performance reasons changes to table data may not be written immediately to disk, these
changes are made recoverable by writing to the database log. Updates are made to database log
files when log buffer fills, at transaction commit time, and for some implementations after a
maximum interval of time. As a result, database log files may be heavily utilized. More
importantly, log writes hold commit operations pending, meaning that the application is
synchronously waiting for the write to complete. Therefore write access performance to the
database log files is critical to overall system performance. We recommend that database log
files be placed on a fast disk subsystem with write back cache.
4.5.10.3 Place Logs on Separate Device from Tablespace Containers
A basic strategy for all database storage configurations is to place the database logs on dedicated
physical disks, ideally on a dedicated disk adapter. This reduces disk access contention between
I/O to the tablespace containers and I/O to the database logs and preserves the mostly sequential
access pattern of the log stream. Such separation also improves recoverability when log archival
is employed.
4.5.10.4 Provide Sufficient Physical Memory
Accessing data in memory is of course much faster than reading it from disk. With 64-bit
hardware being readily available and memory prices continuing to fall, for many performance
critical workloads it makes sense to provision enough memory to avoid most disk reads in steady
state.
Great care should be taken to avoid virtual memory paging in the database machine. The
database manages its memory with the assumption that it is never paged, and does not cooperate
well should the operating system decide to swap some of its pages to disk.
4.5.10.5 Avoid Double Buffering
Since the database attempts to keep frequently accessed data in memory, in most cases there is no
benefit to using file system caching. In fact, performance typically improves by using direct I/O,
when files read by the database bypass the file system cache and only one copy of the data is held
in memory. This allows more memory to be given to the database and avoids overheads in the
file system as it manages its cache.
60
A further advantage can be gained on some operating systems such as AIX by using concurrent
I/O. This bypasses per-file locking, shifting responsibility for concurrency control to the database
and in some cases allowing more useful work to be offered to the adapter or the device.
An important exception to this guideline occurs for large objects (LOB, BLOB, CLOB, etc.)
which are not buffered by the database itself. In this case it can be advantageous to arrange for
file system caching, preferably only for files which back large objects.
4.5.10.6 Refine Table Indexes as Required
WebSphere BPM products typically provide a reasonable set of indexes for the database tables
they use. In general, creating indexes involves a tradeoff between the cost of queries and the cost
of statements which insert, update, or delete data. For query intensive workloads, it makes sense
to provide a rich variety of indexes as required to allow rapid access to data. For update intensive
workloads, it is often helpful to minimize the number of indexes defined, as each row
modification may require changes to multiple indexes. Note that indexes are kept current even
when they are infrequently used.
Index design therefore involves compromises. The default set of indexes may not be optimal for
the database traffic generated by a BPM product in a specific situation. If database CPU or disk
utilization is high or there are concerns with database response time, it may be helpful to consider
changes to indexes.
As described below, DB2 and Oracle databases provide assistance in this area by analyzing
indexes in the context of a given workload. Recommendations are given to add, modify, or
remove indexes. One caveat is that if the workload does not capture all relevant database activity
then a necessary index might appear unused, leading to a recommendation that it be dropped. If
the index is not present, future database activity could suffer as a result.
4.5.11
Database: DB2 Specific Tuning
Providing a comprehensive DB2 tuning guide is beyond the scope of this report. However, there
are a few general rules of thumb that can assist in improving the performance of DB2
environments. In the sections below, we discuss these rules, and provide pointers to more
detailed information. The complete set of current DB2 manuals (including database tuning
guidelines) can be found by using the DB2 Information Center:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp.
Another excellent reference is Best practices for DB2 for Linux, UNIX, and Windows which is
available here:
http://www.ibm.com/developerworks/data/bestpractices/.
4.5.11.1 Update Database Statistics
DB2 provides an Automatic Table Maintenance feature, which runs the RUNSTATS command in
the background as required to ensure that the correct statistics are collected and maintained. This
is controlled by the database configuration parameter auto_runstats, and is enabled by default for
databases created by DB2 V9.1 and beyond. See also the Configure Automatic Maintenance...
wizard at the database level in the DB2 Control Center.
One approach to manually updating statistics on all tables in the database is use the REORGCHK
command. Dynamic SQL, such as that produced by JDBC, will immediately take the new
61
statistics into account. Static SQL, like that in stored procedures, must be explicitly rebound in
the context of the new statistics. Here is an example which performs these steps to gather basic
statistics on database DBNAME:
db2 connect to DBNAME
db2 reorgchk update statistics on table all
db2 connect reset
db2rbind DBNAME all
The REORGCHK and rebind (db2rbnd) should be executed when the system is relatively idle so
that a stable sample may be acquired and to avoid possible deadlocks in the catalog tables.
It is generally better to gather additional statistics, so also consider the following command for
every table requiring attention:
runstats on table <schema>.<table> with distribution and detailed indexes
4.5.11.2 Set Buffer Pool Sizes Correctly
A buffer pool is an area of memory into which database pages are read, modified, and held during
processing. Buffer pools improve database performance. If a needed page of data is already in
the buffer pool, that page is accessed faster than if the page had to be read directly from disk. As
a result, the size of the DB2 buffer pools is critical to performance.
The amount of memory used by a buffer pool depends upon two factors: the size of buffer pool
pages and the number of pages allocated. Buffer pool page size is fixed at creation time and may
be set to 4, 8, 16 or 32 KB. The most commonly used buffer pool is IBMDEFAULTBP which
has a 4 KB page size.
Note that all buffer pools reside in database global memory, allocated on the database machine.
The buffer pools must coexist with other data structures and applications, all without exhausting
available memory. In general, having larger buffer pools will improve performance up to a point
by reducing I/O activity. Beyond that point, allocating additional memory no longer improves
performance.
DB2 V9.1 and beyond provide self tuning memory management, which includes managing buffer
pool sizes. This is controlled globally by the self_tuning_mem database level parameter, which is
ON by default. Individual buffer pools can be enabled for self tuning using SIZE AUTOMATIC
at CREATE or ALTER time.
To choose appropriate buffer pool size settings manually, monitor database container I/O activity,
by using system tools or by using DB2 buffer pool snapshots. Be careful to avoid configuring
large buffer pool size settings which lead to paging activity on the system.
4.5.11.3 Maintain Proper Table Indexing
The DB2 Design Advisor, available from the Control Center, provides recommendations for
schema changes, including changes to indexes. It can be launched from the menu presented when
right-clicking on a database in the left column.
4.5.11.4 Size Log Files Appropriately
When using circular logging, it is important that the available log space permits dirty pages in the
bufferpool to be cleaned at a reasonably low rate. Changes to the database are immediately
written to the log, but a well tuned database will coalesce multiple changes to a page before
62
eventually writing that modified page back to disk. Naturally, changes recorded only in the log
cannot be overwritten by circular logging. DB2 detects this condition and forces the immediate
cleaning of dirty pages required to allow switching to a new log file. While this mechanism
protects the changes recorded in the log, all application logging must be suspended until the
necessary pages are cleaned.
DB2 works to avoid pauses when switching log files by proactively triggering page cleaning
under control of the database level softmax parameter. The default value of 100 for softmax
begins background cleaning activities when the gap between the current head of the log and the
oldest log entry recording a change to a dirty page exceeds 100% of one log file in size. In
extreme cases this asynchronous page cleaning cannot keep up with log activity, leading to log
switch pauses which degrade performance.
Increasing the available log space gives asynchronous page cleaning more time to write dirty
bufferpool pages and avoid log switch pauses. A longer interval between cleanings allows
multiple changes to be coalesced on a page before it is written, which reduces the required write
throughput by making page cleaning more efficient.
Available logspace is governed by the product of log file size and the number primary log files,
which are configured at the database level. logfilsiz is the number of 4K pages in each log file.
logprimary controls the number of primary log files. The Control Center also provides a
Configure Database Logging... wizard.
As a starting point, try using 10 primary log files which are large enough that they do not wrap
for at least a minute in normal operation.
Increasing the primary log file size does have implications for database recovery. Assuming a
constant value for softmax, larger log files mean that recovery may take more time. The softmax
parameter can be lowered to counter this, but keep in mind that more aggressive page cleaning
may also be less efficient. Increasing softmax gives additional opportunities for write coalescing
at the cost of longer recovery time.
The default value softmax is 100, meaning that the database manager will attempt to clean pages
such that a single log file needs to be processed during recovery. For best performance, we
recommend increasing this to 300, meaning that 3 log files may need processing during recovery:
db2 update db config for yourDatabaseName using softmax 300
4.5.11.5 Use SMS for Tablespaces Containing Large Objects
When creating REGULAR or LARGE tablespaces in DB2 V9.5 (and above) which contain
performance critical LOB data, we recommend specifying MANAGED BY SYSTEM to gain the
advantages of cached LOB handling in SMS.
Among WebSphere BPM products, this consideration applies to:
-- WPS: the Process Choreagrapher database, sometimes called BPEDB.
-- WPS and WESB: databases backing service integration bus message engine data stores.
For background, see the section Avoid Double Buffering <KGK -- need a link here/> above. A
detailed explanation follows.
DB2 tablespaces can be configured with NO FILE SYSTEM CACHING, which in many cases
improves system performance.
63
If a tablespaces is MANAGED BY SYSTEM, then it uses System Managed Storage (SMS)

which provides desirable special case handling for LOB data with regard to caching. Even if NO
FILE SYSTEM CACHING is in effect (by default or as specified) access to LOB data still uses
the file system cache.
If a tablespace is MANAGED BY DATABASE, then it uses Database Managed Storage (DMS)
which does not differentiate between LOB and non-LOB data with regard to caching. In
particular, NO FILE SYSTEM CACHING means that LOB access will be directly to disk for
both reads and writes. Unconditionally reading LOBs from disk can cause high disk utilization
and poor database performance.
Since V9.1, DB2 has by default created databases which use automatic storage (AUTOMATIC
STORAGE YES), meaning that the database manages disk space allocations itself from one or
more pools of available file system space called storage paths. If automatic storage is enabled,
CREATE TABLESPACE will use it by default (MANAGED BY AUTOMATIC STORAGE).
For non-temporary tablespaces, REGULAR and LARGE, automatic storage is implemented using
DMS on files.
Before DB2 V9.5 the default caching strategy for tablespaces was FILE SYSTEM CACHING. In
V9.5, this was changed to NO FILE SYSTEM CACHING for platforms where direct I/O or
concurrent I/O is available. Taking defaults on V9.5 we now have a database with
AUTOMATIC STORAGE YES, and a tablespace which is MANAGED BY AUTOMATIC
STORAGE and in many cases NO FILE SYSTEM CACHING. Such a tablespace, which is
implemented using DMS, will not cache LOBs in the buffer pool or the file system.
4.5.11.6 Ensure that sufficient locking resources are available
Locks are allocated from a common pool controlled by the database level parameter locklist,
which is the number of 4K pages set aside for this use. A second database level parameter,
maxlocks, bounds the percentage of the lock pool held by a single application. When an
application attempts to allocate a lock which exceeds the fraction allowed by maxlocks, or when
the free lock pool is exhausted, DB2 performs lock escalation to replenish the supply of available
locks. Lock escalation involves replacing many row locks with a single table-level lock.
While lock escalation addresses the immediate problem of lock pool overuse or starvation, it can
lead to database deadlocks, and so should not occur frequently during normal operation. In some
cases, application behavior can be altered to reduce pressure on the lock pool by breaking up
large transactions which lock many rows into smaller transactions. It is usually simpler to try
tuning the database first.
Beginning with Version 9, DB2 adjusts the locklist and maxlocks parameters automatically by
default. To manually tune these, first observe whether lock escalations are occurring either by
examining db2diag.log or by using the system monitor to gather snapshots at the database level.
If the initial symptom is database deadlocks, also consider whether these are initiated by lock
escalations.
Check the Lock escalations count in the output from:
db2 get snapshot for database yourDatabaseName
Current values for locklist and maxlocks can be obtained by examining the output from:
db2 get db config for yourDatabaseName
These values can be altered, for example to 100 and 20, like this:
64
db2 update db config for yourDatabaseName using locklist 100 maxlocks 20

When increasing the locklist size, consider the impacts of the additional memory allocation
required. Often the locklist is relatively small compared with memory dedicated to buffer pools,
but the total memory required must not lead to virtual memory paging.
When increasing the maxlocks fraction, consider whether a larger value will allow a few
applications to drain the free lock pool, leading to a new cause of escalations as other applications
needing relatively few locks encounter a depleted free lock pool. Often it is better to start by
increasing locklist size alone.
4.5.11.7 Bound the size of the Catalog Cache for Clustered Applications
The Catalog Cache is used to avoid repeating expensive activities, notably preparing execution
plans for dynamic SQL. Thus it is important that the cache be sized appropriately.
By default, several 4 KB pages of memory are allocated for each possible application as defined
by the MAXAPPLS database parameter. The multiplier is 4 for DB2 9, and 5 for DB2 9.5 and
beyond. MAXAPPLS is AUTOMATIC by default, and its value is adjusted to roughly match the
peak number of applications connected at runtime.
When running clustered applications, such as those deployed in the Process Choreographer in
WPS, we have observed a value of more than 1000 for MAXAPPLS, meaning that at least 4000
pages would be allocated for the catalog cache given default tuning. For the same workload, 500
pages were sufficient:
db2 update db config for yourDatabaseName using catalogcache_sz 500
The default behavior assumes heterogeneous use of database connections. A clustered
application will typically have more homogeneous use across connections, allowing a smaller
package cache to be effective. Bounding the package cache size frees up memory for other more
valuable uses.
To manually tune the CATALOGCACHE_SZ database parameter, see the recommendations
documented here:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.udb.admin.doc/doc/r0000
338.htm.
4.5.11.8 (Before DB2 V9.5) Size the Database Heap Appropriately
DB2 Version 9.5 and beyond provide AUTOMATIC tuning of the database heap by default. We
recommend using this when available.
To manually tune the DBHEAP database parameter, see the recommendations documented here:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.udb.admin.doc/doc/r0000
276.htm.
4.5.11.9 (Before DB2 V9.7) Size the Log Buffer Appropriately
Before DB2 Version 9.7 the default LOGBUFSZ is only 8 pages. We recommend setting this to
256, which is the default in Version 9.7:
db2 update db config for yourDatabaseName using logbufsz 256
4.5.11.10
(DB2 V9.7 and beyond) Consider disabling Current Commit
65
DB2 Version 9.7 supports new query semantics which always return the committed value of the
data at the time the query is submitted. This support is ON by default for newly created
databases. We found that performance improved in some cases when we disabled the new
behavior, reverting to the original DB2 query semantics:
db2 update db config for yourDatabaseName using cur_commit disabled
4.5.11.11
Recommendations for WPS
The following link discusses "Specifying initial DB2 database settings" with examples of creating
SMS tablespaces for the BPEDB. It also contains useful links for "Planning the BPEDB
database" and "Fine-tuning the Business Process Choreographer database"
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.web
sphere.bpc.doc/doc/bpc/t5tuneint_spec_init_db_settings.html
This link discusses "Creating a DB2 for Linux, UNIX, and Windows database for Business
Process Choreographer" and gives details on BPEDB database creation, including pointers to
useful creation scripts for a production environment.
e.bpc.doc/doc/bpc/t2codbdb.html
For our SOABench2008 OutSourced Mode workload, we achieved better throughput by dropping
several indexes from the ACTIVITY_INSTANCE_B_T table, as recommended by the Design
Advisor. This is a concrete example of how proper indexing is workload dependant. These same
indexes may be important for many other Process Choreographer workloads.
4.5.12
Database: Oracle Specific Tuning
As with DB2, providing a comprehensive Oracle database tuning guide is beyond the scope of
this report. However, there are a few general rules of thumb that can assist in improving the
performance of Oracle environments. In the sections below, we discuss these rules, and provide
pointers to more detailed information. In addition, the following references are useful:
Oracle Database 11g Release 1 documentation (includes a Performance Tuning Guide):
http://www.oracle.com/pls/db111/homepage
A white paper discussing Oracle on AIX:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100883
4.5.12.1 Update Database Statistics
Oracle provides an automatic statistics gathering facility, which is enabled by default.
One approach to manually updating statistics on all tables in a schema is to use the dbms_stats
utility:
execute dbms_stats.gather_schema_stats( ownname
=> your_schema_name', -
options
=> 'GATHER AUTO', -

estimate_percent
=> DBMS_STATS.AUTO_SAMPLE_SIZE, -
cascade
=> TRUE, -
method_opt
=> 'FOR ALL COLUMNS SIZE AUTO', -
degree
=> 15);
66
4.5.12.2 Set Buffer Cache Sizes Correctly

Oracle provides automatic memory management for buffer caches. For additional discussion on
configuring automatic memory management and for guidance on manually setting buffer cache
sizes, please see the following references.
For Oracle 10g R2:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/memory.htm#i29118
For Oracle 11g R1:
http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/memory.htm#i29118
4.5.12.3 Maintain Proper Table Indexing
The SQL Access Advisor, available from the Enterprise Manager, provides recommendations for
schema changes, including changes to indexes. It can be found starting at the database home
page, then following the Advisor Central link in the Related Links section at the bottom of the
page.
4.5.12.4 Size Log Files Appropriately
Unlike DB2, Oracle performs an expensive checkpoint operation when switching logs. The
checkpoint involves writing all dirty pages in the buffer cache to disk. Therefore, it is important
to make the log files large enough that switching occurs infrequently. Applications which
generate a high volume of log traffic need larger log files to achieve this goal.
4.5.12.5 Recommendations for WPS
The following link discusses "Specifying initial Oracle database settings:
sphere.bpc.doc/doc/bpc/t5tuneint_spec_init_db_oracle.html
This link discusses "Creating an Oracle database for Business Process Choreographer" and gives
details on BPEDB database creation, including pointers to useful creation scripts for a production
environment.
sphere.bpc.doc/doc/bpc/t2codbdb.html
The default Oracle policy for large objects (LOB) is to store the data within the row, when the
size of the object does not exceed a threshold. In some cases, workloads have LOBs which
regularly exceed this threshold. By default, such LOB accesses bypass the buffer cache, meaning
that LOB reads are exposed to disk I/O latencies when using the preferred direct or concurrent
path to storage. We achieved better performance for the SOABench2008 OutSourced Mode
67
workload after adjusting our schema such that the SERVICE_CONTEXT column of the
PROCESS_CONTEXT_T table was CACHED, e.g.:
alter table process_context_t modify service context cache
4.5.13
Advanced Java Heap Tuning
Because the WebSphere BPM product set is written in Java, the performance of the Java Virtual
Machine (JVM) has a significant impact on the performance delivered by these products. JVMs
externalize multiple tuning parameters that may be used to improve both authoring and runtime
performance. The most important of these are related to garbage collection and setting the Java
heap size. This section will deal with these topics in detail.
Note that the products covered in this report utilize IBM JVMs on most platforms (AIX, Linux,
Windows, etc.), and the HotSpot JVMs on selected other systems, such as Solaris and HP/UX.
Vendor specific JVM implementation details and settings will be discussed as appropriate. Also
note that all BPM v7 products in this document use Java 6. It has characteristics similar to Java 5
used in the BPM v6.1 and v6.2.0 products, but much different from Java 1.4.2 used by V6.0.2.x
and earlier releases. For brevity, only Java 6 tuning is discussed here.
Following is a link to the IBM Java 6 Diagnostics Guide:
http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp
The guide referenced above discusses many more tuning parameters than those discussed in this
report, but most are for specific situations and are not of general use. For a more detailed
description of IBM Java 6 garbage collection algorithms, please see Section Memory
Management in the chapter titled Understanding the IBM SDK for Java.
Sun HotSpot JVM references follow:
The following URL provides a useful summary of HotSpot JVM options for Solaris:
http://java.sun.com/docs/hotspot/VMOptions.html
The following URL provides a useful FAQ about the Solaris HotSpot JVM:
http://java.sun.com/docs/hotspot/PerformanceFAQ.html#20
For more performance tuning information of Suns HotSpot JVM, follow the URL below.
http://java.sun.com/docs/performance/
4.5.13.1 Monitoring Garbage Collection

In order to set the heap correctly, you must first determine how the heap is being used. This is
done by collecting a verbosegc trace. A verbosegc trace prints garbage collection actions and
statistics to stderr in IBM JVMs and stdout in Sun HotSpot JVMs. The verbosegc trace is
activated by using the Java run-time option -verbose:gc. Output from verbosegc is different for
the HotSpot and IBM JVMs, as shown by the following examples:
Example IBM JVM verbosegc trace output
<af type="tenured" id="12" timestamp="Fri Jan 18 15:46:15 2008" intervalms="86.539">
<minimum requested_bytes="3498704" />
68
<time exclusiveaccessms="0.103" />

<tenured freebytes="80200400" totalbytes="268435456" percent="29" >
<soa freebytes="76787560" totalbytes="255013888" percent="30" />
<loa freebytes="3412840" totalbytes="13421568" percent="25" />
</tenured>
<gc type="global" id="12" totalid="12" intervalms="87.124">
<refs_cleared soft="2" threshold="32" weak="0" phantom="0" />
<finalization objectsqueued="0" />
<timesms mark="242.029" sweep="14.348" compact="0.000" total="256.598" />
</tenured>
</gc>
</tenured>
<time totalms="263.195" />
</af>
Example Solaris HotSpot JVM verbosgc trace output (young and old)
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K <- live data (776768K), 1.8479984 secs]
Sun HotSpot JVM verbosegc output can be more detailed by setting additional options: XX:+PrintGCDetails -XX:+PrintGCTimeStamps.
It is tedious to parse the verbosegc output using a text editor. There are very good visualization
tools on the Web that can be used for more effective Java heap analysis. The IBM Pattern
Modeling and Analysis Tool (PMAT) for Java Garbage Collector is one such tool. It is available
for free download at IBM alphaWorks through this URL:
http://www.alphaworks.ibm.com/tech/pmat.
PMAT supports the verbosegc output formats of JVMs offered by major JVM vendors such as
IBM, Sun and HP.
4.5.13.2 Setting the Heap Size for most configurations

This section contains guidelines for determining the appropriate Java heap size for most
configurations. If your configuration requires that more than one JVM runs concurrently on the
69
same system (for example, if you run both WPS and WID on the same system), then you should
also read the next section, 4.5.13.3. If your objective is to support large Business Objects, read
Section 4.5.2.
For most production applications, the IBM JVM Java heap size defaults are too small and should
be increased. In general the HotSpot JVM default heap and nursery size are also too small and
should be increased (we will show how to set these parameters later).
There are several approaches to setting optimal heap sizes. We describe here the approach that
most applications should use when running the IBM JVM on AIX. The essentials can be applied
to other systems. Set the initial heap size (-Xms option) to something reasonable (for example,
256 MB), and the maximum heap size (-Xmx) option to something reasonable, but large (for
example, 1024 MB). Of course, the maximum heap size should never force the heap to page. It
is imperative that the heap always stays in physical memory. The JVM will then try to keep the
GC time within reasonable limits by growing and shrinking the heap. The output from verbosegc
should then be used to monitor GC activity.
If Generational Concurrent GC is used (-Xgcpolicy:gencon), the new area size can also be set to
specific values. By default, the new size is a quarter of the total heap size or 64 MB, whichever is
smaller. For better performance, the nursery size should be - 1/2 of the heap size or larger, and
it should not be capped at 64MB. New area sizes are set by JVM options: -Xmn<size>, Xmns<initialSize>, and -Xmnx<maxSize>.
A similar process can be used to set the size of HotSpot heaps. In addition to setting the minimum
and maximum heap size, you should also increase the nursery size to approximately - 1/2 of the
heap size. Note that you should never increase the nursery to more than 1/2 the full heap. The
nursery size is set using the MaxNewSize and NewSize parameters (that is,
-XX:MaxNewSize=128m, -XX:NewSize=128m).
After the heap sizes are set, verbosegc traces should then be used to monitor GC activity. After
analyzing the output, modify the heap settings accordingly. For example, if the percentage of time
in GC is high and the heap has grown to its maximum size, throughput may be improved by
increasing the maximum heap size. As a rule of thumb, greater than 10% of the total time spent in
GC is generally considered high. Note that increasing the maximum size of the Java heap may
not always solve this type of problem as it is could be a memory over-usage problem.
Conversely, if response times are too long due to GC pause times, decrease the heap size. If both
problems are observed, an analysis of the application heap usage is required.
4.5.13.3 Setting the Heap Size when running multiple JVMs on one system
Each running Java program has a heap associated with it. Therefore, if you have a configuration
where more than one Java program is running on a single physical system, setting the heap sizes
appropriately is of particular importance. An example of one such configuration is when the
WID is on the same physical system as WPS. Each of these is a separate Java program that has
its own Java heap. If the sum of all of the virtual memory usage (including both Java Heaps as
well as all other virtual memory allocations) exceeds the size of physical memory, the Java heaps
will be subject to paging. As previously noted, this causes total system performance to degrade
significantly. To minimize the possibility of this occurring, use the following guidelines:
Collect a verbosegc trace for each running JVM.
Based on the verbosegc trace output, set the initial heap size to a relatively low value.
For example, assume that the verbosegc trace output shows that the heap size grows
70
quickly to 256 MB, and then grows more slowly to 400 MB and stabilizes at that
point. Based on this, set the initial heap size to 256 MB (-Xms256m).
Based on the verbosegc trace output, set the maximum heap size appropriately. Care
must be taken to not set this value too low, or Out Of Memory errors will occur; the
maximum heap size must be large enough to allow for peak throughput. Using the
above example, a maximum heap size of 768 MB might be appropriate (-Xmx768m).
This is to give the Java heap head room to expand beyond its current size of 400
MB if required. Note that the Java heap will only grow if required (e.g. if a period of
peak activity drives a higher throughput rate), so setting the maximum heap size
somewhat higher than current requirements is generally a good policy.
Be careful to not set the heap sizes too low, or garbage collections will occur
frequently, which might reduce throughput. Again, a verbosegc trace will assist in
determining this. A balance must be struck so that the heap sizes are large enough
that garbage collections do not occur too often, while still ensuring that the heap sizes
are not cumulatively so large as to cause the heap to page. This balancing act will, of
course, be configuration dependent.
4.5.13.4 Reduce or Increase Heap Size if Out Of Memory Errors Occur

The java.lang.OutOfMemory exception is used by the JVM in a variety of circumstances, making
it sometimes difficult to track down the source of the exception. There is no conclusive
mechanism for telling the difference between these potential error sources, but a good start is to
collect a trace using verbosegc. If the problem is a lack of memory in the heap, then this is easily
seen in this output. Please see section 4.5.13.1 for further information about verbosegc output.
Many garbage collections that produce very little free heap space will generally occur preceding
this exception. If this is the problem then one should increase the size of the heap.
If, however, there is enough free memory when the java.lang.OutofMemory exception is thrown,
the next item to check is the finalizer count from the verbosegc (only the IBM JVM will give this
information). If these appear high then a subtle effect may be occurring whereby resources
outside the heap are held by objects within the heap and being cleaned by finalizers. Reducing
the size of the heap can alleviate this situation, by increasing the frequency with which finalizers
are run. In addition, examine your application, to determine if the finalizers can be avoided, or
minimized.
Note that Out Of Memory errors can also occur for issues unrelated to JVM heap usage, such as
running out of certain system resources. Examples of this include insufficient file handles or
thread stack sizes that are too small.
In some cases, you can tune the configuration to avoid running out of native heap: try reducing
the stack size for threads (the -Xss parameter). However, deeply nested methods may force a
thread stack overflow if there is insufficient stack size.
For middleware products, if you are using an in-process version of the JDBC driver, it is usually
possible to find an out-of-process driver that can have a significant effect on the native memory
requirements. For example, you can use Type 4 JDBC drivers (DB2's "Net" drivers, Oracle's
"Thin" drivers), MQSeries can be switched from Bindings mode to Client mode, and so on.
Refer to the documentation for the products in question for more details.
4.5.13.5 Set AIX Threading Parameters
71
The IBM JVM threading and synchronization components are based upon the AIX POSIX
compliant Pthreads implementation. The following environments variables have been found to
improve Java performance in many situations and have been used for the workloads in this
document. The variables control the mapping of Java threads to AIX Native threads, turn off
mapping information, and allow for spinning on mutex (mutually exclusive) locks.
export AIXTHREAD_COND_DEBUG=OFF
export AIXTHREAD_MUTEX_DEBUG=OFF
export AIXTHREAD_RWLOCK_DEBUG=OFF
export AIXTHREAD_SCOPE=S
export SPINLOOPTIME=2000
4.5.14
Power Management Tuning
Power management is becoming common in processor technology; both Intel and Power core
processors now have this capability. This capability delivers obvious benefits, but it can also
decrease system performance whan a system is under high load, so consider whether or not to
enable power management. Using POWER6 hardware as an example, ensure that Power Saver
Mode is not enabled, unless desired. One way to modify or check this setting on AIX is through
the Power Management window on the HMC.
4.5.15
WPS Tuning for WICS migrated workloads
Note that the tuning below is unique to workloads migrated using the WICS migration wizard in
the WID. In addition to the tuning specified below, please follow the other WPS tuning
recommendations detailed in this document.
For JMS based messaging used to communicate with legacy WBIA adapters or custom
adapters , make use of non-persistent queues when possible.
For JMS based messaging used to communicate with legacy WBIA adapters or custom
adapters, make use of Websphere MQ based queues if available. By default, the adapters
use the MQ APIs to connect to the SIB based destinations via MQ Link. MQ Link is a
protocol translation layer which converts messages to and from MQ based clients. By
switching to Websphere MQ based queues, MQLink translation costs will be eliminated
and therefore performance will be improved.
Turn off server logs for verbose workloads. Some workloads emit log entries for every
transaction thus causing constant disk writes reducing overall throughput. Explore the
possibility of turning off server logs to reduce the throughput degradation for such
workloads.
WebSphere Process Server 7.0.0.1 Performance Results
5 WebSphere Process Server 7.0.0.1 Performance

Results
Each of the measurements presented in the following sections utilizes the SOABench 2008
Choreography Facet workload, described in Section 10.4. Note that SOABench 2008 is a
comprehensive workload that models the business processes of an automobile insurance
company. SOABench 2008 is intended to evaluate the performance of a distributed application
implemented using a Service Oriented Architecture (SOA). Elsewhere in the report, WESB
measurements are shown for another facet of SOABench 2008, the Mediation facet. See Chapter
6 for these results.
A common theme in all SOABench 2008 Choreography Facet results is the use of a metric named
CCPS, or Claims Completed Per Second. Claims in this context is an automobile insurance
claim. We define CCPS as the number of automobile insurance claims that are completed per
second. Note that this is separate and distinct from the number of business processes completed
per second, or the number of transactions (Commits) completed per second.
Following is a summary of the measurements included in this chapter. Sections 5.1.1 and 5.1.2
present data for the Windows 2008 and Red Hat Enterprise Linux (RHEL) platforms using a pure
microflow (Automated Approval Mode), and a microflow/macroflow pattern (OutSourced
Mode), respectively. Section 5.1.3 demonstrates the performance of WPS 7.0.0.1 for these 2
modes on an AIX system using POWER6 hardware. Section 5.1.4 demonstrates clustering
performance for systems using AIX on POWER6 hardware, for both Automated Approval and
OutSourced Mode. Finally, section 5.1.5 demonstrates the performance of an AIX POWER7
system in Automated Approval Mode. All AIX measurements are obtained with simultaneous
multi-threading (SMT) enabled.
Each data chart presented is followed by a table that identifies the measurement cell for the
particular workload. Note that the primary software and hardware systems are identified.
Hardware names are cross-referenced to the individual measurement systems descriptions in
Appendix A of this document, which document detailed configuration information for each
measurement platform.
72
73
5.1 SOABench 2008 Choreography Facet

5.1.1 Automated Approval on Windows 2008 and RHE Linux 5.2
The automated approval workload of the Choreography facet, described in section 10.4.2, is
evaluated on an IBM xSeries 3950, 2.9GHz Xeon (4 quad-cores) running Windows Server 2008
or RedHat Enterprise Linux 5.2 on the same physical machine, to demonstrate the throughput
characteristics of WPS in this configuration. 3 KB requests and 3 KB responses are used. The
workload is run in infrastructure mode, making the processing done during service call
invocations trivial.
The chart below shows that WPS performs very similarly on Windows and Linux in this
configuration. Throughput, in Claims Completed per Second (CCPS), and scaling data follows:
1 core Win2008: 109 CCPS, about the same as Linux (108 CCPS).
4 cores Win2008: 390 CCPS, 3% faster than Linux (379 CCPS), indicating an SMP
scalability factor for Win2008 of 3.6x and for Linux a scalability factor of 3.5x.
8 cores Win2008: 694 CCPS, 4% faster than Linux (665 CCPS), indicating an SMP
SOABench 2008 Automated Mode - WPS 7.0.0.1

Win2008 vs. Linux
Claims Completed per second
800
6.4x
700
6.2x
600
Win 2008
500
RHE Linux 5.2

3.6x
400
3.5x
300
200
100
0
1 core
4 cores
8 cores
CPU Utilization 96% - 100% across all Bars
Scaling Factor on each Multi-Processor Bar
Hyperthreading not supported
Copyright IBM Corporation 2005, 2010. All rights reserved
Measurement Configuration
WPS
Driver,SOABench
Services1
SOABenchServices2
Intel 2.93 GHz B
Intel 3.5 GHz C
Intel 2.93 GHz D
74
5.1.2 OutSourced on Windows 2008 and RHE Linux 5.2

The OutSourced Mode of the Choreography facet, described in section 10.4.3, is executed on an
IBM xSeries 3950, 2.9GHz Xeon (4 quad-core processors) running Windows Server 2008 or
RedHat Enterprise Linux 5.2, on the same physical machine, to demonstrate the throughput
characteristics of WebSphere Process Server in this configuration. The client driver issued 3 KB
requests and the server returned 3 KB responses. The workload is run in infrastructure mode,
making the processing behind service call invocations trivial.
The throughput, measured in Claims Completed per Second (CCPS), and SMP scaling data
follows:
1 core Win2008: 9.1 CCPS, 4% slower than Linux (9.5 CCPS)
4 cores Win2008: 26.6 CCPS, 17% slower than Linux (32.0 CCPS), indicating a SMP
To achieve optimal throughput, changes were made to the indexes of the BPE DB by following
the recommendations of the DB2 Design advisor.
SOABench 2008 OutSourced Mode - WPS 7.0.0.1

Win2008 vs. Linux
35
3.4x
30
2.9x
25
Win2008
20
RHE Linux 5.2
15
10
5
0
1 core
4 cores
CPU Utilization 97% -100% across all Bars
Scaling Factor on each Multi-Processor Bar
WPS
Driver,SOABench
Services1
SOABenchServices2
DB2
Intel 2.93 GHz B
Intel 3.5 GHz C
Intel 2.93 GHz D
PPC 2.2 GHz B
75
5.1.3 Vertical (SMP) scaling on AIX POWER6

5.1.3.1 Overview
This section shows SOABench 2008 vertical (SMP) scaling performance when the application
cluster is measured with a single cluster member using a varying number of POWER6 cores on
an AIX system. Horizontal (clustered) scaling performance is shown in the section directly
below. Further, a direct comparison between vertical and horizontal scaling performance is
shown in section 9.10.
5.1.3.2 Automated Approval Mode
The results below are obtained using SOABench 2008 Automated Approval Mode, described in
section 10.4.2. The topology used for these measurements is shown below the data chart.
As shown below, 4 core and 8 core SMP scaling is excellent 4x and 7.3x, respectively. At 16
cores, an impressive throughput above 2000 Claims Completed per Second is achieved.
However, the scaling limitations of the single server JVM start to hinder SMP scaling. As
mentioned above, section 9.10 examines this further.
Claims Completed per Second
SOABench 2008 Automated Mode - AIX
2500
90%
11.9x
2000
95%
7.3x
1500
1000
98%
4.0x
500
0
4 cores
8 cores
16 cores
CPU Utilization and Scaling Shown Above Each Bar
Simultaneous Multithreading (SMT) enabled
HTTP Server,
SOABench Driver
SOABench Services,
Active MEs, ME DB
PPC 1.9 GHz - A
POWER6 4.7 GHz - E
BPE, WPS DBs
POWER6 4.7 GHzG
WPS Applications
Cluster Members
POWER6 4.7
GHz - D
76
Topology: Vertical SOABench 2008 Automated Mode AIX

WebSphere Network Deployment Cell
1x
8 core
Power5
8 core
Power5
8 core
16 core
16 core
AppCluster
SOABench
Services
SOABench
BPEL
App
Micro
Flow
MECluster
IBM HTTP
Server with
WebSphere
Plugin
SOABench
Automated
Driver
Active
MEs
DB2
(BPE)
DB2
(MEs)
DB2
(WPS)
2010 IBM Corporation
5.1.3.3 OutSourced Mode

The results below are obtained using SOABench 2008 OutSourced Mode, described in section
10.4.3. The topology used for these measurements is shown below the data chart.
Note that SMP scaling is excellent, and when using 8 cores a throughput above 100 Claims
Completed per Second is achieved.
To achieve optimal throughput, the compensation service recovery log was located on a file
system on a RAID array behind a caching RAID adapter.
77
SOABench 2008 OutSourced Mode - AIX
120
93%
6.8x
100
80
98%
4.0x
60
40
20
0
4 cores
8 cores

SOABench Driver,
HTTP Server,
SOABench Agent/
Outsourced Services
SOABench Services,
Active MEs, ME DB
PPC 1.9 GHz - A
POWER6 4.7 GHz - E
BPE, WPS DBs

WPS Applications
Cluster Members
POWER6 4.7 GHzG
POWER6 4.7
GHz - D
78
Topology: Vertical SOABench 2008 OutSourced Mode - AIX

1x
8 core
Power5
8 core
Power5
SOABench
Agent and
Outsourced
Services
SOABench
Outsourced
Controller
(Driver)
8 core
16 core
16 core
AppCluster
SOABench
Services
SOABench
BPEL
App
Micro
Flow
MECluster
Async
Active
MEs
IBM HTTP
Server with
WebSphere
Plugin
DB2
(MEs)
DB2
(BPE)
Macro
Flow
DB2
(WPS)
5.1.4 Horizontal (clustered) scaling on AIX POWER6

5.1.4.1 Overview
This section shows SOABench 2008 horizontal scaling performance when the applications cluster
is measured on multiple 4 core POWER6 cluster members (nodes) using AIX systems. Vertical
(SMP) scaling performance is shown in the section directly above. Further, a direct comparison
between vertical and horizontal scaling performance is shown in section 9.10.
5.1.4.2 Automated Approval Mode
These are the results of SOABench 2008 Automated Approval Mode, described in section 10.4.2.
The topology used for these measurements is shown below the data chart.
Note that scaling is nearly perfectly linear. As additional nodes are added, throughput scales
proportionally. With 8 nodes of 4 cores each for the applications cluster, a throughput above
5,400 Claims Completed per Second is achieved.
79

(4 cores per node)
96%
7.8x
6000
5000
97%
5.9x
4000
98%
4.0x
3000
98%
2.0x
2000
1000
98%
0
1 node
2 nodes
4 nodes
6 nodes
8 nodes

HTTP Server,
SOABench Driver
SOABench Services,
Active MEs, ME DB
WPS Applications
Cluster Members
POWER6 4.7 GHz A
PPC 1.9 GHz - A
POWER6 4.7 GHz - E
POWER6 4.7 GHz B

POWER6 4.7 GHz C
POWER6 4.7 GHz - D
80
Topology: Horizontal SOABench 2008 Automated Mode - AIX

8x
8 core
Power5
8 core
Power5
8 core
16 core
4 core
IBM HTTP
Server with
WebSphere
Plugin
ServicesCluster
SOABench
Services
AppCluster
SOABench
Services
Micro
Flow
MECluster
IBM HTTP
Server with
WebSphere
Plugin
SOABench
Automated
Driver
Active
MEs
DB2
(MEs)
SOABench
BPEL
App
DB2
(BPE)
DB2
(WPS)
5.1.4.3 Outsourced Workload

These are the results of SOABench 2008 OutSourced Mode, described in section 10.4.3. The
topology used for these measurements is shown below the data chart.
With 6 nodes of 4 cores each for the applications cluster, a throughput over 320 Claims
Completed per Second is achieved. A detailed analysis of performance at 6 nodes suggests that
scalability is limited by the machine running DB2 for the BPE database, which was running at
very high utilizations.
To achieve optimal throughput, some changes were made to the indexes of the BPE DB, as
described in section 4.5.11.11.
81

(4 cores per node)
88%
5.4x
350
300
95%
3.9x
250
200
96%
2.0x
150
100
98%
50
0
1 node
2 nodes
4 nodes
6 nodes

SOABench Driver,
HTTP Server,
SOABench Agent/
Outsourced Services
BPE, WPS DBs

SOABench Services,
Active MEs, ME DB
WPS Applications
Cluster Members
POWER6 4.7 GHzA

PPC 1.9 GHz - A
POWER6 4.7 GHz -E
POWER6 4.7 GHzG
POWER6 4.7 GHz B

POWER6 4.7 GHz C
82
Topology: Horizontal SOABench 2008 OutSourced Mode - AIX

6x
8 core
Power5
8 core
Power5
SOABench
Agent and
Outsourced
Services
SOABench
Outsourced
Controller
(Driver)
8 core
16 core
4 core
AppCluster
SOABench
Services
SOABench
BPEL
App
Micro
Flow
MECluster
Async
Active
MEs
IBM HTTP
Server with
WebSphere
Plugin
DB2
(MEs)
DB2
(BPE)
Macro
Flow
DB2
(WPS)
5.1.5 Automated Approval on AIX POWER7

5.1.5.1 Introduction and Caveats
This section contains a study comparing the throughput performance of WPS v7.0.0.1 running on
POWER7 versus POWER6 systems. Both measurements use the AIX 6.1 operating system and
the 32-bit version of WPS 7.0.0.1. The workload used in this study is SOABench 2008
Automated Approval, described in section 10.4.2. With this workload, the POWER7 system
performed 40 to 50% better than a corresponding POWER6 system.
The hardware configuration of the corresponding systems is shown below:
POWER7: IBM pSeries 750, 1 to 6 core lpars, 3.55 GHz, 12 GB RAM, smt=4
POWER6: IBM pSeries 670, 1 to 6 core lpars, 4.7 GHz, 31 GB RAM, smt=2
There are significant differences between POWER7 and POWER6 processor architectures. Each
POWER7 processor contains 8 cores, while POWER6 processors contain only 2 cores.
POWER7 uses 4 MB on-chip L3 per core, while POWER6 shares 36 MB off-chip L3 by 2 cores
in each processor. Each POWER7 core employs 4 concurrent hardware threads (SMT=4), while
POWER6 core supports 2 hardware threads (SMT=2). Note that TurboCore mode is not enabled
on the POWER7 system used for the measurements shown below.
Since the POWER7 system became generally available very recently, relatively little tuning was
done for the measurements shown below. For example, we tuned the smt_snoose_delay setting to
83
-1, but we did not use resource sets to bind processes to processors, and we also did not use
memory affinity. It is likely that further tuning will produce better POWER7 results.
5.1.5.2 Results
The SOABench 2008 Automated Approval workload was used in this study. Results are as
follows:
On a 1 core configuration, POWER7 is 50% better than POWER6; the POWER7

throughput is 261 claims completed per second compared with 174 on POWER6.
On a 2-core configuration, the throughput is 501 on POWER7 versus 344 on POWER6, a

46% improvement for POWER7.
On a 4-core configuration, the throughput is 970 on POWER7 versus 691 on POWER6, a

40% advantage for POWER7.
Finally on a 6-core configuration, the throughput is 1,408 on POWER7 versus 978 on

POWER6, a 44% improvement.
SOABench 2008 Automated Approval

WPS 7.0.0.1 - AIX
1600
97%
1400
5.39x
1200
97%
1000
800
99%
600
99%
99%
400
200
100%
100%
3.72x
98%
5.62x
Power6
Power7
3.97x
1.92x
1.98x
0
1 core
2 core
4 core
6 core
CPU Utilization Shown Above Each Bar

WebSphere Process Server
Driver
POWER6 4.7 GHz E
Intel Xeon 2.93GHz -A
POWER7 3.55 GHz - A
POWER7 3.55 GHz - B
84
WebSphere ESB 7.0.0.1 Performance Results
6 WebSphere ESB 7.0.0.1 Performance Results

Each of the measurements presented in the following sections is based on one of a number of
performance measurement workloads used to simulate a production environment. A detailed
description of each workload, including characteristics, configuration and measurement specifics
can be found in Chapter 11. Any notes which may be pertinent to that particular configuration
and measurement are included directly beneath the chart.
Each set of charts sharing a common configuration are preceded by a table that identifies the
measurement hardware and software for those particular workloads. Hardware names are crossreferenced to the individual measurement systems section of Appendix A of this document, which
includes detailed configuration information for each measurement platform.
85
86
6.1 Windows results

6.1.1 Web Services Binding
The following charts show the throughput measured for various mediations using a range of
request and response sizes. For details of the mediations and request/response sizes see section
11.3 and 11.4. All data is obtained using Web services bindings on a 16-core non hyper-threaded
WESB server machine. For details of the topology used see section 11.1.
The JAX-WS SOAP 1.1 binding for Web Services was used throughout. This is the default Web
Services binding in WESB 7.0.0.1
The following measurement configuration was used for all the following scenarios:
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
Xform Value Mediation - Windows

6000
96%
5000
97%
Reqs/sec
4000
97%
V6.2
V7.0.0.1
95%
97%
3000
96%
96%
2000
96%
1000
97%
77%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
CPU Utilization Show n Above Each Bar
This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing The response message is passed through unmediated and as a result
is not parsed.
87
Xform Namespace Mediation - Windows

4500
95%
4000
3500
96%
96%
Reqs/sec
3000
95%
2500
V6.2
95%
2000
96%
V7.0.0.1
97%
1500
96%
1000
500
97%
88%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
This mediation transforms the request message from one schema to another using the XSL
Transform primitive. It performs the reverse transform on the response message using the XSL
Transform primitive. The schemas are largely the same but the name of an element differs and the
two schemas have different namespaces. The request and response flows are eligible for deferred
parsing.
This chart highlights a major performance improvement between the 6.2 and 7.0.0.1
releases. The improvement will affect all mediations with JAX-WS bindings on the Export
and Import components and are eligible for deferred parsing.
88
Xform Schema Mediation - Windows

1800
1600
97%
97%
1400
Reqs/sec
1200
97%
97%
1000
97%
800
V6.2
V7.0.0.1
98%
97%
600
98%
400
200
97% 99%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
Transform primitive. The schemas are completely different but contain similar data which is
mapped from one to the other. In addition to the transform, a value from the request message is
stored in a context header and set in the response message. The use of the context header means
that the transform operates on the root of the message rather than the body and hence the request
and response flows are not eligible for deferred parsing.
89
Route on Body Mediation - Windows

6000
95%
5000
96%
Reqs/sec
4000
97%
3000
V6.2
V7.0.0.1
95%
97%
96%
97%
2000
97%
1000
96%
77%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.
The response message is passed through unmediated and as a result is not parsed.
Route on Header Mediation - Windows

6000
5000
97%
96%
Reqs/sec
4000
97%
96%
95%
V6.2
V7.0.0.1
96%
3000
96%
96%
2000
1000
86% 85%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
90
XPath queries a string in the SOAP or JMS header. The Web Services workload uses the
Internationalization Context header. The JMS workload uses the JMSCorrelationId header field.
The request message processing does not use the message body which is therefore not parsed.
Message Element Setter Mediation - Windows

4000
3500
97%
97%
97%
3000
Reqs/sec
97%
2500
97%
V6.2
97%
2000
97%
1500
V7.0.0.1
97%
1000
500
97%
97%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
This mediation sets the value of a single element in the request message using the Message
Element Setter primitive. The request message processing is not eligible for deferred parsing. The
response message is passed through unmediated and as a result is not parsed.
If your mediation flow is otherwise eligible for deferred parsing an alternative to using Message
Element Setter primitive would be to use an XSL Transform primitive to set the required field.
This would then be eligible for deferred parsing.
91
BO Mapper Mediation - Windows

3500
98%
98%
3000
98%
Reqs/sec
2500
98%
2000
V6.2
V7.0.0.1
98%
98%
1500
98%
98%
1000
500
98% 98%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
This mediation uses the Business Object Map primitive to map the body of the request message
into a new Business Object. The request message processing is not eligible for deferred parsing.
Service Invoke Mediation - Windows
7000
97%
97%
6000
Reqs/sec
5000
97%
97%
97%
4000
V6.2
97%
3000
97%
V7.0.0.1
96%
2000
1000
97% 98%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
This mediation invokes the SOABench target from a Service Invoke primitive. The request
message is unprocessed but is wired to a service invoke which in turn is wired directly to the
inputResponse node. The request message processing is eligible for deferred parsing and there is
no separate response flow. A similar functional flow can be achieved by having separate request
92
and response flows with no mediation primitives. Testing has shown that these two approaches
are equivalent in performance terms.
Fan Out/Fan In Mediation - Windows

3000
Single Fan V6.2
2500
Single Fan V7.0.0.1
97%
Two Fans V6.2
97%
Two Fans V7.0.0.1
Reqs/sec
2000
96%
4 Fans V6.2
96%
97%
97%
1500
95%
1000
97%
97%
96%
97%
4 Fans V7.0.0.1
97%
97%
98%
96%
96%
96%
97%
97%
96%
96%
500
97%
97%
96%
98% 98%
97% 96% 95%96%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
This mediation invokes multiple SOABench services, sets a field in each response, merges the
responses and transforms the merged response. The request message processing examines a field
in the message to establish the number of fan outs (service calls) to invoke. Some additional
processing primitives are wired into the flow (see section 11.1 for details ) and the response from
the fan-in is wired directly to the inputResponse node as the service calls have already been
made. There is no separate response flow. The mediation is not eligible for deferred parsing.
93
Composite Mediation - Windows

97%
1600
97%
1400
Reqs/sec
1200
97%
1000
800
98%
97%
97%
V6.2
V7.0.0.1
98%
600
97%
400
200
96% 98%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
This mediation processes the request message with several mediation primitives. First a message
filter checks a field for authentication, next a custom mediation logs the message to the console,
this is followed by a routing filter using a value in the body of the message and finally an XSLT
primitive transforms the message. The request message is not eligible for deferred parsing.
Chained Mediation - Windows

1200
96%
97%
1000
Reqs/sec
800
97%
98%
98%
97%
600
V6.2
V7.0.0.1
98%
98%
400
200
98%
98%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE

This mediation is identical in function to the preceding composite mediation but the primitives
are in separate modules linked by SCA bindings. The request message is not eligible for deferred
parsing.
The response message is passed through unmediated but unlike the composite mediation it is not
eligible for deferred parsing as a result of passing back through the SCA bindings.
A separate directed study in this report compares the performance of different methods of
modularization see section 9.17
94
95
6.1.2 JMS Binding Non Persistent

The following charts show the non persistent throughput measured for various mediations using a
range of message sizes. For details of the mediations and message sizes see section 11.3 and 11.4.
All data is obtained using JMS bindings on a 4-way hyper-threaded WESB server machine. For
details of the topology used see section 11.2.
JMS Producer/Consumer
WebSphere ESB
Intel 2.8GHz - B
Intel 3.0GHz - D
JMS Value Mediation - Non Persistent

2500
97%
98%
Reqs/Sec
2000
1500
6.2.0.1
97%
1000
7.0.0.1
98%
500
97%
98%
0
1K
10K
4 CORE
100K
Hyper-Threading (HT) Enabled
eligible for deferred parsing.
96
JMS Body Routing Mediation - Non Persistent

2000
98%
98%
1800
1600
Reqs/Sec
1400
1200
1000
97%
6.2.0.1
98%
7.0.0.1
800
600
400
97%
200
98%
0
1K
10K
4 CORE
100K
JMS Header Routing Mediation - Non Persistent

3500
97%
96%
3000
96%
97%
Reqs/Sec
2500
2000
6.2.0.1
7.0.0.1
1500
1000
82%
73%
500
0
1K
10K
4 CORE
100K
97
XPath queries the JMSCorrelationId property in the JMS header. The request message processing
does not use the message body which is therefore not parsed.
JMS Schema Mediation - Non Persistent

900
99%
99%
800
700
Reqs/Sec
600
500
6.2.0.1
7.0.0.1
400
99%
300
98%
200
100
99%
99%
0
1K
10K
4 CORE
100K
stored in a context. The use of the context header means that the transform operates on the root of
the message rather than the body and hence the request flow is not eligible for deferred parsing.
98
JMS Composite Mediation - Non Persistent

800
98%
98%
700
Reqs/Sec
600
500
6.2.0.1
400
98%
300
7.0.0.1
99%
200
100
97%
99%
0
1K
10K
4 CORE
100K
This mediation uses a Filter primitive to query a field in the body of the request message, this is
followed by a Custom Mediation primitive which could be used for custom logging (no logging
takes place in this scenario to prevent IO contention). A further Filter primitive is then used to
route the message to one of two XSLT Transformation primitives. For this scenario the
transformation as detailed in the Schema Mediation is used.
The request message processing is not eligible for deferred parsing (because the body is parsed
for the initial Filter) or native form reuse (as the body may have been changed in the Custom
Mediation).
99
6.1.3 JMS Binding Persistent

The following charts show the persistent throughput measured for various mediations using a
range of message sizes. For details of the mediations and message sizes see section 11.3 and 11.4.
All data is obtained using JMS bindings on a 4-way hyper-threaded WESB server machine. For
JMS
Producer/Consumer
WebSphere ESB
DB2
Intel 2.8GHz - B
Intel 3.0GHz - D
Intel 3.5GHz - A
JMS Value Mediation - Persistent

1400
93%
94%
1200
Reqs/Sec
1000
800
95%
97%
6.2.0.1
7.0.0.1
600
400
200
97%
97%
0
1K
10K
4 CORE
100K
100
JMS Body Routing Mediation - Persistent

1400
1200
93%
96%
Reqs/Sec
1000
800
6.2.0.1
97%
96%
7.0.0.1
600
400
200
95%
99%
0
1K
10K
4 CORE
100K
JMS Header Routing Mediation - Persistent

1800
1600
93%
92%
1400
91%
93%
Reqs/Sec
1200
1000
6.2.0.1
800
7.0.0.1
600
400
88%
77%
200
0
1K
10K
4 CORE
100K
101
XPath queries the JMSCorrelationId property in the JMS header. The request message processing
does not use the message body which is therefore not parsed.
JMS Schema Mediation - Persistent

700
98%
96%
600
Reqs/Sec
500
400
6.2.0.1
300
99%
99%
7.0.0.1
200
100
98%
99%
0
1K
10K
4 CORE
100K
stored in a context. The use of the context header means that the transform operates on the root of
the message rather than the body and hence the request flow is not eligible for deferred parsing.
102
JMS Composite Mediation - Persistent

700
98%
96%
600
Reqs/Sec
500
400
6.2.0.1
300
99%
99%
7.0.0.1
200
100
98%
99%
0
1K
10K
4 CORE
100K
This mediation uses a Filter primitive to query a field in the body of the request message, this is
followed by a Custom Mediation primitive which could be used for custom logging (no logging
takes place in this scenario to prevent IO contention). A further Filter primitive is then used to
route the message to one of two XSLT Transformation primitives. For this scenario the
transformation as detailed in the Schema Mediation is used.
The request message processing is not eligible for deferred parsing (because the body is parsed
for the initial Filter) or native form reuse (as the body may have been changed in the Custom
Mediation).
103
6.2 AIX results

6.2.1 Web Services Binding
The following charts show the throughput measured for various mediations using a range of
request and response sizes. For details of the mediations and request/response sizes see sections
11.3 and 11.4. All data is obtained using the Web services bindings on an 8-way pSeries
POWER6 WESB server machine with 1 core enabled. For details of the topology used see section
11.1.
Note that there are some simple cases where a small message is being passed and there is no
processing of the response flow where V7.0.0.1 is slower than V6.2. The development team is
currently working to resolve this issue.
Web Services Client
WebSphere ESB
Web Services Target
Intel 3.67GHz - C
PPC 4.2GHz - A
PPC 4.2GHz - B
Xform Value Mediation - AIX

500
99%
450
89%
400
99% 99%
100%
Reqs/sec
350
300
100%
98%
250
T
200
V6.2
V7.0.0.1
99%
150
100
50
100%
99%
0
Base in/Base out
Base in/10K out
10K in/Base out
1 Core
10K in/10K out
100K in/100K out

eligible for deferred parsing The response message is passed through unmediated and as a result
is not parsed.
104
Xform Nmspc Mediation - AIX

600
500
100%
100%
Reqs/sec
400
100%
99%
300
V6.2
99%
99%
200
V7.0.0.1
100%
99%
100
99%
99%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
1 Core
100K in/100K out

Transform primitive. The schemas are largely the same but the name of an element differs and the
two schemas have different namespaces. The request and response flows are eligible for deferred
parsing.
This chart highlights a major performance improvement between the 6.2.0 and 7.0.0.1
releases. The improvement will affect all mediations which have a deferred parsing eligible
transform and which use document literal wsdl.
105
Xform Schema Mediation - AIX

250
100%
100%
Reqs/sec
200
150
99%
V6.2
100%
100%
100
V7.0.0.1
100%
100%
100%
50
99% 100%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
1 Core
100K in/100K out

stored in a context header and set in the response message. The use of the context header means
that the transform operates on the root of the message rather than the body and hence the request
and response flows are not eligible for deferred parsing.
106
Route Header Mediation - AIX

700
100%
600
100%
500
Reqs/sec
99%
100%
400
100%
100%
99%
300
V6.2
99%
V7.0.0.1
200
100
99% 100%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
1 Core
100K in/100K out

XPath queries a string in the SOAP or JMS header. The Web Services workload uses the
Internationalization Context header. The JMS workload uses the JMSCorrelationId header field.
The request message processing does not use the message body which is therefore not parsed.
107
Route Body Mediation - AIX

600
100%
500
100%
400
99%
Reqs/sec
99%
100%
300
V6.2
99%
100%
V7.0.0.1
99%
200
100
100%
100%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
1 Core
100K in/100K out

This mediation routes the request message to the target service using the Filter primitve. The
108
Message Element Setter Mediation - AIX

500
450
100%
100%
400
99% 99%
Reqs/sec
350
300
100%
250
V6.2
99%
V7.0.0.1
100%
200
99%
150
100
50
100%
100%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
1 Core
100K in/100K out

This mediation sets the value of a single element in the request message using the Message
Element Setter primitive. The request message processing is not eligible for deferred parsing. The
response message is passed through unmediated and as a result is not parsed.
If your mediation flow is otherwise eligible for deferred parsing an alternative to using Message
Element Setter primitive would be to use an XSL Transform primitive to set the required field.
This would then be eligible for deferred parsing.
109
Composite Mediation - AIX

250
Reqs/sec
200
100% 100%
150
V6.2
100%
100%
100
99%
V7.0.0.1
99%
100%
100%
50
99% 100%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
1 Core
100K in/100K out

This mediation processes the request message with several mediation primitives. First a message
filter checks a field for authentication, next a custom mediation logs the message to the console,
this is followed by a routing filter using a value in the body of the message and finally an XSLT
primitive transforms the message. The request message is not eligible for deferred parsing.
110
Chained Mediation - AIX

160
140
100% 99%
120
Reqs/sec
100
80
100%
99%
99%
99%
V6.2
V7.0.0.1
60
99%
100%
40
20
99% 99%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
1 Core
Copyright IBM Corporation 2009, 2010
All i ht
d
100K in/100K out

This mediation is identical in function to the preceding composite mediation but the primitives
are in separate modules linked by SCA bindings. The request message is not eligible for deferred
parsing.
The response message is passed through unmediated but unlike the composite mediation it is not
eligible for deferred parsing as a result of passing back through the SCA bindings.
111
BO Mapper Mediation - AIX

400
100%
96%
350
98%
300
99%
Reqs/sec
250
200
V6.2
150
100%
V7.0.0.1
91%
98%
99%
100
50
83% 99%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
1 Core
100K in/100K out

This mediation uses the Business Object Map primitive to map the body of the request message
into a new Business Object. The request message processing is not eligible for deferred parsing.
112
Service Invoke Mediation - AIX

800
700
100%
100%
600
99%
Reqs/sec
500
99%
100%
100%
400
V6.2
99%
99%
V7.0.0.1
300
200
100
100% 98%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
1 Core
100K in/100K out

This mediation invokes the SOABench target from a Service Invoke primitive. The request
message is unprocessed but is wired to a service invoke which in turn is wired directly to the
inputResponse node. The request message processing is eligible for deferred parsing and there is
no separate response flow. A similar functional flow can be achieved by having separate request
and response flows with no mediation primitives. Testing has shown that these two approaches
are equivalent in performance terms.
113
6.2.2 JMS Binding Non Persistent

The following chart shows the throughput measured for the Transform Value and Body Routing
mediations using a range of request sizes. For details of the mediation and request sizes see
sections 11.3 and 11.4. All data is obtained using JMS bindings on an 8-way pSeries POWER6
WESB server machine with 8 Cores enabled. For details of the topology used see section 11.2.1.
The below measurement configuration was used for this scenario:
WebSphere ESB
Intel 3.67GHz - A
PPC 4.2GHz - A
JMS Value Mediation Non Persistent - AIX

8000
7000
95%
95%
Reqs/Sec
6000
5000
6.2
4000
97%
96%
7.0.0.1
3000
2000
1000
97%
96%
0
Base
10
8 CORE
100
Simultaneous Multithreading (SMT) Enabled
eligible for deferred parsing. The response message is passed through unmediated and as a result
is not parsed.
114
JMS Body Routing Mediation Non Persistent - AIX

7000
96%
6000
Reqs/Sec
5000
97%
4000
6.2
97%
3000
7.0.0.1
97%
2000
1000
97%
98%
0
Base
10
8 CORE
100
115
6.2.3 JMS Binding Persistent

The following chart shows the throughput measured for the Transform Value and Body Routing
mediations using a range of request sizes. For details of the mediation and request sizes see
sections 11.3 and 11.4. All data is obtained using JMS bindings on an 8-way pSeries POWER6
WESB server machine with 8 Cores enabled. For details of the topology used see section 11.2.1.
The below measurement configuration was used for this scenario:
WebSphere ESB
DB2
Intel 3.67GHz - A
PPC 4.2GHz - A
PPC 4.2GHz - B
JMS Value Mediation Persistent - AIX

4000
89%
91%
3500
Reqs/Sec
3000
2500
6.2
2000
7.0.0.1
1500
1000
34%
30%
500
77%
68%
0
Base
10
8 CORE
100
eligible for deferred parsing. The response message is passed through unmediated and as a result
is not parsed.
116
JMS Body Routing Mediation Persistent - AIX

4000
3500
90%
92%
Reqs/Sec
3000
2500
6.2
2000
7.0.0.1
1500
1000
42%
37%
500
83%
71%
0
Base
10
8 CORE
100
6.2.4
117
Web Services Binding SMP scaling
The results below are for the SOABench 2008 Transform Schema workload running on AIX
using the 10K message lengths. They show the SMP scaling achieved when running the ESB
Server in 3 different configurations: 1-way, 4-way and 8-way. Note that simultaneous multithreading (SMT) is enabled for all measurements.
The measurement configuration below was used for all measurements:
Web Services Client
WebSphere ESB
Web Services Target
Intel 3.67GHz - C
PPC 4.2GHz - A
PPC 4.2GHz - B
SMP Scaling Web Services - AIX

600
97%
500
7.0x
Reqs/sec
400
98%
300
1 Core
4 Cores
200
100
8 Cores
3.8x
98%
0
Scenario - Transform Schema (10K in /10K out)

SMP Scaling Factor on Each Multi-Processor Bar
WebSphere Business Monitor 7.0.0.0 Performance Results
7 WebSphere Business Monitor 7.0.0.0 Performance

Results
As shown below, a major performance enhancement to interactive process design was delivered
in WebSphere Business (WB) Monitor V7.0.0.0.
While steps were taken during development to ensure that the performance of WB Monitor
7.0.0.0 in other areas kept pace with the levels achieved in V6.2, in the interest of space we do not
repeat previously published results here. For performance data regarding event delivery, event
processing, and dashboard access, please see the earlier report for more information:
BPM 6.2.0 Performance Report
7.1 Interactive Process Design Improvements

Interactive process design empowers business users to go directly from modeling to deployment
on WebSphere Process Server and WB Monitor runtimes for certain human-centric process
scenarios. A pre-configured business space is created as part of deployment that can be
immediately used for testing process execution, management, and monitoring in managed
environments.
In this study we focus on reductions in the time required to deploy a monitor model directly from
WB Modeler 7.0.0.0 to WB Monitor 7.0.0.0. The Vacation Process model and the associated
monitor model used in this study have these attributes:
3 human tasks
2 business rules tasks
7 KPIs
11 metrics
3 cubes
Durations reported here are averages of multiple measurements, gathered from an analysis of
messages logged in the server during deployment. The first deploy operation after startup is not
included in the average. This reflects the typical user experience during interactive process
design. We note that the first deploy operation after startup, while taking somewhat longer due to
one-time initialization costs, also benefits substantially from the improvements delivered in
V7.0.0.0.
In the topology used for these measurements, WB Modeler client and WB Monitor server
machines are connected to the same subnet of a shared (non-private) network at 100 Mbps.
118
WebSphere Business Monitor 7.0.0.0 Performance Results
Deployment Time in Seconds

(Down is Good)
Interactive Process Design

Vacation Process Monitor Model Deployment Time
Monitor Server: Linux on IA-32
100
80
60
40
20
0
V6.2
V7.0
WB Modeler Client
WB Monitor Server
Intel 2.2GHz D2D1
Intel 3.0GHz D2D2
Deployment time is reduced in V7.0.0.0 to less than half of the time needed in V6.2 due to
several improvements, notably:
Exploiting the new EJB 3.0 support available in the WebSphere V7 Application Server
which underpins the runtime of WB Monitor V7. This eliminates the need for a separate
EJB deploy step.
Streamlining the EAR generation process by dramatically reducing disk I/O.
119
WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results
8 WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results

The measurements presented in the following sections demonstrate results generated by
workloads used to simulate common activities in a development environment making use of
WebSphere Integration Developer (WID), WebSphere Business Modeler (WBM) and WebSphere
Process Server (WPS). A detailed description of the workloads can be found in Chapter 12.
Section 8.1 uses the Order Processing, Loan Processing & Customer Service workloads to
demonstrate the Response Time & Peak Memory consumption (within the Java Heap) when
executing a Clean & Build operation within the WID.
Section 8.2 uses the Loan Processing workload to demonstrate Response Time when deploying
the applications to WPS using the Add/Remove Project Dialog with in the WID.
Section 8.3 uses the BPM@Work workload to demonstrate Response Time when deploying
business processes to WPS directly from WBM.
Each of these Sections compares results from version 7.0.0.1 of WID and WPS with those from
previous releases.
Each data chart presented is followed by a table that identifies the measurement cell for the
particular workload. The primary software and hardware systems are identified. Hardware
names are cross-referenced to the individual measurement systems descriptions in Appendix A of
this document, which provide detailed configuration information for each measurement platform.
8.1 Build Activities

Execution of Clean All artifacts followed by a Build of the entire workspace is measured in
order to demonstrate relative performance of broad spectrum of build time operations within the
WID. Both total response time and peak liveset within the Java Heap is measured for this
operation. WID 7.0.0.1 shows improvements relative to previous versions of the product in both
metrics and across a broad range of workspaces.
120
121
8.1.1 Order Processing Workload

Clean & Build of the Order Processing workspace in WID 7.001 completes in 69 seconds, a 30%
improvement relative to version 6.2.0.1 and a 2.3x improvement relative to version 6.1.0.
Peak live data within the WIDs Java Heap is 215MB during the Clean & Build operations, a
12% improvement relative to WID 6.2.0 & a 22% improvement relative to WID 6.1.0.
Order Processing Workload

Clean & Build Response Time - Windows
200
Time (Seconds)
180
160
140
120
100
80
60
40
156
124
99
98
69
20
0
WID 6.1.0.0
WID 6.1.0.1
WID 6.1.2
WID 6.2.0.1
WID 7.001
2 Core
Copyright IBM Corporation 2010. All rights reserved
Order Processing Workload

Clean & Build Peak Java Memory - Windows
300
Memory (MB)
250
276
272
245
240
200
215
150
100
50
0
WID 6.1.0.0
WID 6.1.0.1
WID 6.1.2
WID 6.2.0.1
2 Core
WID 7.001
122
8.1.2 Loan Processing Workload

Clean & Build of the Loan Processing workspace in WID 7.001 completes in 86 seconds, a 13%
improvement relative to version 6.2.0.1 and a 4.7x improvement relative to version 6.1.0.
17% improvement relative to WID 6.2.0.1 & a 19% improvement relative to WID 6.1.0.
Loan Processing Workload

450
Time (Seconds)
400
350
407
300
250
200
203
150
182
100
99
50
86
0
WID 6.1.0.0
WID 6.1.0.1
WID 6.1.2
WID 6.2.0.1
WID 7.001
2 Core

400
350
Memory (MB)
300
250
281
281
278
273
227
200
150
100
50
0
WID 6.1.0.0
WID 6.1.0.1
WID 6.1.2
WID 6.2.0.1
2 Core
WID 7.001
123
8.1.3 Customer Service Workload

Clean & Build of the Customer Service workspace in WID 7.001 completes in 91 seconds, a 45%
improvement relative to version 6.2.0.
32% improvement relative to WID 6.2.0.
Customer Service Workload

180
Time (Seconds)
160
164
140
144
120
100
91
80
60
40
20
0
WID 6.2.0
WID 6.2.0.1
WID 7.001
2 Cores
Customer Service Workload

500
Memory (MB)
450
400
417
350
300
250
324
283
200
150
100
50
0
WID 6.2.0
WID 6.2.0.1
2 Cores
WID 7.001
WebSphere Integration Developer
Intel 2.66GHz A
124
125
8.2 Publish Activities

In this section we examine total response time when publishing the Loan Processing workload to
WPS via execution of the Add All Projects dialog within WID. During publish, if application
deploy code has not already been generated, WID will generate it as part of the publish operation.
However, there are some cases where WID is able to cache the generated deploy code within the
application. We will look at each of these scenarios separately.
8.2.1 Publish Including Generation of Deploy Code

When using WID & WPS version 6.2, we recommend publishing with Resources in the
Workspace (and the minimize file copies checkbox enabled) where applicable, in order to
improve application install responsiveness. In version 7.001, this option is no longer supported,
so we publish the application with Resources on the Server.
Publishing the Loan Processing workspace completes in 611 seconds when using WID & WPS
version 7.001, a 1.9x improvement compared with version 6.2.0 (with Resources on the Server)
and a 40% improvement over 6.2.0 (with Resources in the Workspace).

Publish Response Time - Windows
1200
1145
Time (Seconds)
1000
1018
800
600
611
400
200
0
BPM 6.2 - RoS
BPM 6.2 - RoW
2 Core
Intel 2.66GHz B
BPM 7.001 - RoS
126
8.2.2 Publish with Deploy Code Cached in the Application

Beginning with version 7 of WID & WPS, generation of Deploy Code is much more efficient,
greatly reducing the difference in publish response time due to having deploy code cached within
the application.
In this case, publishing the Loan Processing workspace completes in 538 seconds when using
WID & WPS version 7.001, a 25% improvement compared with version 6.2.0 (with Resources on
the Server). Publish responsiveness in WID & WPS 7.001 (with Resources on the Server) is
comparable to that seen in version 6.2.0 (with Resources in the Workspace).
Note: in these measurements, the 3% difference between the results using versions 7.001 and 6.20
is not statistically significant.

Publish Response Time - Windows
800
700
713
Time (Seconds)
600
500
521
400
538
3.02x
300
200
100
0
BPM 6.2 - RoS
BPM 6.2 - RoW
2 Core
Intel 2.66GHz B
BPM 7.001 - RoS
127
8.3 Direct Deploy Activities

In this section we examine total response time when deploying the BPM@Work workload to
WPS via execution of the Verify Process Design dialog within WebSphere Business Modeler.
The BPM@Work workload is described in Section 12.4.
Deploying the BPM@Work workspace completes in 153 seconds when using WB Modeler &
WPS version 7.0.0.1, a 2.7x improvement compared with version 6.2.0.
BPM@Work Workload
Deploy Response Time - Windows
500
Time (Seconds)
400
412
300
200
153
100
0
BPM 6.2
BPM 7.001
2 Core
Intel 2.66GHz B
Directed Studies
128
9 Directed Studies
This section provides a more detailed exploration of some features, along with development and
deployment options, within WPS, WESB, and WID. Generally, these studies are motivated by
lessons learned in the course of performance analysis of these products, or direct interaction with
WebSphere Business Process Management customers. Each of these studies is meant to illustrate
a set of issues that may be of interest, but is not intended to provide an exhaustive analysis of the
component in question. Several of the studies also support points made in the Architecture Best
Practices and Development Best Practices sections above.
Note that some of the directed studies below contain the same information as was presented in
earlier versions of the performance report; these studies were not repeated using WebSphere
BPM 6.2.0 since the conclusion would not change significantly. The charts and section headers
are clearly labeled to indicate this.
9.1 Throughput for 32-bit JVM on 32-bit and 64-bit Windows

The Windows operating system has both 32-bit and 64-bit modes of operation. This is separate
and distinct from 32-bit and 64-bit JVMs, discussed in section 9.2. The chart below compares the
performance of a WPS 32-bit installation on both a 32-bit and 64-bit version of the Windows
2008 operating system. Equivalent hardware is used for each measurement. As shown in the
chart below, WPS delivers 6% better throughput on the 64-bit version of the Windows 2008
operating system, measured using the SOABench 2008 OutSourced workload described in
section 10.4.3. This is likely due to more efficient memory management in the 64-bit version of
Windows 2008.
SOABench 2008 OutSourced Mode

WPS 7.0.0.1 32-bit JVM
40
35
30
28.3
26.6
25
Win2008 32-bit
Operating System
Win2008 64-bit
Operating System
20
15
10
5
0
4 cores
CPU Utilization 99% across all Bars

Directed Studies
129
WPS
Driver,SOABench
Services1
SOABenchServices2
DB2
Intel 2.93 GHz B
Intel 3.5 GHz C
Intel 2.93 GHz C
PPC 2.2 GHz B
9.2 Throughput and memory usage for 64 bit JVM on AIX

9.2.1 Introduction
WPS offers both a 32 bit and 64 bit version. An advantage of the 64 bit version is additional heap
space can be utilized in WPS; for example, a large heap was used for the 64 bit evaluation in the
large object study in section 9.14.2 of this document.
A drawback of the 64 bit version in earlier WPS versions is the additional memory space used by
objects on the heap. The memory liveset is greater after server startup and greater after
applications are run, for instance, when application caches become populated. In addition, when
applications are running, more frequent garbage collection occurs due to the additional space used
by transient objects at runtime. This impacts throughput.
In the 64 bit version of WPS 7.0.0.1 the drawback of additional memory when compared to the
32 bit version is reduced due to an underlying JVM enhancement to use compressed object
references. For 64 bit WPS 7.0.0.1, this is the default behavior due to the JVM argument
Xcompressedrefs. This improves both 64 bit throughput as well as memory utilization.
9.2.2 Throughput Results

The Automated Approval workload of SOABench 2008, described in section 10.4.2, is evaluated
on an IBM pSeries, 4.7 GHz POWER6, 4 core SMP system running AIX to demonstrate the
throughput characteristics of WPS business choreography in this configuration. The driver issued
3 KB requests and the server returned 3 KB responses. The workload was run in infrastructure
mode, making the processing behind service call invocations trivial.
With WPS 6.2.0.1, the workload runs at a rate of 441 Claims Completed per Second (CCPS).
WPS 7.0.0.1 achieved 502 CCPS with the same configuration, a 14% improvement. The WPS
Java heap size was set to 3000M and the GC policy and nursery size were set as follows:
-Xgcpolicy:gencon Xmn2000M
Directed Studies
130

64 bit WPS Performance - AIX
550
500
450
400
350
300
6.2.0.1
250
7.0.0.1
200
150
100
50
0
4 core
CPU Utilization 99%

POWER6 4.7 GHz D
Driver
DB2
POWER6 4.7 GHz D
9.2.3 Memory Footprint Results

Using the same SOABench 2008 automated approval workload running on the same hardware
and software configuration as above, we measured the Java heap liveset memory footprint on
both 32-bit and 64-bit WPS systems. The memory footprint measured in this study is the average
heap occupancy after garbage collection as reported by verbosegc. Each measurement on the
chart below uses the following JVM parameters:
-Xgcpolicy:optthroughput -ms2048m -mx2048m
The chart below shows the 64-bit heap occupancy dropping from 338 MB for WPS 6.2.0.1 to 248
MB for WPS 7.0.0.1, an improvement of 26%. Also note that the 64-bit heap usage on WPS
7.0.0.1 is only 8% greater than the 32 bit version, which has a heap occupancy of 230 MB. By
contrast, the 64 bit heap occupancy for WPS 6.2.0.1 is 51% greater than the 32 bit version which
uses 223 MB.
Directed Studies
131

Liveset Memory - AIX
MegaBytes (down is good)
400
338
350
300
250
230
223
248
32 bit
200
64 bit
150
100
50
0
6.2.0.1
7.0.0.1
POWER6 4.7 GHz D
Driver
DB2
POWER6 4.7 GHz D
9.3 Throughput and response time for up to 10,000 concurrent
users
9.3.1 Introduction
The SOABench 2008 InHouse Claim Processing workload, described in Section 10.4.4, is
evaluated on an IBM xSeries 3950 M2, 2.93 GHz Xeon (4x quad-core) running Windows 2008
Server. This workload is used to demonstrate the throughput and response time characteristics of
WebSphere Process Server business choreography as an increasing number of users are
concurrently processing insurance claims. Before the workload runs, 50,000 process instances
representing existing insurance claim activity are preloaded into the business process
choreography database. The insurance claims are divided equally among 125 regions. Users
belong to a single region and can only process insurance claims from their region, which is
enforced via authentication by a Tivoli Directory Server. Within a region, users are divided into 2
Directed Studies
132
groups, adjusters and underwriters. Of the four human tasks required to complete an insurance
claim, two are done by adjusters and two are done by underwriters.
Users query active process instances for a list of work that they can perform. A work item is
claimed (selected from the list) and then completed by the user. Users think between query,
claim, and complete activities. The think time is random but averages a total of 180 seconds per
human task. The time a user waits for responses to their human task queries, claims and
completes is recorded as response time. The rate at which insurance claims are completed is the
throughput. Once an entire insurance claim is finished, another is added to the region to maintain
active process instances at a constant level.
A multi tier topology was used for this study:
A database server which holds the Choreography and Messaging databases.
A WPS Server which runs the processes involved in the application scenario.
A Tivoli Directory Server with LDAP database for user authentication.
2 support systems which each run workload generators (client agents) under the direction
of a single client controller. One support system handles asynchronous service requests
and the other handles synchronous service requests by the business processes running on
the WPS Server.
9.3.2 Results 4 WPS server cores

Throughput and response time curves were first generated running the workload on the WPS
server running 4 cores and is presented in the chart below. 8 core results are presented below.
The right y axis is the throughput in Claims Completed per second (CCPS). The left y axis is
average response time in milliseconds. The chart only shows query response time as the claim
and complete response time averages were always faster.
Throughput increases steadily as additional load is applied, reaching 11.5 CCPS, and response
time remains flat, under 133 milliseconds, up to the 8400 user level. At this point the WPS CPU
utilization is 85%. The addition of 480 more users, 8880 total, causes the CPU utilization to
increase to 99%, throughput to rise slightly to 11.8 CCPS, and query response time to climb
abruptly to 1.5 seconds. WPS server CPU is clearly the bottleneck at this point. At the 9040 user
level CPU remains at 99%, throughput flattens and response time for queries increases to 2.3
seconds. This increase in response time is a byproduct of the WPS server CPU being saturated;
throughput is maintained constant but since there are additional users submitting requests,
average response times increase. In other words, the system is well behaved and continues to
process work efficiently even though it is being driven beyond its capacity. Expanding WPS
capacity to handle the additional user load is discussed in the following subsection.
Directed Studies
133
SOABench 2008 In House Claim Processing

WPS 7.0.0.1 Win2008 Server - 4 cores
16
10
3000
Throughput
Query
response
2500
59% cpu
6000
users
2000
1500
8
6
1000
Response Time ms
14
12
99% cpu
8880
users
85% cpu
8400
users
4
500
2
0
24
0
72
12 0
0
16 0
8
21 0
60
26
4
31 0
20
36
0
40 0
8
45 0
60
50
4
55 0
2
60 0
00
64
8
69 0
6
74 0
4
79 0
20
84
0
88 0
8
90 0
40
User Load
Preload:
125 regions with
400 tasks each =
50,000 total tasks.
80 users per region
Total Task
Think Time:
180 seconds
Driver 1
Driver 2
DB2
Intel 3.67 GHz B Intel 3.0 GHz B POWER6 4.7GHz E
9.3.3 Results 8 WPS server cores

The 4 core CPU bottleneck on the WPS server can easily be eliminated by running the workload
on an 8 core server. These results are presented in the chart below. Throughput increases steadily
as additional concurrent users are added, reaching 13.8 CCPS, and response time remains flat at
under 119 ms, up to the 10,000 user level. At this point the WPS CPU utilization is 56%.
Additional concurrent users beyond 10,000 were not evaluated. However, the 4 core results
above suggest that throughput would rise in concert with the increased user load while
maintaining flat response times up to the point where either the WPS CPU became saturated, or
some other system bottleneck was reached.
Directed Studies
134
SOABench 2008 In House Claim Processing

WPS 7.0.0.1 Win2008 Server - 8 cores
44% cpu
8400
users
14
12
56% cpu
10000
users
3000
Query
response
2500
31% cpu
6000
users
10
Throughput
2000
1500
1000
Response Time ms
16
4
500
10
00
0
92
00
84
00
76
00
68
00
60
00
52
00
36
00
44
00
20
00
28
00
0
12
00
40
0
User Load
Preload:
125 regions with
400 tasks each =
50,000 total tasks.
80 users per region
Total Task
Think Time:
180 seconds
Driver 1
Driver 2
DB2
9.4 Business Space response time for Human Workflow

Page load times for Business Space for Human Workflow widgets improved by up to 55% in
WPS 7.0.0.1 compared to the 6.2.0.2-based Feature Pack.
All measurements were performed manually with a single browser user on a Business Space
deployment based on the Advanced Human Workflow template. The response time data is
obtained on a client machine which is connected to the server by a 1 Gigabit ethernet. After
opening the browser and loading the Business Space home page, the user performs the following
set of actions:
1. Log on (and load the My Work page)
2. Refresh the page (by clicking the My Work tab)
3. Switch to the page Manage Human Tasks
Directed Studies
135
4. Log out
The measurement for the initial iteration of the above steps is discarded, so the results below
utilized a primed browser cache. The results in this study show the average of the subsequent
eight measurement iterations in the browser.
Client hardware
OS:
Windows XP (32bit)
CPU:
1 x Intel Centrino Dual Core Processor 1.8 GHz
Memory:
2 GB RAM
Network:
1 Gigabit ethernet connection
Server hardware environment (Standalone Configuration)
OS:
Windows Server 2003 (32-bit)
CPU:
2 x Intel Xeon 5160 @ 3 GHz
FSB:
1333 MHz
Memory:
16 GB RAM
HDD:
8 Internal Disks (2,5 68 GB SATA)

IBM ServeRAID 8k (256 MB buffer)
Network:
1 Gigabit ethernet connection
Directed Studies
136
Software environment
WPS:
Version 7.0.0.1
Single server setup
Local process database on DB2
All other databases on Derby
DB:
o
DB2 version 9.1.301.314
9.5 Process Instance Migration Performance

This study evaluates the total time to migrate business process instances to WPS 7.001. The
summary of results follows:
All measured performance numbers (Response Time, Throughput and Parallelism)

remain consistent for all scenarios evaluated (100, 1,000, and 10,000 Migrated Instances)
Response Time: 450 milliseconds per process instance
Throughput: 22 1nstances migrated per second
Parallelism: 9 Threads performed instance migration in parallel
The total elapsed time of the migration grows linearly with the number of migrated
instances, as expected: migrating 100 Process Instances takes 4.5 seconds, 1,000 Process
Instances takes 44.7 seconds, 10.000 Process Instances takes 453.8 seconds.
Test hardware environment (Standalone Configuration)
OS:
Windows Server 2003 (32bit)
CPU:
1 x Intel Xeon 5160 @ 3 GHz
FSB:
1333 MHz
Memory:
16 GB RAM
HDD:
8 Internal Disks (2,5 68 GB SATA)

IBM ServeRAID 8k (256 MB buffer)
Workload and measurement methodology

For each measurement run the given number of instances (100, 1,000, or 10,000) are created from
the original version of the process template. All instances navigate to a BPEL Parallel-Activity
that consist of a number of Task-Activities and remain in this process instance state. After all
instances reach this state the migration of all instances is triggered. This migration is performed
by a number of parallel threads that invoke the migrate() method of the BusinessFlowManagers
Remote EJB interface. The Total Response Time depicted on the chart below represents the
duration of all process instances migrations for a measurement run. All measurements are based
Directed Studies
137
on the duration of the synchronous migrate() method call. A migration is considered complete
after this call completes.
Process Instance Migration Duration
Total Migration Duration

(Seconds)
1000
453.8
100
10
44.7
4.5
1
100 Instances
1,000 Instances
10,000 Instances
Number of Migrated Process Instances
9.6 BPC Query Response Time

BPC queries for a particular object, such as a specific human task or a specific process instance,
are typically fast because the database can return the requested information without performing
complex joins and calculations. In contrast, task and process list queries require the database to
perform joins and calculations in order to apply filters and sort criteria.
To perform task and process list queries faster, information to use an optimized access path must
be made available to the database. This can be achieved by different technologies, such as BPC
query tables, an optimized index and table structure, and up-to-date statistics. The following
measurements show performance improvements achieved since WPS 6.2.0 compared to previous
versions of WPS. The performance improvements are primarily attributable to:
changes to the physical representation of work items (that is, changes to the BPC
database schema)
changes to the index structure on BPC database tables
BPC query tables, which are introduced with WPS 6.2.0
The measurements in sections 9.6.1 and 9.6.2 have been made on the following machine setup:
Operating System: Microsoft Windows Server 2003 on all machines
Two physical machines: WPS server (standalone setup) and remote database (DB2 v9)
Relevant hardware details:

o
IBM xSeries 3650, 4x3.0 Ghz, 16MB Cache, 16GB memory (DB2 server and
WPS server)
Directed Studies
o
138
Gigabit Ethernet network connection
The measurements are CPU intensive and do not lead to an I/O bottleneck
All measurements have been made with a preloaded database with ~250,000 process instances.
9.6.1 Query Table Response Time

WPS 6.2.0 introduced query tables as a mechanism to achieve very good response times for
human task queries. Query tables are optimized for task and process list queries; they are
developed visually using the Query Table Builder and accessed using the query table API. Please
see the following link for more information:
Query Table Builder: http://www.ibm.com/support/docview.wss?uid=swg24021440
Query tables are a BPC-level concept (unlike database query tables), which only exist in the
context of BPC. BPC query tables do not have a process navigation performance impact.
Comprehensive documentation is published at the following location in the Info Center:
e.bpc.doc/doc/bpc/c6bpel_querytables.html
In order to demonstrate the performance improvements that can be achieved by using query
tables, two workloads have been defined: the QueryProperties query workload and ExternalData
query workload. The QueryProperties query workload represents a task list which contains
human tasks along with 10 query properties that have been defined on the related business
processes. The ExternalData query workload represents a task list which contains human tasks
along with 10 properties from an externally defined table (business data) that has been filled
with one entry for each human task.
Both workloads use the standard query API, as well as using query tables. Each workload has the
following similar characteristics:
250,000 business processes with one human task in state ready is available in the
database. Group work items are used to assign human tasks. 1,000 users are defined,
divided into 200 groups. A limit of 50 human tasks is returned by each query.
10 simulated users continually execute queries during the measurement interval in order
to measure the average response time. Therefore, the database is executing 10 parallel
queries continuously during the measurement interval.
No special tuning has been applied to WPS beyond that recommended in this report.
Note that the database is the bottleneck for these measurements, running at 100% CPU
utilization. Standard BPC database tuning was applied, described in the following:
WebSphere Process Server V6.1 Business Process Choreographer: Performance
Tuning Automatic Business Processes for Production Scenarios with DB2
Improving the performance of complex BPC API queries on DB2
The following figure shows a screenshot of the query table used for the QueryProperties query
workload:
Directed Studies
139
Figure 1: QueryProperties query workload Query Table Builder screenshot
The following charts summarizes the query response times achieved using WPS 7.0 with query
tables versus the response times achieved using WPS 6.1.2 with the standard query API. As
demonstrated below, WPS 7.0 queries are up to 20 times faster than WPS 6.1.2 due to the query
table optimization. In addition, these results were obtained without using expert-level database
tuning, but rather the standard tuning described in this document and in the links above.
WPS 7.0 Query Tables vs. WPS 6.1.2 Standard Query API
5.7
3.7
Seconds
WPS 6.1.2
0.2
0.27
WPS 7.0
ExternalData query
workload
QueryProperties query
workload
Copyright IBM Corporation 2009, 2010. All rights reserved.
Directed Studies
140
Figure 2: Query workloads results (response time in seconds)
9.6.2 BPC Explorer Response Time (WPS 6.2.0 data)

The data presented below was obtained using WPS 6.2.0, but the authors expect similar
performance using WPS 7.0 for these scenarios.
WPS 6.2.0 delivers an optimized index structure for the BPC database, which improves the
response times of queries performed by the BPC Explorer. Changes to the physical
representation of work items in the database also produce a positive performance impact.
The following tuning was applied to achieve the results shown in Figure 3 below:
No tuning has been applied to the WPS server
No specific database tuning other than described in Section 9.6.1 above.
The default set of indexes as provided with the WPS 6.2.0 installation were used, no
additional indexes were created.
Figure 3 shows BPC Explorer query response times obtained using a pre-filled BPC database
with the following characteristics:
250,000 processes in total
The processes have been navigated to 8 different states:

o
5,000 processes in state terminated
12,000 processes in state waiting
50,000 processes waiting for a sub-process to respond
100,000 processes with a human task assigned to a group (group work item)
3,000 processes with escalated human tasks
75,000 processes with 10 invoke activities executed
5,000 processes in state failed
These results demonstrate that BPC Explorer query response times are significantly improved in
WPS 6.2.0 by a factor of up to 7.5 times when compared to WPS 6.1.2.
Directed Studies
141
BPC Explorer Query Response Time
30
Query Response Time (seconds)
26
25
20
15
9
9
10
4
3
4
6.1.2 Index Structure
0
My ToDos
(Tasks)
1
Administered
By Me (Tasks) Instance Details
(Processes)
6.2 Index Structure

Template
Details
(Processes)
Figure 3: BPC Explorer improvements in WPS 6.2
Directed Studies
142
9.7 WPS Release-to-Release improvements

9.7.1 SOABench 2008 Automated Approval (microflow)
Over a series of releases, significant performance improvements have been made in WPS. The
SOABench Choreography Facet, Automated Approval mode (a microflow) is used to
demonstrate the magnitude of these improvements. The results shown below indicate that on a
Windows system this workload is 250% faster in WPS 7.0.0.1 than in WPS 6.0.2.1, and 23%
faster in WPS 7.0.0.1 than in WPS 6.2.0.1.
SOABench Choreography Facet - Automated Approval

Windows - Release History
% of WPS 6.0.2.1 Throughput
300%
250%
WPS 6.0.2.1
200%
WPS 6.1.0.0
WPS 6.2.0.1
150%
WPS 7.0.0.1
100%
50%
0%
4 cores
Note that the chart above uses 2 versions of the SOABench workload; SOABench 2005
Automated Approval Mode and SOABench 2008 Automated Approval Mode. The 2005 version
was used previously to obtain the WPS 6.0.2.1 and 6.1.0 results. The bridge between the 2
different versions of the workload was built by running WPS 6.2.0.1 on both versions of the
workload, and then running WPS 7.0.0.1 on the 2008 version. Therefore, the results presented
above are normalized throughput rather than raw throughput, since the 2 versions of the workload
do not produce comparable throughput SOABench 2008 is more complex, as shown in the
workload descriptions referenced above.
9.7.2 Banking (macroflow)

The throughput of long running processes (macroflows) in WPS has consistently improved
release to release since the initial 6.0.0 release. This is true across a number of different scenarios
and workloads; we demonstrate long running process improvements below using the Banking
(JMS) workload, a macroflow that models a mortgage loan application. We present results
measured using WPS 6.0.0.0, 6.0.1, 6.0.1.1, 6.0.2, 6.1.0, and 6.2.0. Using this workload, WPS
Directed Studies
143
6.2.0 is 3.8 times faster than WPS 6.0.0. In addition, WPS 6.2.0 is 10% faster than WPS 6.1.0.
Note that we expect that WPS 7.0 performs similarly to WPS 6.2.0.
Tuning parameter settings for Banking are described in Appendix A - Banking Settings. One key
configuration difference starting with WPS 6.1.0 is the usage of filestores for the messaging
buses, as opposed to using local databases in previous releases. Another key difference is the use
of WorkManager based navigation and the gencon garbage collection policy in 6.2.0.
WPS 6.0.0 -> 6.2.0 Throughput Improvement

5.0
4.5
99%
99%
BTPS (up is good)
4.0
3.5
WPS 6.0.0
WPS 6.0.1
WPS 6.0.1.1
WPS 6.0.2
WPS 6.1.0
WPS 6.2.0
99%
3.0
2.5
98%
2.0
100%
1.5
98%
1.0
0.5
0.0
Banking on Windows, 1 CPU, JMS

Core Utilization Above Each Bar
WebSphere Process Server, DB2
Intel 3.0 GHz A
9.8 Impact of Varying Number of Active Business Process

Instances
9.8.1 Throughput as increase Preloaded Process Instances
The SOABench 2008 Outsourced Claim Processing workload, described in section 10.4.3, is
evaluated on an IBM xSeries 3950 M2, 2.93 GHz Xeon (4x quad-core) running Windows 2008
Server. This workload is used to demonstrate the throughput characteristics of WebSphere
Process Server business choreography as an increasing number of active interruptible business
Directed Studies
144
process instances are preloaded into the business process choreography database. An active
process instance is defined as one not yet completed. It can be in-flight, but it can also be
persisted into the business process choreography database if it is waiting for a response from an
outbound service call. The client driver maintains a constant number of active process instances
by issuing new 3 KB requests as processes in the system are completed.
A three tier topology was used for this study:
Two client systems. One runs a client driver and an application to handle asynchronous
service requests. The other runs an application to handle synchronous service requests.
As shown below, throughput remains essentially constant as the active number of process
instances is varied between 2,500 and 1,000,000. With 2,500 and 25,000 preloaded process
instances, WPS 7.0.0.1 runs the workload at a rate of 28.4 Claims Completed per Second (CCPS).
With 125,000 and 250,000 process instances preloaded, the workload runs at nearly the same rate,
28.2 and 28.3 CCPS respectively. With 500,000 and 1,000,000 preloaded process instances, the
rate dips very slightly to 28.1 and 27.9 CCPS, respectively.
SOABench 2008 Outsourced Claim Processing

Increasing Process Instances Preload
Throughput - Windows
35
30
25
20
15
10
5
0
2.5K
25K
125K
250K
500K
1000K
Preloaded Process Instances

CPU Utilization 99% at each preload level
Driver 1
Driver 2
DB2
Directed Studies
145
9.8.2 Database System Behavior

The DB2 system hosts the Business Process Choreographer database, the Business Process
Choreographer Message Engine database, and the WPS System Message Engine database. It is an
8 core, 4.7 GHz POWER6 system running DB2 on AIX. It is configured with four RAID 10
arrays each with twelve disks. A file system for database containers is striped across 2 of the
arrays and a file system for database logs is striped across the other 2.
Preloaded process instances are stored in the Business Process Choreographer database. The
behavior of the database system is of interest because the preloaded process instances populate
this database. The following table shows the Core, Disk Utilization, and I/O Wait during peak
throughput at various preload levels. The rise of Database Container Disk Utilization and IO wait
slows after the 250,000 preload indicating that the disk subsystem can provide adequate response
at these throughput and preload levels.

2.5K 25K
125K 250K 500K 1000K
Workload Throughput
28.4
28.4
28.2
28.3
28.1
27.9
Database System CPU Utilization
19%
19%
19%
19%
20%
20%
Database Container Disk

Utilization
2%
5%
23%
39%
40%
41%
Database Logs Disk Utilization
9%
9%
9%
9%
9%
9%
Database System IO Wait
1.4% 1.5% 3.7% 6.3% 6.5% 7.0%
The amount of disk storage needed for the Business Process Choreographer database as the
number of process instances increases is shown in the chart below. This information was obtained
using the DB2 Control Center Storage Manager. The second chart shows the size of a database
backup at various preloads. The backups were created using the command: DB2 BACKUP
DATABASE database.
For both charts a 2x growth in the preloaded tasks results in a 2x growth in the storage
requirements, reaching approximately 77 Gigabytes at 1,000,000 preloaded tasks.
Directed Studies
146

Increasing Process Instances Preload - AIX - DB2
Process Choreographer Database Size
GigaBytes of Disk Storage
90
76.84
80
70
60
Trend:
As Preload doubles,
Size approxim ately
doubles
50
40
38.63
30
19.19
20
10
10.19
0.30
2.15
2.5K
25K
0
125K
250K
500K

1000K
Size determined using DB2
Storage Manager snapshot

Increasing Process Instances Preload - AIX - DB2
Process Choreographer Database Backup Size
GigaBytes of Disk Storage
90
77.15
80
70
60
Trend:
As Preload doubles,
Size approxim ately
doubles
50
40
39.71
30
19.11
20
10
11.14
0.32
2.17
2.5K
25K
0
125K
250K
500K

1000K
Size determined using backup
of database saved to disk
Directed Studies
147
The growth of the Business Process Choreographer database depends on the data passing through
the process. As seen above, since the requests passing into the system did not change, database
growth behavior is predictable as more requests are preloaded into the system.
An additional consideration for growth is the definition of the process being handled. A more
complex process can result in greater storage requirements. Numerous tables in the Business
Process Choreographer database are involved in process instance storage.
The pie chart below shows the Kilo Bytes used by tables in the Business Process Choreographer
database per task. The data was extrapolated from a database with 25,000 preloaded SOABench
2008 Outsourced Claim Processing tasks. The storage per task is 91 KB. Thirteen tables make up
the majority of storage used. The SCOPED_VARIABLE_INSTANCE_B_T table and the
ACTIVITY_INSTANCE_B_T table account for 58 KB (64%) of the storage used.
SOABench 2008 Outsourced Processing - Process Choreographer Database

Table Physical Storage per task in KiloBytes (data, index, and lob objects)
11
11
4
SCOPED_VARIABLE_INSTANCE_B_T
ACTIVITY_INSTANCE_B_T
PROCESS_CONTEXT_T
WORK_ITEM_T
RESTART_EVENT_B_T
4
36
EVENT_INSTANCE_B_T
CORRELATION_SET_PROPERTIES_B_T
TASK_INSTANCE_T
INVOKE_RESULT2_B_T
PROCESS_INSTANCE_B_T
QUERYABLE_VARIABLE_INSTANCE_T
SCOPE_INSTANCE_B_T
RETRIEVED_USER_T
Other
Labels: KiloBytes in Segment

91 KiloBytes total per task
22
The number of rows in these tables depends on the process definition. The chart below shows the
number of rows in various database tables needed to store a single process instance for this study.
The ACTIVITY_INSTANCE_B_T table uses 16 rows to hold its portion of the process instance.
This corresponds to the 16 activity blocks in the process definition. The
SCOPED_VARIABLE_INSTANCE_B_T table uses 24 rows per process instance. This
corresponds to the number of assignments done by the process.
Directed Studies
148
SOABench 2008 Outsourced Processing - Process Choreographer Database

Table Rows Relative to Process Instances Population
16x
ACTIVITY_INSTANCE_B_T
CORRELATION_SET_INSTANCE_B_T
CORRELATION_SET_PROPERTIES_B_T
1x
1x
5x
4x
EVENT_INSTANCE_B_T
Table Name
INVOKE_RESULT2_B_T
PARTNER_LINK_INSTANCE_B_T
PROCESS_CONTEXT_T
PROCESS_INSTANCE_B_T
QUERYABLE_VARIABLE_INSTANCE_T
RESTART_EVENT_B_T
RETRIEVED_USER_T
SCOPE_INSTANCE_B_T
2x
1x
1x
1x
1x
2x
1x
24x
SCOPED_VARIABLE_INSTANCE_B_T
TASK_INST_LDESC_T
TASK_INSTANCE_T
1x
1x
WORK_ITEM_T
0.0
5x
5.0
10.0
15.0
20.0
25.0
Table Row Multiplier
9.9 Impact of Business Object size on throughput

The SOABench 2008 Choreography Facet Automated Approval workload (see section 10.4.2)
was used to explore the effect of Business Object (BO) size on throughput. BO size was varied by
specifying a variable amount of information in the customer detail fields in the claim request,
which is referred to as the "payload. Three BO sizes were used: 3KB, 10KB and 100 KB, using
same size for requests and responses.
The chart below shows increasing throughput boosts running WPS 7.0.0.1 versus WPS 6.2.0.1 as
BO size is increased. The throughput for various BO size, measured in Claims Completed per
Second (CCPS), along with the percent improvements follows:
3KB requests and 3KB responses: 390 CCPS, using WPS 7.0.0.1 which represents a 23%
improvement over WPS 6.2.0.1 (318 CCPS).
10KB requests and 10KB responses: 177 CCPS on WPS 7.0.0.1, demonstrating a 57%
improvement over WPS 6.2.0.1 (113 CCPS).
100KB requests and 100KB responses: 23.5 CCPS using WPS 7.0.0.1 which represents a
86% improvement over WPS 6.2.0.1 (12.6 CCPS ).
In addition to the improvements delivered in WPS 7.0.0.1, the other conclusion to draw from the
above data is that throughput drops significantly as BO size increases.
The bar labels on the chart below show the throughput improvement delivered in WPS 7.0.0.1 vs.
6.20.1, rounded to the nearest 10th percentile.
Directed Studies
149
SOABench 2008 Automated Mode

Impact of BO Size on Throughput
450
400
1.2x
350
300
WPS 6.2.0.1
250
200
WPS 7.0.0.1
1.6x
150
100
50
1.9x
0
3k-3k
10k-10k
100k-100k
4 cores
CPU Utilization 98% - 99% across all Bars

WPS
Driver,SOABench
Services1
SOABenchServices2
DB2
Intel 2.93 GHz B
Intel 3.5 GHz C
Intel 2.93 GHz D
PPC 2.2 GHz B
Directed Studies
150
9.10 Topology Study: SMP vs. Clustered WPS

9.10.1
Overview
The ability of a single Java Virtual Machine to efficiently use processor cores at high utilization
diminishes as the number of cores increases. To demonstrate this, this study directly compares
the vertical (SMP) and horizontal (clustered) measurements of SOABench 2008 on POWER6
running AIX using data shown previously in sections 5.1.3 and 5.1.4 respectively.
The same numbers of processor cores were used to run both Automated Approval and
OutSourced Modes. Although impressive throughput and scaling rates were achieved in the
single server topology, both workloads demonstrated significant performance gains by applying a
clustered topology where the same number of cores were divided among separate hardware
partitions on which multiple WPS JVMs worked together as cluster members (nodes).
Note that when additional hardware partitions are added, underlying resources are also added
such as: Java heaps, WebSphere log streams, network adapters, TCP stacks, disk adapters, file
systems, etc.
9.10.2
Automated Approval Mode
Here is the comparison of vertical to horizontal topologies of SOABench 2008 Automated

Approval Mode, described in section 10.4.2.
Using 8 cores, horizontal throughput is only slightly faster than vertical. However when using 16
cores, horizontal performance is clearly faster than vertical as the scaling limitations of the single
server JVM start to show up.
3000
98%
2500
98%
2000
1500
95%
90%
8 cores
8 cores
16 cores
16 cores
1 node
2 nodes x 4 cores
1 node
4 nodes x 4 cores
1000
500
0

Directed Studies
9.10.3
151
OutSourced Mode
Here is the comparison of vertical to horizontal topologies of SOABench 2008 OutSourced

Mode, described in section 10.4.3.
As with Automated Approval Mode above, horizontal performance is clearly faster than vertical
as the scaling limitations of the single server JVM start to show up.
140
96%
120
93%
100
80
60
40
20
0
8 cores
8 cores
1 node
2 nodes x 4 cores

9.11 Single Cluster Deployment Environment Pattern

9.11.1
Overview
It was recommended earlier in this report that the remote messaging and remote support
deployment environment pattern should be used for maximum flexibility in scaling. However,
there is a new capability in WAS 7.0 that affects message-driver bean (MDB) connection
behavior that is interesting to examine.
This section studies the impact of this MDB connection behavior on performance when measured
in the context of SOABench 2008 OutSourced Mode with a single cluster deployment
environment pattern. For comparison, measurements with a remote messaging and remote
support deployment environment pattern were shown in section 5.1.4.3.
9.11.2
MDB Connection Behavior
Directed Studies
152
As will happen with the single cluster deployment environment pattern, when an MDB
application is installed in the same cluster with the message engine, it will use MDB connection
behavior dependent upon the value of the alwaysActivateAllMDBs property of the appropriate
activation specification.
See this link for more information:
.nd.doc/concepts/cjn_mdb_endpt_overview.html
When this property has a value of false, the MDB will only connect to an active message engine
within the same JVM. When this property has a value of true, the MDB will also connect to an
active message engine on a separate JVM in the cluster. These two behaviors are depicted in the
following two charts.
Pre-V7 MDB Connection Behavior

8 core
Power5
8 core
Power5
8 core
16 core
4 core
4 core
SingleCluster
SOABench
BPEL
App
SOABench
BPEL
App
Micro
Flow
Micro
Flow
Async
Async
Macro
Flow
Macro
Flow
X
Failover
MEs
Active
MEs
Directed Studies
153
V7 MDB Connection Behavior

8 core
Power5
8 core
Power5
8 core
16 core
4 core
4 core
SingleCluster
SOABench
BPEL
App
SOABench
BPEL
App
Micro
Flow
Micro
Flow
Async
Async
Macro
Flow
Macro
Flow
Failover
MEs
Active
MEs
9.11.3
Topology
For this study, a single cluster contains the application and messaging engine, and this cluster has
two cluster members (nodes). The messages engines will run as failover on one node (left node)
and active on the other node (right node).
Dependent on the property value, the MDB in the left node will or wont connect to the active
message engine in the other JVM. The MDB in the right node will always connect to the active
message engine because it is within the same JVM.
Directed Studies
154
Topology: Single Cluster, SOABench 2008 OutSourced Mode - AIX

8 core
Power5
8 core
Power5
SOABench
Agent and
Outsourced
Services
SOABench
Outsourced
Controller
(Driver)
8 core
16 core
4 core
4 core
SingleCluster
SOABench
BPEL
App
SOABench
Services
DB2
(BPE)
IBM HTTP
Server with
WebSphere
Plugin
DB2
(MEs)
SOABench
BPEL
App
Micro
Flow
Micro
Flow
Async
Async
Macro
Flow
Macro
Flow
Failover
MEs
Active
MEs
DB2
(WPS)
9.11.4
Workload
The SOABench 2008 OutSourced Mode workload is not purely MDB driven. A full description
of the workload can be found in section 10.4.3. A significant portion of load is driven via
WebServices invocations, which are sprayed across the nodes from the IBM HTTP server
pictured in the topology above.
This is an important point, because even when the MDB of a particular node is unable to connect
to an active message engine, there is still a significant amount of work for it to perform.
9.11.5
Results
Reading from left to right, the 1st bar in the chart below is provided as a baseline for comparison.
For this measurement bar, the left node is stopped and the right node is started and handling all
workload traffic.
The 2nd bar shows pre-WAS 7.0 MDB behavior where the alwaysActivateAllMDBs property is
set to false. Again, because this workload is not purely MDB driven, the left node still handles
some workload traffic; however its CPU utilization is only 59% busy while the right node is
running at a very high 97% CPU busy.
The 3rd bar shows the performance improvement achieved when the property is set to true and the
left node is now able to perform additional work via its MDB connection to the active ME in the
right node, raising the left nodes CPU utilization to 81%. However, because of the very high
Directed Studies
155
CPU utilization (98%) of the right node, the left node has trouble taking more work from the ME
to drive its CPU utilization even higher.
The 4th bar shows further performance gains obtained by adjusting the weights on the http sprayer
to favor the left node for the non-MDB traffic, thus driving higher overall workload throughput
and better balance of CPU utilization between the left and right nodes. However, if the input
traffic varies significantly, the CPU utilization could become imbalanced one way or the other
until the http sprayer weight is adjusted. In practice, this would need to be monitored closely and
adjusted accordingly.
Single Cluster
(4 cores per node)
140
120
59%,97%
1.5x
100
80
94%,97%
1.9x
81%,98%
1.7x
98%
60
40
20
0
1 node
2 nodes, pre-V7
MDB behavior
2 nodes
2 nodes, http
sprayer weight 5-4

Although not measured here, we predict that adding more nodes to this single cluster topology
would further increase performance as long as the http sprayer weights are adjusted to achieve
good balance and the active message engine node does not become the bottleneck due to
excessive CPU utilization. Potentially, with enough cluster members, the http sprayer weight for
the active message engine node would have to be set to 0 (lowest) so that it only handles
messaging engine related work. However, well before such maintenance intensive adjustments of
the http sprayers weights are made, an alternate cluster topology should be considered.
9.11.6
Summary
A single cluster deployment environment is now more viable due to WAS 7.0 MDB
enhancements, especially for workloads heavily dependent on MDBs.
However, as this study illustrates, due to the imbalance of CPU utilization across nodes related to
where the active message engines are running, such a configuration should be considered
carefully for anything but the simplest of implementations.
Directed Studies
156
9.12 Scaling up production deployments

The WebSphere BPM products have been available for several years; as such, many customers
have developed very sophisticated and mature production deployments. The issues encountered
when scaling up a production deployment are often quite different from those faced when initially
developing a solution, including adding new applications, expanding existing applications with
more modules, bringing more concurrent users online, adding additional cluster members, etc.
Many of these issues are discussed in this performance report. However, the information is
located in several different sections of the report. The purpose of this section is to cross-reference
this information to make it easier for the reader to locate.
Here is a cross reference of Authoring information for scaling up deployments:
WID Considerations
Reduce the number of SCA Modules, and Modularity Impact at Runtime
Hardware matters: server and desktop
Utilize Shared Libraries
Utilize multi-threaded SCA clients
Following are sections which address clustering:
WPS measurements in a clustered environment
Topology Considerations
Clustering Best Practices
Clustering Tuning
Topology Directed Studies: SMP vs. Clustering and Single Cluster
Finally, here are discussions on issues for high volume runtime deployments:
Key Tuning and Deployment Guidelines
Tuning Checklist
Tuning Methodology
Tuning for High Concurrency
Thread Pool Tuning
Supporting up to 10,000 concurrent users
Using Query Tables to optimize query response time
WESB Client Scaling
Sample Tuning Settings WPS and WESB
9.13 WICS to WPS Migration
Directed Studies
157
The Websphere Integration Developer (WID) provides a wizard and command line utility which
enables users to migrate Websphere InterChange Server (WICS) content to equivalent artifacts on
Websphere Process Server (WPS). This wizard can, with minimum developer input, generate
fully functional WPS artifacts. Please note that migration is a complex topic with many different
aspects; for a complete discussion please see the IBM WebSphere InterChange Server Migration
to WebSphere Process Server Red Book at the following location:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247415.pdf
This section evaluates the performance of WID 7.0.0.1-generated migration artifacts running on
WPS 7.0.0.1 by comparing it with the performance of an equivalent workload running on WICS
4.3.0.6 and an equivalent WPS workload run on previous versions of WID/WPS (6.1.0 & 6.2.0).
The workload used for evaluation is Contact Manager with a Web Services binding. The Contact
Manager workload is described in section 10.2. There are 4 workloads used to evaluate the
performance, each of which is different but semantically equivalent.
WICS version: utilizes the WebSphere Business Integration Adapters (WBIA) Web
Services adapter to act as the source of Business Objects, and an emulated Clarify adapter
as the destination. The Web Services adapter interacts with the WICS server using
WebSphere MQ and the emulated Clarify adapter is connected to the WICS server via
IIOP.
WPS 6.1.0 version: developed by making use of the WICS Migration Wizard in WID
6.1.0 to migrate the WICS workload described above. This wizard migrates the Web
Services adapter to still be the WBIA Web Services adapter (but to be run in a standalone
JMS mode) and migrates the emulated Clarify adapter to a custom adapter which
interfaces with WPS using JMS. The workload was subsequently modified to remove the
relationship map step from the maps and to post an async one way JMS message for each
interaction with the emulated Clarify adapter. This is to ensure that the workload driver
can drive enough work into the system to maximize throughput. The generated workload
is measured on WPS 6.1.0.
WPS 6.2.0 version: developed like the WPS 6.1.0 version by using the WID 6.2.0 WICS
migration wizard. This wizard differs from the 6.1.0 version in that it migrates the WBIA
Web Services adapter to an HTTP SCA binding with a custom data handler. Postmigration modifications performed are the same as in WPS 6.1.0 version. The workload
is then measured on WPS 6.2.0.
WPS 7.0.0.1 version: developed like the WPS 6.2.0 workload but using the WID 7.0 .0.1
WICS migration wizard. The 7.0.0.1 wizard offers the option of merging the connector
and collaboration modules during migration. Post-migration, the workload was changed
to incorporate the Migration Development Best Practices and to post an async one way
JMS message for each interaction with the emulated Clarify adapter. The workload is
then measured on WPS 7.0.0.1.
All four workloads described above are evaluated on an IBM pSeries model 9117-MMA, 4.7
GHz (8-way SMP) running AIX 6.1 to demonstrate the throughput characteristics. Measurements
are shown in the chart below.
On the above specified setup with all eight cores enabled, the WID 7.0.0.1 migrated workload
runs on WPS 7.0.0.1 at a rate of 1004 Business Transactions Per Second (BTPS), which is a 54%
improvement over WPS 6.2.0. WPS 6.2.0 runs the WID 6.2.0 migrated workload at a rate of 650
BTPS which is an 8.3x improvement over 6.1.0. WID 6.1.0 migrated workload runs on WPS
6.1.0 at a rate of 78 BTPS.
Directed Studies
158
On the same setup as above, WICS 4.3.0.6 runs its workload at a rate of 1,049 BTPS. A few notes
on this data are relevant:
WPS 7.0.0.1 delivers comparable throughput as WICS for the same workload.
WICS 4.3.0 only utilizes 54% of the available cores, even after comprehensive tuning
was done. This is due to limitations in the WICS runtime architecture, notably a singlethreaded listener path for processing incoming events. WPS does not have this limitation
and therefore has superior SMP scaling, as is demonstrated in the chart below.
The data presented below is for a single server configuration, since WICS does not
support clustering. WPS can deliver higher throughput rates than are show below via
clustering.
B usiness Transactions per second
Contact Manager with Webservices binding

ICS Migration performance - AIX
1200
96%
1000
WICS 4.3.0.6
800
62%
600
ICS Migration WPS

6.1gm
400
ICS Migration WPS

6.2gm
200
ICS Migration WPS

7.0gm
7%
0
8 core
% of WICS performance shown above WPS bars

WICS CPU Utilization is 54%.
WPS CPU Utilization 93-94% for all releases
WICS, WPS server
DB2
Driver
POWER6 4.7 GHz - F
PPC 2.2 GHz - C
Intel 3.5GHz - D
9.14 Large Object size study

9.14.1
Introduction and Caveats
This section contains a series of studies exploring the behavior of a system in the presence of a
large input event (BO). Data is shown for WPS 7.0.0.1 and WESB 7.0.0.1.
Directed Studies
159
For any application, the maximum size input object that it can support depends on a number of
factors. The amount of processing required to complete a transaction and the representation of the
input event internal to the application are clearly important as they affect the number of copies of
the event required to be held in memory and the nature of the objects held in the Java Heap
(whether they are contiguous or composed of a set of smaller, discrete objects).
Also, the ability to process large input events usually depends on the transactional nature of the
processing involved. Some data processing systems are able to break a large transaction into
multiple smaller transactions that are processed (or committed) independently, while others are
not. Whenever possible it is advisable to design a solution that does not depend on processing
input events of arbitrarily large size. Please refer to the Best Practices described in Section 2.5 for
more information related to processing Large Business Objects.
The sections that follow display a wide variety of results. While it may be tempting to do so,
please do not view the data as a fundamental product limit for the largest input event size. Rather,
these sections are a set of case studies intended to explore the factors affecting the ability of a
solution to successfully process a large input event.
9.14.2
Large Objects in WPS
The SOABench 2008 Automated Approval workload (see section 10.4.2) was used to explore the
ability to handle large objects within a business process running in WPS 7.0.0.1. The purpose of
this study was to find the maximum object size that the system can handle repeatedly (20 times
for this study) without exceptions. The system evaluated to find the maximum size is an AIX 6.1
system with 31 GB of RAM running a 32 bit version of WPS 7.0.0.1. In addition an evaluation of
an AIX 64 bit version of WPS 7.0.0.1 was done for a single 500 MB request.
Large Object requests were produced in the client driver by creating additional customer detail
fields in the claim request which is referred to as the "payload." Note that the charts below show
the client driver's input object size and not the actual size processed by WPS; the generation of
the payload results in an actual request size 6% larger than the client reports. For example a 100
MB request is actually 106 MB in WPS (110 MB on the wire with packet overhead).
Responses from the server are constant at 3 KB in size. The SOABench 2008 automated approval
workload implementation used for this study holds 7 copies of the payload for use during the
various steps of the process flow resulting in many large contiguous memory objects contending
for Java heap space. Note: the SOABench 2005 automated approval workload, used in previous
versions of this performance report, holds 5 copies of the payload so maximum object size should
not be compared between the two workload versions.
The maximum Java heap size required was determined by repeated experiments to balance the
memory needed for native memory versus the Java heap as large object sizes were increased. On
AIX the optimal maximum heap was determined to be 2600 MB but to achieve this it was
necessary to set an operating system variable:
"export LDR_CNTRL=MAXDATA=0xB0000000@DSA"
in the session starting the WPS server to provide additional memory segments for user processes.
For the AIX WPS 7.0.0.1 64 bit system study, the maximum Java heap was set to 9800 MB with
no additional AIX system variable settings required. In all cases, native heap space was
preserved by using type 4 JDBC drivers for WPS datasources. See reference:
http://www-128.ibm.com/developerworks/eserver/articles/aix4java1.html
The chart below shows the 32-bit WPS system object maximum was 150 MB for WPS 6.2.0.1
and 170 MB for WPS 7.0.0.1, a 20 MB improvement. The 64 bit WPS 7.0.0.1 was able to handle
Directed Studies
160
the 500 MB object request submitted. Note that this was the largest size attempted; finding the
maximum size for this system was not attempted.
Transaction completion time also improves on large requests in WPS 7.0.0.1. 150 MB requests
on the IBM pSeries power6, 4.7 GHz 4 core, AIX 6.1 system took 542 seconds each on 32-bit
WPS 6.2.0.1, but the larger 170 MB request took only 490 seconds on 32-bit WPS 7.0.0.1. The
500 MB request on this hardware running 64-bit WPS 7.0.0.1 took 1,376 seconds to complete.
Due to the response times shown above, it was necessary to increase several timeout settings for
both the SOABench client driver and the WPS server running the workload. These include:
Increasing the Application Server Transaction Service timeouts for Total transaction
lifetime, Async response, Client inactivity, and Maximum transaction.
Increasing the SOABench BPEL EJB module web service client bindings request timeout.
Increasing socket read and write timeouts for both the SOABench Client and Server
invocations using the JVM properties (in seconds) Dcom.ibm.ws.webservices.readTimeout and "-Dcom.ibm.ws.webservices.writeTimeout=".

Maximum Large Request Size - AIX 32 bit
200
170
Request Size in MegaBytes
180
150
160
140
120
6.2.0.1
100
80
7.0.0.1
60
40
20
0
Request Size Shown Above Each Bar
POWER6 4.7 GHz D
Driver
DB2
POWER6 4.7 GHz D
Directed Studies
161

32 bit versus 64 bit Large Request Size - AIX
Request Size in MegaBytes
600
Achieved**
500
500
400
AIX 32 bit
300
AIX 64 bit
Maximum
170
200
100
0
Request Size Shown Above Each Bar

** Larger request possible but not attempted
POWER6 4.7 GHz D
9.14.3
Driver
DB2
POWER6 4.7 GHz D
Large Objects in WESB
The JMS binding and Web Services scenarios were evaluated with large messages to determine
the largest message which could be processed in sustained operation. The tests were run for a
period of 2 hours.
These tests were run using the Transform Value mediation and a Custom mediation which
transforms the value of a single field in the request message. These mediations were chosen as
they represent a simple case requiring little processing and a complex case which will cause the
request to be serialized, respectively. For details of the mediations see section 11.3. For details of
the topology used see section 11.1 and 11.2.
Directed Studies
162
The Java heap was set to a fixed size of 1536 MB for these measurements.
9.14.3.1 Web Services Binding large messages
The chart below shows that the maximum request size ranges from 82 MB to 96 MB, and the
maximum response size ranges from 91 MB to 110 MB, depending on the processing done in the
mediation.
Web Service Large Messages - Windows
Message Size (MBytes)
120
100
80
V6.2
V7.0
60
40
20
0
Transform Value
Mediation (req)
Transform Value
Mediation (res)
Custom Mediation (req) Custom Mediation (res)
16 CORE
(req=request m essage
res=response m essage)
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
Directed Studies
163
9.14.3.2 JMS Binding large messages

The chart below shows that the maximum message size ranges from 75 MB to 130 MB
depending on the processing done in the mediation and whether persistent or non persistent
messages are utilized.
JMS Large Messages - Windows

140
Message size in MB
120
100
6.2
80
7.0.0.1
60
40
20
0
Transform Value Custom Mediation Transform Value Custom Mediation
Mediation Non
Non Persistent
Mediation
Persistent
Persistent
Persistent
4 CORE
JMS
Producer/Consumer
WebSphere ESB
DB2
Intel 2.8GHz - B
Intel 3.0GHz - D
Intel 3.50GHz - A
9.15 Messaging Binding Comparison using WESB

The following two sections illustrate the performance differences of the various messaging
bindings in both Non Persistent and Persistent modes of operation. Here is a summary of the
results shown below:
For non-persistent messaging, using the default messaging provider within WESB
(WebSphere Platform Messaging) is 38% faster than the MQ JMS provider using the
Directed Studies
164
Base message size (1.2 KB). MQ JMS provides equivalent messaging performance to
the MQ binding for the same scenario.
For persistent messaging, the default messaging provider is 49% faster than the MQ JMS
provider using the Base message size. MQ JMS messaging outperforms the MQ binding
by 7% for the same scenario.
Note: Generic JMS was not tested in V7.0.0.1. V6.2 tests showed the performance to be identical
to MQ JMS.
9.15.1
Messaging Binding Comparison Non Persistent
The following charts compare the throughput for the different non persistent messaging bindings
using the Transform Value mediation and a range of message sizes. For details of the mediations
and request sizes see sections 11.3 and 11.4. All data is obtained on a 4-way hyper-threaded
WESB server machine. For details of the topologies used see section 11.2.
Xform Value Mediation Non Persistent - Windows

2500
98%
Reqs/Sec
2000
1500
98%
Base
98%
10
1000
98%
100
98%
98%
500
98%
95%
99%
0
JMS
MQ JMS
MQ
4 CORE
WebSphere ESB
Intel 2.8GHz - B
Intel 3.0GHz - D
Directed Studies
165
9.15.2
Messaging Binding Comparison Persistent
The following charts compare the throughput for the different persistent messaging bindings
using the Transform Value mediations and a range of message sizes. For details of the mediations
and request sizes see sections 11.3 and 11.4. All data is obtained on a 4-way hyper-threaded
WESB server machine. For details of the topologies used see section 11.2.
Xform Value Mediation Persistent - Windows

1400
1200
94%
Reqs/Sec
1000
800
Base
97%
10
92%
81%
600
91%
100
88%
400
200
97%
89%
80%
0
JMS
MQ JMS
MQ
4 CORE

JMS
Producer/Consumer
WebSphere ESB
DB2
Intel 2.8GHz - B
Intel 3.0GHz - D
Intel 3.50Ghz - A
Directed Studies
166
9.16 XSL Transform (XSLT) vs. BOMap primitives using WESB

The charts below compare the performance of the XSL Transform primitive and the Business
Object Map primitive. The charts show that in a mediation flow which is eligible for deferred
parsing the XSL Transform primitive gives better performance; see section 3.8.1 for details on
which mediations are eligible for deferred parsing. However in a mediation flow where the
message is already being parsed the Business Object Map primitive gives better performance.
XSL Transforms are more efficient in mediation flows which are eligible for deferred parsing
because the message flowing through the mediation remains in a serialized form throughout the
flow. Thus, no serialization is required prior to executing the transformation associated with the
XSLT primitive. BO Maps are more efficient in mediations that do not leverage deferred parsing
because the message flowing through the mediation remains in object form during all processing.
If a BO Map is used in a mediation flow that is otherwise eligible for deferred parsing
deserialization of the message will occur and the flow will no longer be eligible for deferred
parsing. If an XSL Transform is used in a mediation flow that is processing the message in object
form, the message will be serialized prior to performing the XSL transformation and deserialized
after the XSL Transform primitive has completed its processing.
XSLT vs BOMap in deferred parsing flow - Windows

3500
3000
96%
96%
Reqs/sec
2500
96%
96%
97%
96%
XSLT
2000
96%
1500
BOMap
96%
1000
71%
500
98%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
The XSLT mediation sets the value of a single element in the request message and copies all
other elements unchanged using the XSL Transform primitive. The request message processing is
The BOMap mediation uses the Business Object Map primitive to map the body of the request
message into a new Business Object and sets the value of a single element. The request message
processing is not eligible for deferred parsing.
Directed Studies
167
XSLT vs BOMap in non deferred parsing flow - Windows

3000
96%
2500
97%
97%
Reqs/sec
2000
97%
ElemSet XSLT
97%
1500
97%
ElemSet BOMap
97%
97%
1000
500
98% 98%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
The two mediation flows in this chart are the same as in the chart above but with a Message
Element Setter primitive inserted into the mediation flows before the XSL Transform and
BOMap primitives. The Message Element Setter primitive is included to force a parse of the
message so that the flow is not eligible for deferred parsing.
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
9.17 Modularity Impact - Composite vs. Chained Mediations

The following charts show the throughput measured for a mediation flow comprised of several
primitives and connected either as a composite (all in one mediation module) or chained (separate
modules connected via SCA bindings) using a range of request and response sizes. For the
composite case there is an additional comparison where all the mediation primitives reside in one
mediation flow component (MFC), or are split into multiple MFCs (one primitive in each). For
details of the mediations and request/response sizes see section 11.3 and 11.4. All data is obtained
using the JAX-WS (SOAP 1.1) Web services bindings on a 16-core WESB server machine. For
The purpose of this comparison is to show the scale of the overhead in modularizing a mediation
either by using SCA bindings to link mediation primitives in separate modules (chained), or by
Directed Studies
168
linking primitives in a single module using multiple MFCs. For each of the three cases all the
mediation code is still running in a single JVM.
As the chart shows below, using a composite mediation is significantly cheaper than the chained
variation as less data conversion (with an associated reduction in heap usage) will take place.
Splitting the primitives across multiple MFCs in the same module has a lower overhead with the
proportional cost decreasing with message size.
Composite vs Chained Mediation - Windows
1600
1400
97%
96%
98%
1200
Reqs/sec
Composite
Composite (Multi MFC)
Chained
97%
96%
1000
98%
97%
98%
800
98%
98%
98%
98%
600
400
200
98%98%98%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
Directed Studies
169
9.18 Throughput using JAX-WS vs. JAX-RPC for Web Services

JAX-WS became the default Web Services binding in WESB V7.0. The two charts below
compare the older JAX-RPC binding with the new default of JAX-WS (SOAP 1.1). Further
optimizations to JAX-WS processing has been introduced in WESB V7.0 resulting in this binding
outperforming JAX-RPC.
The first chart (composite mediation) shows that JAX-WS is on average 5% faster than JAX-RPC
over a range of message sizes.
The second chart shows results for the transform namespace mediation which is eligible for
deferred parsing in both the request and response flows. In this case JAX-WS is on average 19%
faster than JAX-RPC over a range of message sizes.
For details of the mediations and request/response sizes see section 11.3 and 11.4. All data is
obtained on a 16-core WESB server machine. For details of the topology used see section 11.1.
Composite Mediation W/S Binding Comparison - Windows

1600
97%
97%
JAX-RPC
JAX-WS
1400
Reqs/sec
1200
98% 97%
1000
98% 98%
800
98% 98%
600
400
200
98% 98%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
Directed Studies
170
Transform Namespace Mediation W/S Binding Comparison Windows

4500
95%
JAX-RPC
4000
3500
JAX-WS
96%
96%
Reqs/sec
3000
96%
2500
95%
96%
95%
95%
2000
1500
1000
500
89% 88%
0
Base in/Base out
Base in/10K out
10K in/Base out
10K in/10K out
100K in/100K out
16 CORE
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 2.93GHz - C
Intel 3.5GHz - B
Directed Studies
171
9.19 Authoring Studies

9.19.1
Summary of Key Measurements
The studies presented in the following sections explore issues relevant to the performance of
WebSphere Process Server and WebSphere Integration Developer 7.001 when used in an
authoring environment.
From these studies, the following observations can be made:
1. Deployment to a production server is expected to be as much at twice as fast as what is
experienced in a development environment.
2. When using wsadmin to install SCA Modules, installing multiple modules in a WAS
Session and then saving the configuration change together is faster than installing (and
saving) each of the Modules individually.
3. In addition to memory savings, defining Shared Libraries according to the technote
(available at: http://www-01.ibm.com/support/docview.wss?uid=swg21298478) reduces
total application install time.
Directed Studies
9.19.2
172
Hardware Study Server vs. Desktop systems
For this study, we compare response time when publishing the Loan Processing workload from
WID 7.0.0.1 to WPS 7.0.0.1 on a variety of hardware configurations. Two different machine
types are used: a desktop system resembling a typical developers workstation and a server
system resembling a typical production server. Additionally, each of the two systems is measured
in three different configurations, varying the number of cores available to the system as well as
the configuration of the Disk subsystem.
The results from the Model 9196 Desktop system indicate that addition of a second processing
core improves publish responsiveness from 738 seconds to 612 seconds (a 17% improvement).
Addition of a second physical disk drive (and installing WID & WPS to that drive, isolating its
activities from those associated with the operating system) delivers an additional 12%
improvement.
The results from the Model 7233 Server System indicate that, even with only a single processing
core active, the presence of a fast disk subsystem (RAID Disk array combined with filesystem
improvements available in the server operating system) leads to improved publish responsiveness.
Addition of a second core further improves responsiveness. Additional cores beyond the second
would lead to only a small improvement in responsiveness.
From this data it would be reasonable to expect deployment to a production server to be as much
as twice as fast as deployments that developers experience on their workstations, due simply to
the hardware differences typical in the two environments.

BPM 7.001 Publish Response Time - Windows
800
700
738
51%
Time (Seconds)
600
612
35%
500
538
41%
477
96%
400
374
66%
300
200
100
0
1 Core, 1 Disk
2 Core, 1 Disk
2 Core, 2 Disk
1 Core, RAID
Model 9196
2 Core, RAID
Model 7233
Bar Labels: Response Time & Average CPU Utilization
Directed Studies
173
Model 9196
Model 7233
Intel 2.66GHz B Intel 2.8 GHz - D
Directed Studies
9.19.3
174
Deployment Strategy Study
In this study we use the 60 Modules in the Loan Processing application to demonstrate the
relative performance of some of the options available when deploying Modules via the wsadmin
tool.
First, we use a wsadmin install script that saves the changes made under the configuration session
multiple times when executing the install. Each of the 60 Modules is installed, saved & started
independently, before proceeding to the next Module. This installation operation completes in
466 seconds as shown in the Multiple WS Saves measurement in the chart below.
Second, we use a wsadmin install script that installs all 60 of the Modules, with a single save
operation after all of the Modules are installed. Then, each of the Modules is started. This
operation completes in 382 seconds, 18% faster than the Multiple WS Saves measurement. This
data appears as the Single WS Save measurement in this data chart.
Finally, the shared libraries technique described in the technote,
http://www-01.ibm.com/support/docview.wss?uid=swg21298478, is used in conjunction with the
Single WS Save technique described here. In addition to the memory savings that shared libraries
provides, it delivers an additional 15% savings in install response time, for a total install time of
326 seconds (30% faster than the Multiple WS Saves approach).

BPM 7.001 Publish Response Time - Windows
500
450
466
Time (Seconds)
400
382
350
300
326
250
200
150
100
50
0
Multiple WS Saves
Single WS Save
Single WS Save & Shared Libraries
2 Core
Intel 2.66GHz B
Directed Studies
175
9.20 BPM 6.2.0 Directed Studies

The studies presented below utilize data from the 6.2.0 release of the WebSphere BPM products.
These studies were not repeated for WebSphere BPM 7.0.0.1 because the authors believe that the
messages conveyed by the studies would not be substantially different. Given that, please
continue to use these studies for guidance on the BPM 7.0.0.1 products.
9.20.1
Impact of Enabling Security at Runtime
In WebSphere Application Server Version 6.1, the Security Configuration Wizard enables you to
configure application or Java 2 security. For further information, please see the IBM InfoCenter:
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.wsf
ep.multiplatform.doc/info/ae/ae/usec_secureadminappinfra.html
In order to run an application with Java 2 security enabled, required permissions have to be
granted in the was.policy file of the application ears. Please see the IBM InfoCenter for more
details:
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.wsf
ep.multiplatform.doc/info/ae/ae/csec_rsecmgr2.html
Following is a screen shot of the admin console page with Java 2 security enabled:
9.20.1.1 SOABench 2005 Automated Approval
Directed Studies
176
The automated approval workload of the Choreography facet, described in section 10.5.2, is
evaluated on an IBM xSeries 3950 M2 2.93 GHz Xeon (4 quad-core processors), running with 4
cores enabled on Windows Server 2008, to demonstrate the throughput characteristics of
WebSphere Process Server in this configuration. 3 KB requests and 3 KB responses are utilized.
The workload is run in infrastructure mode, making the processing done during service call
With no security enabled, WPS 6.2.0 runs the workload at a rate of 556 Business Transactions per
Second (BTPS). With application security enabled, WPS runs the workload at a rate of 524 BTPS
indicating a degradation of 6% comparing to setting with no security enabled. In addition to
application security, when Java 2 security is enabled, the rate drops to 360 BTPS, indicating a
degradation of 35% comparing to when no security is enabled.
SOABench Choreography Facet - Windows 2008

Automated Approval - Impact of Application and Java 2 Security
Business Transactions per second
600
99%
99%
500
WPS 6.2
400
99%
WPS 6.2+ApplicationSecutity
300
WPS 6.2+ApplicationSecutity+J2Security
200
100
4 cores

Driver
DB2
Intel 2.93 GHz A
Intel 3.5 GHz C
PPC 2.2 GHz B
9.20.1.2 SOABench 2005 Manual Approval
Directed Studies
177
The manual approval workload of the Choreography facet, described in section 10.5.3, is
executed on an IBM xSeries 3950 M2, 2.93 GHz Xeon (4x quad-core), running with 4 cores
enabled on Windows Server 2008, to demonstrate the throughput characteristics of WebSphere
Process Server in this configuration. 3 KB requests and 3 KB responses are utilized. The
workload is run in infrastructure mode, making the processing done during service call
With no security enabled, WPS 6.2.0 runs the workload at a rate of 44 Business Transactions per
Second (BTPS). With application security enabled, WPS runs the workload at a rate of 34 BTPS
indicating a degradation of 23% comparing to having no security enabled. In addition to
application security, when Java 2 security is enabled, the rate drops to 29 BTPS indicating a
degradation of 34% comparing to having no security enabled.
SOABench Choreography Facet - Windows 2008

Manual Approval - Impact of Application and Java 2 Security
Business Transactions per second
50
99%
40
99%
WPS 6.2
99%
30
WPS 6.2+ApplicationSecutity
20
WPS 6.2+ApplicationSecutity+J2Security
10
4 cores

Driver
DB2
Intel 2.93 GHz A
Intel 3.5 GHz C
PPC 2.2 GHz B
9.20.2
Remote Messaging Deployment Environment Startup
Time and Footprint
Directed Studies
178
The Loan Processing workload described in Section 12.2 was used to quantify the startup time
and footprint improvements in WPS 6.2.0 when running in a remote messaging deployment
environment with many application modules installed in the cell.
See this link for an overview of various deployment environment patterns including remote
messaging:
e.wps.620.doc/doc/cpln_topologypat.html
There is a significant reduction in the time it takes to start the Message Engine associated with
WPS 6.2.0 when using this workload, as shown in the chart below. Message Engine startup time
is reduced by a factor of 6.4 times.
Startup Time Loan Processing Application

1200
123
Time in Seconds
1000
800
WPS APP
WPS ME
600
1016
400
126
200
159
0
64 bit WPS 6.1.0.1
64 bit WPS 6.2.0

Directed Studies
179
There is also a significant reduction in memory footprint after startup in both the Message Engine
JVM and the WPS 6.2.0 JVM with this workload installed, as is demonstrated in the chart below.
The system memory footprint is reduced from 903 MB to 624 MB, and improvement of 31%.
Millions
Startup Memory Footprint Loan Processing Application

1000
900
800
LiveBytes
700
600
520
WPS APP
WPS ME
500
423
400
300
200
383
201
100
0
64 bit WPS 6.1.0.1
64 bit WPS 6.2.0
9.20.3
APP, ME
DB2
4 core LPAR on
PPC 1.9 GHz - A
PPC 2.2 GHz - A
Authoring - Shared Libraries Study
When a WPS application makes use of data-type or interface definitions defined in a library
module, WID copies the artifacts from the library into the application module so that those types
may be available to the runtime. If many application modules make use of a library, its artifacts
are copied many times, increasing the memory pressure on the Server runtime. A technote
(available at: http://www-01.ibm.com/support/docview.wss?uid=swg21298478) describes a
technique that declares the library modules as WAS Shared Libraries and allows their artifacts to
be shared among WPS modules.
Directed Studies
180
In this study, we examine the memory reduction realized when rebuilding the Loan Processing
application to make use of the technique described in the technote. We prepared deployment code
using Java EE Prepare For Deploy and exported the application from WID as a set of EAR files
and then used a jacl script to deploy the EARs to the WPS server via wsadmin.
This application makes moderate use of sharing; 2 shared libraries are used by all 62 modules,
and 20 other shared libraries are used by approximately 5 modules each.
The chart below shows that the peak live memory within the WPS Java Heap when publishing the
Loan Processing application via the standard mechanism is 378MB. When using the WAS
Shared Library technique described in the technote, peak memory is reduced 11% to 335MB.
WAS Shared Library Study - Loan Processing Workload

WPS 6.2 Publish Peak Java Memory
500
450
400
Memory (MB)
350
378
335
300
250
200
150
100
50
0
Standard Deployment
Shared Libraries Technique
2 Core
Intel 2.66GHz B
One of the steps described in the WAS Shared Library technote instructs the Administrator to
copy Shared Library files to the <WAS_HOME>/lib/ext directory for deployment and then to
delete those files when the deployment is complete. The chart below shows the importance of
deleting the Shared Library files from this temporary location. When using the standard
deployment technique, the WPS Java Heap contains 339MB of live data after restart. When using
the WAS Shared Library Technique, WPS liveset is reduced 19% to 275MB. However, if the
temporary library files are not eliminated, the memory reduction is only 13%.
Directed Studies
181
WAS Shared Library Study - Loan Processing Workload

WPS 6.2 Java Memory After Server Restart
500
450
Memory (MB)
400
350
300
339
294
250
275
200
150
100
50
0
Standard Deployment

w/o deleting
2 Core
Intel 2.66GHz B
9.20.4
Authoring - Hardware Comparison Study
For this study, we selected four different machine types to run key measurements of the Loan
Processing workload. An additional run on one machine with no anti-virus software installed was
also made for ease in comparison with the measurements presented in Chapter 8 of this report.
Newer machines showed significant improvements. For each data chart in this section, the
percentages at the top of each bar indicate the average system CPU utilization during the
measurement.
9.20.4.1 Impact on Import time
Using a new workspace, the WebSphere Integration Developer 6.2 was opened, the Build
automatically preference was disabled, and the Loan Processing workload was Imported.
Measurement started when the Import began and stopped as soon as the Import was complete and
the processor cores became idle. This was done seven times on each machine, the result below
being the average.
Directed Studies
182
As can be seen in the following chart, the newer machines can finish the import much more
quickly than the older machines. Comparing the laptops, the T60p completed the Import in 259
seconds, 2.1 times faster than the T42p. Among the desktops, the model 9196 completed the
Import in 215 seconds, 2.5 times faster than the model 8212.

Import Response Time - Windows
700
Time (Seconds)
600
500
82%
66%
555
544
400
300
68%
259
200
60%
215
64%
154
100
0
T42p
8212
T60p
9196
9196 no AV
9.20.4.2 Impact on First Build time

After Importing the Loan Processing application, the Clean and Build operation was executed on
the entire workspace. Measurement started when the Clean began and stopped when the WID
reported the build complete and the processor core became idle. This was done seven times on
each machine; the results below show the average of these seven measurements.
The T60p laptop completed the first build of this workspace in 268 seconds, 35% faster than the
T42p. The model 9196 desktop completed this operation in 205 seconds, 52% faster than the
model 8212.
Directed Studies
183

First Build Response Time - Windows
500
Time (Seconds)
400
92%
62%
413
425
300
61%
268
200
60%
205
63%
179
100
0
T42p
8212
T60p
9196
9196 no AV
9.20.4.3 Impact on Average Subsequent Build time

After completion of the First Build measurement, six additional Clean & Build operations were
performed and measured individually. For each measurement, the clock started when the Build
started and stopped when the Build was complete and the processor core became idle. The fastest
and slowest measurements were discarded and the remaining four measurements were averaged.
This was done seven times on each machine, the result below being the average.
The T60p laptop completed this warmed-up Clean Build operation in 182 seconds, 32% faster
than the T42p laptop. The model 9196 desktop completed this operation in 136 seconds, 58%
faster than the model 8212 desktop.
Note that the T42p laptop (single-core and slower hard drive) completed this operation 17% faster
the model 8212 desktop (dual-core and faster hard drive). This can be attributed to the more
efficient architecture of the T42p processor core (Pentium M) compared to that of the 8212
desktop (Pentium D).
Directed Studies
184

Avgerage Build Response Time - Windows
400
350
Time (Seconds)
300
250
58%
93%
324
268
200
59%
150
182
59%
58%
136
100
127
50
0
T42p
8212
T60p
9196
9196 no AV
T42p
8212
T60p
9196
Intel 2.0 GHz- A Intel 2.8 GHz - D Intel 2.16 GHz - A Intel 2.66 GHz-A
9.20.5
Dynamic/Static Routing Comparison using WESB
The following chart compares two types of routing based on a value in the SOAP header for a
Web Services scenario. In both cases the value retrieved from the header is used to determine the
target service endpoint. The Route on Header mediation selects the service endpoint by routing to
a hard wired callout node based on the header value extracted in a filter primitive. For each
alternative endpoint a user would need to wire in additional nodes for the filter primitive to
access.
In contrast the dynamic endpoint lookup mediation uses the value from the header (accessed by
the endpoint lookup primitive itself) to look up the endpoint from a WSRR repository. This value
is cached by WESB so the performance data below does not show the cost of the WSRR lookup
but shows the performance of routing to the target service using the previously cached endpoint.
The chart shows that the cost of using the dynamic endpoint lookup primitive to route rather than
wiring in alternative targets (a less flexible approach) is minimal.
Directed Studies
185
Route on Header vs Dynamic Endpoint Lookup Mediation Windows

1600
98%
1400
Route On Header
97%
98%
1200
Dynamic Endpoint Lookup

97%
98%
98%
Reqs/sec
97% 97%
1000
800
600
400
87% 85%
200
0
Base in/Base out
Base in/10 out
10 in/Base out
10 in/10 out
4 CORE
100 in/100 out

Hyper-Threading (HT) enabled
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
9.20.6
WESB Client Scaling
In this study two WESB mediations (Transform Value and Route on Body) were driven by an
increasing client load to assess the following scaling characteristics:
1. Horizontal Client Scaling An initial load of x clients each making y requests per second is
increased by adding more clients (increasing x).
2. Vertical Client Scaling An initial load of x clients each making y requests per second is
increased by speeding up the clients (increasing y).
Warm up periods were applied for all of the measurements described below to ensure that the
code had settled to a consistent level of performance.
All client scaling measurements were run with a message size combination of Base/10.
Directed Studies
186
For details of the mediations and request/response sizes see section 11.3 and 11.4. All data is
obtained using Web services bindings on a 4 core WESB server machine with Hyper-Threading
(HT) disabled. For details of the topology used see section 11.1.
Directed Studies
187
9.20.6.1 Horizontal Client Scaling

For the horizontal scaling test, 1 to 1600 clients were used against a single WESB server. Each
client was configured with a think time of 1 second so the theoretical rate of requests can be
defined as clients/(thinkTime+response time).
The CPU utilization of the WESB machine was recorded along with response times and request
rate.
The following chart shows that both throughput and CPU show a good linear trend as the number
of clients increase up to just under 100% utilization of the server. As the number of clients is
increased beyond this, the CPU constraints begin to impact on the workload, with throughput
peaking at around 1000 requests/sec.
Horizontal Client Scaling : XformValue Mediation

Request Rate & CPU Utilization
100
1200
90
80
70
800
60
50
600
40
400
30
20
200
10
0
0
500
1000
1500
2000
0
2500
No. Clients
Req/Sec
WESB server CPU%
Web Services Client
WebSphere ESB
Intel 2.8GHz - C
Intel 3.0GHz - C
Web Services Target

Intel 3.5GHz - B
6
WESB server CPU%
Requests per second
1000
Directed Studies
188
The following chart shows that CPU consumption per request remained consistent across the
evaluation (apart from a larger value at the lower throughput measurement which was probably
skewed by timer tasks). Response time increases in a linear fashion until the server system
approaches CPU saturation; at this point any further increase in clients causes a more direct
impact on latency.
Horizontal Client Scaling : Xform Value Mediation

Response Time & CPU Per Request
1.6
0.003
0.0025
1.2
0.002
1
0.8
0.0015
0.6
0.001
CPU per request
Response time (s)
1.4
0.4
0.0005
0.2
0
0
200
400
600
800
1000
1200
1400
0
1600
No. Clients
Resp (s)
CPU per request
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
The next 2 charts show that Server CPU consumption, request rates, and response times for the
Route On Body mediation result in a similar profile to the XformValue evaluation above.
Directed Studies
189
1200
120
1000
100
800
80
600
60
400
40
200
20
0
0
200
400
600
800
1000
1200
1400
1600
WESB server CPU%
Requests per second
Horizontal Client Scaling : Route On Body Mediation

Request Rate & CPU Utilisation
0
1800
No. Clients
Requests per sec
Copyright IBM Corporation 2009. All rights

reserved
Major CPU %
0.6
0.0035
0.5
0.003
0.0025
0.4
0.002
0.3
0.0015
0.2
0.001
0.1
0.0005
0
0
200
400
600
800
1000
1200
1400
0
1600
No. Clients
Resp (s)
CPU/Req(s)
Web Services Client
WebSphere ESB
Intel 2.8GHz - C
Intel 3.0GHz - C
Web Services Target

Intel 3.5GHz - B
6
CPU per request (s)
Response time (s)
Horizontal Client Scaling: Route On Body Mediation

Directed Studies
190
9.20.6.2 Vertical Client Scaling

For the vertical scaling evaluation the WESB machine was driven by 200 clients. The theoretical
rate of requests can be calculated as for the horizontal scaling.
Think time was initially set to 20 seconds, and then it was reduced each run with the last run
having clients with 0.001 second think time. CPU utilization of the WESB machine was recorded
along with response times and requests per second.
Note that since request rate (and hence CPU utilization) is proportional to 1/t (where t = think
time) plotting request rates and CPU utilization charts using logarithmic scales should display a
straight line plot up to point where CPU saturation occurs on the WESB machine. The data
shows this; a good linear trend is exhibited, with the rate of increase in requests degrading as the
WESB machine becomes CPU bound.
Vertical Client Scaling : XformValue Mediation

Requests per second
1000
10
100
1
10
1
100
10
0.1
WESB Server CPU%
100
10000
0.1
0.01
Think time (s)

CPU%
Req/Sec
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
The next chart shows that response times grow progressively with a sharp increase at the CPU
saturation point. CPU per requests is reasonably flat apart from the initial spike evident in some
of the scaling tests at very low utilization.
Directed Studies
191
0.12
0.003
0.1
0.0025
0.08
0.002
0.06
0.0015
0.04
0.001
0.02
0.0005
0
100
10
0.1
CPU per request (s)
Response time (s)
Vertical Client Scaling : Xform Value Mediation

0
0.01
Think time (s)

CPU/Req (s)
Resp (s)
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
The next 2 charts shows that Route on Body results for vertical scaling produced a similar profile
to the XformValue vertical tests above.
Directed Studies
192
Vertical Client Scaling : Route On Body Mediation

Requests per second
WESB Server CPU %
100
1000
100
10
10
1
100
10
1
0.01
0.1
Think time (s)
WESB server CPU%
Req/Sec
0.14
0.003
0.12
0.0025
0.1
0.002
0.08
0.0015
0.06
0.001
0.04
CPU per request (s)
Response time (s)
Vertical Client Scaling : Route On Body Mediation

0.0005
0.02
0
100
10
0.1
0
0.01
Think time (s)

CPU per request (s)
Resp (s)
Web Services Client
WebSphere ESB
Web Services Target
Intel 2.8GHz - C
Intel 3.0GHz - C
Intel 3.5GHz - B
Directed Studies
9.20.7
193
Local versus remote SCA bindings WPS 6.1.0 data
The results shown in this section compare local and remote bindings using the same hardware
configuration and the Contact Manager workload.. For remote bindings, a total of 3 JVMs are
used, 2 of which are WPS instances while the third JVM hosts the Messaging Engine (not a factor
in this study). The SAP Emulator module is runs on the first WPS instance, and the Contact
Manager and Clarify Emulator modules run on the other WPS instance. Therefore, the remote
binding between the SAP Emulator module and Contact Manager module cross the boundaries of
two separate WPS instances. There are two key findings in this study.
There is a significant throughput difference between local and remote bindings. The
throughput of Contact Manager using the local Synchronous SCA binding is 198 BTPS,
over 3.1x better than the remote Synchronous SCA binding. The difference between
local and remote Web Services bindings is smaller, but still significant. The throughput
of Contact Manager with an optimized local Web Services binding is 110 BTPS,
compared with 88 BTPS for a remote Web Services binding, a difference of 25%.
There is significant benefit due to local Web Services binding optimization, as discussed
in Section 4.5.5, if the Web Services target is hosted on the same JVM. The optimized
throughput of 110 BTPS is 15% higher than the unoptimized throughput of 96 BTPS.
ContactManager - Windows 2000
Local versus Remote Bindings
250
200
100%
1 WPS JVM (WS

Opt)
BTPS
150
100%
100%
100
1 WPS JVM
100%
100%
2 WPS JVMs
50
0
SCA Sync
WebServices
WPS System CPU Utilization Above Each Bar
DB2
Intel 2.8GHz A
PPC 2.2 GHz- B
WebSphere Process Server Core Workloads
10 WebSphere Process Server Core Workloads

10.1 Introduction
This chapter describes the workloads used to derive measurements of performance characteristics
of the server component of WPS, whether that characteristic is throughput, response time,
processor core utilization or memory consumption.
194
10.2 Contact Manager

The Contact Manager workload models a common Enterprise Application Integration scenario
known as real-time data synchronization, where a consistent view of data is kept across multiple
applications. In this scenario, key events (such as create, delete, and update) in the source
application need to propagate to a destination application. The goal is to keep data consistent
between the two applications and their associated data stores. The Contact Manager workload
measures the transaction throughput of the WPS as it synchronizes contact create events
between two simulated enterprise applications.
All of the Contact Manager implementations consist of three parts: the SAP Client, simulating the
source of our Business Events, the Clarify Client simulating the Business Event destination, and
the Contact Manager Application, which contains the business logic for the application under
evaluation. Any particular implementation may consist of purely synchronous invocations, or it
may contain asynchrony. Also, an implementation may be organized in a variety of ways within
the SCA Programming Model. The sections that follow explore all of the implementations
appearing in this Report.
This workload simulates SAP as the source application with a Clarify emulation as the destination
application. In an actual installation, the SAP application would place an event in an application
database event table that is monitored by an application Adapter. The SAP Client emulates this
behavior by generating SAP Contact Business Objects, and passing them directly to the Contact
Manager Application.
For each Business Transaction executed by the system, this workload consists of:
7520 byte input object (this is the size of the message)
3 business object mapping operations with a total of 51 attributes mapped
3 roles, 1 managed and 2 not managed, so that 2 relationship cross references are
created
1 service call
A business transaction in this workload is contact create events received by Clarify.
195
10.2.1
196
SCA Synchronous Binding
In this implementation of the Contact Manager workload, the Contact Manager Application
receives Business Objects (BOs) from the SAP Client Module via synchronous cross-module
SCA invocation, i.e., synchronously invoking an import bound to the corresponding export with
an SCA binding. Its first task is to transform the input BOs from the SAP format to a Generic
format via an Interface Map SCA Component. These generic BOs are then passed to a Business
Process component which contains logic responsible for determining whether the Business Event
requires creation of a new Contact, or updating an existing one and then routing the event to the
destination application. For all of the Business Events measured, a new contact was created. On
the way to the destination, the BO must be mapped again from generic format to the format
understood by the destination application. The destination application, simulated by the Clarify
Client Module, is also invoked via a cross-module, synchronous SCA binding. This module
simulates destination application work, including generation of a new unique identifier, and then
returns a modified BO to the Contact Manager Module. This return BO is mapped again from
Clarify to Generic format before the response is returned to the SAP Client Module.
SAP Emulator
Module
Contact Manager
Module
Clarify Emulator
Module
GemericTOClarify
Interface Map
SAPToGeneric
Interface Map
Contact Manager
Process
MAP
REL
BMK
SC
A
ClarifyTOGeneric
Interface Map
MAP
sy
nc
sy
nc
S
CA
BMK
REL
MAP
REL
Figure 1: Contact Manager Workload Topology SCA Synchronous Invocation
10.2.2
197
Web Services Binding
SCA Components may expose their interfaces as Web Services via the Web Services binding.
This capability is modeled for performance purposes by changing the synchronous SCA binding
between the SAP Emulation Module and ContactManager Module to be WebServices, as
depicted in Figure 2. For measurement purposes, the Web Services client can be either local or
remote. The difference is that for the remote case the client resides on a different physical
machine from the remainder of the application.
SAP Emulator
Module
Contact Manager
Module
Clarify Emulator
Module
GemericTOClarify
Interface Map
SAPToGeneric
Interface Map
Contact Manager
Process
MAP
REL
BMK
MAP
ClarifyTOGeneric
Interface Map
sy
nc
SO
A
SC
/h
ttp
BMK
REL
MAP
REL
Figure 2: Contact Manager Workload with Web Services Binding
198
10.3 Banking
10.3.1
Banking Workload Description
Banking is a long running business process, or macroflow. Macroflows are executed as a

stratified transaction i.e., a J2EE transaction encloses one or more of the steps in the business
process. The J2EE transactions are chained using persistent messaging. Steps within the process
are called activities. Note that after each transaction, the state of the business process is persisted
into a database. This allows the processes to be long-running and to be able to survive system
failures. The Banking workload consists of multiple transactions.
The business process used in the Banking workload models a realistic business process used in a
banks back office. It contains a subset of the steps necessary to process a mortgage loan via a
series of automated steps. The following diagram depicts the Banking workload.
Business Process
Transaction
Generator
JMS
Java Services
Sync or
Async
POJO
The workload setup consists of a Transaction Generator, which generates the load, and a Banking
process, which contains a scenario and outbound services. The Banking measurement run starts
when the workload driver places a large number of mortgage request instances onto a JMS queue.
Instances of the banking process are started via JMS messages. A Banking measurement run
concludes when the workload driver determines that all process instances have completed
processing.
A business transaction in this workload is mortgage loans completed.
10.3.2
Banking Scenarios
The Banking scenarios differ in the setting of the transactional behavior of the invoke activities.
When using the synchronous SCA binding, the process component wired to sync services has the
transactional behavior flag on invokes set to commit after. When using the SCA asynchronous
or JMS binding, the process component wired to async services has the transactional behavior
flag on invokes set to participates.
The BPEL process is shown in the following diagrams:
199

Loop1
Loop2
Loop3
The Banking process contains the following elements:
Invoke activities
200
1 receive activity
1 reply activity
1 correlation set
3 loops with java conditions
10.3.3
Banking Services
Depending upon which binding option is used, the Banking process component is wired in one
of the follow fashions:
BankingProcessJMS: Banking process wired to a JMS MDB using import with JMS binding
(transactional behavior flag set to participates).
BankingProcessJavaSync : Banking process wired to a synchronous POJO (transactional behavior
flag set to commit after),
BankingProcessJavaAsync: Banking process wired to an asynchronous POJO (transactional
behavior flag set to participates),
BankingProcessEJBSOAP: Banking process wired to EJB session bean wrapped as SOAP web
service,
BankingProcessEJB: Banking process wired to EJB session bean using a self-written mapper.
This is required because business process components always have w-typed references (this is a
BPEL restriction) and session bean imports always have j-typed interfaces. This self-written
mapper mediates between the j-typed and w-typed interfaces by calling the session bean import
and also handles data mapping, and
The diagram which follows illustrates these choices. Note that in this report, measurements are
shown only for the JMS binding.
201
10.4 SOABench 2008 Choreography Facet

10.4.1
Overview
The SOABench 2008 workload is used in numerous studies in this report. It is an implementation
of the SOABench 2008 specification. SOABench 2008 replaces an earlier version, SOABench
2005, which was used in previous editions of the BPM Performance Report. Similar to the 2005
version, the 2008 version models the business processes of an automobile insurance company and
is intended to evaluate the performance of a distributed application implemented using a Service
Oriented Architecture (SOA).
The 2008 implementation extends the scope of the 2005 version in several ways. The Automated
Approval (microflow only) scenario performs more synchronous service calls than the previous
version. The Manual Approval (microflow + macroflow pattern) scenario in the previous version
is now implemented in two ways, an Outsourced scenario which does claim approval via
asynchronous Web Service calls, and an InHouse scenario which uses human tasks to approve
claims. In addition the InHouse scenario divides work among users and groups, adds think time to
user activity in human tasks; tracks response time of human task actions as well as recording
throughput. This makes the InHouse scenario very useful for evaluating response time and
throughput using a range of active concurrent users. Finally, the 2008 version also includes the
use of preloaded Process Choreography tasks in both the OutSourced and InHouse Scenarios.
The following diagram illustrates the workload architecture flow.
202
10.4.2
Automated Approval Scenario details
One of the modes of operation for SOABench 2008 in handling insurance claim requests is using
automated approval. No human or asynchronous tasks take place in this scenario; the flow is
implemented as a microflow that makes synchronous service invocations. All of the service
invocations are to service providers that return cached responses; this prevents bottlenecks in the
service providers while exercising the process server.
A claim request is sent to the HandleClaimMicro business process which performs an operation
called CreateClaim followed by FraudCheck. This scenario then follows the FastpathApproval
path which performs synchronous services calls for ApproveClaim, InformPolicyHolder, and
CompleteClaim. The process finishes by sending a response back to the requestor.
The Business Object (BO) size for the input request is variable. By default, a 3 KB request size is
used. The BO size for the reply is fixed at 3 KB.
The BPEL process is shown in the following diagram.
203
The Automated Approval process contains the following activities:

o
1 Receive
1 Reply
1 Choice
5 web services invokes
42 variable assignments across 7 assignment blocks
10.4.3
Outsourced Scenario details
The SOABench 2008 Outsourced scenario is one of two scenarios that utilize long running
processes (macroflow) for manual approval of insurance claims. OutSourced Mode uses both a
microflow and a macroflow; the microflow is the same process shown for Automated Approval
Mode above, but in this mode the logic does not follow the fast path approval path. Instead a long
204

running business process called HandleClaimLongExternal is invoked by following the NO
path of the ProcessInHouse choice activity in the microflow.
As in the Automated Approval Scenario, all of the service invocations are to service providers
that return cached responses which prevents bottlenecks in that area while exercising the process
server.
Claims enter the system via client requests to the HandleClaimMicro process. Synchronous web
service invocations are then made to CreateClaim, FraudCheck, and RecoverVehicle. No human
or asynchronous tasks take place in the HandleClaimMicro process except for an invocation to
the InvokeExternalLong long running process for the Outsourced claim processing with
asynchronous calls workload. This process finishes by sending a response back to the requestor
but the claim is not complete until the long running process invoked by InvokeExternalLong is
finished.
Before running this scenario the system is preloaded with a variable number of Claim requests in
various stages of completion. Oldest claims in the system are worked on first. As claims are
completed more claims are injected into the system to maintain the preloaded number. The
throughput rate for completed claims is reported.
The Business Object (BO) size for the input request is variable. By default, a 3 KB request size is
used. The BO size for the reply is fixed at 3 KB.
The BPEL for the HandleClaimMicro process is shown in the previous section and in this mode
uses the following activities:
1 long running process invoke
The second, long running, process named HandleClaimLongExternal is called via

InvokeExternalLong. Early in this process three parallel, asynchronous, one-way web service
invokes are performed to continue claim processing. At this point the process waits for
corresponding receives to be invoked from another application that processes the asynchronous
web service calls. When all three receives have been completed the process continues to the twoway UpdateClaim web service invocation, followed by (for this scenario) another asynchronous
one-way invoke of RequestManualApproval. The process waits again at this point for the
ReceiveManualApproval to be completed. For this scenario all claims then take the approval path
where three more two-way calls to web services are performed to complete the claim and process.
The HandleClaimLongExternal process contains the following elements:
4 two-way web services invokes
4 one-way web service invokes which wait on 4 corresponding receives
1 parallel activity
205
206
10.4.4
InHouse Scenario details
The SOABench 2008 InHouse scenario is one of two scenarios that utilize long running processes
(macroflow) for manual approval of insurance claims. InHouse Mode uses both a microflow and
a macroflow; the microflow is the same process shown for Automated Approval Mode above, but
in this mode the logic does not follow the fast path approval path. Instead it invokes a long
running business process called HandleClaimHuman.
As in the Automated Approval Scenario, all of the service invocations are to service providers
that return cached responses which prevents bottlenecks in that area while exercising the process
server.
Claims enter the system via client requests to the HandleClaimMicro process. Synchronous web
service invocations are then made to CreateClaim, FraudCheck, and RecoverVehicle. No human
or asynchronous tasks take place in the HandleClaimMicro process except for an invocation to
the InvokeInHouseLong for the InHouse claim processing workload. This process finishes by
sending a response back to the requestor but the claim is not complete until the long running
process invoked by InvokeInHouseLong is finished.
Before running this scenario the system is preloaded with Insurance claim requests in various
stages of completion. The insurance claims are assigned equally to regions. Human task
processing is done by users belonging to a single region and those users can only process
insurance claims from their region which is enforced via authentication. Within a region, users
are divided into 2 groups, adjusters and underwriters. Of the four human tasks required to
complete an insurance claim, two are done by adjusters and two are done by underwriters.
Users query existing processes for a list of work that they can perform. A work item is claimed
(selected from the list) and then completed by the user. Users think between query, claiming,
and completing activities. The think time is random but averages a total of 180 seconds per
human task. The time a user waits for responses to their human task queries, claims and
completes is recorded as response time. The rate at which insurance claims are completed is the
throughput. Once an entire insurance claim is finished, another is added to the region to maintain
its work at the preloaded level.
The BPEL for the HandleClaimMicro process is shown in the Automated Approval section. The
path to InvokeExternalLong contains the following activities:
1 long running process invoke
The second, long running, process named HandleClaimHuman is called via InvokeInHouseLong.
Early in this process three parallel activities take place, one asynchronous, one-way web service
invoke and two human tasks to be done by users in the adjusters group. When all three activities
complete the process continues to the two-way UpdateClaim web service invocation, followed by
(for this scenario) two Human Tasks called FirstApprovalTask and SecondApprovalTask which
are performed by users in the underwriters group. Upon completion of SecondApprovalTask all
claims for this scenario then take the approval path where three more two-way calls to web
services are performed to complete the claim and process.
The HandleClaimLongExternal process contains the following elements:
4 two-way web services invokes
207
1 one-way web service invoke which waits on a corresponding receive
4 Human Tasks
1 parallel activity
208
209
10.5 SOABench 2005 (Used in previous performance reports)

10.5.1
Overview
The SOABench 2005 workload was used in previous BPM performance reports; the description
is included in this report as a bridge since this report contains the initial set of measurements for
the SOABench 2008 workload (described above).
The SOABench 2005 workload is an implementation of the SOABench 2005 specification and
models the business processes of an automobile insurance company. SOABench 2005 is intended
to evaluate the performance of a distributed application implemented using a Service Oriented
Architecture (SOA). SOABench 2005 uses a driver that produces a complex workload similar to
a real production system. The complex driver workload is made up of several subset technologies
called facets which can be included or excluded from performance evaluations. Examples of
SOABench 2005 facets include Services (use of service components), Mediation (use of
mediation to transform requests and responses), and Choreography (application implementation
using service choreography).
By combining facets, SOABench 2005 implements 2 aspects of the IT systems of an insurance
company called SOAAssure. The first is the Claims application which combines the
Choreography and Services facets to process insurance claims. The second is realized using the
Mediation and Services facets and provides a third-party gateway which enables another
company to establish whether coverage exists for an existing policy. The following diagram
illustrates the workload architecture flow.
210
211
SOABench Architecture
Business processes service choreography
Simulate
service
requestors
SOABench
Client
Process
Choreography
Integration
Handle Claim
Process
(Macro flow)
Enterprise
Service Bus
Fraud Check
SCA component
Submit
Claim
Handle Claim
process
(micro flow)
Mediations
Check
Coverage
Claim Approval
Business Rule
Route, transform and

adapt requests
Web service
binding
Claim Service
Call
Claim
Service
Process
Human Tasks
Service providers
Service providers
Service
providers
Claim service
implementation
Claim service
(Web
service)
implementation
Claim
service
(Web
service)
implementation
(Web service)
Human Tasks
Simulator
Service
implementation
Adjuster
Business
Data
The SOABench 2005 Client can drive the workload with mediation or business process claim
requests. The minimum request and response size is 3 KB but this can be increased by the user.
The client driver also provides for an infrastructure mode to make interactions with the backend
Service providers trivial. The Human Tasks Simulator handles both adjuster and underwriter
tasks generated during the Choreography facet manual approval process.
10.5.2
Choreography facet: Automated Approval
One of the workloads in the SOABench 2005 Choreography facet is the handling of an insurance
claim using automated approval. No human or asynchronous tasks take place in this scenario; the
flow is implemented as a microflow. A claim request is sent to a business process which performs
an operation called HandleClaim. HandleClaim does Submit Claim to create the claim, checks
the claim for validity via FraudCheck_SCA , then approves and invokes the Complete Claim
operation. The process finishes by sending a response back to the requestor.
The Automated Approval process contains the following elements:
212
1 java invoke
1 data map with 12 moves
A business transaction for this workload is claims completed.
10.5.3
Choreography facet: Manual Approval
Another workload in the SOABench 2005 Choreography facet is the handling of an insurance
claim using manual approval. Depending on claim amount, either 1 or 2 human tasks are
performed. For data in this report the second task occurs for 40% of claim requests. The workload
starts in the process used in the Automated Approval Scenario (a microflow), as described in the
previous section. A claim request is sent to the process which performs HandleClaim.
HandleClaim does Submit Claim to create the claim, skips the check claim for validity, then
calls a long running (macroflow) process to perform more work on the claim.
The long running process does a fraud check on the claim via FraudCheck_SCA. A claims
adjuster also looks at the claim via the Adjuster human task and the claim is updated through a
webservice call. For the workload measured, all claims are marked valid and then checked by a
business rule to determine if an underwriter needs to evaluate the claim. Forty percent of the
claims are checked by the Underwriter human task. At this point all claims are processed for
claim amount and approved using 2 more webservice calls. The current long running process then
calls back the microflow process to perform the FinishClaim operation which performs a
webservice call to complete the claim.
An adjuster and underwriter simulator is used to process human tasks for the long running
process.
213
214

Manual Approval has 2 processes containing the following elements:
1 java invoke
54 or 58 variable assignments across 16 assignment blocks note the additional 4

assignments occur 40% of the time
1 or 2 human tasks. 2 occur 40% of the time
1 data map with 12 moves
2 process calls
1 business rule call
2 java snippets
A business transaction for this workload is claims completed.
215
WebSphere ESB Core Workloads
216
11 WebSphere ESB Core Workloads

This performance report includes measurements for WESB mediations. There are four basic
topologies covered:
Web Services bindings
JMS bindings
MQ JMS bindings
MQ bindings
The tests make use of the mediations and Web Services from the SOABench 2008 workload.
SOABench 2008 is a workload intended to evaluate the performance of a distributed application
implemented using a Service-Oriented Architecture. For a description of the SOABench 2008
workload, please see section 11.3.
217
11.1 Web Services Test Scenario

The Web Services tests use the following scenario:-
Standalone multithreaded HTTP client to produce SOAP requests
Synchronous SOAP(XML)/HTTP request/response invocation
WESB mediation
SOABench 2008 as the target Web Service
WESB
Mediation
WebSphere 7.0
SOABench2008
50 HTTP Clients
Figure 1: Web Services Topology
11.1.1
218
Web Services Fan Out / Fan In Mediation
The Fan Out mediation allows you to iterate over a repeating element in the request message.
This mediation iterates over the following:
Perform a Service Invoke to call a target Web Service
Use a Message Element Setter to update the shared context with some data from the
response. The shared context was created on the Input node.
The Fan In mediation will then wait for all iterations to complete, before using an XSLT
mediation to create a response message. It then returns the response.
This test was then executed with different request messages so that we would get a different
number of Fan Out iterations. It was executed with requests that would result in 1, 2 and 4 Fan
Out iterations. Note that each iteration is run sequentially, rather than in parallel.
Fan Out
Mediation
Service
Invoke
Mediation
Message
Element
Setter
Mediation
50 HTTP Clients
WebSphere 7.0
SOABench2008
Fan In
Mediation
XSLT
Mediation
219
11.2 JMS Test Scenarios

The JMS, MQ JMS and MQ bindings measurements utilize the following test scenario:
Standalone JMS Producer and Consumer
WESB mediation
BestEffort non-persistent and Assured persistent Messaging
One-way request scenario
The tests use a standalone JMS producer and consumer. The JMSPerfHarness workload program
is used for this as it can be configured to run standalone JMS producers and consumers and
measure the rate at which messages are processed by the consumers. The producer and consumer
are within the same JVM and therefore co-located on one machine
11.2.1
JMS Binding test topology
JMS Export
JMS
Queue
WESB
XSLT
Transformation
Mediation
Mediation
JMS Import
JMS Queue
JMS
Producer
JMS
Consumer
DB2
Figure 3: JMS Topology
11.2.2
MQ JMS and MQ Binding Test topology
220
The MQ JMS and MQ bindings are all used to connect to an MQ Queue Manager. Messages are
delivered into WESB from the MQ inbound queue and sent to an MQ outbound queue. No
internal SIB queues are used in this scenario. The MQ Queue Manager is deployed on the same
machine as the WESB server.
JMS
Producer
JMS
Consumer
MQ Queue Manager
Mediation
MQ JMS
Export
MQ JMS
Import
Figure 4: MQ JMS and MQ Topology
221
11.3 SOABench 2008 Mediation Facet

All of the test scenarios use mediations taken from the SOABench 2008 workload. The bindings
differ between the scenarios but the function of the mediations remains the same.
11.3.1
Transformation Mediations
These are mediations which transform requests and in some cases responses. There are various
levels of complexity of transformation possible.
XSLT Value transform mediation
Transforms the value of a single element in the request message using XSLT.
XSLT Namespace transform mediation
Transforms request and response messages from one schema to another using XSLT. The
schemas are largely the same but the name of an element differs and the two schemas have
different namespaces.
XSLT Schema transform mediation
Transforms request and response messages from one schema to another using XSLT. The
schemas are completely different but contain similar data which is mapped from one to the other.
In addition to the transform a value from the request is transferred to the response by storing it in
a context header.
Message element setter mediation
Transforms the value of a single element in the request message using the Message Element
Setter primitive.
Business Object Mapper mediation
Uses the Business Object Mapper mediation to map the entire body of the request into a new
Business Object.
11.3.2
Routing Mediations
222
These are mediations which route requests to different services based on content.
Route on header mediation
Route the request based on the presence of a string in the SOAP or JMS header. The Web
Services workload does not use any standard headers, so we use an optional one called
Internationalization Context. The JMS workload introspects the JMSCorrelationId header field.
Route on body mediation
Route the request based on the content of a field in the body of the request
Service Invoke mediation
Uses the Service Invoke primitive to invoke a Web Service, and then returns the response.
Service Invoke
Mediation
50 HTTP Clients
WebSphere 7.0
SOABench2008
11.3.3
Composite mediation
223
The composite mediation consists of four mediation primitives wired together inside a single
mediation module. This saves the overhead of inter-module call overheads, but at the expense of
the ability to individually administer the pieces of the overall mediation. The Authorisation
mediation is a routing mediation which checks a password field in the request body.
No logging is performed in either the JMS or Web Services implementations of this scenario.
Transform
Schema
Authorisation
50 HTTP Clients
Logging
Route
Body
Transform
Schema
WebSphere 7.0
SOABench2008
SOAAssure Service
WebSphere 7.0
SOABench2008
LegacySure Service
11.3.4
224
Chained mediation
The chained mediation performs the same function as the composite mediation but the four types
of mediation primitives are each packaged as separate mediation modules, which are then joined
together using bindings.
Web Services to SCA
Authorisation
50 HTTP Clients
Logging
Transform
WebSphere 7.0
SOABench2008
SOAAssure Service
Route Body
Transform
WebSphere 7.0
SOABench2008
LegacySure Service
225
11.4 SOABench 2008 Mediation Facet Message Sizes

The workloads used for the WESB tests were taken from the mediation facet of the SOABench
2008 Client. The actual size of the messages in the workloads is shown below.
SOABench 2008
Client
Web Services
Web Services
JMS payload
SOAP Request
SOAP Response
Base
1.8 K
0.8 K
1.2 K
10
9.1 K
8.3 K
8.5 K
100
107.3 K
106.5 K
106.7 K
Workloads
WID and Modeler Core Workloads
226
12 WID and Modeler Core Workloads

12.1 Order Processing
Order Processing is a workload based on a storyline derived from a fictitious furniture
manufacturing company. This workload is responsible for receiving and managing customer
orders for furniture, scheduling the orders for shipment to the customer, shipping the
orders to the customer, and maintaining the inventory of the company.
Order Processing contains 25 business integration modules, 2 business integration libraries, 57
interfaces, 150 data types and makes use of the full spectrum of SCA component kinds available
in WPS.
12.2 Loan Processing

Loan Processing is a workload based on a storyline from the Financial Services Industry. It is a
collection of many applications with related functionality combined into a single project
interchange. When built, it results in 62 installable EARs, which are used to study performance
characteristics of Build, Publish & Startup operations. As such, it is useful for evaluating the
authoring performance for a relatively large and complex workload.
Loan Processing contains 62 business integration modules, 23 business integration libraries, 5
Java projects, 624 interfaces, 105 business processes, 140 data types, 602 imports & 563 exports.
12.3 Customer Service

Customer Service is a workload based on a storyline from the Telecommunications Services
Industry relative to satisfying customer service requests. It is a collection of many applications
with related functionality combined into a single project interchange. This workload is used to
study performance characteristics of Build operations within WID.
Customer Service contains 49 business integration modules, 15 business integration libraries, 184
interfaces, 97 business processes, 1212 data types, 186 imports & 99 exports.
12.4 BPM@Work
BPM@Work is a Business Process Modeler workload modeling a software development
storyline. It contains a single, complex business process that results in 11 independent process
models that get installed via direct deploy from Modeler to the WPS server.
Appendix A - Measurement Configurations
227
Appendix A - Measurement Configurations

This appendix lists the various components and settings used for the measurements presented in
this document; note that some workloads do not utilize all of these settings if they are not
applicable to that workload. Although the measurements were generated on separate platforms
(AIX and Windows), the tuning options for the common software modules (WPS, database, etc.)
were intentionally kept as similar as possible. System settings are listed first, followed by
detailed descriptions of the individual systems used to obtain measurements.
1.1 WPS Settings

Following are settings that were changed for the WPS performance measurements. Otherwise,
unless specifically noted in the workload description, the default settings as supplied by the
product installer were used.
1.1.1 SOABench 2008 Automated Approval and OutSourced Mode

Settings: AIX
This table shows application cluster related settings modified from their default value as
measured in sections 5.1.3, 5.1.4, and 9.10.
Setting
Value
Java Heap Megabytes
1536
Java nursery Megabytes Xmn
768
Default Thread Pool Max
100
BPEDB Data source > connection pool max
300
BPEDB Data source > WebSphere Application Server data

source properties > Statement cache size
300
BPC ME Data source > connection pool max
50
SCA SYSTEM ME Data source > connection pool max
50
WPS Common Data source > connection pool max
500
J2C activation specifications > SOABenchBPELMod2_AS

> Custom properties > maxConcurrency, maxBatchSize
50,
Resources > Asynchronous Beans > Work Managers >

BPENavigationWorkManager > Work request queue size,
max threads, growable
400,
50,
no
228
Setting
Value
Application Cluster > Business Flow Manager > Message

pool size, Intertransaction cache size
5000,
Application Cluster > Business Flow Manager > Custom

Properties > DataCompressionOptimization
400
False
These settings are common for measurements at all cores and all numbers of nodes except for the
following additional changes that were made for vertical scaling measurements:
WebContainer Thread Pool Min,Max
100, 100
com.ibm.websphere.webservices.http.maxConnection
50
1.1.2 SOABench 2008 Automated Approval and OutSourced Mode

Settings: Windows and Linux
Three systems were used for these SOABench measurements: the request driver, the WPS server,
and the DB2 database server. The WPS server with SOABench and the DB2 database server
were tuned extensively to maximize throughput; see below for details. Note that some tuning
varied due to the Operating System and the number of processor cores used for measurement.
These variations are presented in tabular format below, after the common tuning.
WPS server configuration (used for all measurements):
Production Template
Security disabled
No default or sample applications installed
Common database defined as local DB2 type 4
Business Process support established with bpeconfig.jacl (note that this sets the Data
sources > BPEDataSourceDb2 > WebSphere Application Server data source properties
statement cache to 300)
WPS server tuning (used for all measurements):
PMI disabled
HTTP maxPersistentRequests to -1
GC policy set to Xgcpolicy:gencon (see table below for nursery setting Xmn)
Remote DB2 databases (connection type 4) used for BPE, SIB System, and SIB BPC
databases
229
Automated Approval
OutSourced Approval
Cores
Cores
Tuning
Variations
1
Java Heap Megabytes
1280
1280
1280
1280
1280
Java nursery Megabytes -Xmn
640
640
640
768
768
Web Container Thread Pool Max
100
150
150
100
300
Default Thread Pool Max
100
200
200
100
200
BPE database connection pool

max
150
250
250
150
350
BPC ME database connection

pool max
30
30
30
30
150
SYSTEM ME database
connection pool max
30
40
40
30
100
Common database connection

pool max
80
80
80
80
100
160
160
160
160
BPEInternalActivationSpec
batch size
10
10
SOABenchBPELMod2_AS
batch size
32
32
200
200
Yes
Yes
J2C activation specifications >

eis/BPEInternalActivationSpec >
Custom properties >
maxConcurrency
J2C activation specifications >
SOABenchBPELMod2_AS >
Custom properties >
maxConcurrency
Java custom property

com.ibm.websphere.webservices
.http.maxConnection
40
100
40
200
40
200
Application servers > server1 >

Business Flow Manager >
230
Automated Approval
OutSourced Approval
Cores
Cores
Tuning
Variations
1

interTransactionCache.size
400
400

workManagerNavigation.messag
ePoolSize
4000
4000
Resources > Asynchronous

Beans > Work Managers >
BPENavigationWorkManager >
min threads, max threads,
request queue size
30, 30, 30
30, 30, 30

Business Process Choreographer
> Business Flow Manager >
Custom Properties >
DataCompressionOptimization
false
false
allowPerformanceOptimizations
The DB2 database server has 3 databases defined for use by the WPS server. The database logs
and tablespaces were spread across a RAID array to distribute disk utilization. The database used
for the BPC.cellname..Bus data store was not tuned. The SCA.SYSTEM.cellname.BUS database
and the BPE database were tuned as follows.
The SCA.SYSTEM.cellname.BUS database:
o
db2 update db cfg for sysdb using logbufsz 512 logfilsiz 8000 logprimary 20
logsecond 20 auto_runstats off
db2 alter bufferpool ibmdefaultbp size 30000
The BPE database was created and tuned as follows:

o
db2 CREATE DATABASE bpedb ON /raid USING CODESET UTF-8

TERRITORY en-us
db2 update db cfg for bpedb using logbufsz 512 logfilsiz 10000 logprimary 20
logsecond 10 auto_runstats off
Using the WPS generated script: db2 -tf createTablespace.sql
Using the WPS generated script: db2 -tf createSchema.sql
db2 alter bufferpool ibmdefaultbp size 132000
231
o
db2 alter bufferpool bpebp8k size 132000
1.1.3 SOABench 2008 InHouse Settings

This workload was used in the study evaluating throughput and response time for up to 10,000
concurrent users. (section 9.3) A multi tier topology was used for this study:
A Tivoli Directory Server with LDAP database for user authentication. This ran on the
support system below with the client controller.
2 support systems which each run workload generators (client agents) under the direction
of a single client controller. One support system handles asynchronous service requests
and the other handles synchronous service requests by the business processes running on
the WPS Server.
The database system was tuned in a similar fashion as for the SOABench 2008 OutSourced
scenario measurements. In addition, unused indexes were deleted per the db2 advisor.
The client systems were tuned with two considerations in mind. The first was maintaining load on
the WPS server running the workload which involved Java, thread pool and work manager
tuning. The second was to avoid problems preloading the numerous process tasks into the
system. The latter involved increasing timeouts and resources to maintain connectivity during the
preloading.
Client tuning:
o Transaction Service > tran lifetime timeout
9000
o Transaction Service > async response timeout 9000
o Transaction Service > client inactivity timeout 9000
o Transaction Service > max tran timeout
9000
o Java > max Heap
1280
o Java > -Xgcpolicy
gencon
o Java > -Xmn
512M
o Java Custom > com.ibm.websphere.webservices.http.maxConnection
o Java Custom > com.ibm.ws.webservices.writeTimeout
o Java Custom > com.ibm.ws.webservices.readTimeout
o port 9080 > TCP inbound > Max open connections
30000
o port 9080 > TCP inbound > Inactivity timeout
60
o port 9080 > HTTP inbound > Max persistent req
unlimited
o port 9080 > HTTP inbound > read timeout
6000
o port 9080 > HTTP inbound > write timeout
6000
o port 9080 > HTTP inbound > persistent timeout
3000
o Thread Pool Default min, max
50 to 300
o Thread Pool ORB min max
10 to 100
o Thread Pool WebContainer min max
100 to 400
o Thread Pool TCPChannel min max
5 to 50
unlimited
9000
9000
232
o
o
o
o
o
o
o
WebSphereSOABenchWM > Work request queue size 800

WebSphereSOABenchWM > alarm threads
800
WebSphereSOABenchWM > minimum threads
600
WebSphereSOABenchWM > maximum threads
800
Application > HSASC_WebSphereImplApp > WS client binding timeouts
Win OS CurrentControlSet\Services\Tcpip\Parameters TcpTimedWaitDelay
Win OS CurrentControlSet\Services\Tcpip\Parameters MaxUserPort
9000
20 sec
52768
For the system running the directory server the following setting was updated through the LDAP
server admin console.
o
Server Administration > Manage Server properties > Search Settings > Search Size
Limit "unlimited"
The WPS server tuning parameters for this workload are as follows.
o Transaction Service > tran lifetime timeout
900
o Transaction Service > async response timeout 900
o Transaction Service > client inactivity timeout 900
o Transaction Service > max tran timeout
900
o Business Flow Manager > Allow Perf optimizations
yes
o Business Flow Manager > Message Pool Size
4000
o Business Flow Manager > max age for stalled messages 360
o Business Flow Manager > max process time on thread 360
o Business Flow Manager > Intertransaction cache size
400
o Business Flow Manager > DataCompressionOptimization
false
o Java > Heap
1280
o Java > -Xgcpolicy:
gencon
o Java > -Xmn
768M
o Java Custom > com.ibm.websphere.webservices.http.maxConnection
150
o Java Custom > com.ibm.ws.webservices.writeTimeout 9000
o Java Custom > com.ibm.ws.webservices.readTimeout 9000
o Java Custom > com.ibm.websphere.webservices.http.waitingThreadsThreshold
o port 9080 > TCP inbound > pool > WebContainer
yes
o port 9080 > TCP inbound > Max open connections
20000
o port 9080 > TCP inbound > Inactivity timeout
60
o port 9080 > HTTP inbound > Max persistent req
unlimited
o port 9080 > HTTP inbound > read timeout
60
o port 9080 > HTTP inbound > write timeout
60
o port 9080 > HTTP inbound > persistent timeout 60
o Thread Pool Default
50 to 200
o Thread Pool ORB
10 to 50
o Thread Pool WebContainer
10 to 300
o Thread Pool TCPChannel
5 to 20
o connection pool BPE DB
25 to 350
233
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
WebSphere Application Server data source properties > Stmt cache

256
connection pool SIB BPC DB 25 to 150
connection pool SIB SYS DB 25 to 100
connection pool Common DB 10 to 100
Common DB type/location
local/derby
BPECF connections
10 to 100
BPECFC connections
10 to 100
HTMCF connections
10 to 100
BPEInt AS concurrency
160
BPEInt AS batch size
10
SOA BPEL App AS concurrency
160
SOA BPEL App AS batch size
32
BPENavigationWorkManager > Work request queue size
30
BPENavigationWorkManager > alarm threads
30
BPENavigationWorkManager > minimum threads
30
BPENavigationWorkManager > maximum threads
30
BPENavigationWorkManager > isGrowable
false
Win OS netsh int ipv4 set dynamicport tcp start=12000 num=52000
yes
Win OS CurrentControlSet\Services\Tcpip\Parameters TcpTimedWaitDelay
30 sec
Security related tuning for WPS running the InHouse scenario is as follows:
o
Java custom property: com.ibm.websphere.security.util.authCacheSize = 15000
authCache Timeout = 420 minutes
LTPA Timeout = 600 minutes
humantask people query timeout = 25200 sec
1.1.4 Banking Settings

For Banking, BPE uses a DB2 database, while the (SIB) messaging engines have been configured
to use file stores. To select the file store option, start the Profile Management Tool, select
Advanced Profile Creation and then on the Database Configuration screen, select the checkbox
next to Use a file store for Messaging Engines (MEs). For the Banking workload, the BPE
database is located on the same machine as WPS.
Tuning parameter settings for the BPE database were initially derived using the DB2
Configuration Advisor. A few key parameter settings have been modified further. These include:
MAXAPPLS which must be large enough to accommodate connections from all possible
JDBC Connection Pool threads, and
The default buffer pool sizes (number of 4K pages in IBMDEFAULTBP) for each
database are set so that each pool is 256MB in size.
The following table shows the parameter settings used for this report.
234
Parameter Name
BPEDB Setting
APP_CTL_HEAP_SZ
144
APPGROUP_MEM_SZ
13001
CATALOGCACHE_SZ
521
CHNGPGS_THRESH
55
DBHEAP
600
LOCKLIST
500
LOCKTIMEOUT
30
LOGBUFSZ
245
LOGFILSIZ
1024
LOGPRIMARY
11
LOGSECOND
10
MAXAPPLS
90
MAXLOCKS
57
MINCOMMIT
NUM_IOCLEANERS
NUM_IOSERVERS
10
PCKCACHESZ
915
SOFTMAX
440
SORTHEAP
228
STMTHEAP
2048
DFT_DEGREE
DFT_PREFETCH_SZ
32
UTIL_HEAP_SZ
11663
IBMDEFAULTBP
65536
235
In addition to these database level parameter settings, several other parameters were also
modified using the WAS Admin Console, mostly those affecting concurrency (i.e., thread
settings).
The size of the default thread pool was set to 50 threads
Database connection pool size for the BPEDB was increased to 60 and the statement
cache size for the BPEDB was increased to 300.
The maximum connections property for JMS connection pools was set to 40
Connectivity to the local database is via the DB2 JDBC Universal Driver Type 2 driver.
WPS JVM heap sizes were set to 512 MB
Formatted: Font: (Default) Times

New Roman, 11 pt, Font color: Auto
236
1.2 WESB Settings

Following are settings that were changed for the WESB performance measurements. Otherwise,
unless specifically noted in the workload description, the default settings as supplied by the
product installer were used.
1.2.1 WESB Common Settings

These settings are used for all the tests, Web Services and JMS.
Tracing is disabled
Security is disabled
Java Heap size is fixed at 1280 MB for Windows and 1280 MB for AIX
Gencon garbage collection policy enabled, setting the nursery heap size to 1024 MB.
1.2.2 WESB Settings for Web Services measurements
PMI monitoring is disabled
WebContainer Thread pool sizes set to max 50 and min 10
WebContainer Thread pool inactivity timeouts for thread pools set to 3500
1.2.3 WESB Settings for JMS measurements
Activation specification - set maximum concurrent endpoints to 50
Queue Connection factory - set the maximum connection pool size to 51
DiscardableDataBufferSize set to 10MB and CachedDataBufferSize set to 40MB
1.2.4 DB2 Settings for JMS persistent measurements

These settings are only relevant to the JMS persistent tests as they make use of the database to
persist messages.
Place Database Tablespaces and Logs on a Fast Disk Subsystem
Place Logs on Separate Device from Table Spaces
Set Buffer Pool Size Correctly
Set the Connection Min and Max to 30
Set the Statement cache size to 40
Raw partition for DB2 logs
Otherwise, unless specifically noted in the workload description, the default settings as supplied
by the product installer were used.
237
1.3 Individual Measurement System Descriptions

The individual configurations listed in this section were used in various combinations to form the
measurement and analysis environment for the workloads and studies documented in this report.
The combination of machines and software distributions used is identified with the data for each
individual workload reported.
The section headings for each of the following machines are used as identifiers in the datasheets
for various workloads. Included in each section here are settings which are specific to the
systems and any of the software modules running on that system.
1.3.1 Intel 2.0GHz - A

Hardware
IBM ThinkPad T42p, Type 2373-C61
1 x 2.0 GHz Intel Pentium M 755
Hyper-Threading not supported
2 GB RAM
L1 1 x 32KB (D) 1 x 32 KB (I), L2 1 x 2 MB caches
Hitachi Travelstar 80 GB Disk
100Mbit Ethernet
Software
Windows XP Professional, SP2
IBM WebSphere Integration Developer v6.2fFix001 - Build id: 6.2-20081216_1440
1.3.2 Intel 2.16GHz - A

Hardware
Lenovo ThinkPad T62p, Model 2007-F16
2 x 2.16 GHz Intel Mobile Core 2 Duo T7400
3 GB RAM
L1 2 x 32 KB (D) 2 x 32 KB (I), L2 1 x 4 MB caches
Hitachi Travelstar 100 GB
100Mbit Ethernet
Software
238
1.3.3 Intel 2.2 GHz D2D1

Hardware
Lenovo T60p 2.16GHz Intel Core 2 T2600
3.0 GB RAM
100 GB 7200 RPM HDD
1Gbit Ethernet
Software
Microsoft Windows XP Professional with Service Pack 2
WB Modeler 6.2.0.1 or WB Modeler 7.0.0.0
239
1.3.4 Intel 2.66GHz - A

Hardware
Lenovo ThinkCentre, Model 9196-A49
2 x 2.66 GHz Intel Core2 Duo E6750
4 GB RAM
Seagate Barracuda 250GB Disk
100Mbit Ethernet
Software
IBM WebSphere Integration Developer v7001 - Build id: 7.0.0.1_20091220_1924
1.3.5 Intel 2.66GHz - B

Hardware
Lenovo ThinkCentre, Model 9196-A49
2 x 2.66 GHz Intel Core2 Duo E6750
4 GB RAM
2 x Seagate Barracuda 250GB Disk
1Gbit Ethernet
Software
IBM WebSphere Integration Developer v7001 - Build id: 7.0.0.1_20091220_1924
IBM WebSphere Process Server v7.0.0.1 - of0950.17
IBM WebSphere Business Modeler v7.0.0.1
IBM DB2 v8.1.16.429 s080111
1.3.6 Intel 2.8GHz - A

Hardware
8686-3RQ (IBM xSeries 360)
4 x 2.8GHz Pentium 4
240
Hyperthreading disabled
4 GB RAM
L1: 8KB per physical processor
L2: 512KB per physical processor
L3: 2MB per physical processor
14x18.2GB 15K U160 RAID 0 Disk Array
100Mbit Ethernet
Software
Windows 2000 Advanced Server SP4
IBM DB2 8.1 FP7
WPS 6.1.0
1.3.7 Intel 2.8GHz - B

Hardware
IBM xSeries x365 Xeon 2.8GHz Pentium 4 (4-way SMP)
2MB L3 cache
3.5 GB RAM
IBM ServeRaid Disk Subsystem 256MB Battery backed cache, write

back cache
1 Gbit Ethernet
Software
Windows 2003 Server Standard Edition Service Pack 2
JMSPerfHarness
241
1.3.8 Intel 2.8GHz - C

Hardware
IBM xSeries x335 2.8GHz Pentium IV Xeon (1-way)
1.5 GB RAM
1 Gbit Ethernet
Software
Red Hat Enterprise Linux (RHEL 4)
HTTP traffic generator
1.3.9 Intel 2.8GHz - D

Hardware
IBM ThinkCentre, Model 8212-KUA
2 x 2.8 GHz Intel Pentium D 820
3 GB RAM
L1 2 x 16 KB, L2 2 x 1 MB caches
Western Digital 160 GB Disk
100Mbit Ethernet
Software
1.3.10
Intel 2.93GHz A
Hardware
IBM xSeries 3950M2
4 Quad-core 2.93GHz Intel Xeon CPU X7350
24GB RAM
L1 (Primary cache): 32K Instruction (I) + 32K Data (D) per processor, L2 (Secondary
cache): 8MB I+D per processor (4MB shared per 2 cores)
242
24 x 73.4 GB RAID 10 Disk Array
1 Gigabit Ethernet
Software
Windows 2008 Server Enterprise Edition
1.3.11
Intel 2.93GHz B
Hardware
7141-4SU (IBM xSeries 3950 M2)
16 x 2.93 GHz: Intel Xeon CPU X7350 (4 quad-core processors)
24 GB RAM
L1: 32K Instruction (I) + 32K Data (D) per processor
L2: 8MB I+D per processor (4MB shared per 2 cores),
two 12 x 73.4 GB 15K IBM SAS RAID 10 Disk Array
1 Gigabit Ethernet
Software
Microsoft Windows Server 2008 Enterprise SP1
Red Hat Enterprise 5.2 Linux, version 2.6.18-92.el5PAE
WPS 7.0.0.1
1.3.12
Intel 2.93GHz C
Hardware
7141-4RG (IBM xSeries 3950 M2)
40 GB RAM
One 1 x 73.4 GB 10K IBM SAS RAID 0 Disk Array
One 3 x 73.4 GB 10K IBM SAS RAID 0 Disk Array
1 Gigabit Ethernet
243
Software
WESB 7.0.0.1
1.3.13
Intel 2.93GHz D
Hardware
7141-4RG (IBM xSeries 3950 M2)
24 GB RAM
1 Gigabit Ethernet
244
Software
WPS 6.2.0 (for SOABench 2008 Services)
1.3.14
Intel 3.0GHz A
Hardware
IBM xSeries 365, 4 x 3.0 GHz Pentium 4 Xeon
Hyper-threading disabled for measurements
6 GB RAM, 4 MB L3 Cache
14 x 34 GB RAID 1E Disk Array (for WPS and DB containers)
14 x 34 GB RAID 1E Disk Array (for DB logs)
Software
Windows 2003 Server SP1
IBM DB2 8.2 FP6
WPS 6.0.0 (GMo0537.08), WPS 6.0.1 (GMo0550.06), WPS 6.0.1.1 (m0612.02),

WPS 6.0.1.2 (o0621.07), WPS 6.0.2 (m0649.11), WPS 6.1.0 (o0748.03)
1.3.15
Intel 3.0GHz - B
Hardware
IBM xSeries 365
4 x 3.0 GHz Xeon
6 GB RAM, 4 MB L3 Cache
14 x 36 GB RAID 1E Disk Array
100 Mbit Ethernet
Software
Windows 2003 Server Standard Edition SP1
IBM UDB ESE 8.1 Fix pack 13
IBM WebSphere Process Server, 6.0.2.0 Build m0649.11 with 6.0.2-WS-WPS-ESBWinX32-CritFixes.zip packaged 13 DEC 2006
SOABench 2005 (2005 specification) built on IBM WebSphere Integration

Developer 6.0.2
245
1.3.16
Intel 3.0GHz - C
Hardware
4MB L3 cache
4 GB RAM
back cache
1 Gbit Ethernet
Software
WebSphere ESB V6.2.0
WebSphere ESB V6.2.0 Fix packs :6.2.0.X-WS-WASJavaSDK-WindowsX32-IFPK58751.pak
1.3.17
Derby ( default database )
Intel 3.0GHz - D
Hardware
4MB L3 cache
3.5 GB RAM

back cache
1 Gbit Ethernet
Software
WebSphere MQ V6.0.2.2
WebSphere ESB V7.0.0.1
246
1.3.18
Intel 3.0GHz D2D2
Hardware
Lenovo ThinkCentre 9482-FBU 3.00GHz Intel Core 2 Duo E8400
4.0 GB RAM
250GB 7200 RPM S-ATA HDD
1Gbit Ethernet
Software
Red Hat Linux, kernel 2.6.18-164.6.1.e15
WB Monitor 6.2.0.2 or WB Monitor 7.0.0.0
1.3.19
Intel 3.5GHz - A
Hardware
IBM xSeries x3850 Xeon 3.5GHz (4-way Dual Core)
3 GB RAM
IBM ServeRaid Disk Subsystem with 8i SAS Controller, 256MB Battery

backed cache, write back cache
1 Gbit Ethernet
Software
IBM DB2 9.5
1.3.20
Intel 3.5GHz - B
Hardware
IBM xSeries x3850 Xeon 3.5GHz (4-way Dual Core)
16 GB RAM

1 Gbit Ethernet
Software
247
1.3.21
WebSphere Application Server V7.0
Intel 3.5GHz C
Hardware
8864-5RU (IBM xSeries 3850)
8 x 3.5 GHz Pentium 4 Intel Xeon (4 dual-core processors)
10 GB RAM
L1: 16 KB per core
L2: 1 MB per core
L3: 16 MB per physical processor
12 x 36.4 GB 15K IBM SAS RAID 10 Disk Array
1 Gb Ethernet
Software
Windows 2003 Server SP2 Enterprise Edition
WPS 6.2.0 (for the SOABench 2008 driver and services)
1.3.22
Intel 3.5 GHz D
Hardware
IBM x Series 3850, 4 dual-core 3.5GHz Pentium 4 Xeon cores
Hyperthreading disabled
16GB RAM
RAID Disk Subsystem
Software
Windows 2003 SP2
1.3.23
Intel 3.67GHz - A
Hardware
3.25 GB RAM

248
1 Gbit Ethernet
Software
JMSPerfHarness
1.3.24
Intel 3.67 GHz - B
Hardware
IBM xSeries x366
4 x Xeon 3.67GHz Pentium 4
Hyper-threading disabled
4 GB RAM
L1 cache 16KB, L2 cache 1MB
1 Gbit Ethernet
Software
SOABench 2005 Client and Human Task Simulator
1.3.25
Intel 3.67GHz - C
Hardware
IBM xSeries 366 3.66GHz Intel Xeon 1-way
2.0 GB RAM
1Gbit Ethernet
249
Software
Red Hat Linux 2.6.9-34.EL
HTTP traffic generator
1.3.26
PPC 1.9 GHz - A
Hardware
9117-570 (p5 570)
16 x 1.9GHz POWER5+ processor cores
64GB RAM
1.9 MB L2 cache share per two core
36 MB L3 cache shared per two cores
1 Gb Ethernet
Software
AIX 5300-11-01-0944
WPS 7.0.0.1
IBM HTTP Server 7
1.3.27
PPC 2.2 GHz A
Hardware
9117-570 (p5 570)
16GB RAM
4 12 x 36GB 15K U320 RAID 10 arrays
1 Gb Ethernet
Software
AIX 5300-07-01-0748
DB2 8.1 FP13 or DB2 9.5 FP 3
250
1.3.28
PPC 2.2 GHz B
Hardware
9117-570 (p5 570)
32GB RAM
4 12 x 36GB 15K U320 RAID 10 arrays
1 Gb Ethernet
Software
AIX 5300-07-01
IBM DB2 9.5 FP3
1.3.29
PPC 2.2 GHz C
Hardware
IBM POWER5 2.2 GHz 8 processor cores
16GB RAM
RAID Disk Subsystem
Software
AIX 5300-07-01-0748
DB2 9.5 Fix Pack 3
1.3.30
PPC 4.2GHz - A
Hardware
IBM pSeries 570, 4.2 GHz PPC POWER6 (8-way SMP)
SMT enabled for measurements
64 GB RAM
1 Gbit Ethernet
Software
AIX 6.1.0.0
251
1.3.31
WebSphere ESB V6.2
PPC 4.2GHz - B
Hardware
IBM pSeries 570, 4.2 GHz PPC POWER6 (8-way SMP)
SMT enabled for measurements
64 GB RAM
SAN Disk Subsystem
1 Gbit Ethernet
Software
AIX 6.1.0.0
IBM DB2 9.5 FP 2
252
1.3.32
POWER6 4.7 GHz - A
Hardware
9117-MMA (IBM Power 570)
8 x 4.7GHz POWER6 processor cores
64GB RAM
4 MB L2 cache per core
6 x 36GB 15K U320 RAID 10 array per lpar
1 Gb Ethernet per lpar
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
1.3.33
POWER6 4.7 GHz - B
Hardware
64GB RAM
6 x 36GB 15K U320 RAID 10 array per lpar
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
1.3.34
POWER6 4.7 GHz - C
Hardware
253
32GB RAM
12 x 73GB 15K SAS RAID 10 array per lpar
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
1.3.35
POWER6 4.7 GHz - D
Hardware
64GB RAM
12 x 73 GB 15K SAS RAID 10 array
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
1.3.36
POWER6 4.7 GHz E
Hardware
32GB RAM
254
12 x 73 GB 15K SAS RAID 10 array and 12 x 73 GB 15K SAS RAID 0 array
1 Gb Ethernet
Software
AIX 6100-04-01-0944
WPS 7.0.0.1
DB2 9.5 FP 3
1.3.37
POWER6 4.7 GHz F
Hardware
IBM Power 570 POWER6 4.7GHz 8 processor cores
64GB RAM
RAID Disk Subsystem
Software
AIX 6100-00-03-0808
WPS 6.2.0
Websphere Interchange Server (WICS) 4.3.0.6
Websphere ADK 2.6.0.12
WICS Webservices adapter 3.4.7
WICS XML datahandler 2.7.3
Websphere MQ 6.0.2.5
1.3.38
POWER6 4.7 GHz G
Hardware
128RAM
12 x 73GB 15K U320 RAID 10 arrays
1 Gb Ethernets configured as an 802.3ad Link Aggregation
Software
255
AIX 6100-04-01-0944
WPS 7.0.0.1
DB2 9.7 FP 1
1.3.39
POWER7 3.55 GHz A
Hardware
IBM pSeries 750
6 x 3.55 GHz POWER7 processor cores
12 GB RAM
Simultaneous Multithreading (SMT) enabled with 4 SMT threads
Software
AIX 6..1
WPS 7.0.0.1
1.3.40
POWER7 3.55 GHz B
Hardware
IBM pSeries 750
4 x 3.55 GHz POWER7 processor cores
12 GB RAM
Software
AIX 6..1
SOABench 2008 Client Driver and Services provider
Appendix B References
256
1. WebSphere BPM Performance References
https://w3quickplace.lotus.com/QuickPlace/wasperf/PageLibrary852569AF00670F15.nsf
/h_Toc/3648196DB48799C7852570EE00730294/?OpenDocument&Form=h_PageUI
2. WebSphere BPM Version 7.0 information center
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp
3. WebSphere Application Server Performance Best Practices and Resources
https://w3quickplace.lotus.com/QuickPlace/wasperf/Main.nsf/h_Toc/e600a81c8a827220
85256efb000b5116/?OpenDocument
4. WebSphere Application Server Performance URL
http://www.ibm.com/software/webservers/appserv/was/performance.html
5. WebSphere Application Server 7.0 information center (including Tuning Guide)
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websph
ere.base.doc/info/aes/ae/welcome_base.html
6. Setting up a Data Store in the Messaging Engine
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websph
ere.pmc.nd.multiplatform.doc/tasks/tjm0005_.html
7. DB2 Best Practices for Linux, UNIX, and Windows
http://www.ibm.com/developerworks/data/bestpractices/?&S_TACT=105AGX11&S_C
MP=FP
8. DB2 Version 9.7 Info Center
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp.
9. DB2 Version 9.5 Info Center
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp
10. Redbook: WebSphere BPM v7 Production Topologies
http://www.redbooks.ibm.com/redpieces/abstracts/sg247854.html
11. Redbook: IBM WebSphere InterChange Server Migration to WebSphere Process
Server
http://www.redbooks.ibm.com/redbooks/pdfs/sg247415.pdf
257
12. Red Paper: WebSphere Business Process Management v7 Performance Tuning

http://www.redbooks.ibm.com/redbooks.nsf/RedpieceAbstracts/redp4664.html?Open
13. WebSphere Adapters v7.0 Performance Report
https://w3quickplace.lotus.com/QuickPlace/wasperf/PageLibrary852569AF00670F15.nsf/
h_B5E907903D033001852570EE00732B7E/2EE6499DD40C9569852576FD00422032/?
OpenDocument
14. WPS Wiki

http://w3.tap.ibm.com/w3ki2/display/WPS/Home
15. Microflows vs. Long-running Processes : Tuning Transaction Boundaries
https://w3quickplace.lotus.com/QuickPlace/wasperf/PageLibrary852569AF00670F15.nsf
/h_0DD72A0FDA0EFC0785256E010040EEC1/74341DDA82E3C4F78525729000676B
D7/?OpenDocument
16. Using JCA Adapters with WPS and WESB

http://www-128.ibm.com/developerworks/library/ws-soa-j2caadapter/index.html?ca=drs-
17. WPS Support

http://www-306.ibm.com/software/integration/wps/support/
18. WESB Support
http://www-306.ibm.com/software/integration/wsesb/support/
19. IBM Java 6.0 Diagnostic Guide

http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp
20. Oracle Database 11g Release 1 documentation (includes a Performance Tuning Guide):
21. A white paper discussing Oracle on AIX:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100883
22. Oracle 10g Release 2 documentation (includes a Performance Tuning Guide)

WebSphere BPM&C 7.WebSphere Business Process Management (BPM) 7.0.0.1 Performance Report0.0.1 Performance Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

WebSphere BPM&C 7.WebSphere Business Process Management (BPM) 7.0.0.1 Performance Report0.0.1 Performance Report

Uploaded by

Copyright:

Available Formats

WebSphere Business Process Management (BPM)

Best Practices, Tuning and Configuration, and Measurements for the

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Business Process Management (BPM)

Copyright IBM Corporation 2005, 2010. All right reserved.

ARCHITECTURE BEST PRACTICES............................................................................................ 8

DEVELOPMENT BEST PRACTICES........................................................................................... 23

Copyright IBM Corporation 2005, 2010. All right reserved.

PERFORMANCE TUNING AND CONFIGURATION................................................................ 37

WEBSPHERE PROCESS SERVER 7.0.0.1 PERFORMANCE RESULTS ................................ 72

Copyright IBM Corporation 2005, 2010. All right reserved.

Horizontal (clustered) scaling on AIX POWER6.................................................................. 78

WEBSPHERE ESB 7.0.0.1 PERFORMANCE RESULTS ............................................................ 85

WEBSPHERE BUSINESS MONITOR 7.0.0.0 PERFORMANCE RESULTS.......................... 118

INTERACTIVE PROCESS DESIGN IMPROVEMENTS .......................................................................... 118

WID 7.0.0.1 AND MODELER 7.0.0.1 PERFORMANCE RESULTS ........................................ 120

DIRECTED STUDIES .................................................................................................................... 128

Copyright IBM Corporation 2005, 2010. All right reserved.

9.11.2 MDB Connection Behavior................................................................................................. 151

WEBSPHERE PROCESS SERVER CORE WORKLOADS................................................ 194

WEBSPHERE ESB CORE WORKLOADS............................................................................ 216

Copyright IBM Corporation 2005, 2010. All right reserved.

WID AND MODELER CORE WORKLOADS ...................................................................... 226

ORDER PROCESSING ................................................................................................................. 226

APPENDIX A - MEASUREMENT CONFIGURATIONS................................................................... 227

Copyright IBM Corporation 2005, 2010. All right reserved.

PPC 4.2GHz - A.................................................................................................................. 250

APPENDIX B REFERENCES ............................................................................................................. 256

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server (WPS) 7.0.0.1

WebSphere Enterprise Service Bus (WESB) 7.0.0.1

WebSphere Integration Developer (WID) 7.0.0.1

WebSphere Business Monitor (Monitor) 7.0.0.0

WebSphere Business Modeler (Modeler) 7.0.0.1

WebSphere Process Server allows the deployment of standards-based business

WebSphere Enterprise Service Bus provides the capabilities of a standards-based

WebSphere Integration Developer is the development environment for building

Copyright IBM Corporation 2005, 2010. All right reserved.

Copyright IBM Corporation 2005, 2010. All right reserved.

1.2 Additions in this report

SOABench 2008 is introduced, and is used extensively to demonstrate the performance

POWER7 data for AIX systems is shown for WPS.

WebSphere Business Modeler response times when deploying models to WPS or WB

Business Space response time for Human Workflow scenarios

Response Time and throughput for up to 10,000 concurrent WPS users

WICS to WPS migration best practices, tuning, and performance

Scaling up production deployments

64-bit improvements in throughput and memory utilization

Large Object performance and capability

The performance effect of a varying number of process instances

Process instance migration performance

The relationship between BO size and throughput

A comparison of the performance of various messaging bindings

A comparison of web services performance using JAX-WS and JAX-RPC

A comparison of dynamic vs. static routing in mediations

XSL Transform vs. BO Map performance

A modularity study comparing composite (single module) vs. chained (multiple

Authoring performance using a variety of workloads, hardware platforms, and

Copyright IBM Corporation 2005, 2010. All right reserved.

1.3 Summary of Key Measurements

WPS 7.0.0.1 improved dramatically in several areas, including

POWER7 throughput that is up to 50% faster than a POWER 6 system, with a

Clustering data that shows near-linear horizontal scaling with up to 8 nodes,