You are on page 1of 266

WebSphere Business Process Management (BPM)

7.0.0.1
Performance Report

Best Practices, Tuning and Configuration, and Measurements for the


following products:
WebSphere Process Server (WPS) 7.0.0.1
WebSphere Enterprise Server Bus (WESB) 7.0.0.1
WebSphere Integration Developer (WID) 7.0.0.1
WebSphere Business Monitor (Monitor) 7.0.0.0
WebSphere Business Modeler (Modeler) 7.0.0.1

IBM Corporation
WebSphere Business Process Management Performance Team

March 2010

Introduction

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

ii

WebSphere Business Process Management (BPM)


7.0.0.1
Performance Report
Best Practices, Tuning and Configuration, and Measurements for the
following products:
WebSphere Process Server (WPS) 7.0.0.1
WebSphere Enterprise Server Bus (WESB) 7.0.0.1
WebSphere Integration Developer (WID) 7.0.0.1
WebSphere Business Monitor (Monitor) 7.0.0.0
WebSphere Business Modeler (Modeler) 7.0.0.1

IBM Corporation
WebSphere Business Process Management Performance Team

March 2010
This publication is unclassified, but it is not intended for general or broad public circulation.
The purpose is to provide detailed performance data, best practices, and tuning information for
the products covered. The target audience is software services and technical support specialists.
The expected usage is to provide guidance in making rational configuration choices for proofs of
concept and for product deployments.
Though the content can be shared with customers, preferably in a one-on-one discussion, the
information is not intended as general sales material.

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

iii

INTRODUCTION ............................................................................................................................... 1
1.1 OVERVIEW ........................................................................................................................................ 1
1.2 ADDITIONS IN THIS REPORT ............................................................................................................... 3
1.3 SUMMARY OF KEY MEASUREMENTS ................................................................................................ 4
1.4 DOCUMENT STRUCTURE AND USAGE GUIDELINES ........................................................................... 6
1.4.1
Document Structure................................................................................................................ 6
1.4.2
Measurement Usage Guidelines ............................................................................................. 7

ARCHITECTURE BEST PRACTICES............................................................................................ 8


2.1 OVERVIEW ........................................................................................................................................ 8
2.2 TOP TUNING AND DEPLOYMENT GUIDELINES ................................................................................... 9
2.3 MODELING ...................................................................................................................................... 11
2.3.1
Choose non-interruptible over interruptible (long running) processes whenever possible.. 11
2.3.2
Choose query tables over standard query API for task list and process list queries............ 11
2.3.3
Choose the appropriate granularity for a process................................................................ 11
2.3.4
Use Events Judiciously ......................................................................................................... 12
2.3.5
Choose efficient Meta-Data management............................................................................. 13
2.3.6
Considerations when choosing between business processes and business state machines .. 14
2.3.7
Minimize state transitions in BSM ........................................................................................ 14
2.4 TOPOLOGY ...................................................................................................................................... 15
2.4.1
Deploy appropriate hardware .............................................................................................. 15
2.4.2
Use a high performing database (such as DB2) ................................................................... 15
2.4.3
Deploy local modules in the same server ............................................................................. 15
2.4.4
Best Practices for Clustering................................................................................................ 15
2.4.5
Evaluate service providers and external interfaces.............................................................. 16
2.5 LARGE OBJECTS .............................................................................................................................. 17
2.5.1
Factors Affecting Large Object Size Processing .................................................................. 17
2.5.2
Large Object Design Patterns .............................................................................................. 18
2.6 64-BIT CONSIDERATIONS ................................................................................................................ 19
2.7 WEBSPHERE BUSINESS MONITOR ................................................................................................... 21
2.7.1
Event Processing .................................................................................................................. 21
2.7.2
Dashboard ............................................................................................................................ 21
2.7.3
Database Server ................................................................................................................... 21

DEVELOPMENT BEST PRACTICES........................................................................................... 23


3.1 INTRODUCTION ............................................................................................................................... 23
3.2 SCA CONSIDERATIONS ................................................................................................................... 23
3.2.1
Cache results of ServiceManager.locateService() ................................................................ 23
3.2.2
Reduce the number of SCA Modules, when appropriate ...................................................... 23
3.2.3
Use synchronous SCA bindings across local modules.......................................................... 24
3.2.4
Utilize multi-threaded SCA clients to achieve concurrency ................................................. 24
3.2.5
Add Quality of Service Qualifiers at appropriate level ........................................................ 24
3.3 BUSINESS PROCESS CONSIDERATIONS ............................................................................................ 24
3.3.1
Modeling best practices for activities in a business process ................................................ 24
3.3.2
Do not use 2-way synchronous invocation of long running business processes................... 24
3.3.3
Minimize number and size of BPEL variables and BOs ....................................................... 25
3.4 HUMAN TASK CONSIDERATIONS..................................................................................................... 25
3.5 BUSINESS PROCESS AND HUMAN TASKS CLIENT CONSIDERATIONS ............................................... 25
3.6 TRANSACTIONALITY CONSIDERATIONS .......................................................................................... 26
3.6.1
Exploit SCA transaction qualifiers ....................................................................................... 27
3.6.2
Avoid two-way synchronous invocation of an asynchronous target ..................................... 27
3.6.3
Exploit transactional attributes for BPEL activities in long-running processes .................. 27
3.7 INVOCATION STYLE CONSIDERATIONS ........................................................................................... 28

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

iv

3.7.1
Use Asynchrony judiciously.................................................................................................. 28
3.7.2
Set the Preferred Interaction Style to Sync whenever possible............................................. 28
3.7.3
Avoid Asynchronous Invocation of Synchronous Services in a FanOut / FanIn Block ........ 29
3.8 MEDIATION FLOW CONSIDERATIONS .............................................................................................. 30
3.8.1
Use mediations that benefit from WESB optimizations ........................................................ 30
3.8.2
Usage of XSLTs vs. BO Maps ............................................................................................... 32
3.8.3
Configure WESB Resources ................................................................................................. 32
3.9 LARGE OBJECT BEST PRACTICES .................................................................................................... 33
3.9.1
Avoid lazy cleanup of resources ........................................................................................... 33
3.9.2
Avoid tracing when processing large BOs............................................................................ 33
3.9.3
Avoid buffer-doubling code .................................................................................................. 33
3.9.4
Make use of deferredparsing friendly mediations for XML docs........................................ 33
3.10
WICS MIGRATION CONSIDERATIONS ......................................................................................... 34
3.11
WID CONSIDERATIONS .............................................................................................................. 35
3.11.1 Leverage Hardware Advantages .......................................................................................... 35
3.11.2 Make use of WAS shared libraries in order to reduce memory consumption....................... 35
3.12
FABRIC CONSIDERATIONS .......................................................................................................... 35
3.12.1 Only specify pertinent context properties in context specifications...................................... 35
3.12.2 Bound the range of values for context keys .......................................................................... 35
4

PERFORMANCE TUNING AND CONFIGURATION................................................................ 37


4.1 INTRODUCTION ............................................................................................................................... 37
4.2 PERFORMANCE TUNING METHODOLOGY ........................................................................................ 38
4.3 TUNING CHECKLIST ........................................................................................................................ 40
4.4 TUNING PARAMETERS ..................................................................................................................... 42
4.4.1
Tracing and Logging flags.................................................................................................... 42
4.4.2
Java tuning parameters ........................................................................................................ 42
4.4.3
MDB ActivationSpec............................................................................................................. 43
4.4.4
Thread Pool Sizes ................................................................................................................. 43
4.4.5
JMS Connection Pool Sizes .................................................................................................. 43
4.4.6
JDBC DataSource Parameters............................................................................................. 44
4.4.7
Messaging Engine Properties............................................................................................... 44
4.4.8
Run production servers in production .................................................................................. 45
4.5 ADVANCED TUNING ........................................................................................................................ 45
4.5.1
Tracing and Monitoring considerations ............................................................................... 45
4.5.2
Tuning for Large Objects...................................................................................................... 45
4.5.3
Tuning for Maximum Concurrency....................................................................................... 46
4.5.4
Messaging Tuning................................................................................................................. 49
4.5.5
Web Services Tuning ............................................................................................................ 54
4.5.6
Business Process Choreographer Tuning............................................................................. 54
4.5.7
WESB Tuning........................................................................................................................ 56
4.5.8
Clustered Topology Tuning .................................................................................................. 57
4.5.9
WebSphere Business Monitor Tuning................................................................................... 58
4.5.10 Database: General Tuning ................................................................................................... 59
4.5.11 Database: DB2 Specific Tuning ........................................................................................... 60
4.5.12 Database: Oracle Specific Tuning........................................................................................ 65
4.5.13 Advanced Java Heap Tuning ................................................................................................ 67
4.5.14 Power Management Tuning.................................................................................................. 71
4.5.15 WPS Tuning for WICS migrated workloads ......................................................................... 71

WEBSPHERE PROCESS SERVER 7.0.0.1 PERFORMANCE RESULTS ................................ 72


5.1 SOABENCH 2008 CHOREOGRAPHY FACET .................................................................................. 73
5.1.1
Automated Approval on Windows 2008 and RHE Linux 5.2 ................................................ 73
5.1.2
OutSourced on Windows 2008 and RHE Linux 5.2 .............................................................. 74
5.1.3
Vertical (SMP) scaling on AIX POWER6............................................................................. 75

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction
5.1.4
5.1.5
6

Horizontal (clustered) scaling on AIX POWER6.................................................................. 78


Automated Approval on AIX POWER7 ................................................................................ 82

WEBSPHERE ESB 7.0.0.1 PERFORMANCE RESULTS ............................................................ 85


6.1 WINDOWS RESULTS......................................................................................................................... 86
6.1.1
Web Services Binding ........................................................................................................... 86
6.1.2
JMS Binding Non Persistent .............................................................................................. 95
6.1.3
JMS Binding Persistent...................................................................................................... 99
6.2 AIX RESULTS ................................................................................................................................ 103
6.2.1
Web Services Binding ......................................................................................................... 103
6.2.2
JMS Binding Non Persistent ............................................................................................ 113
6.2.3
JMS Binding Persistent.................................................................................................... 115
6.2.4
Web Services Binding SMP scaling .................................................................................... 117

WEBSPHERE BUSINESS MONITOR 7.0.0.0 PERFORMANCE RESULTS.......................... 118


7.1

INTERACTIVE PROCESS DESIGN IMPROVEMENTS .......................................................................... 118

WID 7.0.0.1 AND MODELER 7.0.0.1 PERFORMANCE RESULTS ........................................ 120


8.1 BUILD ACTIVITIES ........................................................................................................................ 120
8.1.1
Order Processing Workload ............................................................................................... 121
8.1.2
Loan Processing Workload................................................................................................. 122
8.1.3
Customer Service Workload ............................................................................................... 123
8.2 PUBLISH ACTIVITIES ..................................................................................................................... 125
8.2.1
Publish Including Generation of Deploy Code................................................................... 125
8.2.2
Publish with Deploy Code Cached in the Application........................................................ 126
8.3 DIRECT DEPLOY ACTIVITIES ......................................................................................................... 127

DIRECTED STUDIES .................................................................................................................... 128


9.1 THROUGHPUT FOR 32-BIT JVM ON 32-BIT AND 64-BIT WINDOWS ................................................ 128
9.2 THROUGHPUT AND MEMORY USAGE FOR 64 BIT JVM ON AIX...................................................... 129
9.2.1
Introduction ........................................................................................................................ 129
9.2.2
Throughput Results............................................................................................................. 129
9.2.3
Memory Footprint Results .................................................................................................. 130
9.3 THROUGHPUT AND RESPONSE TIME FOR UP TO 10,000 CONCURRENT USERS ................................. 131
9.3.1
Introduction ........................................................................................................................ 131
9.3.2
Results 4 WPS server cores ............................................................................................. 132
9.3.3
Results 8 WPS server cores ............................................................................................. 133
9.4 BUSINESS SPACE RESPONSE TIME FOR HUMAN WORKFLOW ......................................................... 134
9.5 PROCESS INSTANCE MIGRATION PERFORMANCE .......................................................................... 136
9.6 BPC QUERY RESPONSE TIME ....................................................................................................... 137
9.6.1
Query Table Response Time ............................................................................................... 138
9.6.2
BPC Explorer Response Time (WPS 6.2.0 data) ................................................................ 140
9.7 WPS RELEASE-TO-RELEASE IMPROVEMENTS ............................................................................... 142
9.7.1
SOABench 2008 Automated Approval (microflow) ............................................................ 142
9.7.2
Banking (macroflow) .......................................................................................................... 142
9.8 IMPACT OF VARYING NUMBER OF ACTIVE BUSINESS PROCESS INSTANCES .................................. 143
9.8.1
Throughput as increase Preloaded Process Instances ....................................................... 143
9.8.2
Database System Behavior ................................................................................................. 145
9.9 IMPACT OF BUSINESS OBJECT SIZE ON THROUGHPUT .................................................................... 148
9.10
TOPOLOGY STUDY: SMP VS. CLUSTERED WPS ....................................................................... 150
9.10.1 Overview............................................................................................................................. 150
9.10.2 Automated Approval Mode ................................................................................................. 150
9.10.3 OutSourced Mode ............................................................................................................... 151
9.11
SINGLE CLUSTER DEPLOYMENT ENVIRONMENT PATTERN....................................................... 151
9.11.1 Overview............................................................................................................................. 151

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

vi

9.11.2 MDB Connection Behavior................................................................................................. 151


9.11.3 Topology ............................................................................................................................. 153
9.11.4 Workload ............................................................................................................................ 154
9.11.5 Results................................................................................................................................. 154
9.11.6 Summary ............................................................................................................................. 155
9.12
SCALING UP PRODUCTION DEPLOYMENTS ................................................................................ 156
9.13
WICS TO WPS MIGRATION ..................................................................................................... 156
9.14
LARGE OBJECT SIZE STUDY ...................................................................................................... 158
9.14.1 Introduction and Caveats ................................................................................................... 158
9.14.2 Large Objects in WPS......................................................................................................... 159
9.14.3 Large Objects in WESB ...................................................................................................... 161
9.15
MESSAGING BINDING COMPARISON USING WESB .................................................................. 163
9.15.1 Messaging Binding Comparison Non Persistent ............................................................. 164
9.15.2 Messaging Binding Comparison Persistent..................................................................... 165
9.16
XSL TRANSFORM (XSLT) VS. BOMAP PRIMITIVES USING WESB........................................... 166
9.17
MODULARITY IMPACT - COMPOSITE VS. CHAINED MEDIATIONS.............................................. 167
9.18
THROUGHPUT USING JAX-WS VS. JAX-RPC FOR WEB SERVICES .......................................... 169
9.19
AUTHORING STUDIES ............................................................................................................... 171
9.19.1 Summary of Key Measurements.......................................................................................... 171
9.19.2 Hardware Study Server vs. Desktop systems ................................................................... 172
9.19.3 Deployment Strategy Study................................................................................................. 174
9.20
BPM 6.2.0 DIRECTED STUDIES ................................................................................................ 175
9.20.1 Impact of Enabling Security at Runtime ............................................................................. 175
9.20.2 Remote Messaging Deployment Environment Startup Time and Footprint ....................... 177
9.20.3 Authoring - Shared Libraries Study.................................................................................... 179
9.20.4 Authoring - Hardware Comparison Study.......................................................................... 181
9.20.5 Dynamic/Static Routing Comparison using WESB............................................................. 184
9.20.6 WESB Client Scaling .......................................................................................................... 185
9.20.7 Local versus remote SCA bindings WPS 6.1.0 data ........................................................ 193
10

WEBSPHERE PROCESS SERVER CORE WORKLOADS................................................ 194


10.1
INTRODUCTION......................................................................................................................... 194
10.2
CONTACT MANAGER ................................................................................................................ 195
10.2.1 SCA Synchronous Binding.................................................................................................. 196
10.2.2 Web Services Binding ......................................................................................................... 197
10.3
BANKING.................................................................................................................................. 198
10.3.1 Banking Workload Description .......................................................................................... 198
10.3.2 Banking Scenarios .............................................................................................................. 198
10.3.3 Banking Services................................................................................................................. 201
10.4
SOABENCH 2008 CHOREOGRAPHY FACET ........................................................................... 202
10.4.1 Overview............................................................................................................................. 202
10.4.2 Automated Approval Scenario details ................................................................................ 203
10.4.3 Outsourced Scenario details............................................................................................... 204
10.4.4 InHouse Scenario details.................................................................................................... 207
10.5
SOABENCH 2005 (USED IN PREVIOUS PERFORMANCE REPORTS) ............................................. 210
10.5.1 Overview............................................................................................................................. 210
10.5.2 Choreography facet: Automated Approval......................................................................... 212
10.5.3 Choreography facet: Manual Approval.............................................................................. 213

11

WEBSPHERE ESB CORE WORKLOADS............................................................................ 216


11.1
WEB SERVICES TEST SCENARIO ............................................................................................... 217
11.1.1 Web Services Fan Out / Fan In Mediation ......................................................................... 218
11.2
JMS TEST SCENARIOS .............................................................................................................. 219
11.2.1 JMS Binding test topology .................................................................................................. 219
11.2.2 MQ JMS and MQ Binding Test topology............................................................................ 219

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

vii

11.3
SOABENCH 2008 MEDIATION FACET ................................................................................... 221
11.3.1 Transformation Mediations ................................................................................................ 221
11.3.2 Routing Mediations............................................................................................................. 221
11.3.3 Composite mediation .......................................................................................................... 222
11.3.4 Chained mediation.............................................................................................................. 224
11.4
SOABENCH 2008 MEDIATION FACET MESSAGE SIZES ............................................................ 225
12

WID AND MODELER CORE WORKLOADS ...................................................................... 226


12.1
12.2
12.3
12.4

ORDER PROCESSING ................................................................................................................. 226


LOAN PROCESSING ................................................................................................................... 226
CUSTOMER SERVICE ................................................................................................................ 226
BPM@WORK .......................................................................................................................... 226

APPENDIX A - MEASUREMENT CONFIGURATIONS................................................................... 227


1.1 WPS SETTINGS ............................................................................................................................. 227
1.1.1
SOABench 2008 Automated Approval and OutSourced Mode Settings: AIX ..................... 227
1.1.2
SOABench 2008 Automated Approval and OutSourced Mode Settings: Windows and Linux
228
1.1.3
SOABench 2008 InHouse Settings...................................................................................... 231
1.1.4
Banking Settings ................................................................................................................. 233
1.2 WESB SETTINGS .......................................................................................................................... 236
1.2.1
WESB Common Settings ..................................................................................................... 236
1.2.2
WESB Settings for Web Services measurements................................................................. 236
1.2.3
WESB Settings for JMS measurements ............................................................................... 236
1.2.4
DB2 Settings for JMS persistent measurements ................................................................. 236
1.3 INDIVIDUAL MEASUREMENT SYSTEM DESCRIPTIONS ................................................................... 237
1.3.1
Intel 2.0GHz - A.................................................................................................................. 237
1.3.2
Intel 2.16GHz - A................................................................................................................ 237
1.3.3
Intel 2.2 GHz D2D1......................................................................................................... 238
1.3.4
Intel 2.66GHz - A................................................................................................................ 239
1.3.5
Intel 2.66GHz - B................................................................................................................ 239
1.3.6
Intel 2.8GHz - A.................................................................................................................. 239
1.3.7
Intel 2.8GHz - B.................................................................................................................. 240
1.3.8
Intel 2.8GHz - C.................................................................................................................. 241
1.3.9
Intel 2.8GHz - D ................................................................................................................. 241
1.3.10 Intel 2.93GHz A ............................................................................................................... 241
1.3.11 Intel 2.93GHz B ............................................................................................................... 242
1.3.12 Intel 2.93GHz C............................................................................................................... 242
1.3.13 Intel 2.93GHz D............................................................................................................... 243
1.3.14 Intel 3.0GHz A ................................................................................................................. 244
1.3.15 Intel 3.0GHz - B.................................................................................................................. 244
1.3.16 Intel 3.0GHz - C.................................................................................................................. 245
1.3.17 Intel 3.0GHz - D ................................................................................................................. 245
1.3.18 Intel 3.0GHz D2D2.......................................................................................................... 246
1.3.19 Intel 3.5GHz - A................................................................................................................. 246
1.3.20 Intel 3.5GHz - B................................................................................................................. 246
1.3.21 Intel 3.5GHz C................................................................................................................. 247
1.3.22 Intel 3.5 GHz D................................................................................................................ 247
1.3.23 Intel 3.67GHz - A................................................................................................................ 247
1.3.24 Intel 3.67 GHz - B............................................................................................................... 248
1.3.25 Intel 3.67GHz - C................................................................................................................ 248
1.3.26 PPC 1.9 GHz - A................................................................................................................. 249
1.3.27 PPC 2.2 GHz A................................................................................................................ 249
1.3.28 PPC 2.2 GHz B................................................................................................................ 250
1.3.29 PPC 2.2 GHz C................................................................................................................ 250

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction
1.3.30
1.3.31
1.3.32
1.3.33
1.3.34
1.3.35
1.3.36
1.3.37
1.3.38
1.3.39
1.3.40

viii

PPC 4.2GHz - A.................................................................................................................. 250


PPC 4.2GHz - B.................................................................................................................. 251
POWER6 4.7 GHz - A......................................................................................................... 252
POWER6 4.7 GHz - B......................................................................................................... 252
POWER6 4.7 GHz - C ........................................................................................................ 252
POWER6 4.7 GHz - D ........................................................................................................ 253
POWER6 4.7 GHz E........................................................................................................ 253
POWER6 4.7 GHz F........................................................................................................ 254
POWER6 4.7 GHz G ....................................................................................................... 254
POWER7 3.55 GHz A ...................................................................................................... 255
POWER7 3.55 GHz B ...................................................................................................... 255

APPENDIX B REFERENCES ............................................................................................................. 256

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

1 Introduction
1.1 Overview
This document is the fifth in a series of detailed performance reports for the WebSphere Business
Process Management (WebSphere BPM) product line. The report is authored by the IBM
WebSphere BPM performance team, with members in Austin Texas, Bblingen Germany, and
Hursley England. It explores the performance characteristics of the following products:

WebSphere Process Server (WPS) 7.0.0.1

WebSphere Enterprise Service Bus (WESB) 7.0.0.1

WebSphere Integration Developer (WID) 7.0.0.1

WebSphere Business Monitor (Monitor) 7.0.0.0

WebSphere Business Modeler (Modeler) 7.0.0.1

These products represent an integrated development and runtime environment based on a key set
of Service-Oriented Architecture (SOA) and Business Process Management (BPM) technologies:
Service Component Architecture (SCA), Service Data Object (SDO), and Business Process
Execution Language for Web Services (BPEL). These technologies in turn build on the core
capabilities of the WebSphere Application Server (WAS) 7.0 product.
A short description of each product covered in this report follows:

WebSphere Process Server allows the deployment of standards-based business


integration applications in a service-oriented architecture (SOA), which takes everyday
business applications and breaks them down into individual business functions and
processes, rendering them as services. Based on the robust J2EE infrastructure and
associated platform services provided by WebSphere Application Server, WebSphere
Process Server can help you meet current business integration challenges. This includes,
but is not limited to, business process automation.

WebSphere Enterprise Service Bus provides the capabilities of a standards-based


enterprise service bus. WESB manages the flow of messages between service
requesters and service providers. Mediation modules within WESB handle
mismatches between requesters and providers, including protocol or interactionstyle, interface and quality of service mismatches.

WebSphere Integration Developer is the development environment for building


WebSphere BPM solutions. It is a common tool for building service-oriented architecture
(SOA)-based integration solutions across WebSphere Process Server, WebSphere
Enterprise Service Bus, and other WebSphere BPM products.

WebSphere Business Monitor provides the ability to monitor business processes in realtime, providing a visual display of business process status, business performance metrics,
and key business performance indicators, together with alerts and notifications to key
users that enables continuous improvement of business processes.

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

WebSphere Business Modeler is IBMs premier business process modeling and analysis
tool for business users. It offers process modeling, simulation, and analysis capabilities
to help business users understand, document, and deploy business processes for
continuous improvement.

In addition to performance results, this document discusses the performance implications of the
supporting runtime environment, and describes best practices and tuning and configuration
parameters for the different software technologies involved.
We envision this report to be read by a wide variety of groups, both within IBM (development,
services, technical sales, etc.) and by customers. Please note that this document should not be
considered as a comprehensive sizing or capacity planning guide, though the document serves as
a useful reference for these activities.
The systems used to obtain measurements are intended to be representative mixes of potential
development and deployment systems running Windows, AIX, or Linux (note that there is a
separate performance report for WebSphere BPM products on z/OS). While we report results in
many cases on more than one hardware platform, this report is not intended for the purpose of
evaluating relative hardware performance between platforms. Many configurations are run
with some of the processor cores disabled, hyperthreading disabled, or both. While these changes
are marked on the charts, the reader should consider these before attempting any comparisons.
Finally, the workloads used to obtain measurements in this report are internal workloads (i.e., not
publicly available) that are designed to mimic customer usage patterns. Please see the workload
descriptions in this document for further information.
For those who are either considering or are in the very early stages of implementing a solution
incorporating these products, this document should prove a useful reference, both in terms of
best practices during application development and deployment, and as a reference for setup,
tuning and configuration information. It provides a useful introduction to many of the issues
influencing each product's performance, and can serve as a guide for making rational first choices
in terms of configuration and performance settings.
Similarly, those who have already implemented a solution utilizing these products might
effectively use the information presented here to attempt to match, to the extent possible, their
own workload characteristics to those presented here. By relating these characteristics to their
own workloads, the user is much more likely to gain insight as to what performance they might
expect, what possible inhibitors to better performance may be present, and how their overall
integrated solution performance may be improved.
All of these products build on the capabilities of the WAS infrastructure which runs on Java
Virtual Machines (JVMs), so BPM solutions also benefit from tuning, configuration, and best
practices information for WAS and corresponding platform JVMs (documented in the References
appendix). The reader is encouraged to use this report in conjunction with these references.
Please address questions or comments about this document to Mike Collins at
mcollin@us.ibm.com or Mike Collins/Austin/IBM.

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

1.2 Additions in this report


As in previous editions of this report, measurements, best practices, and tuning guidance are
provided for WPS, WESB, WID, and Monitor. In addition, this report adds performance
information for the products and capabilities shown below. The Table of Contents shows specific
links to each of these.

SOABench 2008 is introduced, and is used extensively to demonstrate the performance


characteristics of WPS and WESB. SOABench 2008 replaces SOABench 2005, which
was used in previous reports. Note that the content of the Choreography Facet in
SOABench 2008 is quite different than the 2005 version, so the reader should not
compare performance across the different versions of the SOABench workload.

POWER7 data for AIX systems is shown for WPS.

WebSphere Business Modeler response times when deploying models to WPS or WB


Monitor.

The following directed studies are either added or enhanced relative to the 6.2.0 report:
o

Business Space response time for Human Workflow scenarios

Response Time and throughput for up to 10,000 concurrent WPS users

Partitioning large systems the effect of utilizing a single instance vs. clustering,
and the performance of a single cluster deployment pattern

WICS to WPS migration best practices, tuning, and performance

Scaling up production deployments

64-bit improvements in throughput and memory utilization

Large Object performance and capability

The performance effect of a varying number of process instances

Process instance migration performance

The relationship between BO size and throughput

WPS performance for a 32-bit JVM on 32-bit and 64-bit Windows systems

A comparison of the performance of various messaging bindings

A comparison of web services performance using JAX-WS and JAX-RPC

A comparison of dynamic vs. static routing in mediations

XSL Transform vs. BO Map performance

A modularity study comparing composite (single module) vs. chained (multiple


modules) performance

Authoring performance using a variety of workloads, hardware platforms, and


deployment strategies.

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

1.3 Summary of Key Measurements


This document is the fifth version of WebSphere Business Process Management (WebSphere
BPM) Performance Reports; the previous report presented the performance of the 6.2.0 versions
of these products. Highlights in this report include:

WPS 7.0.0.1 improved dramatically in several areas, including


o

POWER7 throughput that is up to 50% faster than a POWER 6 system, with a


throughput rate of 1,408 Claims Completed Per Second (CCPS) using
SOABench 2008 Automated Approval on a 6 core system, demonstrating an
SMP scaling factor of 5.4x out of 6.

Clustering data that shows near-linear horizontal scaling with up to 8 nodes,


delivering 5,400 CCPS using SOABench 2008 Automated Approval Mode, 7.8
times better throughput than on a single node.

SMP scaling data that demonstrates outstanding vertical scaling using AIX
systems, as shown by SOABench 2008 Automated Approval Mode 8 core
scaling of 7.3x and 16 core scaling of 11.9x, delivering throughput over 2,000
transactions per second.

Continuing the drum beat of release to release improvements, delivering a 23%


improvement over WPS 6.2.0.1, and a 2.5x improvement over WPS 6.0.2.1,
measured using SOABench 2008 Automated Approval.

Measurements on Red Hat Enterprise Linux 5.2 that show a throughput rate of
665 transactions per second using SOABench 2008 Automated Approval Mode
on an 8 core Intel system, an SMP scaling factor of 6.2x.

Support for 10,000 concurrent users with sub-second response times for long
running processes including Query Task, Claim Task, and Complete Task
operations.

Business Space response time improved by up to 55% relative to the 6.2.0.2based Feature Pack, assessed using Human Workflow widgets.

Migrated WICS application performance that demonstrates a WICS application (Contact


Manager) migrated to WPS using the 7.0.0.1 tooling delivers 1,004 business transactions
per second on an 8 core AIX POWER6 system, a 54% improvement over a WPS 6.2.0
migrated workload. Further, the WPS 7.0.0.1 throughput is essentially equivalent to the
throughput delivered by the original WICS application.

Dramatic improvements in Direct To Deploy response time:

2.7x faster deploying the BPM@Work model from WB Modeler to WPS 7.0.0.1

2x faster deploying the Vacation Process model from WB Modeler to WB


Monitor 7.0.0.0

Authoring build and publish data, highlighted by:


o

Clean & Build response time of the Customer Service workspace shows a 45%
improvement from version 6.2.0

Peak memory utilization while building the Customer Service workspace shows a
32% improvement compared with WPS 6.2.0.

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction
o

5
Response time to publish the Loan Processing workspace with Resources on
Server shows a 1.9x improvement compared with version 6.2.0

WESB 7.0.0.1 measurements, including:


o

Web Services binding delivers throughput improvements of up to 150% for the


JAX-WS transform namespace mediation

JAX-WS binding now faster than JAX-RPC binding for Web Services

JMS binding delivers throughput improvements of up to 22%.

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

1.4 Document Structure and Usage Guidelines


1.4.1 Document Structure
As previously stated, this document contains information that pertains to several WebSphere
BPM products. As such, it is not necessarily intended that the document be read end to end,
although some readers will find that useful. The document is structured such that information for
a specific product is easy to find by using the Table of Contents (e.g. if a reader is only interested
in WESB information, scan the Table of Contents for "WESB").
Following is the structure of this document. There are 11 major chapters and 2 appendices
following this Introduction. The first 3 chapters are Best Practices and Tuning Considerations for
three different phases of WebSphere BPM projects: Architecture, Development, and Deployment.
At least one of these chapters will be of interest to any reader of this document, and many will
find value in all 3 chapters. Following these chapters are 4 chapters showing the performance of
each of the major products covered in this document, then further down are 3 chapters describing
the workloads used to obtain these results. There is also a chapter of Directed Studies, which
contains additional data for each of the products covered in this report. Finally, the document
concludes with 2 appendices: one detailing measurement configurations and the other with a list
of useful references. Here is the structure in linear order:

Architecture Best Practices: recommendations for architecture and topology decisions


that will produce high performing and scalable solutions.

Development Best Practices: guidelines for solution developers that will lead to high
performing systems.

Performance Tuning and Configuration: a discussion of the configuration parameters and


settings for the major software components which comprise a business process
management solution.

WPS Performance Results: measurements for the SOABench 2008 Choreography Facet
workload.

WESB Performance Results: measurements for the SOABench 2008 Mediation Facet
workload.

WebSphere Integration Developer and WebSphere Business Modeler Performance


Results : measurements for a representative set of WID and Modeler workloads.

WebSphere Business Monitor Performance Results : measurements for deploying a


Monitor model to WB Monitor.

Directed Studies: a series of specific performance related investigations are presented


which address a specific aspect of one of the products.

WPS Core Workloads: a detailed description of the workloads used to measure the
performance characteristics of WPS.

WESB Core Workloads: a detailed description of the workloads used to measure the
performance characteristics of WESB.

WebSphere Integration Developer and WebSphere Business Modeler Core Workloads : a


detailed description of the workloads used to measure the performance characteristics of
the WID and WB Modeler.

Copyright IBM Corporation 2005, 2010. All right reserved.

Introduction

Appendix A - Measurement Configurations: details of the hardware and software


configurations used for all measurements are presented.

Appendix B - References: links to best practices, performance information, and product


information for both the products in this report, and related products such as WebSphere
Application Server, DB2, etc.

1.4.2 Measurement Usage Guidelines


There are several important points to understand in order to properly interpret the measurements
presented in this document:

Data is presented for multiple hardware platforms, including POWER6, POWER7, Intel
Pentium IV Xeon, and Intel multi-core technologies. This is done to provide
representative coverage for WebSphere BPM production topologies. However, this data
should not be used to compare the relative performance of different hardware platforms.
The intent of this document is to show how the BPM stack performs on representative
configurations, not to compare hardware environments.

Hardware threading can significantly impact system performance. Two different


technologies are relevant; POWERs Simultaneous Multithreading (SMT) and Intels
Hyper-Threading. Where applicable, hardware threading usage is indicated on the
measurement charts.

Multi-core technology is prevalent for current hardware systems. As such, in this


document we use the term Core (or processor Core) in lieu of CPU to refer to a physical
processor. The term CPU Utilization is still used to represent the aggregate utilization of
all cores in the system.

WebSphere BPM 7.0.0.1 provides significant performance improvements in many areas;


highlighting these is the focus of the measurements in this report. However, for some
scenarios WebSphere BPM 6.2.0 data is still relevant and is included in this report as
well. These scenarios are clearly labeled in the Table Of Contents and section titles.

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

2 Architecture Best Practices


2.1 Overview
This section provides guidance on how to architect a high-performing and scalable WebSphere
BPM solution. Many of these best practices are illustrated in the Directed Studies chapter of this
document
The purpose of this chapter is to highlight the best practices associated specifically with the
technologies and features delivered in the WebSphere BPM products covered in this report.
However, these products are built on top of existing technologies like WAS (WebSphere
Application Server), Platform Messaging, and DB2. Each of these technologies has associated
best practices that apply. It is not our intent to enumerate these here. Instead the reader is referred
to Appendix B for a set of references and pointers to this information.

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

2.2 Top Tuning and Deployment Guidelines


The remainder of this chapter details architectural best practices for WebSphere BPM solutions.
Development Best Practices and Performance Tuning and Configuration are covered in
subsequent chapters. The reader is strongly encouraged to read these chapters, since the authors
have found this information to be very beneficial for numerous customers over the years.
However, if you read nothing else in this document, please read and adhere to the following
key tuning and deployment guidelines, since they are relevant in virtually all performance
sensitive customer engagements.

Use a high performance disk subsystem. In virtually any realistic topology, a server-class
disk subsystem (e.g. RAID adapter with multiple physical disks) is required on the tier(s)
that host the message and data stores to achieve acceptable performance. This point
cannot be overstated; the authors have seen many cases where the overall performance of
a solution is improved by several factors simply by utilizing appropriate disk subsystems.

Set an appropriate Java heap size to deliver optimal throughput and response time. JVM
verbosegc output will greatly help in determining the optimal settings. Further
information is available in Section 4.4.2.

Where possible, utilize non-interruptible processes (microflows) instead of long running


processes (macroflows). Macroflows are required for many processes (e.g, if human
tasks are employed, or state needs to be persisted). However, there is significant
performance overhead associated with macroflows. Further, if macroflows are needed
for some portion of the solution, separate the solution into both microflows and
macroflows to maximize utilization of microflows. For details, see Section 2.3.1.

Use DB2 instead of the default Derby DBMS. DB2 is a high-performing, industrial
strength database designed to handle high levels of throughput and concurrency, scale
well, and deliver excellent response time.

Tune your database for optimal performance. Proper tuning, and deployment, choices for
databases can greatly increase overall system throughput. For details, see Section 4.5.10.

Disable tracing. Tracing is clearly important when debugging, but the overhead of tracing
severely impacts performance. More information is available in Section 4.5.1.

Configure thread and connection pools to enable sufficient concurrency. This is


especially important for high volume, highly concurrent workloads, since the thread pool
settings directly influence how much work can be concurrently processed by the server.
For more information, see Section 4.5.3.3.

For task and process list queries, use composite query tables. Query tables are designed to
produce excellent response times for high-volume task and process list queries. For
details, see Section 2.3.2.

Use work-manager based navigation to improve throughput for long running processes.
This optimization reduces the number of objects allocated, the number of objects
retrieved from the database, and the number of messages sent for Business Process
Choreographer messaging. For further information, see Section 4.5.6.1

Avoid unnecessary usages of asynchronous invocations. Asynchronous invocation is


often needed on the edges of modules, but not within a module. Utilize synchronous
preferred interaction styles, as is described in Section 3.7.2.

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

10

Avoid too granular transaction boundaries in SCA and BPEL. Every transaction commit
results in expensive database and/or messaging operations. Design your transactions with
care, as described in Section 3.6.

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

11

2.3 Modeling
2.3.1 Choose non-interruptible over interruptible (long running)
processes whenever possible
Use interruptible processes, a.k.a. macroflows or long running processes, only when required
(e.g. long running service invocations and human tasks). Non-interruptible processes, a.k.a.
microflows or short running processes, exhibit much better performance at runtime. A noninterruptible process instance is executed in one J2EE transaction with no persistence of state,
while an interruptible process instance is typically executed in several J2EE transactions,
requiring that state be persisted in a database at transaction boundaries.
Whenever possible, utilize synchronous interactions for non-interruptible processes. A noninterruptible process is much more efficient than an interruptible process since it does not have to
utilize state or persistence in the backing database system.
A process is interruptible if the checkbox Process is long-running is set in the WebSphere
Integration Developer (WID) via Properties > Details for the process.
If interruptible processes are required for some capabilities, separate the processes such that the
most frequent scenarios can be executed in non-interruptible processes and exceptional cases are
handled in interruptible processes.

2.3.2 Choose query tables over standard query API for task list and
process list queries
Query tables were introduced in WPS 6.2.0. Query tables are designed to provide good response
times for high-volume task list and process list queries. Query tables offer improved query
performance:

Improved access to work items reduces the complexity of the database query.

Configurable high-performance filters on tasks, process instances, and work items allow
for efficient filtering.

Composite query tables can be configured to bypass authorization through work items.

Composite query tables allow the definition of a query tables that reflect the information
which is displayed on task lists and process lists presented to users.

Query improvements due to Query Tables are shown in Section 9.6.1. For further information,
please see the references below:
WebSphere Process Server Query Table Builder
http://www.ibm.com/support/docview.wss?uid=swg24021440
Query Tables in Business Process Choreography in the WPS 7.0 Info Center:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.bpc.doc/doc/bpc/c6bpel_querytables.html

2.3.3 Choose the appropriate granularity for a process

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

12

A business process and its individual steps should have business significance and not try to
mimic programming level granularity. Use programming techniques like POJOs (Plain Old Java
Objects) or Java snippets for logic without business significance. This topic is discussed further
in the Software components: coarse-grained versus fine-grained paper available here:
http://www.ibm.com/developerworks/library/ws-soa-granularity/index.html

2.3.4 Use Events Judiciously


The purpose of CBE (Common Base Event) emission in WPS is for business activity monitoring.
Since CBE emission uses a persistent mechanism, it is inherently heavy weight. One should
utilize CBE only for events that have business relevance. Further, emitting CBEs to a database is
not recommended; instead CBE emission should be done via the messaging infrastructure.
Finally, do not confuse business activity monitoring and IT monitoring, The Performance
Monitoring Infrastructure (PMI) is far more appropriate for the latter.
With this in mind, the following generally holds for most customers:
Customers are concerned about the state of their business and their processes. Therefore
events that signify changes in state are important. For long-running and human task
activities, this is fairly natural: use events to track when long-running activities complete,
when human tasks change state, etc.
For short running flows that complete within seconds, it is usually sufficient to know that
a flow completed, perhaps with the associated data. It usually makes no sense to
distinguish events within a microflow that are only milliseconds or seconds apart.
Therefore, 2 events (start, end) are usually sufficient for a microflow.

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

13

2.3.5 Choose efficient Meta-Data management


2.3.5.1 Follow Java Language Specification for Complex DataType Names
While WebSphere BPM v7 allows characters in Business Object type names that would not be
permissible in Java class names, the internal data representation of complex data type names does
make use of Java types. As such, performance is better if BO types follow the Java naming
standards, because if valid Java naming syntax is used then no additional translation is required.
2.3.5.2 Avoid use of anonymous derived types in XSDs
Some XSD features (restrictions on the primitive string type, for example) result in modifications
to the type that require a new sub-type to be generated. If these types are not explicitly declared,
then a new sub-type (a derived type) is generated at runtime. Performance is generally better if
this can be avoided. So, avoid adding restrictions to elements of primitive type where possible. If
a restriction is unavoidable, consider creating a new, concrete SimpleType that extends the
primitive type to include the restriction. Then XSD elements may utilize that type without
degraded performance.
2.3.5.3 Avoid referencing elements from one XSD in another XSD
If A.xsd defines an element AElement:
<xs:element name="AElement">
<xs:simpleType name="AElementType">
<xs:restriction base="xs:string">
<xs:minLength value="0" />
<xs:maxLength value="8" />
</xs:restriction>
</xs:simpleType>
</xs:element>
It may be referenced from another file, B.xsd as:
<xs:element ref="AElement" minOccurs="0" />
This has been shown to perform poorly. It is much better to define the type concretely and then
make any new elements use this type. So, A.xsd becomes:
<xs:simpleType name="AElementType">
<xs:restriction base="xs:string">
<xs:minLength value="0" />
<xs:maxLength value="8" />
</xs:restriction>
</xs:simpleType>

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

14

and B.xsd becomes:


<xs:element name="BElement" type="AElementType" minOccurs="0" />
2.3.5.4 Reuse Data Object type metadata where possible
Within application code, it is common to refer to types, for instance when creating a new
Business Object. It is possible to refer to a Business Object type by name for instance in the
method DataFactory.create(String uri, String typeName). It is also possible to refer to the type by
a direct reference as in the method DataFactory.create(Type type). In cases where a Type is likely
to be used more than once, it is usually faster to retain the Type (for instance, via
DataObject.getType()) and reuse that type for the second and future uses.

2.3.6 Considerations when choosing between business processes


and business state machines
Business state machines (BSM) provide an attractive way of implementing business flow logic.
For some applications, it is more intuitive to model the business logic as a state machine, and the
resultant artifacts are easy to understand. However, BSM is implemented using the business
process infrastructure, so there will always be a performance impact when choosing BSM over
business processes. If an application can be modeled using either BSM or business processes and
performance is a differentiating factor, choose business processes. There are also more options
available for optimizing business process performance than there are for BSM performance.

2.3.7 Minimize state transitions in BSM


Where possible, minimize external events to drive state transitions in business state machines.
External event driven state transitions are very costly from a performance perspective. In fact, the
total time taken to execute a BSM is proportional to the number of state transitions that occur
during the life span of the state machine. For example, if a state machine transitions through
states A -> B -> B -> B -> C, (4 transitions), it is twice as time consuming as making transitions
through states A -> B -> C (2 transitions). Take this into consideration when designing a BSM.
Also, automatic state transitions are much less costly than event driven state transitions.

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

15

2.4 Topology
2.4.1 Deploy appropriate hardware
It is very important to pick a hardware configuration that contains the resources necessary to
achieve high performance in a WebSphere BPM environment. Here are some key considerations
in picking a hardware configuration:

Cores: Ensure that WPS and WESB are installed on a modern server system with
multiple cores. WPS and WESB scale well, both vertically in terms of SMP scaling, and
horizontally, in terms of clustering.

Memory: WPS and WESB benefit from both a robust memory subsystem as well as an
ample amount of physical memory. Ensure that the chosen system has server-class
memory controllers and as large as possible L2 and L3 caches (optimally, use a system
with at least a 4 MB L3 cache). Make sure there is enough physical memory for all the
applications (JVMs) combined that are expected to run concurrently on the system. 2 GB
per WPS/WESB JVM is a rough rule of thumb.

Disk: Ensure that the systems hosting the message and data stores, typically the database
tiers, have fast storage. This means utilizing RAID adapters with writeback caches and
disk arrays with many physical drives.

Network: Ensure that the network is sufficiently fast to not be a system bottleneck. As an
example, a dedicated Gigabit Ethernet network is a good choice.

Virtualization: Take care when using virtualization such as AIX dynamic logical
partitioning or VMWare virtual machines. Ensure sufficient processor, memory, and I/O
resources are allocated to each virtual machine or lpar. Avoid over-committing
resources.

2.4.2 Use a high performing database (such as DB2)


WPS, WESB, and Monitor are packaged with the Derby database, an open source database
designed for ease-of-use and platform neutrality. If performance and reliability are important, use
an industrial strength database (such as IBMs DB2) for any performance measurement or
production installation. Examples of databases that can be moved to DB2 include the BPE
database, Relationship databases, and the WebSphere Platform Messaging (WPM) Messaging
Engine data stores.
The conversion requires the administrator to create new JDBC providers in the admin console
under Resources > JDBC Providers. Once created, a data source can be added to connect to a
database using the new provider.

2.4.3 Deploy local modules in the same server


If planning to deploy modules on the same physical server, better performance will be achieved
by deploying the modules to the same application server JVM, as this allows the server to exploit
this locality. Section 9.17 demonstrates this benefit.

2.4.4 Best Practices for Clustering

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

16

We highly recommend the IBM Red Book on WebSphere BPM 7.0 Production Topologies
(http://www.redbooks.ibm.com/redpieces/abstracts/sg247854.html) to our readers, which is a
comprehensive guide to selecting appropriate topologies for both scalability and high-availability.
It is not the intent of this section to repeat any content from the above. Rather, we will distill
some of the key considerations when trying to scale up a topology for maximum performance.
2.4.4.1 Use the remote messaging and remote support deployment environment
pattern for maximum flexibility in scaling
See link:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.wps.doc/doc/cpln_topologypat.html
This topology (formerly known as the Golden Topology) prescribes the use of separate clusters
for applications, messaging engines, and support applications like the CEI (Common Event
Infrastructure) server, and the Business Rules Manager. This allows independent control of
resources to support the load on each of these elements of the infrastructure.
Note: As with many system choices, flexibility comes with some cost. For example, synchronous
CBE (Common Base Event) emission between an application and the CEI server in this topology
is a remote call, which is heavier than a local call. The benefit is the independent ability to scale
the application and support cluster. We assume the reader is familiar with these kinds of system
tradeoffs, as they occur in most server middleware.
2.4.4.2 Single Server vs. Clustered Topology Considerations
In general, there are 2 primary reasons to consider when evaluating moving to a clustered
topology from a single server configuration: scalability / load balancing in order to improve
overall performance and throughput, and high availability / failover to prevent loss of service due
to hardware or software failures. Although not mutually exclusive, there are considerations
applicable to each. In this report, the focus is on the performance (throughput) related aspects of
clustering, and not on the high availability aspects.
When considering the tradeoffs between a single server and a clustered configuration, an
interesting study can be found in section 9.10 of this document, Single Server vs. Clustered
WPS. Significant gains in throughput are measured with the workloads in this study due to
utilizing a clustered topology. It can be expected that most single server workloads that are
driving resources to saturation would benefit to some degree by moving to a clustered topology.

2.4.5 Evaluate service providers and external interfaces


One of the typical usage patterns for WPS is as an integration layer between incoming requests
and backend systems for the business (target applications or service providers). In these
scenarios, the throughput will be limited by the layer with the lowest throughput capacity.
Considering the simple case where there is only one target application; the WPS based integration
solution cannot achieve throughput rates higher than the throughput capacity of the target
application regardless of the efficiency of the WPS based implementation or the size or speed of
the system hosting WPS. Thus, it is critical to understand the throughput capacity of all target
applications and service providers, and apply this information when designing the end-to-end
solution.
There are 2 key aspects of the throughput capacity of a target application or service provider:

response time, both for typical cases and exception cases

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

17

number of requests that the target application can process at the same time (concurrency)

If each of these performance aspects of the target applications can be established, then a rough
estimate of the maximum throughput capacity can be calculated. Similarly, if average throughput
is known, then either one of these 2 aspects can be roughly calculated as well. For example, a
target application that can process 10 requests per second with an average response time of 1
second can process approximately 10 requests at the same time (throughput / response time =
concurrency).
The throughput capacity of target applications is critical to projecting the end-to-end throughput
of an entire application. Also, the concurrency of target applications should be considered when
tuning the concurrency levels of the upstream WPS based components. For example, if a target
application can process 10 requests at the same time, the WPS components that invoke this
application should be tuned so that the simultaneous request from WPS at least match the
concurrency capabilities of the target. Additionally, overloading target applications should be
avoided since such configurations will not result any increase in overall application throughput.
For example, if 100 requests are sent to a target application that can only process 10 requests at
the same time, no throughput improvement will be realized versus tuning such that the number of
requests made matches the concurrency capabilities of the target.
Finally, for service providers that may take a long time to reply, either as part of main line
processing or in exception cases, do not utilize synchronous invocations that require a response.
This is to avoid tying up the WPS business process, and its resources, until the service provider
replies.

2.5 Large Objects


An issue frequently encountered by field personnel is trying to identify the largest object size that
WPS, WESB, and the corresponding adapters can effectively and efficiently process. There are a
number of factors affecting large object processing in each of these products. We present both a
discussion of the issues involved as well as practical guidelines for the v7 releases of these
products.
The single most important factor affecting large object processing is the JVM. WebSphere BPM
V7 uses the Java 6 JVM, which is substantially different than the 1.4.2 JVM that was used in
WebSphere BPM V6.0.2 and earlier. As such, this section has been rewritten and the
recommendations and best practices differ from WebSphere BPM V6.0.2 and earlier.
In general, objects 5 MB or larger may be considered large and require special attention.
Objects 100 MB or larger are very large and generally require significant tuning to be processed
successfully.

2.5.1 Factors Affecting Large Object Size Processing


Stated at a high level, the object size capacity for a given installation depends on the size of the
Java heap and the load placed on that heap (that is, the live set) by the current level of incoming
work; the larger the heap, the larger the business object that can be successfully processed.
In order to be able to apply this somewhat general statement, one must first understand that the
object size limit is based on three fundamental implementation facts of Java Virtual Machines:
1. Java Heap Size Limitations
The limit for the size of the Java heap is operating system dependent. Further details on
maximum heap sizes are given in section 4.5.2.1, but it is not unusual to have a heap size

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

18

limit of around 1.4 GB for 32-bit JVMs. The heap size limit is much higher on 64-bit JVMs,
and is typically less of a gating factor on modern hardware configurations than the amount of
available physical memory.
2. Size of In-Memory Business Objects
Business Objects (BO), when represented as Java objects, are much larger in size than when
represented in wire format. For example, a BO that consumes 10 MB on an input JMS
message queue may result in allocations of up to 90 MB on the Java heap. The reason is that
there are many allocations of large and small Java objects as the BO flows through the
adapters and WPS or WESB. There are a number of factors that affect the in-memory
expansion of BOs.

The single-byte binary wire representation is generally converted to multi-byte


character representations (e.g. Unicode), resulting an expansion factor of 2.

The BO may contain many small elements and attributes, each requiring a few
unique Java objects to represent its name, value, and other properties.

Every Java object, even the smallest, has a fixed overhead due to an internal object
header that is 12-bytes long on most 32-bit JVMs, and larger on 64-bit JVMs,

Java objects are padded in order to align on 8-bye or 16-byte address boundaries.

As the BO flows through the system, it may be modified or copied, and multiple
copies may exist at any given time during the end-to-end transaction. What this
means is that the Java heap must be large enough to host all these BO copies in order
for the transaction to complete successfully.

3. Number of Concurrent Objects Being Processed


The largest object that can be successfully processed is inversely proportional to the number
of requests being processed simultaneously. This is due to the fact that each request will have
its own memory usage profile (liveset) as it makes its way through the system. So,
simultaneously processing multiple large objects dramatically increases the amount of
memory required, since the sum total of each requests livesets has to fit into the configured
heap.

2.5.2 Large Object Design Patterns


There are 2 proven design patterns for processing large objects successfully; each is described
below. In cases where neither can be applied, 64-bit mode should be considered. See the next
section for details.
2.5.2.1 Batched Inputs: Send Large Objects as Multiple Small Objects
If a large object needs to be processed then the solutions engineer must find a way to limit the
number of large Java objects that are allocated. The primary technique for doing this involves
decomposing large business objects into smaller objects and submitting them individually.
If the large objects are actually a collection of small objects as assumed above, the solution is to
group the smaller objects into conglomerate objects less than 1 MB in size. This has been done at
a variety of customer sites and has produced good results. If there are temporal dependencies or
an all-or-nothing requirement for the individual objects then the solution becomes more
complex. Implementations at customer sites have shown that dealing with this complexity is
worth the effort as demonstrated by both increased performance and stability.

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

19

Note that certain adapters like the Flat Files JCA Adapter can be configured to use a
SplitBySize mode with a SplitCriteria set to the size of each individual object. In this case a
large object would be split in chunks of the size specified by SplitCriteria to reduce peak memory
usage.
2.5.2.2 Claim Check pattern: when only a small portion of an input message is
used by the workload
When the input BO is too large to be carried around in a system and there are only a few
attributes that are actually needed by that process or mediation, one can exploit a pattern called
the claim check pattern. The claim check pattern applied to BO has the following steps:

Detach the data payload from the message.

Extract the required attributes into a smaller control BO

Persist the larger data payload to a datastore and store the claim check as a reference in
the control BO.

Process the smaller control BO, which has a smaller memory footprint.

At the point where the solution needs the whole large payload again, check out the large
payload from the datastore using the key.

Delete the large payload from the datastore.

Merge the attributes in the control BO with the large payload, taking the changed
attributes in the control BO into account.

The Claim-Check pattern requires custom code and snippets in the solution. A less developerintensive variant would be to make use of custom data bindings to generate the control BO. This
approach suffers from the disadvantage of being limited to certain export/import bindings, and
the full payload still must be allocated in the JVM.

2.6 64-bit Considerations


Since WPS 6.1.0, full 64-bit support has been available in WPS. However, applications can
continue to be run in either 32-bit or 64-bit mode. In 32-bit mode, the maximum heap size is
limited by the 4GB address space size, and in most 32-bit operating systems, the practical limit
varies between 1.5-2.5 GB. In contrast, while maximum heap size is essentially limitless in 64-bit
mode, standard Java best practices still apply. The sum of the maximum heap sizes of all the java
processes running on a system should not exceed the physical memory available on the system.
BPM v7 brought further improvement to its 64-bit implementation. The memory footprint of a
64-bit runtime server is now about the same as the 32-bit version. What this means is that there is
no longer a memory footprint penalty for utilizing 64-bit if the heap size is lower than 27 GB.
This was not the case for BPM v6.1 and v6.2; see section 9.2 for details.
Here are the factors to consider when determining which of these modes to run in:

64-bit mode is an excellent choice for applications whose liveset approaches or exceeds
the 32-bit limits. Such applications either experience OutOfMemoryExceptions or suffer
excessive time in GC. We consider anything > 10% of time in GC as excessive. These
applications will exhibit much better performance when allowed to run with the larger
heaps they need. However, there must always be sufficient physical memory on the
system to back the Java heap size.

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

20

64-bit mode is also a good choice for applications that, though well behaved on 32-bit,
could be algorithmically modified to perform much better with larger heaps. An example
would be an application that frequently persists data to a data store to avoid maintaining a
very large in-memory cache, even if such a cache would greatly improve throughput.
Recoding such an application to tradeoff the more space available in 64-bit heaps for less
execution time would yield much better performance.

Moving to 64-bit still causes some degradation in throughput. If a 32-bit application fits
well within a 1.5-2.5GB heap, and the application is not expected to grow significantly,
32-bit BPM servers can still be a better choice than 64-bit.

Copyright IBM Corporation 2005, 2010. All right reserved.

Architecture Best Practices

21

2.7 WebSphere Business Monitor


2.7.1 Event Processing
A major factor in event processing performance is the tuning of the Monitor Database. Attention
should be paid especially to adequate bufferpool sizes to minimize disk reading activity and the
placement of the database logs which ideally should be on a physically separate disk subsystem
from the database tablespaces.
By default, events are delivered directly from CEI to the monitor database, bypassing an
intermediate queue. We recommend using this default delivery style for better performance, as it
avoids an additional persistence step in the flow. For additional background see the topic
Bypassing the JMS Queue in the WebSphere Business Monitor Information Center at:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/topic/com.ibm.btools.help.monitor.in
st.doc/inst/cfg_qb.html

2.7.2 Dashboard
The platform requirements of the Business Space, Dashboard, and Alphablox stack are relatively
modest compared to those of Monitor server and the database server. The most important
consideration for good Dashboard performance is to size and configure the DB server correctly.
Be sure it has enough CPU capacity for anticipated data mining queries, enough RAM for
bufferpools etc., and plenty of disk arms.

2.7.3 Database Server


Both event processing and Dashboard rely on a fast, well-tuned database server for good
performance. The design of Monitor assumes that any customer using it has strong on-site DB
administrator skills. We strongly advise that the database tuning advice and recommendations
beginning in section 4.5.10 be read and followed.

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

23

3 Development Best Practices


3.1 Introduction
This section discusses best practices that are relevant to the solution developer. It primarily
addresses modeling, design, and development choices that are made while designing and
implementing a WebSphere BPM solution. The WebSphere Integration Developer (WID) tool is
used to implement the vast majority of these Best Practices.

3.2 SCA Considerations


3.2.1 Cache results of ServiceManager.locateService()
When writing Java code to locate an SCA service, either within a Java component or a Java
snippet, consider caching the result for future use, as service location is a relatively expensive
operation. Note that WID-generated code does not do this, so editing is required to cache the
locateService result.

3.2.2 Reduce the number of SCA Modules, when appropriate


WPS components are assembled into modules for deployment. When assembling modules we
recognize that many factors come into play. Performance is one key factor, but maintainability,
versioning requirements and module ownership must be considered as well. In addition, more
modules can allow for better distribution across servers and nodes. Still, it is important to
recognize that modularization also has a cost. When components will be placed together in a
single server instance, it is best to package them within a single module for best performance.

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

24

3.2.3 Use synchronous SCA bindings across local modules


For cross-module invocations, where the modules are likely to be deployed locally, i.e. within the
same server JVM, we recommend using the synchronous SCA binding. This binding has been
optimized for module locality and will outperform other bindings. Note that synchronous SCA is
as expensive as other bindings when invocations are made between modules located in different
WPS or WESB servers; this is shown in section 9.20.7.

3.2.4 Utilize multi-threaded SCA clients to achieve concurrency


Synchronous components that are invoked locally, i.e. from a caller in the same server JVM,
execute on the context of the callers thread. Thus concurrency, if desired, must be provided by
the caller in the form of multiple threads.

3.2.5 Add Quality of Service Qualifiers at appropriate level


Quality of Service (QoS) qualifiers such as Business Object Instance Validation can be added at
the interface level, or at an operation level within an interface. Since there is additional overhead
associated with QoS qualifiers, do not apply a qualifier at the interface level if it is not needed for
all operations of the interface.

3.3 Business Process Considerations


3.3.1 Modeling best practices for activities in a business process

Use the Audit Logging property for Business Processes only if you need to log events in
the BPE database. This property can be set at the activity or process level; if set at the
process level the setting is inherited by all activities.

For long-running processes, disable the Enable persistence and queries of businessrelevant data flag under the Properties->Server tab, for both Process and for each
individual BPEL activity. Enabling this flag causes details of the execution of this
activity to be stored in the BPC database. This increases the load on the database and the
amount of data stored for each process instance. This setting should be used only if this
specific information will need to be retrieved later.

For long-running processes, a setting of participates on all activities generally provides


the best throughput performance. See section 3.6.2 for more details.

Human tasks can be specified in business processes (e.g. process administrators), invoke
activities, and receive activities. Specify these tasks only if needed. Also, when multiple
users are involved use group work items (people assignment criterion: Group) instead of
individual work items for group members (people assignment criterion: Group
Members).

3.3.2 Do not use 2-way synchronous invocation of long running


business processes
When designing long-running business process components, ensure that callers of a 2-way
(request/response) interface do not use synchronous semantics, as this ties up the caller's
resources (thread, transaction, etc.) until the process completes. Instead, such processes should

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

25

either be invoked asynchronously, or via a 1-way synchronous call, where no response is


expected.
In addition, calling a 2-way interface of a long-running business process synchronously
introduces difficulties when exceptions occur. Suppose a non-interruptible process calls a longrunning process using the 2-way request/response semantics, and the server fails after the longrunning process has completed, but before the callers transaction is committed:

If the caller was started by a persistent message, upon server restart the callers
transaction is rolled back and then retried. However, the result of the execution of the
long-running process on the server is not rolled back, since it was committed before the
server failure. As a result, the long-running process on the server is executed twice. This
duplication will cause functional problems in the application unless corrected manually.

If the caller was not started by a persistent message, and the response of the long-running
process was not submitted yet, it will end in the failed event queue.

3.3.3 Minimize number and size of BPEL variables and BOs

Use as few variables as possible and minimize the size and the number of Business
Objects (BOs) used. In long-running processes, each commit saves modified variables to
the database (to save context), and multiple variables or large BOs make this very costly.
Smaller BOs are also more efficient to process when emitting monitor events.

Specify variables as Data Type variables. This improves runtime performance.

Use transformations (maps or assigns) to produce smaller BOs by only mapping fields
necessary for the business logic.

3.4 Human Task Considerations

Use group work items for large groups (people assignment criterion: Group) instead of
individual work items for group members (people assignment criterion: Group
Members).

Where possible, use native properties on the task object rather than custom properties.
For example, use the priority field instead of creating a new custom property priority.

Set the transactional behavior to commit after if the task is not part of a page-flow. This
improves the response time of task complete API calls.

3.5 Business Process and Human Tasks Client Considerations


General considerations:

APIs that provide task details and process details, such as htm.getTask(), should not be
called frequently. Use these methods only when required to display the task details of a
single task, for instance.

Do not put too much work into a single client transaction:


o

In servlet applications, a global transaction is typically not available. If the


servlet calls the HTM and BFM APIs directly, transaction size is typically not a
concern.

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices


o

26

In EJB applications, make sure that transactions are not too time consuming:
long-running transactions create long-lasting locks in the database which prevent
other applications and clients to continue processing.

Chose the protocol which best suits your needs:


o

In a J2EE environment, use the HTM and BFM EJB APIs. If the client
application is running on a WPS server, use the local EJB interface.

In a Web 2.0 application, use the REST API.

In an application that runs remote to the process container, the Web services API
is an option.

Clients that follow a page-flow pattern should consider the following:

Use the completeAndClaimSuccessor() API if possible. This provides optimal response


time.

Applications that assign the next available task to the user can use the claim(String
queryTableName, ) method on the Human Task Manger EJB interface. It implements a
performance optimized mechanism to handle claim collisions.

Dont put asynchronous invocations between two steps of a page-flow, because the
response time of asynchronous services increases as the load on the system increases.

Where possible, do not invoke long-running sub-processes between two steps of a pageflow, because long-running sub-processes are invoked using asynchronous messaging.

Clients that present task lists and process lists to the user should consider the following:

Use query tables for task list and process list queries. See the directed study in section
9.6.1 for further information.

Do not loop over the tasks displayed in the task or process list and execute an additional
remote call for each object. This will prevent the application from providing good
response times and good scalability.

Design the application such that during task list and process list retrieval, all information
is retrieved from a single query table. For instance, do not make calls to retrieve the input
message for task list or process list creation.

3.6 Transactionality Considerations


One of the strengths of the WebSphere Process Server platform is the precise control it provides
for specifying transactional behavior. We strongly recommend that when modeling a process or
mediation assembly, the modeler should carefully design their desired transaction boundaries as
dictated by the applications needs. Transaction boundaries are expensive in system resources;
hence the objective of this section is to guide the modeler in avoiding unnecessary transaction
boundaries.
There are some general guiding principles at work here:

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

27

The throughput of a particular usage scenario is inversely related to the number of


transaction boundaries traversed in the scenario, so fewer transactions is faster

In user-driven scenarios, improving response time may require more granular transaction
boundaries, even at the cost of throughput.

Transactions can span across synchronous invocations, but cannot span asynchronous
invocations.

Avoid synchronous invocation of a two-way asynchronous target. The caller transactions


failure recovery can be problematic

We will see this in more detail in the following sections.

3.6.1 Exploit SCA transaction qualifiers


In an SCA assembly, the number of transaction boundaries can be reduced by allowing
transactions to propagate across components. For any pair of components where this is desired,
we recommend using the following golden path:
SuspendTransaction= false, for the calling components reference
joinTransaction= true, for the called components interface
Transaction= any|global, for both components implementation
The above assumes that the first component in such a chain either starts or participates in a global
transaction.

3.6.2 Avoid two-way synchronous invocation of an asynchronous


target
If the target component has to be invoked asynchronously and its interface is of two-way
request/response style, the target cannot be safely invoked through synchronous SCA calls. After
the caller sends the request to the target, it then waits for response from the target. Upon
receiving the request, the asynchronous target starts a new transaction, and upon completion of
the request processing returns the response asynchronously to the caller through the response
queue. If system failure occurs after the caller successfully sent the request but before receiving
the response, the caller transaction is rolled back and then retried. As a result, the target will be
invoked a second time.

3.6.3 Exploit transactional attributes for BPEL activities in longrunning processes


While SCA qualifiers control component level transactional behavior, there are additional
transactional considerations in long-running business processes which can cause activities to be
run in multiple transactions. The scope of those transactions and the number of transactions can
be changed with the transactional behavior settings on Java Snippet, Human Task, and Invoke
activities. Please see the WPS InfoCenter for a detailed description of these settings at:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.bpc.doc/doc/bpc/cprocess_transaction.html
There are four choices: Commit before, Commit after, Participates, and Requires own. Only
the Participates setting does not require a new transaction boundary, the other three require the

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

28

process flow container to start a new transaction before executing the activity, after executing the
activity, or both before and after.
In general, the Participates attribute provides the best throughput and should be used wherever
possible. This is true for both synchronous and asynchronous activities. In the two-way
asynchronous case, it is important to understand that the calling transaction always commits after
sending the request. The Participates setting refers to the transaction started by the process
engine for the response: when set, this allows the next activity to continue on the same
transaction.
In special cases, the other transaction settings may be preferable. Please refer to the InfoCenter
link above for details.
Use Commit before in parallel activities which start new branches to ensure parallelism. As
noted in the InfoCenter, there are other constraints to be considered.
Use Commit after for inline human tasks to increase responsiveness to human users. When this
option is chosen, after a human task is completed the thread/transaction handling the task
completion is also used to resume navigation of the human task activity in the process flow. The
users task completion action will not complete until the process engine commits the transaction.
By contrast, if the Participates setting is used, the commit will get delayed and result in longer
response time for the user. This is a classic response time versus throughput tradeoff.
Note that starting with the 6.2.0 release, Receive and Pick activities in BPEL flow are now
allowed to define their own transactional behavior property values. If not set, the default value of
initiating a Receive or Pick activity is Commit after. Consider using Participates where
possible, since Participates will perforrn better.

3.7 Invocation Style Considerations


3.7.1 Use Asynchrony judiciously
Components and modules may be wired to each other either synchronously or asynchronously.
The choice of interaction style can have a profound impact on performance and care should be
exercised when making this choice.

3.7.2 Set the Preferred Interaction Style to Sync whenever possible


Many WPS component types like interface maps or business rules invoke their target components
based on the target interfaces setting of preferred interaction style. Since synchronous crosscomponent invocations are better performing, it is recommended to set the Preferred Interaction
Style to Sync whenever possible. Only in specific cases, for example when invoking a longrunning business process, or more generally whenever the target component requires
asynchronous invocation, should this be set to Async.
In WID 6.2 when a new component is added to an Assembly Diagram, its Preferred Interaction
Style is set to synchronous, asynchronous, or any based on the component. In previous
releases of the WID, the default initial setting of Preferred Interaction Style is set to any unless
explicitly changed by the user. If a components Preferred Interaction Style is set to any, how
the component is invoked is determined by the callers context.. If the caller is a long running
business process, a Preferred Interaction Style setting of any is treated as asynchronous. If the
caller is a non-interruptible business flow, a Preferred Interaction Style setting of any is treated
as synchronous.

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

29

The invocation logic of processes is explained in more detail in the WPS InfoCenter at:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.bpc.doc/doc/bpc/cprocess_transaction.html
Some additional considerations are listed below:

When setting an interfaces Preferred interaction style to Async, it is important to realize


the downstream implications. Any components invoked downstream will inherit the
async interaction style unless they explicitly set Preferred interaction style to Sync.

At the input boundary to a module, exports that represent asynchronous transports like
MQ, JMS, or JCA (with async delivery set) will set the interaction style to Async. This
can cause downstream invocations to be async if the Preferred interaction style is left at
Any.

For an SCA import, its Preferred interaction style can be used to specify whether the
cross-module call should be Sync or Async.

For other imports that represent asynchronous transports like MQ or JMS, it is not
necessary to set the Preferred interaction style to Async. Doing so will introduce an
unnecessary async hop between the calling module and the invocation of the transport.

3.7.2.1 Avoid unnecessary cross-component asynchronous invocations within a


module
It is important to realize that asynchronous invocations are intended to provide a rich set of
qualities of service, including transactions, persistence, and recoverability. Hence, an
asynchronous invocation should be thought of as a full messaging hop to its target. When the
intended target of the invocation is in the same module, a synchronous invocation will yield much
higher performance.
Some qualities of services such as event sequencing and store-and-forward can only be associated
with asynchronous SCA calls. Consider the performance impact of asynchronous invocations
when setting these qualities of service.

3.7.3 Avoid Asynchronous Invocation of Synchronous Services in a


FanOut / FanIn Block
Do not select asynchronous (deferred response interaction) service invocations for services with
synchronous bindings (e.g. Web Services) unless there is an overriding need for this, and the nonperformance implications for this style of invocation are well understood.
Apart from the performance implications of calling a synchronous service asynchronously there
are reliability and transactional aspects to be considered. Make sure you understand these nonperformance implications of using asynchronous callouts before considering their use. Generally,
asynchronous callouts should only be used for idempotent query type services. If you need to
guarantee that the service is only called once do not use asynchronous invocation. It is beyond the
scope of this performance report to provide complete guidance on the functional applicability of
using asynchronous callouts in your mediation flow; more information can be found in the WID
help documentation and WPS/WESB InfoCenters.
Assuming that asynchronous callouts are functionally applicable for you there may be a
performance reason for invoking a service in this style but it should be understood that
asynchronous processing is inherently more expensive in terms of CPU cycles due to the
additional messaging overhead incurred by calling a service this way.

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

30

There are additional operational considerations, for example asynchronous invocations use the
SIBus messaging infrastructure which uses a database for persistence. Synchronous invocations
will perform well with basic tuning of the JVM heap size and thread pools but for asynchronous
invocations SCA artifacts require review and tuning. This will include tuning of the SCA
messaging engine (see section 4.4.7), datasources (section 4.4.6) and the database itself. For the
datasource, the tunings for JMS bindings in this report can be used as guidance as the
considerations are the same.
If multiple synchronous services with large latencies are being called then asynchronous
invocations can reduce the overall response time of the mediation flow at the expense of
increasing the internal response time of each individual service call. This assumes that
asynchronous callouts have been configured along with parallel waiting in the FanOut section of
the flow:

In the case of iteration of array - configuring the FanOut to "check for asynchronous
responses after all/N messages have been fired"

In case of extra wires/FlowOrder primitive - by default.

If there are a number of services in a fan-out section of a mediation flow then calling these
synchronously will result in an overall response time equal to the sum of the individual service
response times.
Calling the services asynchronously (with parallel waiting configured) will result in a response
time equal to at least the largest individual service response time in WESB plus the sum of the
time taken by WESB to process the remaining service callout responses residing on the
messaging engine queue.
For a FanOut/FanIn block the processing time for any primitives before or after the service
invocations will need to be added in both cases.
To optimise the overall response time when calling services asynchronously in a FanOut/FanIn
section of a mediation flow you should invoke the services in the order of expected latency if
known (highest latency first).
There is a trade off between parallelism and additional asynchronous processing to consider. The
suitability of asynchronous processing will depend on the size of the messages being processed,
the latency of the target services, the number of services being invoked and any response time
requirements expressed in service level agreements. Running performance evaluations on
mediations flows including fan-outs with high latency services is strongly recommended if
asynchronous invocations are being considered.
The default quality of service on service references is Assurred Persistent. A substantial reduction
in asynchronous processing time can be gained by changing this to Best Effort (non-persistent)
which eliminates I/O to the persistence store but the application MUST tolerate the possibility of
lost request or response messages. This level of reliability for SIBus can discard messages under
load and may require tuning.

3.8 Mediation Flow considerations


3.8.1 Use mediations that benefit from WESB optimizations
Certain types of mediations benefit from internal optimization in WebSphere ESB, and deliver
improved performance. This specialized optimization can be regarded as a kind of 'fastpath'
through the code and is in addition to any general optimization of the WESB mediation code.

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

31

The optimization is known as deferred parsing; as the name implies, parsing the message can be
deferred until absolutely required, and in several cases (described below) parsing can be avoided
altogether.
There are three categories of mediation primitives in WESB that benefit to a greater or lesser
degree from these internal optimizations:
Category 1 (greatest benefit)

Route on Message Header (Message Filter Primitive)

XSLT Primitive (Transforming on /body as the root)

EndpointLookup without Xpath user properties.

Event Emitter (CBE Header Only)

Category 2 (medium benefit)

Route on Message Body (Message Filter Primitive)

Category 3 (lowest benefit)

Custom Mediation

Database Lookup

Message Element Setter

BO Mapper

Fan Out

Fan In

Set Message Type

Message Logger

Event Emitter (Except for CBE Header only)

EndpointLookup utilising Xpath user properties

XSLT Primitive (with a non /body root)

There is therefore an ideal pattern of usage in which these mediation primitives can take
advantage of a 'fastpath' through the code. Fully fastpathed flows can contain any of the above
mediation primitives in category 1 above, e.g.:
--> XSLT Primitive(/body) --> Route On Header --> EndPointLookup (non-Xpath) -->
Partially fastpathed flows can contain a route on body filter primitive (category 2) and any
number of category 1 primitives, e.g.
--> XSLT Primitive(/body) --> Route on body -->

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

32

In addition to the above optimizations, the ordering of primitives can be important. If the
mediation flow contains an XSLT primitive (with a root of /body - i.e the category 1 variant) and
category 3 primitives then the XSLT primitive should be placed ahead of the other primitives. So
--> Route On Header --> XSLT Primitive(/body) --> Custom Primitive -->
is preferable to
--> Route On Header --> Custom Primitive --> XSLT Primitive(/body) -->
It should be understood that there are costs associated with any primitive regardless of whether
the flow is optimally configured or not. If an Event Emitter primitive is using event distribution
or a Message Logger primitive is included there are associated infrastructure overheads for such
remote communications. Large messages increase processing requirements proportionally for
primitives (especially those accessing the body) and a custom mediation will contain code which
may not be optimally written. The above guidelines can help in designing for performance but
they cannot guarantee speed.

3.8.2 Usage of XSLTs vs. BO Maps


In a mediation flow which is eligible for deferred parsing (detailed above), the XSL Transform
primitive gives better performance than the Business Object Map primitive. However in a
mediation flow where the message is already being parsed the Business Object Map primitive
gives better performance than the XSL Transform primitive.
Note that if you are transforming from the root (/) then the Business Object Map will always
perform better.
Section 9.16 contains a detailed discussion of this topic, along with performance results to
support the conclusions.

3.8.3 Configure WESB Resources


When creating resources using the WebSphere Integration Developer (WID) Tooling, the
application developer is given the choice to use pre-configured WESB resources or to let the
Tooling generate the Mediation Flow related resources that it requires. Both approaches have
their advantages and disadvantages.
Pre-configured resources support:

existing resources to be used

external creation/tuning scripts to be applied

easier post deployment adjustment

Tooling created resources support:

no further need for creating resources using scripts or the Admin Console

the ability to change the majority of performance tuning options as they now exposed in
the Tooling

In our performance tests we use pre-configured resources for the reason that by segregating the
performance tuning from the Business logic, the configuration for different scenarios can be

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

33

maintained in a single script. It is also easier to adjust these parameters once the applications have
been deployed.
The only cases where this pattern has not been followed is for Generic JMS bindings. In these
scenarios where resources have already been configured by the 3rd party JMS provider software
(MQ 6.0.2.2 for all instances in this report), the Tooling created resources are used to locate the
externally defined resources.

3.9 Large Object Best Practices


3.9.1 Avoid lazy cleanup of resources
Lazy cleanup of resources adds to the liveset required when processing large objects. Any
resources which can be cleaned up (e.g, by dropping object references when no longer required)
should be done as soon as is practical.

3.9.2 Avoid tracing when processing large BOs


Tracing and logging can add significant memory overhead. A typical tracing activity is to dump
the BO payload. Creating a string representation of a large BO can trigger allocation of many
large and small Java objects in the Java heap. Avoid turning on tracing when processing large
BO payloads in production environments.
Also, avoid constructing trace messages outside of conditional guard statement. For example, the
sample code below will create a large String object even if tracing is disabled.
String boTrace = bo.toString();
While this pattern is always inefficient, it hurts performance even more if the BO size is large.
To avoid unnecessarily creating a BO when tracing is disabled, move the String construction
inside an if statement., as is shown below

if (tracing_on) System.out.println(bo.toString();

3.9.3 Avoid buffer-doubling code


Study the memory implications of using Java data structures which expand their capacity based
on input (eg. StringBuffer, ByteArrayOutputStream). Such data structures usually double their
capacity when they run out of space; this doubling can produces significant memory pressure
when processing large objects. If possible, always assign an initial size to such data structures.

3.9.4 Make use of deferredparsing friendly mediations for XML docs


Certain mediations can reduce memory pressure as they retain the document in their native form
and avoid inflating them into their full BO representation. These mediations are listed above in
Section 3.8.1. Where possible, use these mediations.

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

34

3.10 WICS Migration considerations

Utilize JCA adapters to replace WBIA adapters, where possible. Migrated workloads
making use of custom WBIA adapters or legacy WBIA adapters result in interaction with
the WPS server through JMS, which is slower than the JCA adapters.

Some WBIA technology adapters like HTTP and Webservices are migrated by the WICS
migration wizard into native WPS SCA bindings, which is a better performing approach.
For WBIA adapters which are not migrated automatically to available SCA bindings,
development effort spent to manually migrate to a SCA binding will remove the
dependency on a legacy adapter as well as have better performance.

The WICS Migration Wizard in WID 7.0 offers a feature to merge the connector &
collaboration module together. Enable this option, if possible, as it increases performance
by reducing cross-module SCA calls.

WICS Collaborations are migrated into WPS BPEL processes. The resultant BPEL
processes can be further customized and made more efficient as follows:
o

Migrated BPEL processes enable support for compensation by default. If the


migrated workload does not make use of compensation, this support can be
disabled to gain performance. The relevant flag can be found in the WID under
process name-> properties->Details-> Require a compensation sphere context to
be passed in

The generated BPEL flows still make use of ICS API to perform BO &
Collaboration level tasks. Development effort spent cleaning up the migrated
BPEL to replace these APIs will result in better performance and better
maintainability.

Investigate the possibility of replacing BPEL processes produced by migration


with other artifacts. All WICS collaborations currently get migrated into BPEL
processes. For certain scenarios other WPS artifacts may be better choices (e.g.
Business Rules). Analyze the BPEL processes produced by migration to ensure
the processes are the best fit for your scenario.

Disable Message Logger calls in migration-generated MFC components. The WICS


Migration Wizard in WID 7.0 generates a Mediation Flow Component (MFC) to deal
with the mapping details of a connector : it contains the code handling
synchronous/asynchronous calls to maps that transform Application Specific BO to/from
Generic BO and visa versa. The generated MFC contain embedded MessageLogger calls
which log the message to a database. Disable these calls (Select MessageLogger instance,
choose the details panel, uncheck Enabled checkbox) if not required in your business
scenario. This reduces writes to the database and thus improves performance.

Reduce memory pressure by splitting the shared library generated by the migration
wizard. The migration wizard creates a single shared library and puts all migrated
Business Objects, maps and relationships in it. This library is then shared by copy by all
the migrated modules. This can cause memory bloat for cases where the shared library is
very large and a large number of modules are present. The solution is to manually refactor the shared library into multiple libraries based on functionality or usage and
modify modules to only reference the shared libraries that are needed.

If original WICS maps contain many custom map steps, then development effort spent in
rewriting such map steps will result in better performance. The WICS Migration Wizard

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

35

in WID 7.0 generates maps that make use of ICS APIs, which is a translation layer above
WPS technologies. Removing this layer by making direct use of WPS APIs avoids the
cost of translation and hence produces better performance.

3.11 WID Considerations


This section describes recommendations intended to improve the performance of activities
commonly encountered by application developers during the development of an enterprise
application, primarily Import, Build & Publish of an application workspace.

3.11.1

Leverage Hardware Advantages

Importing and building an enterprise application is, in itself a resource intensive activity. Recent
improvements in desktop hardware architecture have greatly improved the responsiveness of
Import and Build activities, as demonstrated in Section 9.20.4. In particular, Intel Core2 Duo
cores perform much better than the older PentiumD architecture, even when the Core2 Duo runs
at a slower clock rate. Also, for I/O intensive activities (like Import) a faster disk drive reduces
total response time, as demonstrated in Section 9.19.2.

3.11.2
Make use of WAS shared libraries in order to reduce
memory consumption
For applications containing many projects utilizing a WPS shared library, server memory
consumption is reduced by defining the library as a WAS shared library as described in the
technote found at
http://www-01.ibm.com/support/docview.wss?uid=swg21298478.
Section 9.20.3 demonstrates some results obtained using this approach.

3.12 Fabric Considerations


3.12.1
Only specify pertinent context properties in context
specifications
The effectiveness of Fabrics runtime caching of metadata is governed by the number of context
properties explicitly listed in a context specification. Thus care should be taken to limit cached
content by using only the context properties that are pertinent to a particular dynamic decision.
For example if a credit score context property is not used in a particular dynamic decision, then
dont list that context property in the associated context specification.
Note that this applies to strict context specifications, which is the preferred mechanism.

3.12.2

Bound the range of values for context keys

The possible values of a context key should be bound to either a finite set, or a minimum and
maximum value. The Fabric runtime caches metadata based on the contexts defined as required
or optional in the context specification. Thus having a context key which can take an unbounded
integer as its value will result in too many potential cache entries, which will make the cache less
efficient. Consider using classes of possible values rather than absolute numbers. For example,

Copyright IBM Corporation 2005, 2010. All right reserved.

Development Best Practices

36

for credit scores group the possible values under Poor, Average, Good, and Excellent, rather than
using the actual values. The actual values should then be placed in one of these categories and
the category should be passed as the context instead of the actual values.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

37

4 Performance Tuning and Configuration


4.1 Introduction
In order to optimize performance it is usually necessary to configure the system differently than
the default settings. This chapter lists several areas to consider during system tuning. This
includes tuning the WebSphere BPM products, and also other products in the system (e.g. DB2).
The documentation for each of these products contains a wealth of information regarding
performance, capacity planning and configuration. This documentation would likely offer the
best guidance for performance considerations in a variety of operational environments.
Assuming that all these issues have been addressed from the perspective of the actual product,
additional levels of performance implications are introduced at the interface between these
products and the products covered in this report.
A number of configuration parameters are available to the system administrator. While this
chapter identifies several specific parameters observed to affect performance, it does not address
all available parameters. For a complete list of configuration parameters and possible settings
please see the relevant product documentation.
The next section describes a methodology to use when tuning a deployed system. It is followed
by a basic tuning checklist that enumerates the major components and their associated tuning
concepts. The subsections that follow address tuning in more detail, first describing several
tuning parameters and their suggested setting (where appropriate), and finally providing advanced
tuning guidelines for more detailed guidance for key areas of the system. While there is no
guarantee that following the guidance in this chapter will immediately provide acceptable
performance, it is likely that degraded performance can be expected if these parameters are
incorrectly set.
Finally, the last section of this document contains References to related documentation that may
prove valuable when tuning a particular configuration.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

38

4.2 Performance Tuning Methodology


We recommend a system-wide approach to performance tuning of a WebSphere BPM
environment. Please note that the art of system performance tuning, which requires training and
experience, is not going to be exhaustively described here. Rather, we will highlight some key
aspects of tuning that are particularly important.
It is important to note that tuning encompasses every element of the deployment topology:

Physical hardware topology choices

Operating System parameters tuning

WPS, WAS, and ME tuning

The methodology for tuning can be stated very simply as an iterative loop:

Pick a set of reasonable initial parameter settings.

Run the system.

Monitor the system to obtain metrics that indicate whether performance is being limited.

Use monitoring data to guide further tuning changes.

Repeat until done.

We will now examine each in turn:

Pick a set of reasonable initial parameter settings.

Use the tuning checklist in the next section for a systematic way to set parameters.

For specific initial values, consult Appendix A for settings that were used for the
various workloads that were run. These values can be considered for initial values.

Monitor the system. We recommend monitoring the system(s) to determine system


health, as well as to determine the need for further tuning. The following should be
monitored:

For each physical machine in the topology including front end and back-end servers
like web servers, and DB servers:
o

Monitor Core utilization, memory utilization, disk utilization, network


utilization using relevant OS tools like vmstat, iostat, netstat, or equivalent

For each JVM process started on a physical machine, i.e. WPS server, ME server,
etc.

use tools like ps or equivalent to get Core and memory usage per process

collect verbosegc statistics

For each WPS or ME JVM, use TPV (Tivoli Performance Viewer) to monitor the
following:

For each data source, the data connection pool utilization

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

39

For each thread pool (Web Container, default, work managers), the thread pool
utilization

Use monitoring data to guide further tuning changes


This is a vast topic which requires skill and experience. In general, this phase of tuning
requires the analyst to look at the collected monitoring data, detect performance bottlenecks,
and do further tuning. The key characteristic about this phase of tuning is that it is driven by
the monitoring data collected in the previous phase.
Examples of performance bottlenecks include, but are not limited to:

Excessive utilization of physical resources like processor cores, disk, memory etc. These
can be resolved either by adding more physical resources, or rebalancing the load more
evenly across the available resources.

Excessive utilization of virtual resources. Examples include heap memory, connection


pools, thread pools, etc. For these, tuning parameters should be used to remove the
bottlenecks.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

40

4.3 Tuning Checklist


This checklist serves as a guide or to do list when tuning a WebSphere BPM solution. Each of
these topics is covered in more detail in the remainder of this chapter.
Common

Disable Tracing and Monitoring when possible

Move databases from the default Derby to a high performance DBMS such as DB2

If security is required use Application security, not Java2 security.

Use appropriate hardware configuration for performance measurement, e.g.


ThinkPads and desktops are not appropriate for realistic performance evaluations.

If hardware virtualization is used, ensure adequate processor, memory, and I/O


resources are allocated to each virtual machine. Avoid over-committing resources.

Do not run production server in development mode or with development profile

Do not use the Unit Test Environment (UTE) for performance measurement

Tune external service providers and external interfaces to ensure they are not the
system bottleneck.

Configure MDB Activation Specs

Configure for clustering (where applicable)

Configure Thread Pool sizes

Configure Data Sources : Connection Pool size, Prepared Statement Cache size.
Consider using non-XA data sources for CEI data when that data is non-critical.

Business Process Choreographer

Use work-manager based navigation for long running processes


o

If work-manager based navigation is used, also optimize message pool size and
intertransaction cache size

Use Query Tables to optimize query response time

Optimize Business Flow Manager resources: database connection (BPEDB), activation


specification (BPEInternalActivationSpec), and JMS connection (BPECF and BPECFC);

Optimize the database configuration for the Business Process Choreographer database
(BPEDB)

Optimize indexes for SQL statements that result from task and process list queries using
database tools like the DB2 design advisor

Turn off state observers that are not needed , e.g. turn off audit logging

Messaging and Message Bindings

Optimize Activation Specification (JMS, MQJMS, MQ)

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

Optimize Queue Connection Factory (JMS, MQJMS, MQ)

Configure Connection Pool Size (JMS, MQJMS, MQ)

Configure SIBus Data Buffer Sizes

Database

Place Database Tablespaces and Logs on a Fast Disk Subsystem

Place Logs on Separate Device from Tablespace Containers

Maintain Current Indexes on Tables

Update Database Statistics

Set Log File Sizes Correctly

Optimize Buffer Pool Size (DB2) or Buffer Cache Size (Oracle)

Set the Heap / Nursery Sizes to manage memory efficiently

Choose the appropriate garbage collection policy (generally, -Xgcpolicy:gencon)

Java

Monitor

Configure CEI

Set message consumption batch size

Copyright IBM Corporation 2005, 2010. All right reserved.

41

Performance Tuning and Configuration

42

4.4 Tuning parameters


4.4.1 Tracing and Logging flags
Tracing and logging are often necessary when setting up a system or debugging issues. However,
these capabilities produce performance overhead that is often significant; minimize their use
when evaluating performance or in production environments.
This section lists tracing parameters used in the products covered in this report. Some flags or
checkboxes are common to all or a subset of the products, while others are specific to a particular
product. Unless stated otherwise, all of these parameters can be set via the Admin
Console.Tracing and logging flags
To enable or disable tracing, go to Troubleshooting > Logs and Trace > server name > Change
Log Detail Levels and set both the Configuration and Runtime to *=all=disabled.
To change the PMI level go to Monitoring and Tuning -> Performance Monitoring Infrastructure
-> server name and select none.
In addition, Cross-Component Tracing (XCT) is very useful for problem determination, enabling
correlation of SCA component information with log entries. However, XCT should not be used
in production or while obtaining performance data. There are two levels of XCT settings: enable
or enable with data snapshot. Both incur significant performance overhead. Enable with data
snapshot is particularly costly because of the additional I/O involved in saving snapshots in files.
To enable or disable Cross-Component Trace, go to Troubleshooting > Cross-Component Trace.
Select the XCT setting from three options, disable, enable, or enable with data snapshot, in
Configuration and/or Runtime. Changes to Runtime takes effect immediately while changes to
Configuration require a server restart to take effect.

4.4.2 Java tuning parameters


In this section we list a few frequently used Java Virtual Machine (JVM) tuning parameters. For
a complete list, consult the JVM tuning guide offered by the JVM supplier.
The JVM admin panel can be accessed from Servers > Application Servers > your server name >
Server Infrastructure > Java and Process Management > Process Definition > Additional
Properties > Java Virtual Machine.
4.4.2.1 Java GC policy
The default garbage collection algorithm on platforms with an IBM JVM is a generational
concurrent collector (specified via -Xgcpolicy:gencon under Generic JVM arguments on the Java
Virtual Machine admin panel). Our results show that this garbage collection policy usually
delivers better performance with a tuned nursery size as discussed in the next section.
4.4.2.2 Java Heap sizes
To change the default Java heap sizes, set the Initial Heap Size and Maximum Heap Size
explicitly on the Java Virtual Machine admin panel.
If Generational Concurrent Garbage Collector is used, the Java heap is divided into a new area
(nursery) where new objects are allocated and an old area (tenured space) where longer lived

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

43

objects reside. The total heap size is the sum of the new area and the tenured space. The new area
size can be set independently from the total heap size. Typically the new area size should be set
between and of the total heap size. The relevant parameters are:

-Xmns<size> : initial new area size

-Xmnx<size> : maximum new area size

-Xmn<size> : fixed new area size

4.4.3 MDB ActivationSpec


There are a few shortcuts to access the MDB ActivationSpec tuning parameters.

Resources > Resource Adapters > J2C activation specifications > ActivationSpec name

Resources > JMS > Activation specifications > ActivationSpec name

Resources > Resource Adapters > Resource adapters > resource adapter name >
Additional proerpties > J2C activation specifications > ActivationSpec name

Two custom properties (shown below) in the MDB ActivationSpec have considerable
performance implications. These are discussed further in Section 4.5.3.2.

maxConcurrency

maxBatchSize

4.4.4 Thread Pool Sizes


WebSphere uses thread pools to manage concurrent tasks. The Maximum Size property of a
thread pool can be set under
Servers > Application servers > server name > Additional Properties > Thread Pools > thread
pool name.
The following thread pools typically need to be tuned:

Default

ORB.thread.pool

WebContainer

In addition, thread pools used by Work Managers are configured separately via:
Resources > Asynchronous beans > Work managers > work manager name > Thread pool
properties
The following Work Managers typically need to be tuned:

DefaultWorkManager

BPENavigationWorkManager

4.4.5 JMS Connection Pool Sizes

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

44

There a few ways of accessing the JMS connection factories and JMS queue connection factories
from Websphere admin console.

Resources > Resource Adapters > J2C connection factories > factory name

Resources > JMS > Connection factories > factory name

Resources > JMS > Queue connection factories > factory name

Resources > Resource Adapters > Resource adapters > resource adapter name (e.g. SIB
JMS Resource Adapter) > Additional properties > J2C connection factories > factory
name

From the connection factory admin panel, open Additional Properties > Connection pool
properties. Set the Maximum connections property to the max size of the connection pool.

4.4.6 JDBC DataSource Parameters


DataSources can be accessed from either of these paths:

Resources > JDBC > Data sources > datasource name

Resources > JDBC Providers > JDBC provider name > Additional Properties > Data
sources > datasource name

4.4.6.1 Connection Pool Size


The maximum size of the DataSource connection pool is limited by the value of Maximum
connections property, which can be configured from the DataSource panels Additional
Properties -> Connection pool properties.
The following DataSources typically need to be tuned:

BPEDataSource for BPE DB

SCA Application Bus ME DataSource

SCA System Bus ME DataSource

CEI Bus ME DataSource

4.4.6.2 Prepared Statement Cache Size


The DataSource prepared statement cache size can be configured from the DataSources
Additional properties > WebSphere Application Server data source properties.
For WPS, the BPEDB datasource should typically be tuned to a higher value; 300 is suggested as
an initial value.

4.4.7 Messaging Engine Properties


Two message engine custom properties may impact the messaging engine performance:

sib.msgstore.discardableDataBufferSize
o

In memory buffer for best effort nonpersistent messages.

Default is 320K.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration


o

45

Once full, messages will be discarded to allow newer messages to be written to


the buffer.

sib.msgstore.cachedDataBufferSizeCachedDataBufferSize
o

In memory cache for messages other than best effort nonpersistent

Default is 320K

The properties can be accessed under Service Integration > Buses > bus name > Messaging
Engines > messaging engine name > Additional properties > Custom properties.
Full details of these are given in the Info Center at the following location:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.pmc
.doc/concepts/cjk_learning.html

4.4.8 Run production servers in production


WebSphere application servers can be run in development mode which may reduce startup time
for the server by using JVM settings to disable bytecode verification and reduce JIT compilation
time. This setting should not be used on production servers, however, since it is not designed to
produce optimal runtime performance. Make sure the checkbox Run in development mode on
the Admin Console panel Servers > Application Servers > your server name > Configuration is
unchecked.
Server profiles may also be created with production or development templates. Use production
profile templates for production servers.

4.5 Advanced Tuning


4.5.1 Tracing and Monitoring considerations
The ability to configure tracing and monitoring at different levels for a variety of system
components has proven to be extremely valuable during periods of system analysis or debugging.
The WebSphere BPM product set provides rich monitoring capabilities, both in terms of business
monitoring via the Common Event Interface (CEI) and audit logging, and system performance
monitoring via the Performance Monitoring Infrastructure (PMI) and the Application Response
Measurement (ARM) infrastructure. While these capabilities provide insight into the performance
of the running solution, these features can degrade overall system performance and throughput.
Therefore, it is recommended that tracing and monitoring be used judiciously and when
possible, turned off entirely to ensure optimal performance.
Most tracing and monitoring is controlled via the WAS Admin console. Please validate that the
appropriate level of tracing/monitoring is set for PMI Monitoring, Logging, and Tracing via the
Admin Console.
Further, use the Admin Console to validate that the "Audit logging" and "Common Event
Infrastructure logging" check boxes are disabled in the Business Flow Manager and the Human
Task Manager, unless these capabilities are required for business reasons.
The WebSphere Integration Developer (WID) is also used to control event monitoring. Please
check the Event Monitor tab for your Components and Business Processes to ensure that event
monitoring is applied judiciously.

4.5.2 Tuning for Large Objects

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

46

4.5.2.1 Heap Limitations: Increase the Java Heap to its maximum


One of the key factors affecting large object processing is the maximum size of the Java heap. In
this section we discuss how to set the heap size as big as possible on two commonly used
platforms. For more comprehensive heap setting techniques, consult Section 4.5.13.
Windows:
Due to address space limitations in the Windows 32 bit operating system, the largest heap that
can be obtained is around 1.4 GB to 1.6 GB for 32-bit JVMs. When using a 64-bit Windows
JVM, however, the heap size is only limited by the available physical memory.
AIX:
Using the normal Java heap settings, the Java5 and Java 6 JVM supports heaps 2 GB to 2.4 GB
on 32-bit systems. Note that since the 4GB address space allowed by the 32-bit system is shared
with other resources, the actual limit of the heap size depends on memory usage by resources
such as thread stacks, JIT compiled code, loaded classes, shared libraries, buffers used by OS
system services, etc. An extremely large heap squeazes address space reserved for other resources
and may cause runtime failures. On 64-bit systems, the available address space is practically
unlimited, so the heap size is usually limited only by available memory.
4.5.2.2 Reduce or eliminate other processing within WPS, WESB and Adapters
while processing a large object.
One way to allow for larger object sizes is to limit the concurrent processing within the JVM.
One should not expect to be able to process a steady stream of the largest objects possible
concurrently with other WPS, WESB, and WebSphere Adapters activities. The operational
assumption that needs to be made when considering large objects is that not all objects will be
large or very large and that large objects will not arrive very often, perhaps once or twice per
day. If more than one very large object is being processed concurrently the likelihood of
failure increases dramatically.
The size and number of the normally arriving smaller objects will affect the amount of Java
heap memory consumption in the system. Generally speaking, the heavier the load on a system
when a large object is being processed the more likely that memory problems will be
encountered.
For adapters, the amount of concurrent processing can be influenced by setting the pollPeriod and
pollQuantity parameters. To allow for larger object sizes, set a relatively high value for
pollPeriod (e.g. 10 seconds) and low value for pollQuantity (e.g. 1) to minimize the amount of
concurrent processing that occurs. Note that these settings are not optimal for peak throughput,
so if a given adapter instance must support both high throughput for smaller objects interspersed
with occasional large objects, then trade-offs must be made.

4.5.3 Tuning for Maximum Concurrency


For most high volume deployments on server-class hardware, there will be many operations
which take place simultaneously. Tuning for maximum concurrency ensures that the server will
accept enough load to saturate its Core(s). One sign of an inadequately tuned configuration is
when additional load does not result in additional Core utilization, while the Cores are not fully
utilized. To optimize these operations for maximum concurrency, the general guideline is to
follow the execution flow and remove bottlenecks one at a time.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

47

Note that higher concurrent processing means higher resource requirements (memory and number
of threads) on the server. It needs to be balanced with other tuning objectives, such as the
handling of large objects, handling large numbers of users, and providing good response time.
4.5.3.1 Tune edge components for concurrency
The first step is to ensure that Business Objects are handled concurrently at the edge components
of WPS or WESB. If the input BOs come from an adapter, ensure the adapter is tuned for
concurrent delivery of input messages. See Section 4.5.8 for more details on tuning adapters.
If the input BOs come from the WebServices export binding or direct invocation from a JSP or
Servlet, make sure the WebContainer thread pool is correctly sized. To allow for 100 in-flight
requests handled concurrently, the maximum size of the WebContainer thread pool needs to be
set to 100 or larger.
If the input BOs come from the messaging, the ActivationSpec (MDB bindings) and Listener
ports (MQ or MQJMS bindings) need to be tuned to handle sufficient concurrency.
4.5.3.2 Tune MDB ActivationSpec properties
For each JMS export component, there is an MDB and its corresponding ActivationSpec (JNDI
name: module name/export component name_AS). The default value for maxConcurrency of the
JMS export MDB is 10, meaning up to 10 BOs from the JMS queue can be delivered to the MDB
threads concurrently. Change it to 100 if a concurrency of 100 is desired.
Note that the Tivoli Performance Viewer (TPV) can be used to monitor the maxConcurrency
parameter. For each message being processed by an MDB there will be a message on the queue
marked as being locked inside a transaction (which will be removed once the onMessage
completes), these messages are classed as "unavailable". There is a PMI metric that gives you the
number of unavailable messages on each queue point (resource_name > SIB Service > SIB
Messaging Engines > bus_name > Destinations > Queues), called "UnavailableMessageCount".
If any queue has at least maxConcurrency unavailable messages it would imply that the number
of messages on the queue is currently running higher than the MDB's concurrency maximum. If
this occurs, increase the maxConcurrency setting for that MDB.
The maximum batch size in the activation spec also has an impact on performance. The default
value is 1. The maximum batch size value determines how many messages are taken from the
messaging layer and delivered to the application layer in a single step (note that this does NOT
mean that this work is done within a single transaction, and therefore this setting does not
influence transactional scope). Increase this value, for example to 8, for activation specs
associated with SCA modules and long-running business processes to improve performance and
scalability, especially for large multi-core systems.
4.5.3.3 Configure Thread pool sizes
The sizes of thread pools have a direct impact on a servers ability to run applications
concurrently. For maximum concurrency, the thread pool sizes need to be set to optimal values.
Increasing the maxConcurrency or Maximum sessions parameters only enables the concurrent
delivery of BOs from the JMS or MQ queues. In order for the WPS or WESB server to process
multiple requests concurrently, it is also necessary to increase the corresponding thread pool sizes
to allow higher concurrent execution of these Message Driven Beans (MDB) threads.
MDB work is dispatched to threads allocated from the Default thread pool. Note that all MDBs
in the application server share this thread pool, unless a different thread pool is specified. This

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

48

means that the Default thread pool size needs to be larger, probably significantly larger, than the
maxConccurency of any individual MDB.
Threads in the Web Container thread pool are used for handling incoming HTTP and Web
Services requests. Again, this thread pool is shared by all applications deployed on the server. As
discussed earlier, it needs to be tuned, likely to a higher value than the default.
ORB thread pool threads are employed for running ORB requests, e.g. remote EJB calls. The
thread pool size needs to be large enough to handle requests coming in through EJB interface,
such as certain human task manager APIs.
4.5.3.4 Configure dedicated thread pools for MDBs
The Default thread pool is shared by many WebSphere Application Server tasks. It is sometimes
desirable to separate the execution of JMS MDBs to a dedicated thread pool. Follow the steps
below to change the thread pool used for JMS MDB threads.
1) Create a new thread pool, say MDBThreadPool, on the server by following
Servers > Server Types > WebSphere application servers > server > Thread pools
and then click on New
2) Open the Service Integration Bus (SIB) JMS Resource Adapter admin panel with
server scope from Resources > Resource Adapters > Resource adapters. If the
adapter is not shown, go to Preferences, and set the Show built-in resources
checkbox.
3) Change Thread pool alias from Default to MDBThreadPool.
4) Repeat the 2 and 3 for SIB JMS Resource Adapters with node and cell scope.
5) Restart the server for the change to be effective.
SCA Module MDBs for asynchronous SCA calls use a separate resource adapter, the Platform
Messaging Component SPI Resource Adapter. Follow the same step as above to change the
thread pool to a different one, if so desired.
Note that even with a dedicated thread pool, all MDBs associated with the resource adapter still
share the same thread pool. However, they do not have to compete with other WebSphere
Application Server tasks that also use the Default thread pool.
4.5.3.5 Tune intermediate components for concurrency
If the input BO is handled by a single thread from end to end, the tuning for the edge components
is normally adequate. In many situations, however, there are multiple thread switches during the
end to end execution path. It is important to tune the system to ensure adequate concurrency for
each asynchronous segment of the execution path.
Asynchronous invocations of an SCA component utilize an MDB to listen for incoming events
that arrive in the associated input queue. Each SCA module defines an MDB and its
corresponding activation spec (JNDI name: sca/module name/ActivationSpec). Note that the SCA
module MDB is shared by all asynchronous SCA components within the module, including SCA
export components. Take this into account when configuring the ActivationSpecs
maxConcurrency propery value. SCA module MDBs use the same Default thread pool as those
for JMS exports.
The asynchrony in a long running business process occurs at transaction boundaries (see Section
3.6 for more details on settings that affect transaction boundaries). BPE defines an internal MDB

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

49

and its ActivationSpec: BPEInternalActivationSpec. The maxConcurrency parameter needs to be


tuned following the same guideline as for SCA module and JMS export MDBs (described above).
The only catch is there is one BPEInternalActivationSpec in the WPS server.
4.5.3.6 Configure JMS and JMS queue connection factories
Multiple concurrently running threads may bottleneck on resources such as JMS and database
connection pools if such resources are not tuned properly. The Maximum Connections pool size
specifies the maximum number of physical connections that can be created in this pool. These are
the physical connections to the backend resource, for example a DB2 database. Once the
connection pool limit is reached, no new physical connections can be created and the requester
waits until a physical connection that is currently in use is returned to the pool, or a
ConnectionWaitTimeout exception is issued.
For example, if the Maximum Connections value is set to 5, and there are five physical
connections in use, the pool manager waits for the amount of time specified in Connection
Timeout for a physical connection to become free. The threads waiting for connections to
underlying resource are blocked until the connections are freed up and allocated to them by the
pool manager. If no connection is freed in the specified interval, a ConnectionWaitTimeout
exception is issued.
If Maximum Connections is set to 0, the connection pool is allowed to grow infinitely. This also
has the side effect of causing the Connection Timeout value to be ignored.
The general guideline for tuning connection factories is that their max connection pool size needs
to match the number of concurrent threads multiplied by the number of simultaneous connections
per thread.
For each JMS, MQ, or MQJMS Import, there is a Connection Factory created during application
deployment. The maximum connections property of the JMS Connection Factorys connection
pool should be large enough to provide connections for all threads concurrently executing in the
import component. For example, if 100 threads are expected to run in a given module, the
maximum connections property should be set to 100. The default is 10.
From the connection factory admin panel, open Additional Properties > Connection pool
properties. Set Maximum connections property to the max size of the connection pool
4.5.3.7 Configure DataSource options
The maximum connections property of DataSources should be large enough to allow concurrent
access to the databases from all threads. Typically there are a number of DataSources configured
in WPS/WESB servers, e.g. BPEDB datasource, WPSDB datasource, and Message Engine DB
datasources. Set each DataSources maximum connection property to match the maximum
concurrency of other system resources as discussed previously in this chapter.
4.5.3.8 Set DataSource prepared statement cache size
The BPC container uses prepared statements extensively. The statement cache sizes should be
large enough to avoid repeatedly preparing statements for accessing the databases.
The prepared statement cache for the BPEDB datasource should be at least 300.

4.5.4 Messaging Tuning


4.5.4.1 For Message Engines, choose datastore or filestore

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

50

Message Engine persistence is usually backed by a database. Stating with the 6.2.0 release, a
standalone configuration of WPS or WESB can have the persistence storage of BPE and SCA
buses backed by the file system (filestore). The choice of filestore has to be made at profile
creation time. Use the Profile Management Tool to create a new Standalone enterprise service
bus profile or Standalone process server profile. Choose Profile Creation Options ->
Advanced profile creation -> Database Configuration, select checkbox Use a file store for
Messaging Engine (MEs). When this profile is used, filestores will be used for BPE and SCA
service integration buses.
4.5.4.2 Set Data Buffer Sizes (Discardable or Cached)
The DiscardableDataBufferSize is the size in bytes of the data buffer used when processing best
effort non persistent messages. The purpose of the discardable data buffer is to hold message data
in memory, since this data is never written to the data store for this Quality of Service. Messages
which are too large to fit into this buffer will be discarded.
The CachedDataBufferSize is the size in bytes of the data buffer used when processing all
messages other than best effort non persistent messages. The purpose of the cached data buffer is
to optimize performance by caching in memory data that might otherwise need to be read from
the data store.
The DiscardableDataBufferSize and CachedDataBufferSize can be set under Service IntegrationBuses -> bus name -> Messaging Engines -> messaging engine name -> Additional properties ->
Custom properties.
4.5.4.3 Move Message Engine datastores to a High Performance DBMS
For better performance, the Message Engine datastores should use production quality databases,
such as DB2, rather than the default Derby. The choice can be made at profile creation time
using advanced profile creation option. If the profile has already been created with Derby as
the ME datastore, the following method can be used to change the datastore to an alternative
database.
After the Profile Creation Wizard has finished and Business Process Choreographer is
configured, the system should contain four buses with one message engine each. The example
below shows the Buses in WPS installed on machine box01; the node and cell names are the
default
Bus

Messaging Engine

SCA.SYSTEM.box01Node01Cell.Bus

box01server1.SCA.SYSTEM.box01Node01Cell.Bus

SCA.APPLICATION. box01Node01Cell.Bus

box01-server1.SCA.APPLICATION.
box01Node01Cell.Bus

CommonEventInfrastructure_Bus

box01server1.CommonEventInfrastructure_Bus

BPC.box01Node01Cell.Bus

box01-server1.BPC.box01Node01Cell.Bus

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

51

Each of these message engines is by default configured to use a datastore in Derby. Each
datastore is located in its own database. For DB2, this is not optimal from an administrative point
of view. There are already many databases in the system and adding four more databases
increases the maintenance and tuning effort substantially. The solution proposed here uses a
single DB2 database for all four datastores. The individual datastores/tables are completely
separate and each message engine acquires an exclusive lock on its set of tables during startup.
Each message engine uses a unique schema name to identify its set of tables.

4.5.4.3.1 Setting up the data stores for the messaging engines


For information on setting up a data store see Configuring a messaging engine to use a data
store in the WAS 7.0 Info Center at the following link:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.pmc
.nd.multiplatform.doc/tasks/tjm0005_.html
4.5.4.3.2

Create the DB2 database and load the datastore schemas.

Instead of having a DB2 database per messaging engine we put all messaging engines into the
same database using different schemas to separate them.
Schema

Messaging Engine

SCASYS

box01server1.SCA.SYSTEM.box01Node01Cell.Bus

SCAAPP

box01-server1.SCA.APPLICATION.
box01Node01Cell.Bus

CEIMSG

box01server1.CommonEventInfrastructure_Bus

BPCMSG

box01-server1.BPC.box01Node01Cell.Bus

Create one schema definition for each message engine with the following command on Windows.
In the example below, <WAS Install> represents the WPS Installation directory, <user>
represents the user name, and <path> represents the fully qualified path to the referenced file.
<WAS Install>\bin\sibDDLGenerator.bat -system db2 -version 8.1 -platform windows statementend ; -schema BPCMSG -user <user> >createSIBSchema_BPCMSG.ddl
Repeat for each schema/messaging engine.
To be able to distribute the database across several disks, edit the created schema definitions and
put each table in a tablespace named after the schema used i.e. SCAAPP becomes
SCANODE_TS, CEIMSG becomes CEIMSG_TS and so on. The schema definition should look
like this after editing:
CREATE SCHEMA CEIMSG;
CREATE TABLE CEIMSG.SIBOWNER (
ME_UUID VARCHAR(16),

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

52

INC_UUID VARCHAR(16),
VERSION INTEGER,
MIGRATION_VERSION INTEGER
) IN CEIMSG_TB;

CREATE TABLE CEIMSG.SIBCLASSMAP (


CLASSID INTEGER NOT NULL,
URI VARCHAR(2048) NOT NULL,
PRIMARY KEY(CLASSID)
) IN CEIMSG_TB;
.
It is possible to provide separate tablespaces for the various tables here. Optimal distribution
depends on application structure and load characteristics. In this example one tablespace per
datastore was used.
After creating all schema definitions and defined tablespaces for the tables, create a database
named SIB.
Create the tablespaces and distribute the containers across available disks by issuing the
following command for a system managed tablespace:
DB2 CREATE TABLESPACE CEIMSG_TB MANAGED BY SYSTEM USING(
'<path>\CEIMSG_TB' )
Place the database log on a separate disk if possible.
Create the schema of the database by loading the four schema definitions into the database.
Please see Sections 4.5.10 and 4.5.11 for further information on database and DB2-specific
tuning, respectively.

4.5.4.3.3 Create the datasources for the messaging engines


Create a datasource for each message engine and configure each message engine to use the new
datastore using the admin console.
The following table shows the default state:
Messaging Engine

JDBC Provider

box01server1.SCA.SYSTEM.box01Node01Cell.Bus

Derby JDBC Provider (XA)

box01-server1.SCA.APPLICATION.
box01Node01Cell.Bus

Derby JDBC Provider

box01-

Derby JDBC Provider

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

53

server1.CommonEventInfrastructure_Bus
box01-server1.BPC.box01Node01Cell.Bus

Derby JDBC Provider

Create a new JDBC provider DB2 Universal JDBC Driver Provider for the non-XA
datasources first if it is missing. The XA DB2 JDBC Driver Provider should exist if BPC was
configured correctly for DB2.
Create four new JDBC datasources, one for CEI as an XA datasource, the remaining three as
single-phase commit (non-XA) datasources.
The following table provides new names.
Name of datasource

JNDI Name

Type of jdbc provider

CEIMSG_sib

jdbc/sib/CEIMSG

DB2 Universal (XA)

SCAAPP_sib

jdbc/sib/SCAAPPLICATION

DB2 Universal

SCASYSTEM_sib

jdbc/sib/SCASYSTEM

DB2 Universal

BPCMSG_sib

jdbc/sib/BPCMSG

DB2 Universal

When creating a datasource


y

Uncheck the checkbox named Use this Data Source in container managed
persistence (CMP)

Set a Component-managed authentication alias

Set the database name to the name used for the database created earlier for messaging
e.g. SIB

Select a driver type : 2 or 4. Per DB2 recommendations, use the JDBC Universal
Driver Type 2 connectivity to access local databases and Type 4 connectivity to
access remote databases.. Note that a driver of Type 4 requires a hostname and valid
port to be configured for the database.

4.5.4.3.4 Change the datastores of the messaging engines


Use the Admin Console to change the datastores of the messaging engines.

In the Navigation Panel go to Service Integration -> Buses and change the datastores for
each Bus/Messaging Engine displayed.

Put in the new JNDI and schema name for each datastore. Uncheck the checkbox Create
Tables since the tables have been created already.

The server immediately restarts the message engine; the SystemOut.log shows the results
of the change and also shows if the message engine starts successfully.

Restart the server and validate that all systems come up using the updated configuration.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

54

The last remaining task is tuning the database; please see Sections 4.5.10 and 4.5.11 for further
information on database and DB2-specific tuning, respectively.

4.5.5 Web Services Tuning


If the target of the Web Services import binding is hosted locally in the same application server,
the performance can be further improved by exploiting the optimized communication path
provided by the Web container. Normally requests from the Web Services clients are sent
through the network connection between the client and the service provider. For local Web
Services calls, however, WAS offers a direct communication channel bypassing the network layer
completely. Follow the steps below to enable this optimization. Use the WAS Admin Console to
make these changes.

Set Web container custom property enableInProcessConnections to true at Application


servers > server name > Container Settings > Web Container Settings > Web container >
Additional Properties > Custom Properties

Do not use wildcard (*) for the host name of the Web Container port. Replace it with
the hostname or IP address. The property can be accessed from Application servers >
server name > Container Settings > Web Container Settings > Web container >
Additional Properties > Web container transport chains > WCInboundDefault > TCP
inbound channel (TCP_2) > Related Items > Ports > WC_defaulthost > Host

Use localhost instead of host name in the Web Services client binding. If the actual
hostname is used and even if it is aliased to localhost, this optimization will be disabled.
The property can be accessed from Enterprise Applications > application name >
Manage Modules > application EJB jar > Web services client bindings > Preferred port
mappings > binding name. Use localhost (e.g. localhost:9080) in the URL.

Make sure there is not an entry for your server hostame and IP address in your servers
hosts file for name resolution. An entry in the hosts file inhibits this optimization by
adding name resolution overhead.

4.5.6 Business Process Choreographer Tuning


4.5.6.1 Tuning Work-Manager-based navigation for business processes
Starting with WPS 7.0, work-manager-based navigation is the default navigation mode for WPS
(versus JMS-based navigation).
Work-Manager-based navigation provides two performance optimizations, keeping the quality of
service of process navigation with persistent messaging (JMS-based navigation):

Work-Manager-based navigation. A WorkManager is a thread pool for J2EE threads.


WorkManager process navigation exploits an underlying capability of WAS to start the
processing of ready-to-navigate business flow activities without using messaging as
provided by JMS providers.

The InterTransactionCache, a part of the Work-Manager-based navigation mode which


holds process instance state information in memory, reducing the need to retrieve
information from the BPE database.

There are several parameters that control usage of these two optimizations. The first set of these
parameters are found by going to

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

55

Application Servers > server name > Business Integration > Business Process Choreographer
> Business Flow Manager > Business Process Navigation Performance
The key parameters are:

Check Enable advanced performance optimization to enable both the Work-Managerbased navigation and InterTransactionCache optimizations.

Work-Manager-Based Navigation Message Pool Size: this property specifies the size of
the cache used for navigation messages that cannot be processed immediately, provided
Work-Manager-based navigation has been enabled. The cache defaults to a size of (10 *
thread pool size of the BPENavigationWorkManager) messages. Note that if this cache
reaches its limit, WPS uses JMS-based navigation for new messages,, so for optimal
performance ensure this Message Pools size is set to a sufficiently high value.

InterTransaction Cache Size: this property specifies the size of the cache used to store
process state information that has also been written to the BPE database. It should be set
to twice the number of parallel running process instances. The default value for this
property is the thread pool size of the BPENavigationWorkManager.

In addition, customize the number of threads for the work manager using:
Resources -> Asynchronous Beans -> Work Managers -> BPENavigationWorkManager
The minimum and maximum number of threads should be increased from their default values of
5 and 12, respectively, using the methodology outlined below in the section titled Tuning for
Maximum Concurrency. If the thread pool size is modified, then the work request queue size
should also be modified and set to be twice the maximum number of threads.
4.5.6.2 Tuning the business process container for JMS navigation
If JMS-based navigation is configured, the following resources need to be optimized for efficient
navigation of business processes:

Activation specification BPEInternalActivationSpec: The maximum concurrent endpoints


parameter specifies the parallelism that is used for process navigation across all process
instances. Increase the value of this parameter to increase the number of business
processes executed concurrently. This resource can be found at:
Resources > Activation Specifications > BPEInternalActivationSpec.

JMS connection factory BPECFC: set the connection pool size to the number of threads
in the BPEInternalActivationSpec + 10%. This resource can be found at:
Resources > JMS > Connection factories > BPECFC > Connection pool properties.
Note that this connection factory is also used when work-manager based navigation is in
use, but only for error situations or if the server is highly overloaded.

4.5.6.3 Tuning task list and process list queries


Task list and process list queries in Business Process Choreographer applications are made using
the standard query API (query() and queryAll() APIs, and related REST and Web services
interfaces), and the query table API (queryEntities() and queryRows() APIs). All task list and
process list queries result in SQL queries against the Business Process Choreographer database.
These SQL queries might need special tuning in order to provide optimal response times:

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

56

Up-to-date database statistics are key for good SQL query response times.

Databases offer tools to tune SQL queries. In most cases, additional indexes improve
query performance with potentially some impact on process navigation performance. For
DB2, the DB2 design advisor can be used to guide in choosing indexes.

4.5.6.4 Tuning Business Process Choreographer API calls


Business Process Choreographer API calls are triggered by requests external to the WPS runtime.
Examples are remote EJB requests, Web service requests, Web requests over HTTP, requests that
come through the SCA layer, or JMS requests. The connection pools associated with each of
these communication mechanisms may need to be tuned. Consider the following hints when
tuning the connection pools:

API calls for task list and process list queries may take more time to respond, depending
on the tuning of the database and the amount of data in the database.

Ensure that concurrency (parallelism) is sufficiently high to handle the load and to utilize
the CPU. However, increasing the parallelism of API call execution beyond what is
necessary can negatively influence response times. Also, increased parallelism can put
excessive load on the BPC database. When tuning the parallelism of API calls, measure
response times before and after tuning, and adjust the parallelism if necessary.

4.5.7 WESB Tuning


Following are additional configuration options that are relevant to tuning WESB. Please see
Appendix A - WESB settings for a list of the values used to obtain the WESB measurements
shown in this document.
4.5.7.1

Tune the database, if using persistent messaging

If you are using persistent messaging the configuration of your database becomes important. Use
a remote DB2 instance with a fast disk array as the DB server. You may also find benefit in
tuning the connection pooling and statement cache of the DataSource. Please see sections 4.5.10
and 4.5.11 for further information on tuning DB2, and also note the relevant References at the
end of this document.
4.5.7.2

Disable event distribution for CEI

The Event Server which manages events can be configured to distribute events and/or log them to
the event database. Some mediations only require events to be logged to a database; for these
cases, performance is improved by disabling event distribution. Since the event server may be
used by other applications it is important to check that none of them use event monitoring which
requires event distribution before disabling this.
Event distribution can be disabled from Service integration > Common Event Infrastructure >
Event service > Event services > Default Common Event Infrastructure event server-> uncheck
Enable event distribution.
4.5.7.3 Configure WSRR Cache Timeout
WebSphere Service Registry and Repository (WSRR) is used by WESB for endpoint lookup.
When accessing the WSRR (e.g. using the endpoint lookup mediation primitive), results from the
registry are cached in WESB. The lifetime of the cached entries can be configured via Service
Integration->WSRR Definitions-><your WSRR definition name>->Timeout of Cache

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

57

Validate that the timeout is sufficiently large a value, the default timeout is 300 seconds, which is
reasonable from a performance perspective. Too low a value will result in frequent lookups to the
WSRR which can be expensive (especially if retrieving a list of results), and will also include the
associated network latency if the registry is located on a remote machine.

4.5.8 Clustered Topology Tuning


One reason for deploying a clustered topology is to be able to add more resources to system
components that are bottlenecked due to increasing load. Ideally, it should be possible to scale up
a topology arbitrarily to match the required load. The WPS Network Deployment (ND)
infrastructure provides this capability. However, effective scaling still requires standard
performance monitoring and bottleneck analysis techniques to be used.
Here are some considerations, and tuning guidelines, when expanding or tuning a clustered
topology. In the discussion below, we assume additional cluster members also imply additional
server hardware.

If deploying more than one cluster member (JVM) on a single physical system, it is
important to monitor not just the resource utilization (Core, disk, network, etc) of the
system as a whole, but also the utilization by each cluster member. This allows the
detection of a system bottleneck due to a particular cluster member.

If all members of a cluster are bottlenecked, scaling can be achieved by adding one or
more members to the cluster, backed by appropriate physical hardware.

If a singleton server or cluster member is the bottleneck, there are some additional
considerations:

A messaging engine in a cluster with One of N policy (to preserve event ordering)
may become the bottleneck. Scaling options include:
o

Hosting the active cluster member on a more powerful hardware server, or


removing extraneous load from the existing server

If the Message Engine (ME) cluster is servicing multiple busses, and


messaging traffic is spread across these busses, consider employing a
separate ME cluster per bus.

If a particular bus is a bottleneck, consider whether destinations on that bus


can tolerate out of order events, in which case the cluster policy can be
changed to allow workload balancing with partitioned destinations.
Partitioning a bus also has considerations for balancing work across the ME
cluster members. For further information, please see the following:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.
ibm.websphere.pmc.nd.doc/concepts/cjt0014_.html

A database (DB) server may become the bottleneck. Approaches to consider are:
o

If the DB server is hosting multiple DBs that are active (for example, the
BPEDB and the MEDB), consider hosting each DB on a separate server.

If a single DB is driving load, consider a more powerful DB server.

Beyond the above, database partitioning and clustering capabilities can be


exploited.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

58

4.5.9 WebSphere Business Monitor Tuning


4.5.9.1

Configure Java heap sizes

The default maximum heap size in most implementations of Java is too small for many of the
servers in this configuration. The Monitor Launchpad installs Monitor and its prerequisite servers
with larger heap sizes, but you might check that these sizes are appropriate for your hardware and
workload. We use a maximum heap size of 1536M for our performance measurements.
4.5.9.2 Configure CEI
By default, when an event arrives at CEI, it is delivered to the registered consumer (in this case a
particular Monitor Model) and also into an additional, default queue. Performance is improved
by avoiding this double-store, which can be done using the WAS Admin Console by removing
the All Events event group found via:
Service Integration -> Common Event Infrastructure -> Event Service -> Event Services ->
Default Common Event Infrastructure event server -> Event Groups
Beyond its persistent delivery of events to registered consumers, CEI offers the ability to
explicitly store events in a database. This has significant performance overhead and should be
avoided if this additional functionality is not needed. The CEI Data Store is also configured in
the WAS Admin Console:
Service Integration -> Common Event Infrastructure -> Event Service -> Event Services ->
Default Common Event Infrastructure event server: deselect Enable Data Store
4.5.9.3 Configure Message Consumption Batch Size
Consuming events in large batches is much more efficient than one at a time. Up to some limit,
the larger the batch size, the higher event processing throughput will be. But there is a trade-off:
Consuming events, processing them, and persisting them to the Monitor database is done as a
transaction. So while a larger batch size yields better throughput, it will cost more if you have to
roll back. If you experience frequent rollbacks, consider reducing the batch size. This can be
done in the WAS Admin Console in Server Scope:
Applications -> Monitor Models -> <version> -> Runtime Configuration -> Tuning -> Message
Consumption Batch size: <default 100>
4.5.9.4 Enable KPI Caching
The cost of calculating aggregate KPI values increases as completed process instances
accumulate in the database. A KPI Cache is available to reduce the overhead of these
calculations, at the cost of some staleness in the results. The refresh interval is configurable via
the WAS Admin Console:
Applications -> Monitor Models -> <version> -> Runtime Configuration -> KPI -> KPI Cache
Refresh Interval
A value of zero (the default) disables the cache.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

4.5.10

59

Database: General Tuning

4.5.10.1 Provide Adequate Statistics For Optimization


Databases often have a wide variety of available choices when determining the best approach to
accessing data. Statistics, which describe the shape of the data, are used to guide the selection
of a low-cost data access strategy. Statistics are maintained on tables and indexes. Examples of
statistics include the number of rows in a table and the number of distinct values in a certain
column.
Gathering statistics can be expensive, but fortunately for many workloads a set of representative
statistics will allow good performance over a large span of time. It may be necessary to refresh
statistics periodically if the data population shifts dramatically.
4.5.10.2 Place Database Log files on a Fast Disk Subsystem
Databases are designed for high availability, transactional processing and recoverability. Since
for performance reasons changes to table data may not be written immediately to disk, these
changes are made recoverable by writing to the database log. Updates are made to database log
files when log buffer fills, at transaction commit time, and for some implementations after a
maximum interval of time. As a result, database log files may be heavily utilized. More
importantly, log writes hold commit operations pending, meaning that the application is
synchronously waiting for the write to complete. Therefore write access performance to the
database log files is critical to overall system performance. We recommend that database log
files be placed on a fast disk subsystem with write back cache.
4.5.10.3 Place Logs on Separate Device from Tablespace Containers
A basic strategy for all database storage configurations is to place the database logs on dedicated
physical disks, ideally on a dedicated disk adapter. This reduces disk access contention between
I/O to the tablespace containers and I/O to the database logs and preserves the mostly sequential
access pattern of the log stream. Such separation also improves recoverability when log archival
is employed.
4.5.10.4 Provide Sufficient Physical Memory
Accessing data in memory is of course much faster than reading it from disk. With 64-bit
hardware being readily available and memory prices continuing to fall, for many performance
critical workloads it makes sense to provision enough memory to avoid most disk reads in steady
state.
Great care should be taken to avoid virtual memory paging in the database machine. The
database manages its memory with the assumption that it is never paged, and does not cooperate
well should the operating system decide to swap some of its pages to disk.
4.5.10.5 Avoid Double Buffering
Since the database attempts to keep frequently accessed data in memory, in most cases there is no
benefit to using file system caching. In fact, performance typically improves by using direct I/O,
when files read by the database bypass the file system cache and only one copy of the data is held
in memory. This allows more memory to be given to the database and avoids overheads in the
file system as it manages its cache.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

60

A further advantage can be gained on some operating systems such as AIX by using concurrent
I/O. This bypasses per-file locking, shifting responsibility for concurrency control to the database
and in some cases allowing more useful work to be offered to the adapter or the device.
An important exception to this guideline occurs for large objects (LOB, BLOB, CLOB, etc.)
which are not buffered by the database itself. In this case it can be advantageous to arrange for
file system caching, preferably only for files which back large objects.
4.5.10.6 Refine Table Indexes as Required
WebSphere BPM products typically provide a reasonable set of indexes for the database tables
they use. In general, creating indexes involves a tradeoff between the cost of queries and the cost
of statements which insert, update, or delete data. For query intensive workloads, it makes sense
to provide a rich variety of indexes as required to allow rapid access to data. For update intensive
workloads, it is often helpful to minimize the number of indexes defined, as each row
modification may require changes to multiple indexes. Note that indexes are kept current even
when they are infrequently used.
Index design therefore involves compromises. The default set of indexes may not be optimal for
the database traffic generated by a BPM product in a specific situation. If database CPU or disk
utilization is high or there are concerns with database response time, it may be helpful to consider
changes to indexes.
As described below, DB2 and Oracle databases provide assistance in this area by analyzing
indexes in the context of a given workload. Recommendations are given to add, modify, or
remove indexes. One caveat is that if the workload does not capture all relevant database activity
then a necessary index might appear unused, leading to a recommendation that it be dropped. If
the index is not present, future database activity could suffer as a result.

4.5.11

Database: DB2 Specific Tuning

Providing a comprehensive DB2 tuning guide is beyond the scope of this report. However, there
are a few general rules of thumb that can assist in improving the performance of DB2
environments. In the sections below, we discuss these rules, and provide pointers to more
detailed information. The complete set of current DB2 manuals (including database tuning
guidelines) can be found by using the DB2 Information Center:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp.
Another excellent reference is Best practices for DB2 for Linux, UNIX, and Windows which is
available here:
http://www.ibm.com/developerworks/data/bestpractices/.
4.5.11.1 Update Database Statistics
DB2 provides an Automatic Table Maintenance feature, which runs the RUNSTATS command in
the background as required to ensure that the correct statistics are collected and maintained. This
is controlled by the database configuration parameter auto_runstats, and is enabled by default for
databases created by DB2 V9.1 and beyond. See also the Configure Automatic Maintenance...
wizard at the database level in the DB2 Control Center.
One approach to manually updating statistics on all tables in the database is use the REORGCHK
command. Dynamic SQL, such as that produced by JDBC, will immediately take the new

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

61

statistics into account. Static SQL, like that in stored procedures, must be explicitly rebound in
the context of the new statistics. Here is an example which performs these steps to gather basic
statistics on database DBNAME:
db2 connect to DBNAME
db2 reorgchk update statistics on table all
db2 connect reset
db2rbind DBNAME all
The REORGCHK and rebind (db2rbnd) should be executed when the system is relatively idle so
that a stable sample may be acquired and to avoid possible deadlocks in the catalog tables.
It is generally better to gather additional statistics, so also consider the following command for
every table requiring attention:
runstats on table <schema>.<table> with distribution and detailed indexes
4.5.11.2 Set Buffer Pool Sizes Correctly
A buffer pool is an area of memory into which database pages are read, modified, and held during
processing. Buffer pools improve database performance. If a needed page of data is already in
the buffer pool, that page is accessed faster than if the page had to be read directly from disk. As
a result, the size of the DB2 buffer pools is critical to performance.
The amount of memory used by a buffer pool depends upon two factors: the size of buffer pool
pages and the number of pages allocated. Buffer pool page size is fixed at creation time and may
be set to 4, 8, 16 or 32 KB. The most commonly used buffer pool is IBMDEFAULTBP which
has a 4 KB page size.
Note that all buffer pools reside in database global memory, allocated on the database machine.
The buffer pools must coexist with other data structures and applications, all without exhausting
available memory. In general, having larger buffer pools will improve performance up to a point
by reducing I/O activity. Beyond that point, allocating additional memory no longer improves
performance.
DB2 V9.1 and beyond provide self tuning memory management, which includes managing buffer
pool sizes. This is controlled globally by the self_tuning_mem database level parameter, which is
ON by default. Individual buffer pools can be enabled for self tuning using SIZE AUTOMATIC
at CREATE or ALTER time.
To choose appropriate buffer pool size settings manually, monitor database container I/O activity,
by using system tools or by using DB2 buffer pool snapshots. Be careful to avoid configuring
large buffer pool size settings which lead to paging activity on the system.
4.5.11.3 Maintain Proper Table Indexing
The DB2 Design Advisor, available from the Control Center, provides recommendations for
schema changes, including changes to indexes. It can be launched from the menu presented when
right-clicking on a database in the left column.
4.5.11.4 Size Log Files Appropriately
When using circular logging, it is important that the available log space permits dirty pages in the
bufferpool to be cleaned at a reasonably low rate. Changes to the database are immediately
written to the log, but a well tuned database will coalesce multiple changes to a page before

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

62

eventually writing that modified page back to disk. Naturally, changes recorded only in the log
cannot be overwritten by circular logging. DB2 detects this condition and forces the immediate
cleaning of dirty pages required to allow switching to a new log file. While this mechanism
protects the changes recorded in the log, all application logging must be suspended until the
necessary pages are cleaned.
DB2 works to avoid pauses when switching log files by proactively triggering page cleaning
under control of the database level softmax parameter. The default value of 100 for softmax
begins background cleaning activities when the gap between the current head of the log and the
oldest log entry recording a change to a dirty page exceeds 100% of one log file in size. In
extreme cases this asynchronous page cleaning cannot keep up with log activity, leading to log
switch pauses which degrade performance.
Increasing the available log space gives asynchronous page cleaning more time to write dirty
bufferpool pages and avoid log switch pauses. A longer interval between cleanings allows
multiple changes to be coalesced on a page before it is written, which reduces the required write
throughput by making page cleaning more efficient.
Available logspace is governed by the product of log file size and the number primary log files,
which are configured at the database level. logfilsiz is the number of 4K pages in each log file.
logprimary controls the number of primary log files. The Control Center also provides a
Configure Database Logging... wizard.
As a starting point, try using 10 primary log files which are large enough that they do not wrap
for at least a minute in normal operation.
Increasing the primary log file size does have implications for database recovery. Assuming a
constant value for softmax, larger log files mean that recovery may take more time. The softmax
parameter can be lowered to counter this, but keep in mind that more aggressive page cleaning
may also be less efficient. Increasing softmax gives additional opportunities for write coalescing
at the cost of longer recovery time.
The default value softmax is 100, meaning that the database manager will attempt to clean pages
such that a single log file needs to be processed during recovery. For best performance, we
recommend increasing this to 300, meaning that 3 log files may need processing during recovery:
db2 update db config for yourDatabaseName using softmax 300
4.5.11.5 Use SMS for Tablespaces Containing Large Objects
When creating REGULAR or LARGE tablespaces in DB2 V9.5 (and above) which contain
performance critical LOB data, we recommend specifying MANAGED BY SYSTEM to gain the
advantages of cached LOB handling in SMS.
Among WebSphere BPM products, this consideration applies to:
-- WPS: the Process Choreagrapher database, sometimes called BPEDB.
-- WPS and WESB: databases backing service integration bus message engine data stores.
For background, see the section Avoid Double Buffering <KGK -- need a link here/> above. A
detailed explanation follows.
DB2 tablespaces can be configured with NO FILE SYSTEM CACHING, which in many cases
improves system performance.

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

63

If a tablespaces is MANAGED BY SYSTEM, then it uses System Managed Storage (SMS)


which provides desirable special case handling for LOB data with regard to caching. Even if NO
FILE SYSTEM CACHING is in effect (by default or as specified) access to LOB data still uses
the file system cache.
If a tablespace is MANAGED BY DATABASE, then it uses Database Managed Storage (DMS)
which does not differentiate between LOB and non-LOB data with regard to caching. In
particular, NO FILE SYSTEM CACHING means that LOB access will be directly to disk for
both reads and writes. Unconditionally reading LOBs from disk can cause high disk utilization
and poor database performance.
Since V9.1, DB2 has by default created databases which use automatic storage (AUTOMATIC
STORAGE YES), meaning that the database manages disk space allocations itself from one or
more pools of available file system space called storage paths. If automatic storage is enabled,
CREATE TABLESPACE will use it by default (MANAGED BY AUTOMATIC STORAGE).
For non-temporary tablespaces, REGULAR and LARGE, automatic storage is implemented using
DMS on files.
Before DB2 V9.5 the default caching strategy for tablespaces was FILE SYSTEM CACHING. In
V9.5, this was changed to NO FILE SYSTEM CACHING for platforms where direct I/O or
concurrent I/O is available. Taking defaults on V9.5 we now have a database with
AUTOMATIC STORAGE YES, and a tablespace which is MANAGED BY AUTOMATIC
STORAGE and in many cases NO FILE SYSTEM CACHING. Such a tablespace, which is
implemented using DMS, will not cache LOBs in the buffer pool or the file system.
4.5.11.6 Ensure that sufficient locking resources are available
Locks are allocated from a common pool controlled by the database level parameter locklist,
which is the number of 4K pages set aside for this use. A second database level parameter,
maxlocks, bounds the percentage of the lock pool held by a single application. When an
application attempts to allocate a lock which exceeds the fraction allowed by maxlocks, or when
the free lock pool is exhausted, DB2 performs lock escalation to replenish the supply of available
locks. Lock escalation involves replacing many row locks with a single table-level lock.
While lock escalation addresses the immediate problem of lock pool overuse or starvation, it can
lead to database deadlocks, and so should not occur frequently during normal operation. In some
cases, application behavior can be altered to reduce pressure on the lock pool by breaking up
large transactions which lock many rows into smaller transactions. It is usually simpler to try
tuning the database first.
Beginning with Version 9, DB2 adjusts the locklist and maxlocks parameters automatically by
default. To manually tune these, first observe whether lock escalations are occurring either by
examining db2diag.log or by using the system monitor to gather snapshots at the database level.
If the initial symptom is database deadlocks, also consider whether these are initiated by lock
escalations.
Check the Lock escalations count in the output from:
db2 get snapshot for database yourDatabaseName
Current values for locklist and maxlocks can be obtained by examining the output from:
db2 get db config for yourDatabaseName
These values can be altered, for example to 100 and 20, like this:

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

64

db2 update db config for yourDatabaseName using locklist 100 maxlocks 20


When increasing the locklist size, consider the impacts of the additional memory allocation
required. Often the locklist is relatively small compared with memory dedicated to buffer pools,
but the total memory required must not lead to virtual memory paging.
When increasing the maxlocks fraction, consider whether a larger value will allow a few
applications to drain the free lock pool, leading to a new cause of escalations as other applications
needing relatively few locks encounter a depleted free lock pool. Often it is better to start by
increasing locklist size alone.
4.5.11.7 Bound the size of the Catalog Cache for Clustered Applications
The Catalog Cache is used to avoid repeating expensive activities, notably preparing execution
plans for dynamic SQL. Thus it is important that the cache be sized appropriately.
By default, several 4 KB pages of memory are allocated for each possible application as defined
by the MAXAPPLS database parameter. The multiplier is 4 for DB2 9, and 5 for DB2 9.5 and
beyond. MAXAPPLS is AUTOMATIC by default, and its value is adjusted to roughly match the
peak number of applications connected at runtime.
When running clustered applications, such as those deployed in the Process Choreographer in
WPS, we have observed a value of more than 1000 for MAXAPPLS, meaning that at least 4000
pages would be allocated for the catalog cache given default tuning. For the same workload, 500
pages were sufficient:
db2 update db config for yourDatabaseName using catalogcache_sz 500
The default behavior assumes heterogeneous use of database connections. A clustered
application will typically have more homogeneous use across connections, allowing a smaller
package cache to be effective. Bounding the package cache size frees up memory for other more
valuable uses.
To manually tune the CATALOGCACHE_SZ database parameter, see the recommendations
documented here:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.udb.admin.doc/doc/r0000
338.htm.
4.5.11.8 (Before DB2 V9.5) Size the Database Heap Appropriately
DB2 Version 9.5 and beyond provide AUTOMATIC tuning of the database heap by default. We
recommend using this when available.
To manually tune the DBHEAP database parameter, see the recommendations documented here:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.udb.admin.doc/doc/r0000
276.htm.
4.5.11.9 (Before DB2 V9.7) Size the Log Buffer Appropriately
Before DB2 Version 9.7 the default LOGBUFSZ is only 8 pages. We recommend setting this to
256, which is the default in Version 9.7:
db2 update db config for yourDatabaseName using logbufsz 256
4.5.11.10

(DB2 V9.7 and beyond) Consider disabling Current Commit

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

65

DB2 Version 9.7 supports new query semantics which always return the committed value of the
data at the time the query is submitted. This support is ON by default for newly created
databases. We found that performance improved in some cases when we disabled the new
behavior, reverting to the original DB2 query semantics:
db2 update db config for yourDatabaseName using cur_commit disabled
4.5.11.11

Recommendations for WPS

The following link discusses "Specifying initial DB2 database settings" with examples of creating
SMS tablespaces for the BPEDB. It also contains useful links for "Planning the BPEDB
database" and "Fine-tuning the Business Process Choreographer database"
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.web
sphere.bpc.doc/doc/bpc/t5tuneint_spec_init_db_settings.html
This link discusses "Creating a DB2 for Linux, UNIX, and Windows database for Business
Process Choreographer" and gives details on BPEDB database creation, including pointers to
useful creation scripts for a production environment.
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.bpc.doc/doc/bpc/t2codbdb.html
For our SOABench2008 OutSourced Mode workload, we achieved better throughput by dropping
several indexes from the ACTIVITY_INSTANCE_B_T table, as recommended by the Design
Advisor. This is a concrete example of how proper indexing is workload dependant. These same
indexes may be important for many other Process Choreographer workloads.

4.5.12

Database: Oracle Specific Tuning

As with DB2, providing a comprehensive Oracle database tuning guide is beyond the scope of
this report. However, there are a few general rules of thumb that can assist in improving the
performance of Oracle environments. In the sections below, we discuss these rules, and provide
pointers to more detailed information. In addition, the following references are useful:
Oracle Database 11g Release 1 documentation (includes a Performance Tuning Guide):
http://www.oracle.com/pls/db111/homepage
A white paper discussing Oracle on AIX:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100883
4.5.12.1 Update Database Statistics
Oracle provides an automatic statistics gathering facility, which is enabled by default.
One approach to manually updating statistics on all tables in a schema is to use the dbms_stats
utility:
execute dbms_stats.gather_schema_stats( ownname

=> your_schema_name', -

options

=> 'GATHER AUTO', -

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration


estimate_percent

=> DBMS_STATS.AUTO_SAMPLE_SIZE, -

cascade

=> TRUE, -

method_opt

=> 'FOR ALL COLUMNS SIZE AUTO', -

degree

=> 15);

66

4.5.12.2 Set Buffer Cache Sizes Correctly


Oracle provides automatic memory management for buffer caches. For additional discussion on
configuring automatic memory management and for guidance on manually setting buffer cache
sizes, please see the following references.
For Oracle 10g R2:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/memory.htm#i29118
For Oracle 11g R1:
http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/memory.htm#i29118
4.5.12.3 Maintain Proper Table Indexing
The SQL Access Advisor, available from the Enterprise Manager, provides recommendations for
schema changes, including changes to indexes. It can be found starting at the database home
page, then following the Advisor Central link in the Related Links section at the bottom of the
page.
4.5.12.4 Size Log Files Appropriately
Unlike DB2, Oracle performs an expensive checkpoint operation when switching logs. The
checkpoint involves writing all dirty pages in the buffer cache to disk. Therefore, it is important
to make the log files large enough that switching occurs infrequently. Applications which
generate a high volume of log traffic need larger log files to achieve this goal.
4.5.12.5 Recommendations for WPS
The following link discusses "Specifying initial Oracle database settings:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.web
sphere.bpc.doc/doc/bpc/t5tuneint_spec_init_db_oracle.html
This link discusses "Creating an Oracle database for Business Process Choreographer" and gives
details on BPEDB database creation, including pointers to useful creation scripts for a production
environment.
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.web
sphere.bpc.doc/doc/bpc/t2codbdb.html
The default Oracle policy for large objects (LOB) is to store the data within the row, when the
size of the object does not exceed a threshold. In some cases, workloads have LOBs which
regularly exceed this threshold. By default, such LOB accesses bypass the buffer cache, meaning
that LOB reads are exposed to disk I/O latencies when using the preferred direct or concurrent
path to storage. We achieved better performance for the SOABench2008 OutSourced Mode

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

67

workload after adjusting our schema such that the SERVICE_CONTEXT column of the
PROCESS_CONTEXT_T table was CACHED, e.g.:
alter table process_context_t modify service context cache

4.5.13

Advanced Java Heap Tuning

Because the WebSphere BPM product set is written in Java, the performance of the Java Virtual
Machine (JVM) has a significant impact on the performance delivered by these products. JVMs
externalize multiple tuning parameters that may be used to improve both authoring and runtime
performance. The most important of these are related to garbage collection and setting the Java
heap size. This section will deal with these topics in detail.
Note that the products covered in this report utilize IBM JVMs on most platforms (AIX, Linux,
Windows, etc.), and the HotSpot JVMs on selected other systems, such as Solaris and HP/UX.
Vendor specific JVM implementation details and settings will be discussed as appropriate. Also
note that all BPM v7 products in this document use Java 6. It has characteristics similar to Java 5
used in the BPM v6.1 and v6.2.0 products, but much different from Java 1.4.2 used by V6.0.2.x
and earlier releases. For brevity, only Java 6 tuning is discussed here.
Following is a link to the IBM Java 6 Diagnostics Guide:
http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp

The guide referenced above discusses many more tuning parameters than those discussed in this
report, but most are for specific situations and are not of general use. For a more detailed
description of IBM Java 6 garbage collection algorithms, please see Section Memory
Management in the chapter titled Understanding the IBM SDK for Java.
Sun HotSpot JVM references follow:
The following URL provides a useful summary of HotSpot JVM options for Solaris:
http://java.sun.com/docs/hotspot/VMOptions.html
The following URL provides a useful FAQ about the Solaris HotSpot JVM:
http://java.sun.com/docs/hotspot/PerformanceFAQ.html#20
For more performance tuning information of Suns HotSpot JVM, follow the URL below.
http://java.sun.com/docs/performance/

4.5.13.1 Monitoring Garbage Collection


In order to set the heap correctly, you must first determine how the heap is being used. This is
done by collecting a verbosegc trace. A verbosegc trace prints garbage collection actions and
statistics to stderr in IBM JVMs and stdout in Sun HotSpot JVMs. The verbosegc trace is
activated by using the Java run-time option -verbose:gc. Output from verbosegc is different for
the HotSpot and IBM JVMs, as shown by the following examples:
Example IBM JVM verbosegc trace output
<af type="tenured" id="12" timestamp="Fri Jan 18 15:46:15 2008" intervalms="86.539">
<minimum requested_bytes="3498704" />

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

68

<time exclusiveaccessms="0.103" />


<tenured freebytes="80200400" totalbytes="268435456" percent="29" >
<soa freebytes="76787560" totalbytes="255013888" percent="30" />
<loa freebytes="3412840" totalbytes="13421568" percent="25" />
</tenured>
<gc type="global" id="12" totalid="12" intervalms="87.124">
<refs_cleared soft="2" threshold="32" weak="0" phantom="0" />
<finalization objectsqueued="0" />
<timesms mark="242.029" sweep="14.348" compact="0.000" total="256.598" />
<tenured freebytes="95436688" totalbytes="268435456" percent="35" >
<soa freebytes="87135192" totalbytes="252329472" percent="34" />
<loa freebytes="8301496" totalbytes="16105984" percent="51" />
</tenured>
</gc>
<tenured freebytes="91937984" totalbytes="268435456" percent="34" >
<soa freebytes="87135192" totalbytes="252329472" percent="34" />
<loa freebytes="4802792" totalbytes="16105984" percent="29" />
</tenured>
<time totalms="263.195" />
</af>
Example Solaris HotSpot JVM verbosgc trace output (young and old)
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K <- live data (776768K), 1.8479984 secs]
Sun HotSpot JVM verbosegc output can be more detailed by setting additional options: XX:+PrintGCDetails -XX:+PrintGCTimeStamps.
It is tedious to parse the verbosegc output using a text editor. There are very good visualization
tools on the Web that can be used for more effective Java heap analysis. The IBM Pattern
Modeling and Analysis Tool (PMAT) for Java Garbage Collector is one such tool. It is available
for free download at IBM alphaWorks through this URL:
http://www.alphaworks.ibm.com/tech/pmat.
PMAT supports the verbosegc output formats of JVMs offered by major JVM vendors such as
IBM, Sun and HP.

4.5.13.2 Setting the Heap Size for most configurations


This section contains guidelines for determining the appropriate Java heap size for most
configurations. If your configuration requires that more than one JVM runs concurrently on the

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

69

same system (for example, if you run both WPS and WID on the same system), then you should
also read the next section, 4.5.13.3. If your objective is to support large Business Objects, read
Section 4.5.2.
For most production applications, the IBM JVM Java heap size defaults are too small and should
be increased. In general the HotSpot JVM default heap and nursery size are also too small and
should be increased (we will show how to set these parameters later).
There are several approaches to setting optimal heap sizes. We describe here the approach that
most applications should use when running the IBM JVM on AIX. The essentials can be applied
to other systems. Set the initial heap size (-Xms option) to something reasonable (for example,
256 MB), and the maximum heap size (-Xmx) option to something reasonable, but large (for
example, 1024 MB). Of course, the maximum heap size should never force the heap to page. It
is imperative that the heap always stays in physical memory. The JVM will then try to keep the
GC time within reasonable limits by growing and shrinking the heap. The output from verbosegc
should then be used to monitor GC activity.
If Generational Concurrent GC is used (-Xgcpolicy:gencon), the new area size can also be set to
specific values. By default, the new size is a quarter of the total heap size or 64 MB, whichever is
smaller. For better performance, the nursery size should be - 1/2 of the heap size or larger, and
it should not be capped at 64MB. New area sizes are set by JVM options: -Xmn<size>, Xmns<initialSize>, and -Xmnx<maxSize>.
A similar process can be used to set the size of HotSpot heaps. In addition to setting the minimum
and maximum heap size, you should also increase the nursery size to approximately - 1/2 of the
heap size. Note that you should never increase the nursery to more than 1/2 the full heap. The
nursery size is set using the MaxNewSize and NewSize parameters (that is,
-XX:MaxNewSize=128m, -XX:NewSize=128m).
After the heap sizes are set, verbosegc traces should then be used to monitor GC activity. After
analyzing the output, modify the heap settings accordingly. For example, if the percentage of time
in GC is high and the heap has grown to its maximum size, throughput may be improved by
increasing the maximum heap size. As a rule of thumb, greater than 10% of the total time spent in
GC is generally considered high. Note that increasing the maximum size of the Java heap may
not always solve this type of problem as it is could be a memory over-usage problem.
Conversely, if response times are too long due to GC pause times, decrease the heap size. If both
problems are observed, an analysis of the application heap usage is required.
4.5.13.3 Setting the Heap Size when running multiple JVMs on one system
Each running Java program has a heap associated with it. Therefore, if you have a configuration
where more than one Java program is running on a single physical system, setting the heap sizes
appropriately is of particular importance. An example of one such configuration is when the
WID is on the same physical system as WPS. Each of these is a separate Java program that has
its own Java heap. If the sum of all of the virtual memory usage (including both Java Heaps as
well as all other virtual memory allocations) exceeds the size of physical memory, the Java heaps
will be subject to paging. As previously noted, this causes total system performance to degrade
significantly. To minimize the possibility of this occurring, use the following guidelines:

Collect a verbosegc trace for each running JVM.

Based on the verbosegc trace output, set the initial heap size to a relatively low value.
For example, assume that the verbosegc trace output shows that the heap size grows

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

70

quickly to 256 MB, and then grows more slowly to 400 MB and stabilizes at that
point. Based on this, set the initial heap size to 256 MB (-Xms256m).

Based on the verbosegc trace output, set the maximum heap size appropriately. Care
must be taken to not set this value too low, or Out Of Memory errors will occur; the
maximum heap size must be large enough to allow for peak throughput. Using the
above example, a maximum heap size of 768 MB might be appropriate (-Xmx768m).
This is to give the Java heap head room to expand beyond its current size of 400
MB if required. Note that the Java heap will only grow if required (e.g. if a period of
peak activity drives a higher throughput rate), so setting the maximum heap size
somewhat higher than current requirements is generally a good policy.

Be careful to not set the heap sizes too low, or garbage collections will occur
frequently, which might reduce throughput. Again, a verbosegc trace will assist in
determining this. A balance must be struck so that the heap sizes are large enough
that garbage collections do not occur too often, while still ensuring that the heap sizes
are not cumulatively so large as to cause the heap to page. This balancing act will, of
course, be configuration dependent.

4.5.13.4 Reduce or Increase Heap Size if Out Of Memory Errors Occur


The java.lang.OutOfMemory exception is used by the JVM in a variety of circumstances, making
it sometimes difficult to track down the source of the exception. There is no conclusive
mechanism for telling the difference between these potential error sources, but a good start is to
collect a trace using verbosegc. If the problem is a lack of memory in the heap, then this is easily
seen in this output. Please see section 4.5.13.1 for further information about verbosegc output.
Many garbage collections that produce very little free heap space will generally occur preceding
this exception. If this is the problem then one should increase the size of the heap.
If, however, there is enough free memory when the java.lang.OutofMemory exception is thrown,
the next item to check is the finalizer count from the verbosegc (only the IBM JVM will give this
information). If these appear high then a subtle effect may be occurring whereby resources
outside the heap are held by objects within the heap and being cleaned by finalizers. Reducing
the size of the heap can alleviate this situation, by increasing the frequency with which finalizers
are run. In addition, examine your application, to determine if the finalizers can be avoided, or
minimized.
Note that Out Of Memory errors can also occur for issues unrelated to JVM heap usage, such as
running out of certain system resources. Examples of this include insufficient file handles or
thread stack sizes that are too small.
In some cases, you can tune the configuration to avoid running out of native heap: try reducing
the stack size for threads (the -Xss parameter). However, deeply nested methods may force a
thread stack overflow if there is insufficient stack size.
For middleware products, if you are using an in-process version of the JDBC driver, it is usually
possible to find an out-of-process driver that can have a significant effect on the native memory
requirements. For example, you can use Type 4 JDBC drivers (DB2's "Net" drivers, Oracle's
"Thin" drivers), MQSeries can be switched from Bindings mode to Client mode, and so on.
Refer to the documentation for the products in question for more details.
4.5.13.5 Set AIX Threading Parameters

Copyright IBM Corporation 2005, 2010. All right reserved.

Performance Tuning and Configuration

71

The IBM JVM threading and synchronization components are based upon the AIX POSIX
compliant Pthreads implementation. The following environments variables have been found to
improve Java performance in many situations and have been used for the workloads in this
document. The variables control the mapping of Java threads to AIX Native threads, turn off
mapping information, and allow for spinning on mutex (mutually exclusive) locks.
export AIXTHREAD_COND_DEBUG=OFF
export AIXTHREAD_MUTEX_DEBUG=OFF
export AIXTHREAD_RWLOCK_DEBUG=OFF
export AIXTHREAD_SCOPE=S
export SPINLOOPTIME=2000

4.5.14

Power Management Tuning

Power management is becoming common in processor technology; both Intel and Power core
processors now have this capability. This capability delivers obvious benefits, but it can also
decrease system performance whan a system is under high load, so consider whether or not to
enable power management. Using POWER6 hardware as an example, ensure that Power Saver
Mode is not enabled, unless desired. One way to modify or check this setting on AIX is through
the Power Management window on the HMC.

4.5.15

WPS Tuning for WICS migrated workloads

Note that the tuning below is unique to workloads migrated using the WICS migration wizard in
the WID. In addition to the tuning specified below, please follow the other WPS tuning
recommendations detailed in this document.

For JMS based messaging used to communicate with legacy WBIA adapters or custom
adapters , make use of non-persistent queues when possible.

For JMS based messaging used to communicate with legacy WBIA adapters or custom
adapters, make use of Websphere MQ based queues if available. By default, the adapters
use the MQ APIs to connect to the SIB based destinations via MQ Link. MQ Link is a
protocol translation layer which converts messages to and from MQ based clients. By
switching to Websphere MQ based queues, MQLink translation costs will be eliminated
and therefore performance will be improved.

Turn off server logs for verbose workloads. Some workloads emit log entries for every
transaction thus causing constant disk writes reducing overall throughput. Explore the
possibility of turning off server logs to reduce the throughput degradation for such
workloads.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server 7.0.0.1 Performance Results

5 WebSphere Process Server 7.0.0.1 Performance


Results
Each of the measurements presented in the following sections utilizes the SOABench 2008
Choreography Facet workload, described in Section 10.4. Note that SOABench 2008 is a
comprehensive workload that models the business processes of an automobile insurance
company. SOABench 2008 is intended to evaluate the performance of a distributed application
implemented using a Service Oriented Architecture (SOA). Elsewhere in the report, WESB
measurements are shown for another facet of SOABench 2008, the Mediation facet. See Chapter
6 for these results.
A common theme in all SOABench 2008 Choreography Facet results is the use of a metric named
CCPS, or Claims Completed Per Second. Claims in this context is an automobile insurance
claim. We define CCPS as the number of automobile insurance claims that are completed per
second. Note that this is separate and distinct from the number of business processes completed
per second, or the number of transactions (Commits) completed per second.
Following is a summary of the measurements included in this chapter. Sections 5.1.1 and 5.1.2
present data for the Windows 2008 and Red Hat Enterprise Linux (RHEL) platforms using a pure
microflow (Automated Approval Mode), and a microflow/macroflow pattern (OutSourced
Mode), respectively. Section 5.1.3 demonstrates the performance of WPS 7.0.0.1 for these 2
modes on an AIX system using POWER6 hardware. Section 5.1.4 demonstrates clustering
performance for systems using AIX on POWER6 hardware, for both Automated Approval and
OutSourced Mode. Finally, section 5.1.5 demonstrates the performance of an AIX POWER7
system in Automated Approval Mode. All AIX measurements are obtained with simultaneous
multi-threading (SMT) enabled.
Each data chart presented is followed by a table that identifies the measurement cell for the
particular workload. Note that the primary software and hardware systems are identified.
Hardware names are cross-referenced to the individual measurement systems descriptions in
Appendix A of this document, which document detailed configuration information for each
measurement platform.

Copyright IBM Corporation 2005, 2010. All right reserved.

72

WebSphere Process Server 7.0.0.1 Performance Results

73

5.1 SOABench 2008 Choreography Facet


5.1.1 Automated Approval on Windows 2008 and RHE Linux 5.2
The automated approval workload of the Choreography facet, described in section 10.4.2, is
evaluated on an IBM xSeries 3950, 2.9GHz Xeon (4 quad-cores) running Windows Server 2008
or RedHat Enterprise Linux 5.2 on the same physical machine, to demonstrate the throughput
characteristics of WPS in this configuration. 3 KB requests and 3 KB responses are used. The
workload is run in infrastructure mode, making the processing done during service call
invocations trivial.
The chart below shows that WPS performs very similarly on Windows and Linux in this
configuration. Throughput, in Claims Completed per Second (CCPS), and scaling data follows:

1 core Win2008: 109 CCPS, about the same as Linux (108 CCPS).

4 cores Win2008: 390 CCPS, 3% faster than Linux (379 CCPS), indicating an SMP
scalability factor for Win2008 of 3.6x and for Linux a scalability factor of 3.5x.

8 cores Win2008: 694 CCPS, 4% faster than Linux (665 CCPS), indicating an SMP
scalability factor for Win2008 of 6.4x and for Linux a scalability factor of 6.2x.

SOABench 2008 Automated Mode - WPS 7.0.0.1


Win2008 vs. Linux
Claims Completed per second

800
6.4x

700

6.2x

600

Win 2008

500

RHE Linux 5.2


3.6x

400

3.5x

300
200
100
0
1 core

4 cores

8 cores
CPU Utilization 96% - 100% across all Bars
Scaling Factor on each Multi-Processor Bar
Hyperthreading not supported

Copyright IBM Corporation 2005, 2010. All rights reserved

Measurement Configuration
WPS

Driver,SOABench
Services1

SOABenchServices2

Intel 2.93 GHz B

Intel 3.5 GHz C

Intel 2.93 GHz D

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server 7.0.0.1 Performance Results

74

5.1.2 OutSourced on Windows 2008 and RHE Linux 5.2


The OutSourced Mode of the Choreography facet, described in section 10.4.3, is executed on an
IBM xSeries 3950, 2.9GHz Xeon (4 quad-core processors) running Windows Server 2008 or
RedHat Enterprise Linux 5.2, on the same physical machine, to demonstrate the throughput
characteristics of WebSphere Process Server in this configuration. The client driver issued 3 KB
requests and the server returned 3 KB responses. The workload is run in infrastructure mode,
making the processing behind service call invocations trivial.
The throughput, measured in Claims Completed per Second (CCPS), and SMP scaling data
follows:

1 core Win2008: 9.1 CCPS, 4% slower than Linux (9.5 CCPS)

4 cores Win2008: 26.6 CCPS, 17% slower than Linux (32.0 CCPS), indicating a SMP
scalability factor for Win2008 of 2.9x and for Linux a scalability factor of 3.4x.

To achieve optimal throughput, changes were made to the indexes of the BPE DB by following
the recommendations of the DB2 Design advisor.

SOABench 2008 OutSourced Mode - WPS 7.0.0.1


Win2008 vs. Linux
35
Claims Completed per second

3.4x

30
2.9x

25

Win2008

20

RHE Linux 5.2

15
10
5
0
1 core

4 cores
CPU Utilization 97% -100% across all Bars
Scaling Factor on each Multi-Processor Bar
Hyperthreading not supported

Copyright IBM Corporation 2005, 2010. All rights reserved

Measurement Configuration
WPS

Driver,SOABench
Services1

SOABenchServices2

DB2

Intel 2.93 GHz B

Intel 3.5 GHz C

Intel 2.93 GHz D

PPC 2.2 GHz B

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server 7.0.0.1 Performance Results

75

5.1.3 Vertical (SMP) scaling on AIX POWER6


5.1.3.1 Overview
This section shows SOABench 2008 vertical (SMP) scaling performance when the application
cluster is measured with a single cluster member using a varying number of POWER6 cores on
an AIX system. Horizontal (clustered) scaling performance is shown in the section directly
below. Further, a direct comparison between vertical and horizontal scaling performance is
shown in section 9.10.
5.1.3.2 Automated Approval Mode
The results below are obtained using SOABench 2008 Automated Approval Mode, described in
section 10.4.2. The topology used for these measurements is shown below the data chart.
As shown below, 4 core and 8 core SMP scaling is excellent 4x and 7.3x, respectively. At 16
cores, an impressive throughput above 2000 Claims Completed per Second is achieved.
However, the scaling limitations of the single server JVM start to hinder SMP scaling. As
mentioned above, section 9.10 examines this further.

Claims Completed per Second

SOABench 2008 Automated Mode - AIX

2500

90%
11.9x

2000
95%
7.3x

1500
1000

98%
4.0x

500
0
4 cores

8 cores

Copyright IBM Corporation 2005, 2010. All rights reserved

16 cores
CPU Utilization and Scaling Shown Above Each Bar
Simultaneous Multithreading (SMT) enabled

Measurement Configuration
HTTP Server,
SOABench Driver

SOABench Services,
Active MEs, ME DB

PPC 1.9 GHz - A

POWER6 4.7 GHz - E

BPE, WPS DBs

POWER6 4.7 GHzG

Copyright IBM Corporation 2005, 2010. All right reserved.

WPS Applications
Cluster Members
POWER6 4.7
GHz - D

WebSphere Process Server 7.0.0.1 Performance Results

76

Topology: Vertical SOABench 2008 Automated Mode AIX


WebSphere Network Deployment Cell

1x
8 core
Power5

8 core
Power5

8 core

16 core

16 core

AppCluster
SOABench
Services

SOABench
BPEL
App
Micro
Flow

MECluster

IBM HTTP
Server with
WebSphere
Plugin

SOABench
Automated
Driver

Active
MEs
DB2
(BPE)

DB2
(MEs)

DB2
(WPS)

2010 IBM Corporation

5.1.3.3 OutSourced Mode


The results below are obtained using SOABench 2008 OutSourced Mode, described in section
10.4.3. The topology used for these measurements is shown below the data chart.
Note that SMP scaling is excellent, and when using 8 cores a throughput above 100 Claims
Completed per Second is achieved.
To achieve optimal throughput, the compensation service recovery log was located on a file
system on a RAID array behind a caching RAID adapter.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server 7.0.0.1 Performance Results

77

Claims Completed per Second

SOABench 2008 OutSourced Mode - AIX

120

93%
6.8x

100
80

98%
4.0x

60
40
20
0
4 cores

Copyright IBM Corporation 2005, 2010. All rights reserved

8 cores

CPU Utilization and Scaling Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

Measurement Configuration
SOABench Driver,
HTTP Server,
SOABench Agent/
Outsourced Services

SOABench Services,
Active MEs, ME DB

PPC 1.9 GHz - A

POWER6 4.7 GHz - E

BPE, WPS DBs


WPS Applications
Cluster Members

POWER6 4.7 GHzG

Copyright IBM Corporation 2005, 2010. All right reserved.

POWER6 4.7
GHz - D

WebSphere Process Server 7.0.0.1 Performance Results

78

Topology: Vertical SOABench 2008 OutSourced Mode - AIX


WebSphere Network Deployment Cell

1x
8 core
Power5

8 core
Power5

SOABench
Agent and
Outsourced
Services
SOABench
Outsourced
Controller
(Driver)

8 core

16 core

16 core

AppCluster
SOABench
Services

SOABench
BPEL
App
Micro
Flow

MECluster

Async

Active
MEs
IBM HTTP
Server with
WebSphere
Plugin
DB2
(MEs)

DB2
(BPE)

Macro
Flow

DB2
(WPS)

2010 IBM Corporation

5.1.4 Horizontal (clustered) scaling on AIX POWER6


5.1.4.1 Overview
This section shows SOABench 2008 horizontal scaling performance when the applications cluster
is measured on multiple 4 core POWER6 cluster members (nodes) using AIX systems. Vertical
(SMP) scaling performance is shown in the section directly above. Further, a direct comparison
between vertical and horizontal scaling performance is shown in section 9.10.
5.1.4.2 Automated Approval Mode
These are the results of SOABench 2008 Automated Approval Mode, described in section 10.4.2.
The topology used for these measurements is shown below the data chart.
Note that scaling is nearly perfectly linear. As additional nodes are added, throughput scales
proportionally. With 8 nodes of 4 cores each for the applications cluster, a throughput above
5,400 Claims Completed per Second is achieved.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server 7.0.0.1 Performance Results

79

Claims Completed per Second

SOABench 2008 Automated Mode - AIX


(4 cores per node)
96%
7.8x

6000
5000

97%
5.9x

4000
98%
4.0x

3000
98%
2.0x

2000
1000

98%

0
1 node

2 nodes

Copyright IBM Corporation 2005, 2010. All rights reserved

4 nodes

6 nodes

8 nodes

CPU Utilization and Scaling Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

Measurement Configuration
HTTP Server,
SOABench Driver

SOABench Services,
Active MEs, ME DB

WPS Applications
Cluster Members
POWER6 4.7 GHz A

PPC 1.9 GHz - A

POWER6 4.7 GHz - E

POWER6 4.7 GHz B


POWER6 4.7 GHz C
POWER6 4.7 GHz - D

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server 7.0.0.1 Performance Results

80

Topology: Horizontal SOABench 2008 Automated Mode - AIX


WebSphere Network Deployment Cell

8x
8 core
Power5

8 core
Power5

8 core

16 core

4 core

IBM HTTP
Server with
WebSphere
Plugin
ServicesCluster
SOABench
Services

AppCluster
SOABench
Services

Micro
Flow

MECluster

IBM HTTP
Server with
WebSphere
Plugin

SOABench
Automated
Driver

Active
MEs

DB2
(MEs)

SOABench
BPEL
App

DB2
(BPE)

DB2
(WPS)

2010 IBM Corporation

5.1.4.3 Outsourced Workload


These are the results of SOABench 2008 OutSourced Mode, described in section 10.4.3. The
topology used for these measurements is shown below the data chart.
With 6 nodes of 4 cores each for the applications cluster, a throughput over 320 Claims
Completed per Second is achieved. A detailed analysis of performance at 6 nodes suggests that
scalability is limited by the machine running DB2 for the BPE database, which was running at
very high utilizations.
To achieve optimal throughput, some changes were made to the indexes of the BPE DB, as
described in section 4.5.11.11.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server 7.0.0.1 Performance Results

81

Claims Completed per Second

SOABench 2008 OutSourced Mode - AIX


(4 cores per node)
88%
5.4x

350
300

95%
3.9x

250
200
96%
2.0x

150
100

98%

50
0
1 node

2 nodes

Copyright IBM Corporation 2005, 2010. All rights reserved

4 nodes

6 nodes

CPU Utilization and Scaling Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

Measurement Configuration
SOABench Driver,
HTTP Server,
SOABench Agent/
Outsourced Services

BPE, WPS DBs


SOABench Services,
Active MEs, ME DB

WPS Applications
Cluster Members

POWER6 4.7 GHzA


PPC 1.9 GHz - A

POWER6 4.7 GHz -E

POWER6 4.7 GHzG

Copyright IBM Corporation 2005, 2010. All right reserved.

POWER6 4.7 GHz B


POWER6 4.7 GHz C

WebSphere Process Server 7.0.0.1 Performance Results

82

Topology: Horizontal SOABench 2008 OutSourced Mode - AIX


WebSphere Network Deployment Cell

6x
8 core
Power5

8 core
Power5

SOABench
Agent and
Outsourced
Services
SOABench
Outsourced
Controller
(Driver)

8 core

16 core

4 core

AppCluster
SOABench
Services

SOABench
BPEL
App
Micro
Flow

MECluster

Async

Active
MEs
IBM HTTP
Server with
WebSphere
Plugin
DB2
(MEs)

DB2
(BPE)

Macro
Flow

DB2
(WPS)

2010 IBM Corporation

5.1.5 Automated Approval on AIX POWER7


5.1.5.1 Introduction and Caveats
This section contains a study comparing the throughput performance of WPS v7.0.0.1 running on
POWER7 versus POWER6 systems. Both measurements use the AIX 6.1 operating system and
the 32-bit version of WPS 7.0.0.1. The workload used in this study is SOABench 2008
Automated Approval, described in section 10.4.2. With this workload, the POWER7 system
performed 40 to 50% better than a corresponding POWER6 system.
The hardware configuration of the corresponding systems is shown below:
POWER7: IBM pSeries 750, 1 to 6 core lpars, 3.55 GHz, 12 GB RAM, smt=4
POWER6: IBM pSeries 670, 1 to 6 core lpars, 4.7 GHz, 31 GB RAM, smt=2
There are significant differences between POWER7 and POWER6 processor architectures. Each
POWER7 processor contains 8 cores, while POWER6 processors contain only 2 cores.
POWER7 uses 4 MB on-chip L3 per core, while POWER6 shares 36 MB off-chip L3 by 2 cores
in each processor. Each POWER7 core employs 4 concurrent hardware threads (SMT=4), while
POWER6 core supports 2 hardware threads (SMT=2). Note that TurboCore mode is not enabled
on the POWER7 system used for the measurements shown below.
Since the POWER7 system became generally available very recently, relatively little tuning was
done for the measurements shown below. For example, we tuned the smt_snoose_delay setting to

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server 7.0.0.1 Performance Results

83

-1, but we did not use resource sets to bind processes to processors, and we also did not use
memory affinity. It is likely that further tuning will produce better POWER7 results.
5.1.5.2 Results
The SOABench 2008 Automated Approval workload was used in this study. Results are as
follows:

On a 1 core configuration, POWER7 is 50% better than POWER6; the POWER7


throughput is 261 claims completed per second compared with 174 on POWER6.

On a 2-core configuration, the throughput is 501 on POWER7 versus 344 on POWER6, a


46% improvement for POWER7.

On a 4-core configuration, the throughput is 970 on POWER7 versus 691 on POWER6, a


40% advantage for POWER7.

Finally on a 6-core configuration, the throughput is 1,408 on POWER7 versus 978 on


POWER6, a 44% improvement.

SOABench 2008 Automated Approval


WPS 7.0.0.1 - AIX
Claims Completed per second

1600
97%

1400
5.39x

1200
97%

1000
800

99%

600

99%
99%

400
200

100%
100%

3.72x

98%
5.62x

Power6
Power7

3.97x

1.92x

1.98x

0
1 core

2 core

Copyright IBM Corporation 2005, 2010. All rights reserved

4 core

6 core

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server 7.0.0.1 Performance Results

Measurement Configuration
WebSphere Process Server

Driver

POWER6 4.7 GHz E

Intel Xeon 2.93GHz -A

POWER7 3.55 GHz - A

POWER7 3.55 GHz - B

Copyright IBM Corporation 2005, 2010. All right reserved.

84

WebSphere ESB 7.0.0.1 Performance Results

6 WebSphere ESB 7.0.0.1 Performance Results


Each of the measurements presented in the following sections is based on one of a number of
performance measurement workloads used to simulate a production environment. A detailed
description of each workload, including characteristics, configuration and measurement specifics
can be found in Chapter 11. Any notes which may be pertinent to that particular configuration
and measurement are included directly beneath the chart.
Each set of charts sharing a common configuration are preceded by a table that identifies the
measurement hardware and software for those particular workloads. Hardware names are crossreferenced to the individual measurement systems section of Appendix A of this document, which
includes detailed configuration information for each measurement platform.

Copyright IBM Corporation 2005, 2010. All right reserved.

85

WebSphere ESB 7.0.0.1 Performance Results

86

6.1 Windows results


6.1.1 Web Services Binding
The following charts show the throughput measured for various mediations using a range of
request and response sizes. For details of the mediations and request/response sizes see section
11.3 and 11.4. All data is obtained using Web services bindings on a 16-core non hyper-threaded
WESB server machine. For details of the topology used see section 11.1.
The JAX-WS SOAP 1.1 binding for Web Services was used throughout. This is the default Web
Services binding in WESB 7.0.0.1
The following measurement configuration was used for all the following scenarios:

Measurement Configuration

Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 2.93GHz - C

Intel 3.5GHz - B

Xform Value Mediation - Windows


6000
96%

5000
97%

Reqs/sec

4000

97%

V6.2
V7.0.0.1

95%

97%

3000

96%
96%

2000

96%

1000
97%

77%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing The response message is passed through unmediated and as a result
is not parsed.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

87

Xform Namespace Mediation - Windows


4500

95%

4000
3500

96%
96%

Reqs/sec

3000

95%

2500

V6.2

95%

2000

96%

V7.0.0.1

97%

1500

96%

1000
500

97%

88%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

This mediation transforms the request message from one schema to another using the XSL
Transform primitive. It performs the reverse transform on the response message using the XSL
Transform primitive. The schemas are largely the same but the name of an element differs and the
two schemas have different namespaces. The request and response flows are eligible for deferred
parsing.
This chart highlights a major performance improvement between the 6.2 and 7.0.0.1
releases. The improvement will affect all mediations with JAX-WS bindings on the Export
and Import components and are eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

88

Xform Schema Mediation - Windows


1800
1600

97%
97%

1400
Reqs/sec

1200
97%

97%

1000
97%

800

V6.2
V7.0.0.1

98%
97%

600

98%

400
200

97% 99%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

This mediation transforms the request message from one schema to another using the XSL
Transform primitive. It performs the reverse transform on the response message using the XSL
Transform primitive. The schemas are completely different but contain similar data which is
mapped from one to the other. In addition to the transform, a value from the request message is
stored in a context header and set in the response message. The use of the context header means
that the transform operates on the root of the message rather than the body and hence the request
and response flows are not eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

89

Route on Body Mediation - Windows


6000
95%

5000
96%

Reqs/sec

4000

97%

3000

V6.2
V7.0.0.1

95%

97%

96%
97%

2000

97%

1000
96%

77%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
CPU Utilization Show n Above Each Bar

Copyright IBM Corporation 2009, 2010. All rights reserved

This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.
The response message is passed through unmediated and as a result is not parsed.

Route on Header Mediation - Windows


6000
5000

97%

96%

Reqs/sec

4000
97%

96%

95%

V6.2
V7.0.0.1

96%

3000

96%
96%

2000
1000
86% 85%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

Copyright IBM Corporation 2005, 2010. All right reserved.

CPU Utilization Show n Above Each Bar

WebSphere ESB 7.0.0.1 Performance Results

90

This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a string in the SOAP or JMS header. The Web Services workload uses the
Internationalization Context header. The JMS workload uses the JMSCorrelationId header field.
The request message processing does not use the message body which is therefore not parsed.
The response message is passed through unmediated and as a result is not parsed.

Message Element Setter Mediation - Windows


4000
3500

97%
97%
97%

3000
Reqs/sec

97%

2500

97%

V6.2

97%

2000
97%

1500

V7.0.0.1

97%

1000
500

97%

97%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

This mediation sets the value of a single element in the request message using the Message
Element Setter primitive. The request message processing is not eligible for deferred parsing. The
response message is passed through unmediated and as a result is not parsed.
If your mediation flow is otherwise eligible for deferred parsing an alternative to using Message
Element Setter primitive would be to use an XSL Transform primitive to set the required field.
This would then be eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

91

BO Mapper Mediation - Windows


3500

98%
98%

3000
98%

Reqs/sec

2500

98%

2000

V6.2
V7.0.0.1

98%
98%

1500

98%

98%

1000
500

98% 98%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
CPU Utilization Show n Above Each Bar

Copyright IBM Corporation 2009, 2010. All rights reserved

This mediation uses the Business Object Map primitive to map the body of the request message
into a new Business Object. The request message processing is not eligible for deferred parsing.
The response message is passed through unmediated and as a result is not parsed.
Service Invoke Mediation - Windows
7000
97%

97%

6000

Reqs/sec

5000
97%

97%

97%

4000

V6.2

97%

3000

97%

V7.0.0.1

96%

2000
1000
97% 98%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

This mediation invokes the SOABench target from a Service Invoke primitive. The request
message is unprocessed but is wired to a service invoke which in turn is wired directly to the
inputResponse node. The request message processing is eligible for deferred parsing and there is
no separate response flow. A similar functional flow can be achieved by having separate request

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

92

and response flows with no mediation primitives. Testing has shown that these two approaches
are equivalent in performance terms.

Fan Out/Fan In Mediation - Windows


3000
Single Fan V6.2

2500

Single Fan V7.0.0.1

97%

Two Fans V6.2

97%

Two Fans V7.0.0.1

Reqs/sec

2000

96%

4 Fans V6.2

96%

97%

97%

1500
95%

1000

97%

97%

96%

97%

4 Fans V7.0.0.1

97%
97%

98%

96%
96%

96%

97%
97%

96%
96%

500

97%

97%
96%
98% 98%
97% 96% 95%96%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

This mediation invokes multiple SOABench services, sets a field in each response, merges the
responses and transforms the merged response. The request message processing examines a field
in the message to establish the number of fan outs (service calls) to invoke. Some additional
processing primitives are wired into the flow (see section 11.1 for details ) and the response from
the fan-in is wired directly to the inputResponse node as the service calls have already been
made. There is no separate response flow. The mediation is not eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

93

Composite Mediation - Windows


97%

1600
97%

1400

Reqs/sec

1200
97%

1000
800

98%

97%

97%

V6.2
V7.0.0.1

98%

600

97%

400
200

96% 98%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
CPU Utilization Show n Above Each Bar

Copyright IBM Corporation 2009, 2010. All rights reserved

This mediation processes the request message with several mediation primitives. First a message
filter checks a field for authentication, next a custom mediation logs the message to the console,
this is followed by a routing filter using a value in the body of the message and finally an XSLT
primitive transforms the message. The request message is not eligible for deferred parsing.
The response message is passed through unmediated and as a result is not parsed.

Chained Mediation - Windows


1200

96%
97%

1000

Reqs/sec

800

97%

98%
98%

97%

600

V6.2
V7.0.0.1

98%
98%

400
200

98%

98%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

Copyright IBM Corporation 2005, 2010. All right reserved.

CPU Utilization Show n Above Each Bar

WebSphere ESB 7.0.0.1 Performance Results


This mediation is identical in function to the preceding composite mediation but the primitives
are in separate modules linked by SCA bindings. The request message is not eligible for deferred
parsing.
The response message is passed through unmediated but unlike the composite mediation it is not
eligible for deferred parsing as a result of passing back through the SCA bindings.
A separate directed study in this report compares the performance of different methods of
modularization see section 9.17

Copyright IBM Corporation 2005, 2010. All right reserved.

94

WebSphere ESB 7.0.0.1 Performance Results

95

6.1.2 JMS Binding Non Persistent


The following charts show the non persistent throughput measured for various mediations using a
range of message sizes. For details of the mediations and message sizes see section 11.3 and 11.4.
All data is obtained using JMS bindings on a 4-way hyper-threaded WESB server machine. For
details of the topology used see section 11.2.
The following measurement configuration was used for all the following scenarios:

Measurement Configuration

JMS Producer/Consumer

WebSphere ESB

Intel 2.8GHz - B

Intel 3.0GHz - D

JMS Value Mediation - Non Persistent


2500
97%

98%

Reqs/Sec

2000

1500
6.2.0.1
97%
1000

7.0.0.1

98%

500
97%

98%

0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved

10K
4 CORE

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

96

JMS Body Routing Mediation - Non Persistent


2000

98%

98%

1800
1600

Reqs/Sec

1400
1200
1000

97%

6.2.0.1

98%

7.0.0.1

800
600
400
97%

200

98%

0
1K

10K

Copyright IBM Corporation 2006, 2009. All rights reserved

4 CORE

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.

JMS Header Routing Mediation - Non Persistent


3500
97%

96%
3000

96%

97%

Reqs/Sec

2500
2000

6.2.0.1
7.0.0.1

1500
1000
82%

73%

500
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved

10K
4 CORE

Copyright IBM Corporation 2005, 2010. All right reserved.

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

WebSphere ESB 7.0.0.1 Performance Results

97

This mediation routes the request message to the target service using the Filter primitive. The
XPath queries the JMSCorrelationId property in the JMS header. The request message processing
does not use the message body which is therefore not parsed.

JMS Schema Mediation - Non Persistent


900
99%

99%
800
700

Reqs/Sec

600
500

6.2.0.1
7.0.0.1

400
99%

300

98%

200
100

99%

99%

0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved

10K
4 CORE

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

This mediation transforms the request message from one schema to another using the XSL
Transform primitive. The schemas are completely different but contain similar data which is
mapped from one to the other. In addition to the transform, a value from the request message is
stored in a context. The use of the context header means that the transform operates on the root of
the message rather than the body and hence the request flow is not eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

98

JMS Composite Mediation - Non Persistent


800

98%

98%

700

Reqs/Sec

600
500
6.2.0.1

400
98%

300

7.0.0.1

99%

200
100

97%

99%

0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved

10K
4 CORE

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

This mediation uses a Filter primitive to query a field in the body of the request message, this is
followed by a Custom Mediation primitive which could be used for custom logging (no logging
takes place in this scenario to prevent IO contention). A further Filter primitive is then used to
route the message to one of two XSLT Transformation primitives. For this scenario the
transformation as detailed in the Schema Mediation is used.
The request message processing is not eligible for deferred parsing (because the body is parsed
for the initial Filter) or native form reuse (as the body may have been changed in the Custom
Mediation).

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

99

6.1.3 JMS Binding Persistent


The following charts show the persistent throughput measured for various mediations using a
range of message sizes. For details of the mediations and message sizes see section 11.3 and 11.4.
All data is obtained using JMS bindings on a 4-way hyper-threaded WESB server machine. For
details of the topology used see section 11.2.
The following measurement configuration was used for all the following scenarios:

Measurement Configuration

JMS
Producer/Consumer

WebSphere ESB

DB2

Intel 2.8GHz - B

Intel 3.0GHz - D

Intel 3.5GHz - A

JMS Value Mediation - Persistent


1400
93%

94%

1200

Reqs/Sec

1000
800

95%

97%

6.2.0.1
7.0.0.1

600
400
200

97%

97%

0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved

10K
4 CORE

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

100

JMS Body Routing Mediation - Persistent


1400
1200

93%

96%

Reqs/Sec

1000
800

6.2.0.1

97%

96%

7.0.0.1

600
400
200

95%

99%

0
1K

10K

Copyright IBM Corporation 2006, 2009. All rights reserved

4 CORE

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.

JMS Header Routing Mediation - Persistent


1800
1600

93%
92%

1400

91%

93%

Reqs/Sec

1200
1000

6.2.0.1

800

7.0.0.1

600
400

88%

77%

200
0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved

10K
4 CORE

Copyright IBM Corporation 2005, 2010. All right reserved.

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

WebSphere ESB 7.0.0.1 Performance Results

101

This mediation routes the request message to the target service using the Filter primitive. The
XPath queries the JMSCorrelationId property in the JMS header. The request message processing
does not use the message body which is therefore not parsed.

JMS Schema Mediation - Persistent


700
98%

96%

600

Reqs/Sec

500
400

6.2.0.1

300

99%

99%

7.0.0.1

200
100
98%

99%

0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved

10K
4 CORE

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

This mediation transforms the request message from one schema to another using the XSL
Transform primitive. The schemas are completely different but contain similar data which is
mapped from one to the other. In addition to the transform, a value from the request message is
stored in a context. The use of the context header means that the transform operates on the root of
the message rather than the body and hence the request flow is not eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

102

JMS Composite Mediation - Persistent


700
98%

96%

600

Reqs/Sec

500
400

6.2.0.1

300

99%

99%

7.0.0.1

200
100
98%

99%

0
1K
Copyright IBM Corporation 2006, 2009. All rights reserved

10K
4 CORE

100K
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

This mediation uses a Filter primitive to query a field in the body of the request message, this is
followed by a Custom Mediation primitive which could be used for custom logging (no logging
takes place in this scenario to prevent IO contention). A further Filter primitive is then used to
route the message to one of two XSLT Transformation primitives. For this scenario the
transformation as detailed in the Schema Mediation is used.
The request message processing is not eligible for deferred parsing (because the body is parsed
for the initial Filter) or native form reuse (as the body may have been changed in the Custom
Mediation).

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

103

6.2 AIX results


6.2.1 Web Services Binding
The following charts show the throughput measured for various mediations using a range of
request and response sizes. For details of the mediations and request/response sizes see sections
11.3 and 11.4. All data is obtained using the Web services bindings on an 8-way pSeries
POWER6 WESB server machine with 1 core enabled. For details of the topology used see section
11.1.
Note that there are some simple cases where a small message is being passed and there is no
processing of the response flow where V7.0.0.1 is slower than V6.2. The development team is
currently working to resolve this issue.
The following measurement configuration was used for all the following scenarios:

Measurement Configuration
Web Services Client

WebSphere ESB

Web Services Target

Intel 3.67GHz - C

PPC 4.2GHz - A

PPC 4.2GHz - B

Xform Value Mediation - AIX


500

99%

450
89%
400

99% 99%
100%

Reqs/sec

350
300

100%
98%

250
T

200

V6.2
V7.0.0.1

99%

150
100
50

100%

99%

0
Base in/Base out

Base in/10K out

10K in/Base out

1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved

10K in/10K out

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing The response message is passed through unmediated and as a result
is not parsed.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

104

Xform Nmspc Mediation - AIX


600

500

100%

100%

Reqs/sec

400
100%

99%
300

V6.2
99%
99%

200

V7.0.0.1

100%

99%
100
99%

99%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation transforms the request message from one schema to another using the XSL
Transform primitive. It performs the reverse transform on the response message using the XSL
Transform primitive. The schemas are largely the same but the name of an element differs and the
two schemas have different namespaces. The request and response flows are eligible for deferred
parsing.
This chart highlights a major performance improvement between the 6.2.0 and 7.0.0.1
releases. The improvement will affect all mediations which have a deferred parsing eligible
transform and which use document literal wsdl.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

105

Xform Schema Mediation - AIX


250
100%
100%

Reqs/sec

200

150

99%

V6.2

100%

100%
100

V7.0.0.1

100%
100%
100%

50
99% 100%
0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation transforms the request message from one schema to another using the XSL
Transform primitive. It performs the reverse transform on the response message using the XSL
Transform primitive. The schemas are completely different but contain similar data which is
mapped from one to the other. In addition to the transform, a value from the request message is
stored in a context header and set in the response message. The use of the context header means
that the transform operates on the root of the message rather than the body and hence the request
and response flows are not eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

106

Route Header Mediation - AIX


700
100%
600

100%

500
Reqs/sec

99%
100%

400

100%

100%

99%

300

V6.2

99%

V7.0.0.1

200
100
99% 100%
0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a string in the SOAP or JMS header. The Web Services workload uses the
Internationalization Context header. The JMS workload uses the JMSCorrelationId header field.
The request message processing does not use the message body which is therefore not parsed.
The response message is passed through unmediated and as a result is not parsed.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

107

Route Body Mediation - AIX


600
100%
500

100%

400

99%

Reqs/sec

99%
100%
300

V6.2

99%

100%

V7.0.0.1
99%

200

100
100%

100%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation routes the request message to the target service using the Filter primitve. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.
The response message is passed through unmediated and as a result is not parsed.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

108

Message Element Setter Mediation - AIX


500
450

100%
100%

400
99% 99%

Reqs/sec

350
300

100%
250

V6.2

99%

V7.0.0.1

100%

200

99%

150
100
50

100%

100%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation sets the value of a single element in the request message using the Message
Element Setter primitive. The request message processing is not eligible for deferred parsing. The
response message is passed through unmediated and as a result is not parsed.
If your mediation flow is otherwise eligible for deferred parsing an alternative to using Message
Element Setter primitive would be to use an XSL Transform primitive to set the required field.
This would then be eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

109

Composite Mediation - AIX


250

Reqs/sec

200

100% 100%

150
V6.2
100%

100%
100

99%

V7.0.0.1

99%
100%
100%

50
99% 100%
0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation processes the request message with several mediation primitives. First a message
filter checks a field for authentication, next a custom mediation logs the message to the console,
this is followed by a routing filter using a value in the body of the message and finally an XSLT
primitive transforms the message. The request message is not eligible for deferred parsing.
The response message is passed through unmediated and as a result is not parsed.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

110

Chained Mediation - AIX


160
140

100% 99%

120

Reqs/sec

100
80
100%

99%

99%

99%

V6.2
V7.0.0.1

60

99%
100%

40
20
99% 99%
0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

1 Core
Copyright IBM Corporation 2009, 2010
All i ht
d

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation is identical in function to the preceding composite mediation but the primitives
are in separate modules linked by SCA bindings. The request message is not eligible for deferred
parsing.
The response message is passed through unmediated but unlike the composite mediation it is not
eligible for deferred parsing as a result of passing back through the SCA bindings.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

111

BO Mapper Mediation - AIX


400

100%

96%

350
98%

300

99%

Reqs/sec

250
200

V6.2

150

100%

V7.0.0.1

91%

98%
99%

100
50
83% 99%
0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation uses the Business Object Map primitive to map the body of the request message
into a new Business Object. The request message processing is not eligible for deferred parsing.
The response message is passed through unmediated and as a result is not parsed.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

112

Service Invoke Mediation - AIX


800
700

100%
100%

600
99%

Reqs/sec

500

99%

100%

100%

400

V6.2

99%
99%

V7.0.0.1

300
200
100

100% 98%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

1 Core
Copyright IBM Corporation 2009, 2010. All rights reserved

100K in/100K out

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

This mediation invokes the SOABench target from a Service Invoke primitive. The request
message is unprocessed but is wired to a service invoke which in turn is wired directly to the
inputResponse node. The request message processing is eligible for deferred parsing and there is
no separate response flow. A similar functional flow can be achieved by having separate request
and response flows with no mediation primitives. Testing has shown that these two approaches
are equivalent in performance terms.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

113

6.2.2 JMS Binding Non Persistent


The following chart shows the throughput measured for the Transform Value and Body Routing
mediations using a range of request sizes. For details of the mediation and request sizes see
sections 11.3 and 11.4. All data is obtained using JMS bindings on an 8-way pSeries POWER6
WESB server machine with 8 Cores enabled. For details of the topology used see section 11.2.1.
The below measurement configuration was used for this scenario:

Measurement Configuration
JMS Producer/Consumer

WebSphere ESB

Intel 3.67GHz - A

PPC 4.2GHz - A

JMS Value Mediation Non Persistent - AIX


8000
7000

95%
95%

Reqs/Sec

6000
5000
6.2

4000

97%

96%

7.0.0.1

3000
2000
1000

97%

96%

0
Base
Copyright IBM Corporation 2009, 2010. All rights reserved

10
8 CORE

100
CPU Utilization Shown Above Each Bar
Simultaneous Multithreading (SMT) Enabled

This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing. The response message is passed through unmediated and as a result
is not parsed.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

114

JMS Body Routing Mediation Non Persistent - AIX


7000
96%
6000

Reqs/Sec

5000

97%

4000

6.2
97%

3000

7.0.0.1

97%

2000
1000

97%

98%

0
Base
Copyright IBM Corporation 2009, 2010. All rights reserved

10
8 CORE

100
CPU Utilization Shown Above Each Bar
Simultaneous Multithreading (SMT) Enabled

This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

115

6.2.3 JMS Binding Persistent


The following chart shows the throughput measured for the Transform Value and Body Routing
mediations using a range of request sizes. For details of the mediation and request sizes see
sections 11.3 and 11.4. All data is obtained using JMS bindings on an 8-way pSeries POWER6
WESB server machine with 8 Cores enabled. For details of the topology used see section 11.2.1.
The below measurement configuration was used for this scenario:

Measurement Configuration
JMS Producer/Consumer

WebSphere ESB

DB2

Intel 3.67GHz - A

PPC 4.2GHz - A

PPC 4.2GHz - B

JMS Value Mediation Persistent - AIX


4000
89%

91%

3500

Reqs/Sec

3000
2500
6.2

2000

7.0.0.1

1500
1000

34%

30%
500

77%

68%

0
Base
Copyright IBM Corporation 2009, 2010. All rights reserved

10
8 CORE

100
CPU Utilization Shown Above Each Bar
Simultaneous Multithreading (SMT) Enabled

This mediation sets the value of a single element in the request message and copies all other
elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing. The response message is passed through unmediated and as a result
is not parsed.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

116

JMS Body Routing Mediation Persistent - AIX


4000
3500

90%
92%

Reqs/Sec

3000
2500
6.2

2000

7.0.0.1

1500
1000

42%

37%
500

83%

71%

0
Base
Copyright IBM Corporation 2009, 2010. All rights reserved

10
8 CORE

100
CPU Utilization Shown Above Each Bar
Simultaneous Multithreading (SMT) Enabled

This mediation routes the request message to the target service using the Filter primitive. The
XPath queries a field in the body of the request message. The request message processing is not
eligible for deferred parsing, however since the message is unchanged the native form is re-used.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB 7.0.0.1 Performance Results

6.2.4

117

Web Services Binding SMP scaling

The results below are for the SOABench 2008 Transform Schema workload running on AIX
using the 10K message lengths. They show the SMP scaling achieved when running the ESB
Server in 3 different configurations: 1-way, 4-way and 8-way. Note that simultaneous multithreading (SMT) is enabled for all measurements.
The measurement configuration below was used for all measurements:

Measurement Configuration
Web Services Client

WebSphere ESB

Web Services Target

Intel 3.67GHz - C

PPC 4.2GHz - A

PPC 4.2GHz - B

SMP Scaling Web Services - AIX


600
97%
500
7.0x

Reqs/sec

400
98%

300

1 Core
4 Cores

200

100

8 Cores

3.8x
98%

0
Scenario - Transform Schema (10K in /10K out)

Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Shown Above Each Bar


SMP Scaling Factor on Each Multi-Processor Bar
Simultaneous Multithreading (SMT) enabled

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Business Monitor 7.0.0.0 Performance Results

7 WebSphere Business Monitor 7.0.0.0 Performance


Results
As shown below, a major performance enhancement to interactive process design was delivered
in WebSphere Business (WB) Monitor V7.0.0.0.
While steps were taken during development to ensure that the performance of WB Monitor
7.0.0.0 in other areas kept pace with the levels achieved in V6.2, in the interest of space we do not
repeat previously published results here. For performance data regarding event delivery, event
processing, and dashboard access, please see the earlier report for more information:
BPM 6.2.0 Performance Report

7.1 Interactive Process Design Improvements


Interactive process design empowers business users to go directly from modeling to deployment
on WebSphere Process Server and WB Monitor runtimes for certain human-centric process
scenarios. A pre-configured business space is created as part of deployment that can be
immediately used for testing process execution, management, and monitoring in managed
environments.
In this study we focus on reductions in the time required to deploy a monitor model directly from
WB Modeler 7.0.0.0 to WB Monitor 7.0.0.0. The Vacation Process model and the associated
monitor model used in this study have these attributes:

3 human tasks

2 business rules tasks

7 KPIs

11 metrics

3 cubes

Durations reported here are averages of multiple measurements, gathered from an analysis of
messages logged in the server during deployment. The first deploy operation after startup is not
included in the average. This reflects the typical user experience during interactive process
design. We note that the first deploy operation after startup, while taking somewhat longer due to
one-time initialization costs, also benefits substantially from the improvements delivered in
V7.0.0.0.
In the topology used for these measurements, WB Modeler client and WB Monitor server
machines are connected to the same subnet of a shared (non-private) network at 100 Mbps.

Copyright IBM Corporation 2005, 2010. All right reserved.

118

WebSphere Business Monitor 7.0.0.0 Performance Results

Deployment Time in Seconds


(Down is Good)

Interactive Process Design


Vacation Process Monitor Model Deployment Time
Monitor Server: Linux on IA-32
100
80
60
40
20
0
V6.2

V7.0

Copyright IBM Corporation 2005, 2009. All rights reserved

Measurement Configuration
WB Modeler Client

WB Monitor Server

Intel 2.2GHz D2D1

Intel 3.0GHz D2D2

Deployment time is reduced in V7.0.0.0 to less than half of the time needed in V6.2 due to
several improvements, notably:

Exploiting the new EJB 3.0 support available in the WebSphere V7 Application Server
which underpins the runtime of WB Monitor V7. This eliminates the need for a separate
EJB deploy step.

Streamlining the EAR generation process by dramatically reducing disk I/O.

Copyright IBM Corporation 2005, 2010. All right reserved.

119

WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results

8 WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results


The measurements presented in the following sections demonstrate results generated by
workloads used to simulate common activities in a development environment making use of
WebSphere Integration Developer (WID), WebSphere Business Modeler (WBM) and WebSphere
Process Server (WPS). A detailed description of the workloads can be found in Chapter 12.
Section 8.1 uses the Order Processing, Loan Processing & Customer Service workloads to
demonstrate the Response Time & Peak Memory consumption (within the Java Heap) when
executing a Clean & Build operation within the WID.
Section 8.2 uses the Loan Processing workload to demonstrate Response Time when deploying
the applications to WPS using the Add/Remove Project Dialog with in the WID.
Section 8.3 uses the BPM@Work workload to demonstrate Response Time when deploying
business processes to WPS directly from WBM.
Each of these Sections compares results from version 7.0.0.1 of WID and WPS with those from
previous releases.
Each data chart presented is followed by a table that identifies the measurement cell for the
particular workload. The primary software and hardware systems are identified. Hardware
names are cross-referenced to the individual measurement systems descriptions in Appendix A of
this document, which provide detailed configuration information for each measurement platform.

8.1 Build Activities


Execution of Clean All artifacts followed by a Build of the entire workspace is measured in
order to demonstrate relative performance of broad spectrum of build time operations within the
WID. Both total response time and peak liveset within the Java Heap is measured for this
operation. WID 7.0.0.1 shows improvements relative to previous versions of the product in both
metrics and across a broad range of workspaces.

Copyright IBM Corporation 2005, 2010. All right reserved.

120

WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results

121

8.1.1 Order Processing Workload


Clean & Build of the Order Processing workspace in WID 7.001 completes in 69 seconds, a 30%
improvement relative to version 6.2.0.1 and a 2.3x improvement relative to version 6.1.0.
Peak live data within the WIDs Java Heap is 215MB during the Clean & Build operations, a
12% improvement relative to WID 6.2.0 & a 22% improvement relative to WID 6.1.0.

Order Processing Workload


Clean & Build Response Time - Windows
200

Time (Seconds)

180
160
140
120
100
80
60
40

156
124
99

98
69

20
0
WID 6.1.0.0

WID 6.1.0.1

WID 6.1.2

WID 6.2.0.1

WID 7.001

2 Core
Copyright IBM Corporation 2010. All rights reserved

Order Processing Workload


Clean & Build Peak Java Memory - Windows
300

Memory (MB)

250

276

272
245

240
200

215

150
100
50
0
WID 6.1.0.0

WID 6.1.0.1

WID 6.1.2

WID 6.2.0.1

2 Core
Copyright IBM Corporation 2010. All rights reserved

Copyright IBM Corporation 2005, 2010. All right reserved.

WID 7.001

WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results

122

8.1.2 Loan Processing Workload


Clean & Build of the Loan Processing workspace in WID 7.001 completes in 86 seconds, a 13%
improvement relative to version 6.2.0.1 and a 4.7x improvement relative to version 6.1.0.
Peak live data within the WIDs Java Heap is 227MB during the Clean & Build operations, a
17% improvement relative to WID 6.2.0.1 & a 19% improvement relative to WID 6.1.0.

Loan Processing Workload


Clean & Build Response Time - Windows
450

Time (Seconds)

400
350

407

300
250
200

203

150

182

100
99

50

86

0
WID 6.1.0.0

WID 6.1.0.1

WID 6.1.2

WID 6.2.0.1

WID 7.001

2 Core
Copyright IBM Corporation 2010. All rights reserved

Loan Processing Workload


Clean & Build Peak Java Memory - Windows
400
350
Memory (MB)

300
250

281

281

278

273
227

200
150
100
50
0
WID 6.1.0.0

WID 6.1.0.1

WID 6.1.2

WID 6.2.0.1

2 Core
Copyright IBM Corporation 2010. All rights reserved

Copyright IBM Corporation 2005, 2010. All right reserved.

WID 7.001

WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results

123

8.1.3 Customer Service Workload


Clean & Build of the Customer Service workspace in WID 7.001 completes in 91 seconds, a 45%
improvement relative to version 6.2.0.
Peak live data within the WIDs Java Heap is 283MB during the Clean & Build operations, a
32% improvement relative to WID 6.2.0.

Customer Service Workload


Clean & Build Response Time - Windows
180

Time (Seconds)

160

164

140

144

120
100
91

80
60
40
20
0
WID 6.2.0

WID 6.2.0.1

WID 7.001

2 Cores
Copyright IBM Corporation 2010. All rights reserved

Customer Service Workload


Clean & Build Peak Java Memory - Windows
500

Memory (MB)

450
400

417

350
300
250

324
283

200
150
100
50
0
WID 6.2.0

WID 6.2.0.1

2 Cores
Copyright IBM Corporation 2010. All rights reserved

Copyright IBM Corporation 2005, 2010. All right reserved.

WID 7.001

WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results

Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz A

Copyright IBM Corporation 2005, 2010. All right reserved.

124

WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results

125

8.2 Publish Activities


In this section we examine total response time when publishing the Loan Processing workload to
WPS via execution of the Add All Projects dialog within WID. During publish, if application
deploy code has not already been generated, WID will generate it as part of the publish operation.
However, there are some cases where WID is able to cache the generated deploy code within the
application. We will look at each of these scenarios separately.

8.2.1 Publish Including Generation of Deploy Code


When using WID & WPS version 6.2, we recommend publishing with Resources in the
Workspace (and the minimize file copies checkbox enabled) where applicable, in order to
improve application install responsiveness. In version 7.001, this option is no longer supported,
so we publish the application with Resources on the Server.
Publishing the Loan Processing workspace completes in 611 seconds when using WID & WPS
version 7.001, a 1.9x improvement compared with version 6.2.0 (with Resources on the Server)
and a 40% improvement over 6.2.0 (with Resources in the Workspace).

Loan Processing Workload


Publish Response Time - Windows
1200
1145

Time (Seconds)

1000

1018

800
600

611

400
200
0
BPM 6.2 - RoS

Copyright IBM Corporation 2010. All rights reserved

BPM 6.2 - RoW

2 Core

Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B

Copyright IBM Corporation 2005, 2010. All right reserved.

BPM 7.001 - RoS

WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results

126

8.2.2 Publish with Deploy Code Cached in the Application


Beginning with version 7 of WID & WPS, generation of Deploy Code is much more efficient,
greatly reducing the difference in publish response time due to having deploy code cached within
the application.
In this case, publishing the Loan Processing workspace completes in 538 seconds when using
WID & WPS version 7.001, a 25% improvement compared with version 6.2.0 (with Resources on
the Server). Publish responsiveness in WID & WPS 7.001 (with Resources on the Server) is
comparable to that seen in version 6.2.0 (with Resources in the Workspace).
Note: in these measurements, the 3% difference between the results using versions 7.001 and 6.20
is not statistically significant.

Loan Processing Workload


Publish Response Time - Windows
800
700

713

Time (Seconds)

600
500

521

400

538

3.02x

300
200
100
0
BPM 6.2 - RoS

BPM 6.2 - RoW

2 Core
Copyright IBM Corporation 2010. All rights reserved

Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B

Copyright IBM Corporation 2005, 2010. All right reserved.

BPM 7.001 - RoS

WID 7.0.0.1 and Modeler 7.0.0.1 Performance Results

127

8.3 Direct Deploy Activities


In this section we examine total response time when deploying the BPM@Work workload to
WPS via execution of the Verify Process Design dialog within WebSphere Business Modeler.
The BPM@Work workload is described in Section 12.4.
Deploying the BPM@Work workspace completes in 153 seconds when using WB Modeler &
WPS version 7.0.0.1, a 2.7x improvement compared with version 6.2.0.

BPM@Work Workload
Deploy Response Time - Windows
500

Time (Seconds)

400

412

300

200
153
100

0
BPM 6.2
Copyright IBM Corporation 2010. All rights reserved

BPM 7.001

2 Core

Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

128

9 Directed Studies
This section provides a more detailed exploration of some features, along with development and
deployment options, within WPS, WESB, and WID. Generally, these studies are motivated by
lessons learned in the course of performance analysis of these products, or direct interaction with
WebSphere Business Process Management customers. Each of these studies is meant to illustrate
a set of issues that may be of interest, but is not intended to provide an exhaustive analysis of the
component in question. Several of the studies also support points made in the Architecture Best
Practices and Development Best Practices sections above.
Note that some of the directed studies below contain the same information as was presented in
earlier versions of the performance report; these studies were not repeated using WebSphere
BPM 6.2.0 since the conclusion would not change significantly. The charts and section headers
are clearly labeled to indicate this.

9.1 Throughput for 32-bit JVM on 32-bit and 64-bit Windows


The Windows operating system has both 32-bit and 64-bit modes of operation. This is separate
and distinct from 32-bit and 64-bit JVMs, discussed in section 9.2. The chart below compares the
performance of a WPS 32-bit installation on both a 32-bit and 64-bit version of the Windows
2008 operating system. Equivalent hardware is used for each measurement. As shown in the
chart below, WPS delivers 6% better throughput on the 64-bit version of the Windows 2008
operating system, measured using the SOABench 2008 OutSourced workload described in
section 10.4.3. This is likely due to more efficient memory management in the 64-bit version of
Windows 2008.

SOABench 2008 OutSourced Mode


WPS 7.0.0.1 32-bit JVM
40

Claims Completed per second

35
30

28.3
26.6

25

Win2008 32-bit
Operating System
Win2008 64-bit
Operating System

20
15
10
5
0
4 cores

Copyright IBM Corporation 2005, 2010. All rights reserved

Copyright IBM Corporation 2005, 2010. All right reserved.

CPU Utilization 99% across all Bars


Hyperthreading not supported

Directed Studies

129

Measurement Configuration
WPS

Driver,SOABench
Services1

SOABenchServices2

DB2

Intel 2.93 GHz B

Intel 3.5 GHz C

Intel 2.93 GHz C

PPC 2.2 GHz B

9.2 Throughput and memory usage for 64 bit JVM on AIX


9.2.1 Introduction
WPS offers both a 32 bit and 64 bit version. An advantage of the 64 bit version is additional heap
space can be utilized in WPS; for example, a large heap was used for the 64 bit evaluation in the
large object study in section 9.14.2 of this document.
A drawback of the 64 bit version in earlier WPS versions is the additional memory space used by
objects on the heap. The memory liveset is greater after server startup and greater after
applications are run, for instance, when application caches become populated. In addition, when
applications are running, more frequent garbage collection occurs due to the additional space used
by transient objects at runtime. This impacts throughput.
In the 64 bit version of WPS 7.0.0.1 the drawback of additional memory when compared to the
32 bit version is reduced due to an underlying JVM enhancement to use compressed object
references. For 64 bit WPS 7.0.0.1, this is the default behavior due to the JVM argument
Xcompressedrefs. This improves both 64 bit throughput as well as memory utilization.

9.2.2 Throughput Results


The Automated Approval workload of SOABench 2008, described in section 10.4.2, is evaluated
on an IBM pSeries, 4.7 GHz POWER6, 4 core SMP system running AIX to demonstrate the
throughput characteristics of WPS business choreography in this configuration. The driver issued
3 KB requests and the server returned 3 KB responses. The workload was run in infrastructure
mode, making the processing behind service call invocations trivial.
With WPS 6.2.0.1, the workload runs at a rate of 441 Claims Completed per Second (CCPS).
WPS 7.0.0.1 achieved 502 CCPS with the same configuration, a 14% improvement. The WPS
Java heap size was set to 3000M and the GC policy and nursery size were set as follows:
-Xgcpolicy:gencon Xmn2000M

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

130

SOABench 2008 Automated Approval


64 bit WPS Performance - AIX
Claims Completed per second

550
500
450
400
350
300

6.2.0.1

250

7.0.0.1

200
150
100
50
0
4 core

Copyright IBM Corporation 2005, 2010. All rights reserved

CPU Utilization 99%


Simultaneous Multithreading (SMT) enabled

Measurement Configuration
WebSphere Process Server
POWER6 4.7 GHz D

Driver
Intel Xeon 2.93GHz -A

DB2
POWER6 4.7 GHz D

9.2.3 Memory Footprint Results


Using the same SOABench 2008 automated approval workload running on the same hardware
and software configuration as above, we measured the Java heap liveset memory footprint on
both 32-bit and 64-bit WPS systems. The memory footprint measured in this study is the average
heap occupancy after garbage collection as reported by verbosegc. Each measurement on the
chart below uses the following JVM parameters:
-Xgcpolicy:optthroughput -ms2048m -mx2048m
The chart below shows the 64-bit heap occupancy dropping from 338 MB for WPS 6.2.0.1 to 248
MB for WPS 7.0.0.1, an improvement of 26%. Also note that the 64-bit heap usage on WPS
7.0.0.1 is only 8% greater than the 32 bit version, which has a heap occupancy of 230 MB. By
contrast, the 64 bit heap occupancy for WPS 6.2.0.1 is 51% greater than the 32 bit version which
uses 223 MB.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

131

SOABench 2008 Automated Approval


Liveset Memory - AIX

MegaBytes (down is good)

400
338

350
300
250

230

223

248
32 bit

200
64 bit
150
100
50
0
6.2.0.1

7.0.0.1

Copyright IBM Corporation 2005, 2010. All rights reserved

Measurement Configuration
WebSphere Process Server
POWER6 4.7 GHz D

Driver
Intel Xeon 2.93GHz -A

DB2
POWER6 4.7 GHz D

9.3 Throughput and response time for up to 10,000 concurrent

users
9.3.1 Introduction
The SOABench 2008 InHouse Claim Processing workload, described in Section 10.4.4, is
evaluated on an IBM xSeries 3950 M2, 2.93 GHz Xeon (4x quad-core) running Windows 2008
Server. This workload is used to demonstrate the throughput and response time characteristics of
WebSphere Process Server business choreography as an increasing number of users are
concurrently processing insurance claims. Before the workload runs, 50,000 process instances
representing existing insurance claim activity are preloaded into the business process
choreography database. The insurance claims are divided equally among 125 regions. Users
belong to a single region and can only process insurance claims from their region, which is
enforced via authentication by a Tivoli Directory Server. Within a region, users are divided into 2

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

132

groups, adjusters and underwriters. Of the four human tasks required to complete an insurance
claim, two are done by adjusters and two are done by underwriters.
Users query active process instances for a list of work that they can perform. A work item is
claimed (selected from the list) and then completed by the user. Users think between query,
claim, and complete activities. The think time is random but averages a total of 180 seconds per
human task. The time a user waits for responses to their human task queries, claims and
completes is recorded as response time. The rate at which insurance claims are completed is the
throughput. Once an entire insurance claim is finished, another is added to the region to maintain
active process instances at a constant level.
A multi tier topology was used for this study:

A database server which holds the Choreography and Messaging databases.

A WPS Server which runs the processes involved in the application scenario.

A Tivoli Directory Server with LDAP database for user authentication.

2 support systems which each run workload generators (client agents) under the direction
of a single client controller. One support system handles asynchronous service requests
and the other handles synchronous service requests by the business processes running on
the WPS Server.

9.3.2 Results 4 WPS server cores


Throughput and response time curves were first generated running the workload on the WPS
server running 4 cores and is presented in the chart below. 8 core results are presented below.
The right y axis is the throughput in Claims Completed per second (CCPS). The left y axis is
average response time in milliseconds. The chart only shows query response time as the claim
and complete response time averages were always faster.
Throughput increases steadily as additional load is applied, reaching 11.5 CCPS, and response
time remains flat, under 133 milliseconds, up to the 8400 user level. At this point the WPS CPU
utilization is 85%. The addition of 480 more users, 8880 total, causes the CPU utilization to
increase to 99%, throughput to rise slightly to 11.8 CCPS, and query response time to climb
abruptly to 1.5 seconds. WPS server CPU is clearly the bottleneck at this point. At the 9040 user
level CPU remains at 99%, throughput flattens and response time for queries increases to 2.3
seconds. This increase in response time is a byproduct of the WPS server CPU being saturated;
throughput is maintained constant but since there are additional users submitting requests,
average response times increase. In other words, the system is well behaved and continues to
process work efficiently even though it is being driven beyond its capacity. Expanding WPS
capacity to handle the additional user load is discussed in the following subsection.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

133

SOABench 2008 In House Claim Processing


WPS 7.0.0.1 Win2008 Server - 4 cores
16

10

3000

Throughput
Query
response

2500

59% cpu
6000
users

2000

1500

8
6

1000

Response Time ms

Claims Completed per second

14
12

99% cpu
8880
users

85% cpu
8400
users

4
500
2
0

24
0
72
12 0
0
16 0
8
21 0
60
26
4
31 0
20
36
0
40 0
8
45 0
60
50
4
55 0
2
60 0
00
64
8
69 0
6
74 0
4
79 0
20
84
0
88 0
8
90 0
40

User Load

Preload:
125 regions with
400 tasks each =
50,000 total tasks.
80 users per region

Total Task
Think Time:
180 seconds

Copyright IBM Corporation 2005, 2010. All rights reserved

Measurement Configuration
WebSphere Process Server
Intel Xeon 2.93GHz -A

Driver 1

Driver 2

DB2

Intel 3.67 GHz B Intel 3.0 GHz B POWER6 4.7GHz E

9.3.3 Results 8 WPS server cores


The 4 core CPU bottleneck on the WPS server can easily be eliminated by running the workload
on an 8 core server. These results are presented in the chart below. Throughput increases steadily
as additional concurrent users are added, reaching 13.8 CCPS, and response time remains flat at
under 119 ms, up to the 10,000 user level. At this point the WPS CPU utilization is 56%.
Additional concurrent users beyond 10,000 were not evaluated. However, the 4 core results
above suggest that throughput would rise in concert with the increased user load while
maintaining flat response times up to the point where either the WPS CPU became saturated, or
some other system bottleneck was reached.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

134

SOABench 2008 In House Claim Processing


WPS 7.0.0.1 Win2008 Server - 8 cores
44% cpu
8400
users

14
12

56% cpu
10000
users

3000

Query
response

2500

31% cpu
6000
users

10

Throughput

2000

1500

1000

Response Time ms

Claims Completed per second

16

4
500

10
00
0

92
00

84
00

76
00

68
00

60
00

52
00

36
00
44
00

20
00
28
00

0
12
00

40
0

User Load

Preload:
125 regions with
400 tasks each =
50,000 total tasks.
80 users per region

Total Task
Think Time:
180 seconds

Copyright IBM Corporation 2005, 2010. All rights reserved

Measurement Configuration
WebSphere Process Server
Intel Xeon 2.93GHz -A

Driver 1

Driver 2

DB2

Intel 3.67 GHz B Intel 3.0 GHz B POWER6 4.7GHz E

9.4 Business Space response time for Human Workflow


Page load times for Business Space for Human Workflow widgets improved by up to 55% in
WPS 7.0.0.1 compared to the 6.2.0.2-based Feature Pack.
All measurements were performed manually with a single browser user on a Business Space
deployment based on the Advanced Human Workflow template. The response time data is
obtained on a client machine which is connected to the server by a 1 Gigabit ethernet. After
opening the browser and loading the Business Space home page, the user performs the following
set of actions:
1. Log on (and load the My Work page)
2. Refresh the page (by clicking the My Work tab)
3. Switch to the page Manage Human Tasks

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

135

4. Log out
The measurement for the initial iteration of the above steps is discarded, so the results below
utilized a primed browser cache. The results in this study show the average of the subsequent
eight measurement iterations in the browser.

Client hardware

OS:

Windows XP (32bit)

CPU:

1 x Intel Centrino Dual Core Processor 1.8 GHz

Memory:

2 GB RAM

Network:

1 Gigabit ethernet connection

Server hardware environment (Standalone Configuration)

OS:

Windows Server 2003 (32-bit)

CPU:

2 x Intel Xeon 5160 @ 3 GHz

FSB:

1333 MHz

Memory:

16 GB RAM

HDD:

8 Internal Disks (2,5 68 GB SATA)


IBM ServeRAID 8k (256 MB buffer)

Network:

1 Gigabit ethernet connection

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

136

Software environment

WPS:

Version 7.0.0.1

Single server setup

Local process database on DB2

All other databases on Derby

DB:
o

DB2 version 9.1.301.314

9.5 Process Instance Migration Performance


This study evaluates the total time to migrate business process instances to WPS 7.001. The
summary of results follows:

All measured performance numbers (Response Time, Throughput and Parallelism)


remain consistent for all scenarios evaluated (100, 1,000, and 10,000 Migrated Instances)

Response Time: 450 milliseconds per process instance

Throughput: 22 1nstances migrated per second

Parallelism: 9 Threads performed instance migration in parallel

The total elapsed time of the migration grows linearly with the number of migrated
instances, as expected: migrating 100 Process Instances takes 4.5 seconds, 1,000 Process
Instances takes 44.7 seconds, 10.000 Process Instances takes 453.8 seconds.

Test hardware environment (Standalone Configuration)

OS:

Windows Server 2003 (32bit)

CPU:

1 x Intel Xeon 5160 @ 3 GHz

FSB:

1333 MHz

Memory:

16 GB RAM

HDD:

8 Internal Disks (2,5 68 GB SATA)


IBM ServeRAID 8k (256 MB buffer)

Workload and measurement methodology


For each measurement run the given number of instances (100, 1,000, or 10,000) are created from
the original version of the process template. All instances navigate to a BPEL Parallel-Activity
that consist of a number of Task-Activities and remain in this process instance state. After all
instances reach this state the migration of all instances is triggered. This migration is performed
by a number of parallel threads that invoke the migrate() method of the BusinessFlowManagers
Remote EJB interface. The Total Response Time depicted on the chart below represents the
duration of all process instances migrations for a measurement run. All measurements are based

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

137

on the duration of the synchronous migrate() method call. A migration is considered complete
after this call completes.

Process Instance Migration Duration

Total Migration Duration


(Seconds)

1000

453.8

100

10

44.7

4.5

1
100 Instances
Copyright IBM Corporation 2010. All rights reserved

1,000 Instances

10,000 Instances

Number of Migrated Process Instances

9.6 BPC Query Response Time


BPC queries for a particular object, such as a specific human task or a specific process instance,
are typically fast because the database can return the requested information without performing
complex joins and calculations. In contrast, task and process list queries require the database to
perform joins and calculations in order to apply filters and sort criteria.
To perform task and process list queries faster, information to use an optimized access path must
be made available to the database. This can be achieved by different technologies, such as BPC
query tables, an optimized index and table structure, and up-to-date statistics. The following
measurements show performance improvements achieved since WPS 6.2.0 compared to previous
versions of WPS. The performance improvements are primarily attributable to:

changes to the physical representation of work items (that is, changes to the BPC
database schema)

changes to the index structure on BPC database tables

BPC query tables, which are introduced with WPS 6.2.0

The measurements in sections 9.6.1 and 9.6.2 have been made on the following machine setup:

Operating System: Microsoft Windows Server 2003 on all machines

Two physical machines: WPS server (standalone setup) and remote database (DB2 v9)

Relevant hardware details:


o

IBM xSeries 3650, 4x3.0 Ghz, 16MB Cache, 16GB memory (DB2 server and
WPS server)

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies
o

138

Gigabit Ethernet network connection

The measurements are CPU intensive and do not lead to an I/O bottleneck

All measurements have been made with a preloaded database with ~250,000 process instances.

9.6.1 Query Table Response Time


WPS 6.2.0 introduced query tables as a mechanism to achieve very good response times for
human task queries. Query tables are optimized for task and process list queries; they are
developed visually using the Query Table Builder and accessed using the query table API. Please
see the following link for more information:
Query Table Builder: http://www.ibm.com/support/docview.wss?uid=swg24021440
Query tables are a BPC-level concept (unlike database query tables), which only exist in the
context of BPC. BPC query tables do not have a process navigation performance impact.
Comprehensive documentation is published at the following location in the Info Center:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp?topic=/com.ibm.webspher
e.bpc.doc/doc/bpc/c6bpel_querytables.html
In order to demonstrate the performance improvements that can be achieved by using query
tables, two workloads have been defined: the QueryProperties query workload and ExternalData
query workload. The QueryProperties query workload represents a task list which contains
human tasks along with 10 query properties that have been defined on the related business
processes. The ExternalData query workload represents a task list which contains human tasks
along with 10 properties from an externally defined table (business data) that has been filled
with one entry for each human task.
Both workloads use the standard query API, as well as using query tables. Each workload has the
following similar characteristics:

250,000 business processes with one human task in state ready is available in the
database. Group work items are used to assign human tasks. 1,000 users are defined,
divided into 200 groups. A limit of 50 human tasks is returned by each query.

10 simulated users continually execute queries during the measurement interval in order
to measure the average response time. Therefore, the database is executing 10 parallel
queries continuously during the measurement interval.

No special tuning has been applied to WPS beyond that recommended in this report.

Note that the database is the bottleneck for these measurements, running at 100% CPU
utilization. Standard BPC database tuning was applied, described in the following:
WebSphere Process Server V6.1 Business Process Choreographer: Performance
Tuning Automatic Business Processes for Production Scenarios with DB2
http://www.ibm.com/support/docview.wss?uid=swg27012639
Improving the performance of complex BPC API queries on DB2
http://www.ibm.com/support/docview.wss?uid=swg21299450

The following figure shows a screenshot of the query table used for the QueryProperties query
workload:

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

139

Figure 1: QueryProperties query workload Query Table Builder screenshot

The following charts summarizes the query response times achieved using WPS 7.0 with query
tables versus the response times achieved using WPS 6.1.2 with the standard query API. As
demonstrated below, WPS 7.0 queries are up to 20 times faster than WPS 6.1.2 due to the query
table optimization. In addition, these results were obtained without using expert-level database
tuning, but rather the standard tuning described in this document and in the links above.
WPS 7.0 Query Tables vs. WPS 6.1.2 Standard Query API

5.7

3.7

Seconds

WPS 6.1.2

0.2
0.27

WPS 7.0
ExternalData query
workload

QueryProperties query
workload

Copyright IBM Corporation 2009, 2010. All rights reserved.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

140
Figure 2: Query workloads results (response time in seconds)

9.6.2 BPC Explorer Response Time (WPS 6.2.0 data)


The data presented below was obtained using WPS 6.2.0, but the authors expect similar
performance using WPS 7.0 for these scenarios.
WPS 6.2.0 delivers an optimized index structure for the BPC database, which improves the
response times of queries performed by the BPC Explorer. Changes to the physical
representation of work items in the database also produce a positive performance impact.
The following tuning was applied to achieve the results shown in Figure 3 below:

No tuning has been applied to the WPS server

No specific database tuning other than described in Section 9.6.1 above.

The default set of indexes as provided with the WPS 6.2.0 installation were used, no
additional indexes were created.

Figure 3 shows BPC Explorer query response times obtained using a pre-filled BPC database
with the following characteristics:

250,000 processes in total

The processes have been navigated to 8 different states:


o

5,000 processes in state terminated

12,000 processes in state waiting

50,000 processes waiting for a sub-process to respond

100,000 processes with a human task assigned to a group (group work item)

3,000 processes with escalated human tasks

75,000 processes with 10 invoke activities executed

5,000 processes in state failed

These results demonstrate that BPC Explorer query response times are significantly improved in
WPS 6.2.0 by a factor of up to 7.5 times when compared to WPS 6.1.2.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

141

BPC Explorer Query Response Time

30

Query Response Time (seconds)

26
25

20

15

9
9

10

4
3

4
6.1.2 Index Structure

0
My ToDos
(Tasks)

1
Administered
By Me (Tasks) Instance Details
(Processes)

6.2 Index Structure


Template
Details
(Processes)

Copyright IBM Corporation 2009, 2010. All rights reserved.

Figure 3: BPC Explorer improvements in WPS 6.2

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

142

9.7 WPS Release-to-Release improvements


9.7.1 SOABench 2008 Automated Approval (microflow)
Over a series of releases, significant performance improvements have been made in WPS. The
SOABench Choreography Facet, Automated Approval mode (a microflow) is used to
demonstrate the magnitude of these improvements. The results shown below indicate that on a
Windows system this workload is 250% faster in WPS 7.0.0.1 than in WPS 6.0.2.1, and 23%
faster in WPS 7.0.0.1 than in WPS 6.2.0.1.

SOABench Choreography Facet - Automated Approval


Windows - Release History
% of WPS 6.0.2.1 Throughput

300%
250%

WPS 6.0.2.1
200%

WPS 6.1.0.0
WPS 6.2.0.1

150%

WPS 7.0.0.1

100%
50%
0%
4 cores

Copyright IBM Corporation 2005, 2010. All rights reserved

Note that the chart above uses 2 versions of the SOABench workload; SOABench 2005
Automated Approval Mode and SOABench 2008 Automated Approval Mode. The 2005 version
was used previously to obtain the WPS 6.0.2.1 and 6.1.0 results. The bridge between the 2
different versions of the workload was built by running WPS 6.2.0.1 on both versions of the
workload, and then running WPS 7.0.0.1 on the 2008 version. Therefore, the results presented
above are normalized throughput rather than raw throughput, since the 2 versions of the workload
do not produce comparable throughput SOABench 2008 is more complex, as shown in the
workload descriptions referenced above.

9.7.2 Banking (macroflow)


The throughput of long running processes (macroflows) in WPS has consistently improved
release to release since the initial 6.0.0 release. This is true across a number of different scenarios
and workloads; we demonstrate long running process improvements below using the Banking
(JMS) workload, a macroflow that models a mortgage loan application. We present results
measured using WPS 6.0.0.0, 6.0.1, 6.0.1.1, 6.0.2, 6.1.0, and 6.2.0. Using this workload, WPS

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

143

6.2.0 is 3.8 times faster than WPS 6.0.0. In addition, WPS 6.2.0 is 10% faster than WPS 6.1.0.
Note that we expect that WPS 7.0 performs similarly to WPS 6.2.0.
Tuning parameter settings for Banking are described in Appendix A - Banking Settings. One key
configuration difference starting with WPS 6.1.0 is the usage of filestores for the messaging
buses, as opposed to using local databases in previous releases. Another key difference is the use
of WorkManager based navigation and the gencon garbage collection policy in 6.2.0.

WPS 6.0.0 -> 6.2.0 Throughput Improvement


5.0
4.5

99%
99%

BTPS (up is good)

4.0
3.5

WPS 6.0.0
WPS 6.0.1
WPS 6.0.1.1
WPS 6.0.2
WPS 6.1.0
WPS 6.2.0

99%

3.0
2.5

98%

2.0
100%

1.5
98%
1.0
0.5
0.0

Banking on Windows, 1 CPU, JMS


Copyright IBM Corporation 2005, 2009. All rights reserved.

Core Utilization Above Each Bar

Measurement Configuration
WebSphere Process Server, DB2
Intel 3.0 GHz A

9.8 Impact of Varying Number of Active Business Process


Instances
9.8.1 Throughput as increase Preloaded Process Instances
The SOABench 2008 Outsourced Claim Processing workload, described in section 10.4.3, is
evaluated on an IBM xSeries 3950 M2, 2.93 GHz Xeon (4x quad-core) running Windows 2008
Server. This workload is used to demonstrate the throughput characteristics of WebSphere
Process Server business choreography as an increasing number of active interruptible business

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

144

process instances are preloaded into the business process choreography database. An active
process instance is defined as one not yet completed. It can be in-flight, but it can also be
persisted into the business process choreography database if it is waiting for a response from an
outbound service call. The client driver maintains a constant number of active process instances
by issuing new 3 KB requests as processes in the system are completed.
A three tier topology was used for this study:

A database server which holds the Choreography and Messaging databases.

A WPS Server which runs the processes involved in the application scenario.

Two client systems. One runs a client driver and an application to handle asynchronous
service requests. The other runs an application to handle synchronous service requests.

As shown below, throughput remains essentially constant as the active number of process
instances is varied between 2,500 and 1,000,000. With 2,500 and 25,000 preloaded process
instances, WPS 7.0.0.1 runs the workload at a rate of 28.4 Claims Completed per Second (CCPS).
With 125,000 and 250,000 process instances preloaded, the workload runs at nearly the same rate,
28.2 and 28.3 CCPS respectively. With 500,000 and 1,000,000 preloaded process instances, the
rate dips very slightly to 28.1 and 27.9 CCPS, respectively.

SOABench 2008 Outsourced Claim Processing


Increasing Process Instances Preload
Throughput - Windows
Claims Completed per second

35
30
25
20
15
10
5
0
2.5K

25K

125K

250K

500K

1000K

Preloaded Process Instances


Copyright IBM Corporation 2005, 2010. All rights reserved

CPU Utilization 99% at each preload level

Measurement Configuration
WebSphere Process Server

Driver 1

Driver 2

Copyright IBM Corporation 2005, 2010. All right reserved.

DB2

Directed Studies

Intel Xeon 2.93GHz -A

145

Intel 3.67 GHz B Intel 3.0 GHz B POWER6 4.7GHz E

9.8.2 Database System Behavior


The DB2 system hosts the Business Process Choreographer database, the Business Process
Choreographer Message Engine database, and the WPS System Message Engine database. It is an
8 core, 4.7 GHz POWER6 system running DB2 on AIX. It is configured with four RAID 10
arrays each with twelve disks. A file system for database containers is striped across 2 of the
arrays and a file system for database logs is striped across the other 2.
Preloaded process instances are stored in the Business Process Choreographer database. The
behavior of the database system is of interest because the preloaded process instances populate
this database. The following table shows the Core, Disk Utilization, and I/O Wait during peak
throughput at various preload levels. The rise of Database Container Disk Utilization and IO wait
slows after the 250,000 preload indicating that the disk subsystem can provide adequate response
at these throughput and preload levels.

Preloaded Process Instances


2.5K 25K

125K 250K 500K 1000K

Workload Throughput

28.4

28.4

28.2

28.3

28.1

27.9

Database System CPU Utilization

19%

19%

19%

19%

20%

20%

Database Container Disk


Utilization

2%

5%

23%

39%

40%

41%

Database Logs Disk Utilization

9%

9%

9%

9%

9%

9%

Database System IO Wait

1.4% 1.5% 3.7% 6.3% 6.5% 7.0%

The amount of disk storage needed for the Business Process Choreographer database as the
number of process instances increases is shown in the chart below. This information was obtained
using the DB2 Control Center Storage Manager. The second chart shows the size of a database
backup at various preloads. The backups were created using the command: DB2 BACKUP
DATABASE database.
For both charts a 2x growth in the preloaded tasks results in a 2x growth in the storage
requirements, reaching approximately 77 Gigabytes at 1,000,000 preloaded tasks.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

146

SOABench 2008 Outsourced Claim Processing


Increasing Process Instances Preload - AIX - DB2
Process Choreographer Database Size
GigaBytes of Disk Storage

90
76.84

80
70
60
Trend:
As Preload doubles,
Size approxim ately
doubles

50
40

38.63

30
19.19

20
10

10.19
0.30

2.15

2.5K

25K

0
125K

250K

500K

Preloaded Process Instances


Copyright IBM Corporation 2005, 2010. All rights reserved

1000K
Size determined using DB2
Storage Manager snapshot

SOABench 2008 Outsourced Claim Processing


Increasing Process Instances Preload - AIX - DB2
Process Choreographer Database Backup Size
GigaBytes of Disk Storage

90
77.15

80
70
60
Trend:
As Preload doubles,
Size approxim ately
doubles

50
40

39.71

30
19.11

20
10

11.14
0.32

2.17

2.5K

25K

0
125K

250K

500K

Preloaded Process Instances


Copyright IBM Corporation 2005, 2010. All rights reserved

Copyright IBM Corporation 2005, 2010. All right reserved.

1000K
Size determined using backup
of database saved to disk

Directed Studies

147

The growth of the Business Process Choreographer database depends on the data passing through
the process. As seen above, since the requests passing into the system did not change, database
growth behavior is predictable as more requests are preloaded into the system.
An additional consideration for growth is the definition of the process being handled. A more
complex process can result in greater storage requirements. Numerous tables in the Business
Process Choreographer database are involved in process instance storage.
The pie chart below shows the Kilo Bytes used by tables in the Business Process Choreographer
database per task. The data was extrapolated from a database with 25,000 preloaded SOABench
2008 Outsourced Claim Processing tasks. The storage per task is 91 KB. Thirteen tables make up
the majority of storage used. The SCOPED_VARIABLE_INSTANCE_B_T table and the
ACTIVITY_INSTANCE_B_T table account for 58 KB (64%) of the storage used.

SOABench 2008 Outsourced Processing - Process Choreographer Database


Table Physical Storage per task in KiloBytes (data, index, and lob objects)

11

11

4
SCOPED_VARIABLE_INSTANCE_B_T
ACTIVITY_INSTANCE_B_T
PROCESS_CONTEXT_T

WORK_ITEM_T
RESTART_EVENT_B_T

4
36

EVENT_INSTANCE_B_T
CORRELATION_SET_PROPERTIES_B_T
TASK_INSTANCE_T

INVOKE_RESULT2_B_T
PROCESS_INSTANCE_B_T
QUERYABLE_VARIABLE_INSTANCE_T
SCOPE_INSTANCE_B_T

RETRIEVED_USER_T
Other

Labels: KiloBytes in Segment


91 KiloBytes total per task
22
Copyright IBM Corporation 2005, 2010. All rights reserved

The number of rows in these tables depends on the process definition. The chart below shows the
number of rows in various database tables needed to store a single process instance for this study.
The ACTIVITY_INSTANCE_B_T table uses 16 rows to hold its portion of the process instance.
This corresponds to the 16 activity blocks in the process definition. The
SCOPED_VARIABLE_INSTANCE_B_T table uses 24 rows per process instance. This
corresponds to the number of assignments done by the process.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

148

SOABench 2008 Outsourced Processing - Process Choreographer Database


Table Rows Relative to Process Instances Population

16x

ACTIVITY_INSTANCE_B_T
CORRELATION_SET_INSTANCE_B_T
CORRELATION_SET_PROPERTIES_B_T

1x
1x
5x
4x

EVENT_INSTANCE_B_T

Table Name

INVOKE_RESULT2_B_T
PARTNER_LINK_INSTANCE_B_T
PROCESS_CONTEXT_T
PROCESS_INSTANCE_B_T
QUERYABLE_VARIABLE_INSTANCE_T
RESTART_EVENT_B_T
RETRIEVED_USER_T
SCOPE_INSTANCE_B_T

2x
1x
1x
1x
1x
2x
1x
24x

SCOPED_VARIABLE_INSTANCE_B_T
TASK_INST_LDESC_T
TASK_INSTANCE_T

1x
1x

WORK_ITEM_T

0.0
Copyright IBM Corporation 2005, 2010. All rights reserved

5x
5.0

10.0

15.0

20.0

25.0

Table Row Multiplier

9.9 Impact of Business Object size on throughput


The SOABench 2008 Choreography Facet Automated Approval workload (see section 10.4.2)
was used to explore the effect of Business Object (BO) size on throughput. BO size was varied by
specifying a variable amount of information in the customer detail fields in the claim request,
which is referred to as the "payload. Three BO sizes were used: 3KB, 10KB and 100 KB, using
same size for requests and responses.
The chart below shows increasing throughput boosts running WPS 7.0.0.1 versus WPS 6.2.0.1 as
BO size is increased. The throughput for various BO size, measured in Claims Completed per
Second (CCPS), along with the percent improvements follows:

3KB requests and 3KB responses: 390 CCPS, using WPS 7.0.0.1 which represents a 23%
improvement over WPS 6.2.0.1 (318 CCPS).

10KB requests and 10KB responses: 177 CCPS on WPS 7.0.0.1, demonstrating a 57%
improvement over WPS 6.2.0.1 (113 CCPS).

100KB requests and 100KB responses: 23.5 CCPS using WPS 7.0.0.1 which represents a
86% improvement over WPS 6.2.0.1 (12.6 CCPS ).

In addition to the improvements delivered in WPS 7.0.0.1, the other conclusion to draw from the
above data is that throughput drops significantly as BO size increases.
The bar labels on the chart below show the throughput improvement delivered in WPS 7.0.0.1 vs.
6.20.1, rounded to the nearest 10th percentile.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

149

SOABench 2008 Automated Mode


Impact of BO Size on Throughput
450

Claims Completed per second

400

1.2x

350
300

WPS 6.2.0.1

250
200

WPS 7.0.0.1

1.6x

150
100
50

1.9x

0
3k-3k

10k-10k

100k-100k

4 cores

CPU Utilization 98% - 99% across all Bars


Hyperthreading not supported

Copyright IBM Corporation 2005, 2010. All rights reserved

Measurement Configuration
WPS

Driver,SOABench
Services1

SOABenchServices2

DB2

Intel 2.93 GHz B

Intel 3.5 GHz C

Intel 2.93 GHz D

PPC 2.2 GHz B

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

150

9.10 Topology Study: SMP vs. Clustered WPS


9.10.1

Overview

The ability of a single Java Virtual Machine to efficiently use processor cores at high utilization
diminishes as the number of cores increases. To demonstrate this, this study directly compares
the vertical (SMP) and horizontal (clustered) measurements of SOABench 2008 on POWER6
running AIX using data shown previously in sections 5.1.3 and 5.1.4 respectively.
The same numbers of processor cores were used to run both Automated Approval and
OutSourced Modes. Although impressive throughput and scaling rates were achieved in the
single server topology, both workloads demonstrated significant performance gains by applying a
clustered topology where the same number of cores were divided among separate hardware
partitions on which multiple WPS JVMs worked together as cluster members (nodes).
Note that when additional hardware partitions are added, underlying resources are also added
such as: Java heaps, WebSphere log streams, network adapters, TCP stacks, disk adapters, file
systems, etc.

9.10.2

Automated Approval Mode

Here is the comparison of vertical to horizontal topologies of SOABench 2008 Automated


Approval Mode, described in section 10.4.2.
Using 8 cores, horizontal throughput is only slightly faster than vertical. However when using 16
cores, horizontal performance is clearly faster than vertical as the scaling limitations of the single
server JVM start to show up.

Claims Completed per Second

SOABench 2008 Automated Mode - AIX

3000

98%

2500
98%

2000
1500

95%

90%

8 cores

8 cores

16 cores

16 cores

1 node

2 nodes x 4 cores

1 node

4 nodes x 4 cores

1000
500
0

Copyright IBM Corporation 2005, 2010. All rights reserved

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

9.10.3

151

OutSourced Mode

Here is the comparison of vertical to horizontal topologies of SOABench 2008 OutSourced


Mode, described in section 10.4.3.
As with Automated Approval Mode above, horizontal performance is clearly faster than vertical
as the scaling limitations of the single server JVM start to show up.

Claims Completed per Second

SOABench 2008 OutSourced Mode - AIX

140
96%

120
93%

100
80
60
40
20
0
8 cores

8 cores

1 node

2 nodes x 4 cores

Copyright IBM Corporation 2005, 2010. All rights reserved

CPU Utilization Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

9.11 Single Cluster Deployment Environment Pattern


9.11.1

Overview

It was recommended earlier in this report that the remote messaging and remote support
deployment environment pattern should be used for maximum flexibility in scaling. However,
there is a new capability in WAS 7.0 that affects message-driver bean (MDB) connection
behavior that is interesting to examine.
This section studies the impact of this MDB connection behavior on performance when measured
in the context of SOABench 2008 OutSourced Mode with a single cluster deployment
environment pattern. For comparison, measurements with a remote messaging and remote
support deployment environment pattern were shown in section 5.1.4.3.

9.11.2

MDB Connection Behavior

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

152

As will happen with the single cluster deployment environment pattern, when an MDB
application is installed in the same cluster with the message engine, it will use MDB connection
behavior dependent upon the value of the alwaysActivateAllMDBs property of the appropriate
activation specification.
See this link for more information:
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.pmc
.nd.doc/concepts/cjn_mdb_endpt_overview.html
When this property has a value of false, the MDB will only connect to an active message engine
within the same JVM. When this property has a value of true, the MDB will also connect to an
active message engine on a separate JVM in the cluster. These two behaviors are depicted in the
following two charts.

Pre-V7 MDB Connection Behavior


WebSphere Network Deployment Cell

8 core
Power5

8 core
Power5

8 core

16 core

4 core

4 core

SingleCluster
SOABench
BPEL
App

SOABench
BPEL
App

Micro
Flow

Micro
Flow

Async

Async

Macro
Flow

Macro
Flow

X
Failover
MEs

Active
MEs

2010 IBM Corporation

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

153

V7 MDB Connection Behavior


WebSphere Network Deployment Cell

8 core
Power5

8 core
Power5

8 core

16 core

4 core

4 core

SingleCluster
SOABench
BPEL
App

SOABench
BPEL
App

Micro
Flow

Micro
Flow

Async

Async

Macro
Flow

Macro
Flow

Failover
MEs

Active
MEs

2010 IBM Corporation

9.11.3

Topology

For this study, a single cluster contains the application and messaging engine, and this cluster has
two cluster members (nodes). The messages engines will run as failover on one node (left node)
and active on the other node (right node).
Dependent on the property value, the MDB in the left node will or wont connect to the active
message engine in the other JVM. The MDB in the right node will always connect to the active
message engine because it is within the same JVM.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

154

Topology: Single Cluster, SOABench 2008 OutSourced Mode - AIX


WebSphere Network Deployment Cell

8 core
Power5

8 core
Power5

SOABench
Agent and
Outsourced
Services
SOABench
Outsourced
Controller
(Driver)

8 core

16 core

4 core

4 core

SingleCluster
SOABench
BPEL
App

SOABench
Services

DB2
(BPE)

IBM HTTP
Server with
WebSphere
Plugin
DB2
(MEs)

SOABench
BPEL
App

Micro
Flow

Micro
Flow

Async

Async

Macro
Flow

Macro
Flow

Failover
MEs

Active
MEs

DB2
(WPS)

2010 IBM Corporation

9.11.4

Workload

The SOABench 2008 OutSourced Mode workload is not purely MDB driven. A full description
of the workload can be found in section 10.4.3. A significant portion of load is driven via
WebServices invocations, which are sprayed across the nodes from the IBM HTTP server
pictured in the topology above.
This is an important point, because even when the MDB of a particular node is unable to connect
to an active message engine, there is still a significant amount of work for it to perform.

9.11.5

Results

Reading from left to right, the 1st bar in the chart below is provided as a baseline for comparison.
For this measurement bar, the left node is stopped and the right node is started and handling all
workload traffic.
The 2nd bar shows pre-WAS 7.0 MDB behavior where the alwaysActivateAllMDBs property is
set to false. Again, because this workload is not purely MDB driven, the left node still handles
some workload traffic; however its CPU utilization is only 59% busy while the right node is
running at a very high 97% CPU busy.
The 3rd bar shows the performance improvement achieved when the property is set to true and the
left node is now able to perform additional work via its MDB connection to the active ME in the
right node, raising the left nodes CPU utilization to 81%. However, because of the very high

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

155

CPU utilization (98%) of the right node, the left node has trouble taking more work from the ME
to drive its CPU utilization even higher.
The 4th bar shows further performance gains obtained by adjusting the weights on the http sprayer
to favor the left node for the non-MDB traffic, thus driving higher overall workload throughput
and better balance of CPU utilization between the left and right nodes. However, if the input
traffic varies significantly, the CPU utilization could become imbalanced one way or the other
until the http sprayer weight is adjusted. In practice, this would need to be monitored closely and
adjusted accordingly.

Claims Completed per Second

Single Cluster
SOABench 2008 OutSourced Mode - AIX
(4 cores per node)
140
120

59%,97%
1.5x

100
80

94%,97%
1.9x

81%,98%
1.7x

98%

60
40
20
0
1 node

2 nodes, pre-V7
MDB behavior

Copyright IBM Corporation 2005, 2010. All rights reserved

2 nodes

2 nodes, http
sprayer weight 5-4

CPU Utilization and Scaling Shown Above Each Bar


Simultaneous Multithreading (SMT) enabled

Although not measured here, we predict that adding more nodes to this single cluster topology
would further increase performance as long as the http sprayer weights are adjusted to achieve
good balance and the active message engine node does not become the bottleneck due to
excessive CPU utilization. Potentially, with enough cluster members, the http sprayer weight for
the active message engine node would have to be set to 0 (lowest) so that it only handles
messaging engine related work. However, well before such maintenance intensive adjustments of
the http sprayers weights are made, an alternate cluster topology should be considered.

9.11.6

Summary

A single cluster deployment environment is now more viable due to WAS 7.0 MDB
enhancements, especially for workloads heavily dependent on MDBs.
However, as this study illustrates, due to the imbalance of CPU utilization across nodes related to
where the active message engines are running, such a configuration should be considered
carefully for anything but the simplest of implementations.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

156

9.12 Scaling up production deployments


The WebSphere BPM products have been available for several years; as such, many customers
have developed very sophisticated and mature production deployments. The issues encountered
when scaling up a production deployment are often quite different from those faced when initially
developing a solution, including adding new applications, expanding existing applications with
more modules, bringing more concurrent users online, adding additional cluster members, etc.
Many of these issues are discussed in this performance report. However, the information is
located in several different sections of the report. The purpose of this section is to cross-reference
this information to make it easier for the reader to locate.
Here is a cross reference of Authoring information for scaling up deployments:

WID Considerations

Reduce the number of SCA Modules, and Modularity Impact at Runtime

Hardware matters: server and desktop

Utilize Shared Libraries

Utilize multi-threaded SCA clients

Following are sections which address clustering:

WPS measurements in a clustered environment

Topology Considerations

Clustering Best Practices

Clustering Tuning

Topology Directed Studies: SMP vs. Clustering and Single Cluster

Finally, here are discussions on issues for high volume runtime deployments:

Key Tuning and Deployment Guidelines

Tuning Checklist

Tuning Methodology

Tuning for High Concurrency

Thread Pool Tuning

Supporting up to 10,000 concurrent users

Using Query Tables to optimize query response time

WESB Client Scaling

Sample Tuning Settings WPS and WESB

9.13 WICS to WPS Migration

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

157

The Websphere Integration Developer (WID) provides a wizard and command line utility which
enables users to migrate Websphere InterChange Server (WICS) content to equivalent artifacts on
Websphere Process Server (WPS). This wizard can, with minimum developer input, generate
fully functional WPS artifacts. Please note that migration is a complex topic with many different
aspects; for a complete discussion please see the IBM WebSphere InterChange Server Migration
to WebSphere Process Server Red Book at the following location:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247415.pdf
This section evaluates the performance of WID 7.0.0.1-generated migration artifacts running on
WPS 7.0.0.1 by comparing it with the performance of an equivalent workload running on WICS
4.3.0.6 and an equivalent WPS workload run on previous versions of WID/WPS (6.1.0 & 6.2.0).
The workload used for evaluation is Contact Manager with a Web Services binding. The Contact
Manager workload is described in section 10.2. There are 4 workloads used to evaluate the
performance, each of which is different but semantically equivalent.

WICS version: utilizes the WebSphere Business Integration Adapters (WBIA) Web
Services adapter to act as the source of Business Objects, and an emulated Clarify adapter
as the destination. The Web Services adapter interacts with the WICS server using
WebSphere MQ and the emulated Clarify adapter is connected to the WICS server via
IIOP.

WPS 6.1.0 version: developed by making use of the WICS Migration Wizard in WID
6.1.0 to migrate the WICS workload described above. This wizard migrates the Web
Services adapter to still be the WBIA Web Services adapter (but to be run in a standalone
JMS mode) and migrates the emulated Clarify adapter to a custom adapter which
interfaces with WPS using JMS. The workload was subsequently modified to remove the
relationship map step from the maps and to post an async one way JMS message for each
interaction with the emulated Clarify adapter. This is to ensure that the workload driver
can drive enough work into the system to maximize throughput. The generated workload
is measured on WPS 6.1.0.

WPS 6.2.0 version: developed like the WPS 6.1.0 version by using the WID 6.2.0 WICS
migration wizard. This wizard differs from the 6.1.0 version in that it migrates the WBIA
Web Services adapter to an HTTP SCA binding with a custom data handler. Postmigration modifications performed are the same as in WPS 6.1.0 version. The workload
is then measured on WPS 6.2.0.

WPS 7.0.0.1 version: developed like the WPS 6.2.0 workload but using the WID 7.0 .0.1
WICS migration wizard. The 7.0.0.1 wizard offers the option of merging the connector
and collaboration modules during migration. Post-migration, the workload was changed
to incorporate the Migration Development Best Practices and to post an async one way
JMS message for each interaction with the emulated Clarify adapter. The workload is
then measured on WPS 7.0.0.1.

All four workloads described above are evaluated on an IBM pSeries model 9117-MMA, 4.7
GHz (8-way SMP) running AIX 6.1 to demonstrate the throughput characteristics. Measurements
are shown in the chart below.
On the above specified setup with all eight cores enabled, the WID 7.0.0.1 migrated workload
runs on WPS 7.0.0.1 at a rate of 1004 Business Transactions Per Second (BTPS), which is a 54%
improvement over WPS 6.2.0. WPS 6.2.0 runs the WID 6.2.0 migrated workload at a rate of 650
BTPS which is an 8.3x improvement over 6.1.0. WID 6.1.0 migrated workload runs on WPS
6.1.0 at a rate of 78 BTPS.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

158

On the same setup as above, WICS 4.3.0.6 runs its workload at a rate of 1,049 BTPS. A few notes
on this data are relevant:

WPS 7.0.0.1 delivers comparable throughput as WICS for the same workload.

WICS 4.3.0 only utilizes 54% of the available cores, even after comprehensive tuning
was done. This is due to limitations in the WICS runtime architecture, notably a singlethreaded listener path for processing incoming events. WPS does not have this limitation
and therefore has superior SMP scaling, as is demonstrated in the chart below.

The data presented below is for a single server configuration, since WICS does not
support clustering. WPS can deliver higher throughput rates than are show below via
clustering.

B usiness Transactions per second

Contact Manager with Webservices binding


ICS Migration performance - AIX
1200
96%

1000
WICS 4.3.0.6

800
62%

600

ICS Migration WPS


6.1gm

400

ICS Migration WPS


6.2gm

200

ICS Migration WPS


7.0gm

7%

0
8 core

% of WICS performance shown above WPS bars


WICS CPU Utilization is 54%.
WPS CPU Utilization 93-94% for all releases
Simultaneous Multithreading (SMT) enabled

Copyright IBM Corporation 2006, 2010. All rights reserved

Measurement Configuration
WICS, WPS server

DB2

Driver

POWER6 4.7 GHz - F

PPC 2.2 GHz - C

Intel 3.5GHz - D

9.14 Large Object size study


9.14.1

Introduction and Caveats

This section contains a series of studies exploring the behavior of a system in the presence of a
large input event (BO). Data is shown for WPS 7.0.0.1 and WESB 7.0.0.1.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

159

For any application, the maximum size input object that it can support depends on a number of
factors. The amount of processing required to complete a transaction and the representation of the
input event internal to the application are clearly important as they affect the number of copies of
the event required to be held in memory and the nature of the objects held in the Java Heap
(whether they are contiguous or composed of a set of smaller, discrete objects).
Also, the ability to process large input events usually depends on the transactional nature of the
processing involved. Some data processing systems are able to break a large transaction into
multiple smaller transactions that are processed (or committed) independently, while others are
not. Whenever possible it is advisable to design a solution that does not depend on processing
input events of arbitrarily large size. Please refer to the Best Practices described in Section 2.5 for
more information related to processing Large Business Objects.
The sections that follow display a wide variety of results. While it may be tempting to do so,
please do not view the data as a fundamental product limit for the largest input event size. Rather,
these sections are a set of case studies intended to explore the factors affecting the ability of a
solution to successfully process a large input event.

9.14.2

Large Objects in WPS

The SOABench 2008 Automated Approval workload (see section 10.4.2) was used to explore the
ability to handle large objects within a business process running in WPS 7.0.0.1. The purpose of
this study was to find the maximum object size that the system can handle repeatedly (20 times
for this study) without exceptions. The system evaluated to find the maximum size is an AIX 6.1
system with 31 GB of RAM running a 32 bit version of WPS 7.0.0.1. In addition an evaluation of
an AIX 64 bit version of WPS 7.0.0.1 was done for a single 500 MB request.
Large Object requests were produced in the client driver by creating additional customer detail
fields in the claim request which is referred to as the "payload." Note that the charts below show
the client driver's input object size and not the actual size processed by WPS; the generation of
the payload results in an actual request size 6% larger than the client reports. For example a 100
MB request is actually 106 MB in WPS (110 MB on the wire with packet overhead).
Responses from the server are constant at 3 KB in size. The SOABench 2008 automated approval
workload implementation used for this study holds 7 copies of the payload for use during the
various steps of the process flow resulting in many large contiguous memory objects contending
for Java heap space. Note: the SOABench 2005 automated approval workload, used in previous
versions of this performance report, holds 5 copies of the payload so maximum object size should
not be compared between the two workload versions.
The maximum Java heap size required was determined by repeated experiments to balance the
memory needed for native memory versus the Java heap as large object sizes were increased. On
AIX the optimal maximum heap was determined to be 2600 MB but to achieve this it was
necessary to set an operating system variable:
"export LDR_CNTRL=MAXDATA=0xB0000000@DSA"
in the session starting the WPS server to provide additional memory segments for user processes.
For the AIX WPS 7.0.0.1 64 bit system study, the maximum Java heap was set to 9800 MB with
no additional AIX system variable settings required. In all cases, native heap space was
preserved by using type 4 JDBC drivers for WPS datasources. See reference:
http://www-128.ibm.com/developerworks/eserver/articles/aix4java1.html
The chart below shows the 32-bit WPS system object maximum was 150 MB for WPS 6.2.0.1
and 170 MB for WPS 7.0.0.1, a 20 MB improvement. The 64 bit WPS 7.0.0.1 was able to handle

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

160

the 500 MB object request submitted. Note that this was the largest size attempted; finding the
maximum size for this system was not attempted.
Transaction completion time also improves on large requests in WPS 7.0.0.1. 150 MB requests
on the IBM pSeries power6, 4.7 GHz 4 core, AIX 6.1 system took 542 seconds each on 32-bit
WPS 6.2.0.1, but the larger 170 MB request took only 490 seconds on 32-bit WPS 7.0.0.1. The
500 MB request on this hardware running 64-bit WPS 7.0.0.1 took 1,376 seconds to complete.
Due to the response times shown above, it was necessary to increase several timeout settings for
both the SOABench client driver and the WPS server running the workload. These include:

Increasing the Application Server Transaction Service timeouts for Total transaction
lifetime, Async response, Client inactivity, and Maximum transaction.

Increasing the SOABench BPEL EJB module web service client bindings request timeout.

Increasing socket read and write timeouts for both the SOABench Client and Server
invocations using the JVM properties (in seconds) Dcom.ibm.ws.webservices.readTimeout and "-Dcom.ibm.ws.webservices.writeTimeout=".

SOABench 2008 Automated Approval


Maximum Large Request Size - AIX 32 bit
200
170

Request Size in MegaBytes

180
150

160
140
120

6.2.0.1

100
80

7.0.0.1

60
40
20
0
Request Size Shown Above Each Bar

Copyright IBM Corporation 2005, 2010. All rights reserved

Measurement Configuration
WebSphere Process Server
POWER6 4.7 GHz D

Driver
Intel Xeon 2.93GHz -A

Copyright IBM Corporation 2005, 2010. All right reserved.

DB2
POWER6 4.7 GHz D

Directed Studies

161

SOABench 2008 Automated Approval


32 bit versus 64 bit Large Request Size - AIX

Request Size in MegaBytes

600

Achieved**
500

500
400
AIX 32 bit
300
AIX 64 bit

Maximum
170

200
100
0

Request Size Shown Above Each Bar


** Larger request possible but not attempted
Copyright IBM Corporation 2005, 2010. All rights reserved

Measurement Configuration
WebSphere Process Server
POWER6 4.7 GHz D

9.14.3

Driver

DB2

Intel Xeon 2.93GHz -A

POWER6 4.7 GHz D

Large Objects in WESB

The JMS binding and Web Services scenarios were evaluated with large messages to determine
the largest message which could be processed in sustained operation. The tests were run for a
period of 2 hours.
These tests were run using the Transform Value mediation and a Custom mediation which
transforms the value of a single field in the request message. These mediations were chosen as
they represent a simple case requiring little processing and a complex case which will cause the
request to be serialized, respectively. For details of the mediations see section 11.3. For details of
the topology used see section 11.1 and 11.2.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

162

The Java heap was set to a fixed size of 1536 MB for these measurements.
9.14.3.1 Web Services Binding large messages
The chart below shows that the maximum request size ranges from 82 MB to 96 MB, and the
maximum response size ranges from 91 MB to 110 MB, depending on the processing done in the
mediation.

Web Service Large Messages - Windows

Message Size (MBytes)

120
100
80
V6.2
V7.0

60
40
20
0
Transform Value
Mediation (req)

Transform Value
Mediation (res)

Custom Mediation (req) Custom Mediation (res)

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

(req=request m essage

res=response m essage)

Measurement Configuration
Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 2.93GHz - C

Intel 3.5GHz - B

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

163

9.14.3.2 JMS Binding large messages


The chart below shows that the maximum message size ranges from 75 MB to 130 MB
depending on the processing done in the mediation and whether persistent or non persistent
messages are utilized.

JMS Large Messages - Windows


140

Message size in MB

120
100
6.2
80

7.0.0.1

60
40
20
0
Transform Value Custom Mediation Transform Value Custom Mediation
Mediation Non
Non Persistent
Mediation
Persistent
Persistent
Persistent
4 CORE

Copyright IBM Corporation 2006, 2010. All rights reserved

Hyper-Threading (HT) Enabled

Measurement Configuration

JMS
Producer/Consumer

WebSphere ESB

DB2

Intel 2.8GHz - B

Intel 3.0GHz - D

Intel 3.50GHz - A

9.15 Messaging Binding Comparison using WESB


The following two sections illustrate the performance differences of the various messaging
bindings in both Non Persistent and Persistent modes of operation. Here is a summary of the
results shown below:

For non-persistent messaging, using the default messaging provider within WESB
(WebSphere Platform Messaging) is 38% faster than the MQ JMS provider using the

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

164

Base message size (1.2 KB). MQ JMS provides equivalent messaging performance to
the MQ binding for the same scenario.

For persistent messaging, the default messaging provider is 49% faster than the MQ JMS
provider using the Base message size. MQ JMS messaging outperforms the MQ binding
by 7% for the same scenario.

Note: Generic JMS was not tested in V7.0.0.1. V6.2 tests showed the performance to be identical
to MQ JMS.

9.15.1

Messaging Binding Comparison Non Persistent

The following charts compare the throughput for the different non persistent messaging bindings
using the Transform Value mediation and a range of message sizes. For details of the mediations
and request sizes see sections 11.3 and 11.4. All data is obtained on a 4-way hyper-threaded
WESB server machine. For details of the topologies used see section 11.2.

Xform Value Mediation Non Persistent - Windows


2500
98%

Reqs/Sec

2000

1500
98%

Base

98%

10
1000

98%

100
98%

98%

500
98%

95%

99%

0
JMS

MQ JMS

Copyright IBM Corporation 2006, 2010. All rights reserved

MQ
CPU Utilization Shown Above Each Bar
Hyper-Threading (HT) Enabled

4 CORE

Measurement Configuration
JMS Producer/Consumer

WebSphere ESB

Intel 2.8GHz - B

Intel 3.0GHz - D

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

165

9.15.2

Messaging Binding Comparison Persistent

The following charts compare the throughput for the different persistent messaging bindings
using the Transform Value mediations and a range of message sizes. For details of the mediations
and request sizes see sections 11.3 and 11.4. All data is obtained on a 4-way hyper-threaded
WESB server machine. For details of the topologies used see section 11.2.

Xform Value Mediation Persistent - Windows


1400
1200

94%

Reqs/Sec

1000
800

Base

97%

10

92%

81%

600
91%

100
88%

400

200

97%

89%

80%

0
JMS

MQ JMS

MQ

4 CORE

CPU Utilization Shown Above Each Bar


Hyper-Threading (HT) Enabled

Copyright IBM Corporation 2006, 2010. All rights reserved

Measurement Configuration
JMS
Producer/Consumer

WebSphere ESB

DB2

Intel 2.8GHz - B

Intel 3.0GHz - D

Intel 3.50Ghz - A

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

166

9.16 XSL Transform (XSLT) vs. BOMap primitives using WESB


The charts below compare the performance of the XSL Transform primitive and the Business
Object Map primitive. The charts show that in a mediation flow which is eligible for deferred
parsing the XSL Transform primitive gives better performance; see section 3.8.1 for details on
which mediations are eligible for deferred parsing. However in a mediation flow where the
message is already being parsed the Business Object Map primitive gives better performance.
XSL Transforms are more efficient in mediation flows which are eligible for deferred parsing
because the message flowing through the mediation remains in a serialized form throughout the
flow. Thus, no serialization is required prior to executing the transformation associated with the
XSLT primitive. BO Maps are more efficient in mediations that do not leverage deferred parsing
because the message flowing through the mediation remains in object form during all processing.
If a BO Map is used in a mediation flow that is otherwise eligible for deferred parsing
deserialization of the message will occur and the flow will no longer be eligible for deferred
parsing. If an XSL Transform is used in a mediation flow that is processing the message in object
form, the message will be serialized prior to performing the XSL transformation and deserialized
after the XSL Transform primitive has completed its processing.

XSLT vs BOMap in deferred parsing flow - Windows


3500
3000

96%
96%

Reqs/sec

2500

96%

96%
97%

96%

XSLT

2000
96%

1500

BOMap
96%

1000
71%

500

98%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

The XSLT mediation sets the value of a single element in the request message and copies all
other elements unchanged using the XSL Transform primitive. The request message processing is
eligible for deferred parsing.
The BOMap mediation uses the Business Object Map primitive to map the body of the request
message into a new Business Object and sets the value of a single element. The request message
processing is not eligible for deferred parsing.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

167

XSLT vs BOMap in non deferred parsing flow - Windows


3000
96%

2500

97%

97%

Reqs/sec

2000

97%

ElemSet XSLT
97%

1500
97%

ElemSet BOMap

97%
97%

1000
500
98% 98%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

The two mediation flows in this chart are the same as in the chart above but with a Message
Element Setter primitive inserted into the mediation flows before the XSL Transform and
BOMap primitives. The Message Element Setter primitive is included to force a parse of the
message so that the flow is not eligible for deferred parsing.

Measurement Configuration
Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 2.93GHz - C

Intel 3.5GHz - B

9.17 Modularity Impact - Composite vs. Chained Mediations


The following charts show the throughput measured for a mediation flow comprised of several
primitives and connected either as a composite (all in one mediation module) or chained (separate
modules connected via SCA bindings) using a range of request and response sizes. For the
composite case there is an additional comparison where all the mediation primitives reside in one
mediation flow component (MFC), or are split into multiple MFCs (one primitive in each). For
details of the mediations and request/response sizes see section 11.3 and 11.4. All data is obtained
using the JAX-WS (SOAP 1.1) Web services bindings on a 16-core WESB server machine. For
details of the topology used see section 11.1.
The purpose of this comparison is to show the scale of the overhead in modularizing a mediation
either by using SCA bindings to link mediation primitives in separate modules (chained), or by

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

168

linking primitives in a single module using multiple MFCs. For each of the three cases all the
mediation code is still running in a single JVM.
As the chart shows below, using a composite mediation is significantly cheaper than the chained
variation as less data conversion (with an associated reduction in heap usage) will take place.
Splitting the primitives across multiple MFCs in the same module has a lower overhead with the
proportional cost decreasing with message size.
Composite vs Chained Mediation - Windows
1600
1400

97%
96%
98%

1200
Reqs/sec

Composite
Composite (Multi MFC)
Chained
97%
96%

1000

98%
97%

98%

800

98%

98%
98%
98%

600
400
200

98%98%98%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

Measurement Configuration
Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 2.93GHz - C

Intel 3.5GHz - B

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

169

9.18 Throughput using JAX-WS vs. JAX-RPC for Web Services


JAX-WS became the default Web Services binding in WESB V7.0. The two charts below
compare the older JAX-RPC binding with the new default of JAX-WS (SOAP 1.1). Further
optimizations to JAX-WS processing has been introduced in WESB V7.0 resulting in this binding
outperforming JAX-RPC.
The first chart (composite mediation) shows that JAX-WS is on average 5% faster than JAX-RPC
over a range of message sizes.
The second chart shows results for the transform namespace mediation which is eligible for
deferred parsing in both the request and response flows. In this case JAX-WS is on average 19%
faster than JAX-RPC over a range of message sizes.
For details of the mediations and request/response sizes see section 11.3 and 11.4. All data is
obtained on a 16-core WESB server machine. For details of the topology used see section 11.1.

Composite Mediation W/S Binding Comparison - Windows


1600

97%

97%

JAX-RPC
JAX-WS

1400

Reqs/sec

1200
98% 97%

1000

98% 98%

800

98% 98%

600
400
200

98% 98%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

Copyright IBM Corporation 2005, 2010. All right reserved.

CPU Utilization Show n Above Each Bar

Directed Studies

170

Transform Namespace Mediation W/S Binding Comparison Windows


4500

95%

JAX-RPC

4000
3500

JAX-WS

96%
96%

Reqs/sec

3000
96%

2500

95%
96%

95%
95%

2000
1500
1000
500

89% 88%

0
Base in/Base out

Base in/10K out

10K in/Base out

10K in/10K out

100K in/100K out

16 CORE
Copyright IBM Corporation 2009, 2010. All rights reserved

CPU Utilization Show n Above Each Bar

Measurement Configuration
Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 2.93GHz - C

Intel 3.5GHz - B

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

171

9.19 Authoring Studies


9.19.1

Summary of Key Measurements

The studies presented in the following sections explore issues relevant to the performance of
WebSphere Process Server and WebSphere Integration Developer 7.001 when used in an
authoring environment.
From these studies, the following observations can be made:
1. Deployment to a production server is expected to be as much at twice as fast as what is
experienced in a development environment.
2. When using wsadmin to install SCA Modules, installing multiple modules in a WAS
Session and then saving the configuration change together is faster than installing (and
saving) each of the Modules individually.
3. In addition to memory savings, defining Shared Libraries according to the technote
(available at: http://www-01.ibm.com/support/docview.wss?uid=swg21298478) reduces
total application install time.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

9.19.2

172

Hardware Study Server vs. Desktop systems

For this study, we compare response time when publishing the Loan Processing workload from
WID 7.0.0.1 to WPS 7.0.0.1 on a variety of hardware configurations. Two different machine
types are used: a desktop system resembling a typical developers workstation and a server
system resembling a typical production server. Additionally, each of the two systems is measured
in three different configurations, varying the number of cores available to the system as well as
the configuration of the Disk subsystem.
The results from the Model 9196 Desktop system indicate that addition of a second processing
core improves publish responsiveness from 738 seconds to 612 seconds (a 17% improvement).
Addition of a second physical disk drive (and installing WID & WPS to that drive, isolating its
activities from those associated with the operating system) delivers an additional 12%
improvement.
The results from the Model 7233 Server System indicate that, even with only a single processing
core active, the presence of a fast disk subsystem (RAID Disk array combined with filesystem
improvements available in the server operating system) leads to improved publish responsiveness.
Addition of a second core further improves responsiveness. Additional cores beyond the second
would lead to only a small improvement in responsiveness.
From this data it would be reasonable to expect deployment to a production server to be as much
as twice as fast as deployments that developers experience on their workstations, due simply to
the hardware differences typical in the two environments.

Loan Processing Workload


BPM 7.001 Publish Response Time - Windows
800
700

738
51%

Time (Seconds)

600

612
35%

500

538
41%

477
96%

400

374
66%

300
200
100
0
1 Core, 1 Disk

2 Core, 1 Disk

2 Core, 2 Disk

1 Core, RAID

Model 9196
Copyright IBM Corporation 2010. All rights reserved

2 Core, RAID

Model 7233
Bar Labels: Response Time & Average CPU Utilization

Measurement Configuration

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

173
Model 9196

Model 7233

Intel 2.66GHz B Intel 2.8 GHz - D

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

9.19.3

174

Deployment Strategy Study

In this study we use the 60 Modules in the Loan Processing application to demonstrate the
relative performance of some of the options available when deploying Modules via the wsadmin
tool.
First, we use a wsadmin install script that saves the changes made under the configuration session
multiple times when executing the install. Each of the 60 Modules is installed, saved & started
independently, before proceeding to the next Module. This installation operation completes in
466 seconds as shown in the Multiple WS Saves measurement in the chart below.
Second, we use a wsadmin install script that installs all 60 of the Modules, with a single save
operation after all of the Modules are installed. Then, each of the Modules is started. This
operation completes in 382 seconds, 18% faster than the Multiple WS Saves measurement. This
data appears as the Single WS Save measurement in this data chart.
Finally, the shared libraries technique described in the technote,
http://www-01.ibm.com/support/docview.wss?uid=swg21298478, is used in conjunction with the
Single WS Save technique described here. In addition to the memory savings that shared libraries
provides, it delivers an additional 15% savings in install response time, for a total install time of
326 seconds (30% faster than the Multiple WS Saves approach).

Loan Processing Workload


BPM 7.001 Publish Response Time - Windows
500
450

466

Time (Seconds)

400
382

350
300

326

250
200
150
100
50
0
Multiple WS Saves

Single WS Save

Single WS Save & Shared Libraries

2 Core
Copyright IBM Corporation 2010. All rights reserved

Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

175

9.20 BPM 6.2.0 Directed Studies


The studies presented below utilize data from the 6.2.0 release of the WebSphere BPM products.
These studies were not repeated for WebSphere BPM 7.0.0.1 because the authors believe that the
messages conveyed by the studies would not be substantially different. Given that, please
continue to use these studies for guidance on the BPM 7.0.0.1 products.

9.20.1

Impact of Enabling Security at Runtime

In WebSphere Application Server Version 6.1, the Security Configuration Wizard enables you to
configure application or Java 2 security. For further information, please see the IBM InfoCenter:
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.wsf
ep.multiplatform.doc/info/ae/ae/usec_secureadminappinfra.html
In order to run an application with Java 2 security enabled, required permissions have to be
granted in the was.policy file of the application ears. Please see the IBM InfoCenter for more
details:
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.wsf
ep.multiplatform.doc/info/ae/ae/csec_rsecmgr2.html
Following is a screen shot of the admin console page with Java 2 security enabled:

9.20.1.1 SOABench 2005 Automated Approval

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

176

The automated approval workload of the Choreography facet, described in section 10.5.2, is
evaluated on an IBM xSeries 3950 M2 2.93 GHz Xeon (4 quad-core processors), running with 4
cores enabled on Windows Server 2008, to demonstrate the throughput characteristics of
WebSphere Process Server in this configuration. 3 KB requests and 3 KB responses are utilized.
The workload is run in infrastructure mode, making the processing done during service call
invocations trivial.
With no security enabled, WPS 6.2.0 runs the workload at a rate of 556 Business Transactions per
Second (BTPS). With application security enabled, WPS runs the workload at a rate of 524 BTPS
indicating a degradation of 6% comparing to setting with no security enabled. In addition to
application security, when Java 2 security is enabled, the rate drops to 360 BTPS, indicating a
degradation of 35% comparing to when no security is enabled.

SOABench Choreography Facet - Windows 2008


Automated Approval - Impact of Application and Java 2 Security
Business Transactions per second

600
99%
99%
500

WPS 6.2

400
99%

WPS 6.2+ApplicationSecutity

300

WPS 6.2+ApplicationSecutity+J2Security
200

100

4 cores
Copyright IBM Corporation 2005, 2009. All rights reserved

CPU Utilization Show n Above Each Bar


Hyperthreading not supported

Measurement Configuration
WebSphere Process Server

Driver

DB2

Intel 2.93 GHz A

Intel 3.5 GHz C

PPC 2.2 GHz B

9.20.1.2 SOABench 2005 Manual Approval

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

177

The manual approval workload of the Choreography facet, described in section 10.5.3, is
executed on an IBM xSeries 3950 M2, 2.93 GHz Xeon (4x quad-core), running with 4 cores
enabled on Windows Server 2008, to demonstrate the throughput characteristics of WebSphere
Process Server in this configuration. 3 KB requests and 3 KB responses are utilized. The
workload is run in infrastructure mode, making the processing done during service call
invocations trivial.
With no security enabled, WPS 6.2.0 runs the workload at a rate of 44 Business Transactions per
Second (BTPS). With application security enabled, WPS runs the workload at a rate of 34 BTPS
indicating a degradation of 23% comparing to having no security enabled. In addition to
application security, when Java 2 security is enabled, the rate drops to 29 BTPS indicating a
degradation of 34% comparing to having no security enabled.

SOABench Choreography Facet - Windows 2008


Manual Approval - Impact of Application and Java 2 Security
Business Transactions per second

50
99%
40
99%

WPS 6.2
99%

30

WPS 6.2+ApplicationSecutity
20

WPS 6.2+ApplicationSecutity+J2Security

10

4 cores
Copyright IBM Corporation 2005, 2009. All rights reserved

CPU Utilization Show n Above Each Bar


Hyperthreading not supported

Measurement Configuration
WebSphere Process Server

Driver

DB2

Intel 2.93 GHz A

Intel 3.5 GHz C

PPC 2.2 GHz B

9.20.2
Remote Messaging Deployment Environment Startup
Time and Footprint

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

178

The Loan Processing workload described in Section 12.2 was used to quantify the startup time
and footprint improvements in WPS 6.2.0 when running in a remote messaging deployment
environment with many application modules installed in the cell.
See this link for an overview of various deployment environment patterns including remote
messaging:
http://publib.boulder.ibm.com/infocenter/dmndhelp/v6r2mx/index.jsp?topic=/com.ibm.webspher
e.wps.620.doc/doc/cpln_topologypat.html
There is a significant reduction in the time it takes to start the Message Engine associated with
WPS 6.2.0 when using this workload, as shown in the chart below. Message Engine startup time
is reduced by a factor of 6.4 times.

Startup Time Loan Processing Application


1200
123

Time in Seconds

1000
800
WPS APP
WPS ME

600
1016
400
126

200

159
0
64 bit WPS 6.1.0.1
Copyright IBM Corporation 2005, 2009. All rights reserved

64 bit WPS 6.2.0


Simultaneous Multithreading (SMT) enabled

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

179

There is also a significant reduction in memory footprint after startup in both the Message Engine
JVM and the WPS 6.2.0 JVM with this workload installed, as is demonstrated in the chart below.
The system memory footprint is reduced from 903 MB to 624 MB, and improvement of 31%.

Millions

Startup Memory Footprint Loan Processing Application


1000
900
800

LiveBytes

700
600

520

WPS APP
WPS ME

500
423

400
300
200

383
201

100
0
64 bit WPS 6.1.0.1

64 bit WPS 6.2.0

Copyright IBM Corporation 2005, 2009. All rights reserved

Simultaneous Multithreading (SMT) enabled

Measurement Configuration

9.20.3

APP, ME

DB2

4 core LPAR on
PPC 1.9 GHz - A

PPC 2.2 GHz - A

Authoring - Shared Libraries Study

When a WPS application makes use of data-type or interface definitions defined in a library
module, WID copies the artifacts from the library into the application module so that those types
may be available to the runtime. If many application modules make use of a library, its artifacts
are copied many times, increasing the memory pressure on the Server runtime. A technote
(available at: http://www-01.ibm.com/support/docview.wss?uid=swg21298478) describes a
technique that declares the library modules as WAS Shared Libraries and allows their artifacts to
be shared among WPS modules.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

180

In this study, we examine the memory reduction realized when rebuilding the Loan Processing
application to make use of the technique described in the technote. We prepared deployment code
using Java EE Prepare For Deploy and exported the application from WID as a set of EAR files
and then used a jacl script to deploy the EARs to the WPS server via wsadmin.
This application makes moderate use of sharing; 2 shared libraries are used by all 62 modules,
and 20 other shared libraries are used by approximately 5 modules each.
The chart below shows that the peak live memory within the WPS Java Heap when publishing the
Loan Processing application via the standard mechanism is 378MB. When using the WAS
Shared Library technique described in the technote, peak memory is reduced 11% to 335MB.

WAS Shared Library Study - Loan Processing Workload


WPS 6.2 Publish Peak Java Memory
500
450
400
Memory (MB)

350

378
335

300
250
200
150
100
50
0
Standard Deployment

Copyright IBM Corporation 2009. All rights reserved

Shared Libraries Technique

2 Core

Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B

One of the steps described in the WAS Shared Library technote instructs the Administrator to
copy Shared Library files to the <WAS_HOME>/lib/ext directory for deployment and then to
delete those files when the deployment is complete. The chart below shows the importance of
deleting the Shared Library files from this temporary location. When using the standard
deployment technique, the WPS Java Heap contains 339MB of live data after restart. When using
the WAS Shared Library Technique, WPS liveset is reduced 19% to 275MB. However, if the
temporary library files are not eliminated, the memory reduction is only 13%.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

181

WAS Shared Library Study - Loan Processing Workload


WPS 6.2 Java Memory After Server Restart
500
450

Memory (MB)

400
350
300

339
294

250

275

200
150
100
50
0
Standard Deployment

Shared Libraries Technique


w/o deleting

Shared Libraries Technique

Copyright IBM Corporation 2009. All rights reserved

2 Core

Measurement Configuration
WebSphere Integration Developer
Intel 2.66GHz B

9.20.4

Authoring - Hardware Comparison Study

For this study, we selected four different machine types to run key measurements of the Loan
Processing workload. An additional run on one machine with no anti-virus software installed was
also made for ease in comparison with the measurements presented in Chapter 8 of this report.
Newer machines showed significant improvements. For each data chart in this section, the
percentages at the top of each bar indicate the average system CPU utilization during the
measurement.
9.20.4.1 Impact on Import time
Using a new workspace, the WebSphere Integration Developer 6.2 was opened, the Build
automatically preference was disabled, and the Loan Processing workload was Imported.
Measurement started when the Import began and stopped as soon as the Import was complete and
the processor cores became idle. This was done seven times on each machine, the result below
being the average.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

182

As can be seen in the following chart, the newer machines can finish the import much more
quickly than the older machines. Comparing the laptops, the T60p completed the Import in 259
seconds, 2.1 times faster than the T42p. Among the desktops, the model 9196 completed the
Import in 215 seconds, 2.5 times faster than the model 8212.

Loan Processing Workload


Import Response Time - Windows
700

Time (Seconds)

600
500

82%

66%

555

544

400
300

68%
259

200

60%
215

64%
154

100
0
T42p

8212

T60p

9196

9196 no AV

Copyright IBM Corporation 2009. All rights reserved

9.20.4.2 Impact on First Build time


After Importing the Loan Processing application, the Clean and Build operation was executed on
the entire workspace. Measurement started when the Clean began and stopped when the WID
reported the build complete and the processor core became idle. This was done seven times on
each machine; the results below show the average of these seven measurements.
The T60p laptop completed the first build of this workspace in 268 seconds, 35% faster than the
T42p. The model 9196 desktop completed this operation in 205 seconds, 52% faster than the
model 8212.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

183

Loan Processing Workload


First Build Response Time - Windows
500

Time (Seconds)

400

92%

62%

413

425

300

61%
268

200

60%
205

63%
179

100

0
T42p

8212

T60p

9196

9196 no AV

Copyright IBM Corporation 2009. All rights reserved

9.20.4.3 Impact on Average Subsequent Build time


After completion of the First Build measurement, six additional Clean & Build operations were
performed and measured individually. For each measurement, the clock started when the Build
started and stopped when the Build was complete and the processor core became idle. The fastest
and slowest measurements were discarded and the remaining four measurements were averaged.
This was done seven times on each machine, the result below being the average.
The T60p laptop completed this warmed-up Clean Build operation in 182 seconds, 32% faster
than the T42p laptop. The model 9196 desktop completed this operation in 136 seconds, 58%
faster than the model 8212 desktop.
Note that the T42p laptop (single-core and slower hard drive) completed this operation 17% faster
the model 8212 desktop (dual-core and faster hard drive). This can be attributed to the more
efficient architecture of the T42p processor core (Pentium M) compared to that of the 8212
desktop (Pentium D).

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

184

Loan Processing Workload


Avgerage Build Response Time - Windows
400
350

Time (Seconds)

300
250

58%
93%

324

268

200

59%

150

182

59%

58%

136

100

127

50
0
T42p

8212

T60p

9196

9196 no AV

Copyright IBM Corporation 2009. All rights reserved

Measurement Configuration
T42p

8212

T60p

9196

Intel 2.0 GHz- A Intel 2.8 GHz - D Intel 2.16 GHz - A Intel 2.66 GHz-A

9.20.5

Dynamic/Static Routing Comparison using WESB

The following chart compares two types of routing based on a value in the SOAP header for a
Web Services scenario. In both cases the value retrieved from the header is used to determine the
target service endpoint. The Route on Header mediation selects the service endpoint by routing to
a hard wired callout node based on the header value extracted in a filter primitive. For each
alternative endpoint a user would need to wire in additional nodes for the filter primitive to
access.
In contrast the dynamic endpoint lookup mediation uses the value from the header (accessed by
the endpoint lookup primitive itself) to look up the endpoint from a WSRR repository. This value
is cached by WESB so the performance data below does not show the cost of the WSRR lookup
but shows the performance of routing to the target service using the previously cached endpoint.
The chart shows that the cost of using the dynamic endpoint lookup primitive to route rather than
wiring in alternative targets (a less flexible approach) is minimal.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

185

Route on Header vs Dynamic Endpoint Lookup Mediation Windows


1600
98%

1400

Route On Header

97%
98%

1200

Dynamic Endpoint Lookup


97%

98%

98%

Reqs/sec

97% 97%

1000
800
600
400
87% 85%

200
0
Base in/Base out

Base in/10 out

10 in/Base out

10 in/10 out

4 CORE
Copyright IBM Corporation 2008, 2009. All rights reserved

100 in/100 out

CPU Utilization Show n Above Each Bar


Hyper-Threading (HT) enabled

Measurement Configuration

Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 3.0GHz - C

Intel 3.5GHz - B

9.20.6

WESB Client Scaling

In this study two WESB mediations (Transform Value and Route on Body) were driven by an
increasing client load to assess the following scaling characteristics:
1. Horizontal Client Scaling An initial load of x clients each making y requests per second is
increased by adding more clients (increasing x).
2. Vertical Client Scaling An initial load of x clients each making y requests per second is
increased by speeding up the clients (increasing y).
Warm up periods were applied for all of the measurements described below to ensure that the
code had settled to a consistent level of performance.
All client scaling measurements were run with a message size combination of Base/10.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

186

For details of the mediations and request/response sizes see section 11.3 and 11.4. All data is
obtained using Web services bindings on a 4 core WESB server machine with Hyper-Threading
(HT) disabled. For details of the topology used see section 11.1.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

187

9.20.6.1 Horizontal Client Scaling


For the horizontal scaling test, 1 to 1600 clients were used against a single WESB server. Each
client was configured with a think time of 1 second so the theoretical rate of requests can be
defined as clients/(thinkTime+response time).
The CPU utilization of the WESB machine was recorded along with response times and request
rate.
The following chart shows that both throughput and CPU show a good linear trend as the number
of clients increase up to just under 100% utilization of the server. As the number of clients is
increased beyond this, the CPU constraints begin to impact on the workload, with throughput
peaking at around 1000 requests/sec.

Horizontal Client Scaling : XformValue Mediation


Request Rate & CPU Utilization
100

1200

90
80
70

800

60
50

600

40
400

30
20

200

10
0
0

500

1000

1500

2000

0
2500

No. Clients
Req/Sec

WESB server CPU%

Copyright IBM Corporation 2009. All rights reserved

Measurement Configuration

Web Services Client

WebSphere ESB

Intel 2.8GHz - C

Intel 3.0GHz - C

Copyright IBM Corporation 2005, 2010. All right reserved.

Web Services Target


Intel 3.5GHz - B
6

WESB server CPU%

Requests per second

1000

Directed Studies

188

The following chart shows that CPU consumption per request remained consistent across the
evaluation (apart from a larger value at the lower throughput measurement which was probably
skewed by timer tasks). Response time increases in a linear fashion until the server system
approaches CPU saturation; at this point any further increase in clients causes a more direct
impact on latency.

Horizontal Client Scaling : Xform Value Mediation


Response Time & CPU Per Request

1.6

0.003
0.0025

1.2
0.002

1
0.8

0.0015

0.6

0.001

CPU per request

Response time (s)

1.4

0.4
0.0005

0.2
0
0

200

400

600

800

1000

1200

1400

0
1600

No. Clients
Resp (s)

CPU per request

Copyright IBM Corporation 2009. All rights reserved

Measurement Configuration
Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 3.0GHz - C

Intel 3.5GHz - B

The next 2 charts show that Server CPU consumption, request rates, and response times for the
Route On Body mediation result in a similar profile to the XformValue evaluation above.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

189

1200

120

1000

100

800

80

600

60

400

40

200

20

0
0

200

400

600

800

1000

1200

1400

1600

WESB server CPU%

Requests per second

Horizontal Client Scaling : Route On Body Mediation


Request Rate & CPU Utilisation

0
1800

No. Clients
Requests per sec

Copyright IBM Corporation 2009. All rights


reserved

Major CPU %

0.6

0.0035

0.5

0.003
0.0025

0.4

0.002
0.3
0.0015
0.2

0.001

0.1

0.0005

0
0

200

400

600

800

1000

1200

1400

0
1600

No. Clients
Resp (s)

CPU/Req(s)

Copyright IBM Corporation 2009. All rights reserved

Measurement Configuration

Web Services Client

WebSphere ESB

Intel 2.8GHz - C

Intel 3.0GHz - C

Copyright IBM Corporation 2005, 2010. All right reserved.

Web Services Target


Intel 3.5GHz - B
6

CPU per request (s)

Response time (s)

Horizontal Client Scaling: Route On Body Mediation


Response Time & CPU Per Request

Directed Studies

190

9.20.6.2 Vertical Client Scaling


For the vertical scaling evaluation the WESB machine was driven by 200 clients. The theoretical
rate of requests can be calculated as for the horizontal scaling.
Think time was initially set to 20 seconds, and then it was reduced each run with the last run
having clients with 0.001 second think time. CPU utilization of the WESB machine was recorded
along with response times and requests per second.
Note that since request rate (and hence CPU utilization) is proportional to 1/t (where t = think
time) plotting request rates and CPU utilization charts using logarithmic scales should display a
straight line plot up to point where CPU saturation occurs on the WESB machine. The data
shows this; a good linear trend is exhibited, with the rate of increase in requests degrading as the
WESB machine becomes CPU bound.

Vertical Client Scaling : XformValue Mediation


Request Rate & CPU Utilization

Requests per second

1000
10
100
1
10

1
100

10

0.1

WESB Server CPU%

100

10000

0.1
0.01

Think time (s)


CPU%

Req/Sec

Copyright IBM Corporation 2009. All rights reserved

Measurement Configuration

Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 3.0GHz - C

Intel 3.5GHz - B

The next chart shows that response times grow progressively with a sharp increase at the CPU
saturation point. CPU per requests is reasonably flat apart from the initial spike evident in some
of the scaling tests at very low utilization.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

191

0.12

0.003

0.1

0.0025

0.08

0.002

0.06

0.0015

0.04

0.001

0.02

0.0005

0
100

10

0.1

CPU per request (s)

Response time (s)

Vertical Client Scaling : Xform Value Mediation


Response Time & CPU Per Request

0
0.01

Think time (s)


CPU/Req (s)

Resp (s)

Copyright IBM Corporation 2009. All rights reserved

Measurement Configuration

Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 3.0GHz - C

Intel 3.5GHz - B

The next 2 charts shows that Route on Body results for vertical scaling produced a similar profile
to the XformValue vertical tests above.

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

192

Vertical Client Scaling : Route On Body Mediation


Request Rate & CPU Utilization

Requests per second

WESB Server CPU %

100

1000

100
10
10

1
100

10

1
0.01

0.1

Think time (s)

WESB server CPU%

Req/Sec

Copyright IBM Corporation 2009. All rights reserved

0.14

0.003

0.12

0.0025

0.1

0.002

0.08
0.0015
0.06
0.001

0.04

CPU per request (s)

Response time (s)

Vertical Client Scaling : Route On Body Mediation


Response Time & CPU Per Request

0.0005

0.02
0
100

10

0.1

0
0.01

Think time (s)


CPU per request (s)

Resp (s)

Copyright IBM Corporation 2009. All rights reserved

Measurement Configuration

Web Services Client

WebSphere ESB

Web Services Target

Intel 2.8GHz - C

Intel 3.0GHz - C

Intel 3.5GHz - B

Copyright IBM Corporation 2005, 2010. All right reserved.

Directed Studies

9.20.7

193

Local versus remote SCA bindings WPS 6.1.0 data

The results shown in this section compare local and remote bindings using the same hardware
configuration and the Contact Manager workload.. For remote bindings, a total of 3 JVMs are
used, 2 of which are WPS instances while the third JVM hosts the Messaging Engine (not a factor
in this study). The SAP Emulator module is runs on the first WPS instance, and the Contact
Manager and Clarify Emulator modules run on the other WPS instance. Therefore, the remote
binding between the SAP Emulator module and Contact Manager module cross the boundaries of
two separate WPS instances. There are two key findings in this study.

There is a significant throughput difference between local and remote bindings. The
throughput of Contact Manager using the local Synchronous SCA binding is 198 BTPS,
over 3.1x better than the remote Synchronous SCA binding. The difference between
local and remote Web Services bindings is smaller, but still significant. The throughput
of Contact Manager with an optimized local Web Services binding is 110 BTPS,
compared with 88 BTPS for a remote Web Services binding, a difference of 25%.

There is significant benefit due to local Web Services binding optimization, as discussed
in Section 4.5.5, if the Web Services target is hosted on the same JVM. The optimized
throughput of 110 BTPS is 15% higher than the unoptimized throughput of 96 BTPS.
ContactManager - Windows 2000
Local versus Remote Bindings

250

200

100%

1 WPS JVM (WS


Opt)

BTPS

150

100%
100%

100

1 WPS JVM

100%

100%

2 WPS JVMs

50

0
SCA Sync

WebServices

Copyright IBM Corporation 2005, 2008. All rights reserved

WPS System CPU Utilization Above Each Bar

Measurement Configuration
WebSphere Process Server

DB2

Intel 2.8GHz A

PPC 2.2 GHz- B

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server Core Workloads

10 WebSphere Process Server Core Workloads


10.1 Introduction
This chapter describes the workloads used to derive measurements of performance characteristics
of the server component of WPS, whether that characteristic is throughput, response time,
processor core utilization or memory consumption.

Copyright IBM Corporation 2005, 2010. All right reserved.

194

WebSphere Process Server Core Workloads

10.2 Contact Manager


The Contact Manager workload models a common Enterprise Application Integration scenario
known as real-time data synchronization, where a consistent view of data is kept across multiple
applications. In this scenario, key events (such as create, delete, and update) in the source
application need to propagate to a destination application. The goal is to keep data consistent
between the two applications and their associated data stores. The Contact Manager workload
measures the transaction throughput of the WPS as it synchronizes contact create events
between two simulated enterprise applications.
All of the Contact Manager implementations consist of three parts: the SAP Client, simulating the
source of our Business Events, the Clarify Client simulating the Business Event destination, and
the Contact Manager Application, which contains the business logic for the application under
evaluation. Any particular implementation may consist of purely synchronous invocations, or it
may contain asynchrony. Also, an implementation may be organized in a variety of ways within
the SCA Programming Model. The sections that follow explore all of the implementations
appearing in this Report.
This workload simulates SAP as the source application with a Clarify emulation as the destination
application. In an actual installation, the SAP application would place an event in an application
database event table that is monitored by an application Adapter. The SAP Client emulates this
behavior by generating SAP Contact Business Objects, and passing them directly to the Contact
Manager Application.
For each Business Transaction executed by the system, this workload consists of:

7520 byte input object (this is the size of the message)

3 business object mapping operations with a total of 51 attributes mapped

3 roles, 1 managed and 2 not managed, so that 2 relationship cross references are
created

1 service call

A business transaction in this workload is contact create events received by Clarify.

Copyright IBM Corporation 2005, 2010. All right reserved.

195

WebSphere Process Server Core Workloads

10.2.1

196

SCA Synchronous Binding

In this implementation of the Contact Manager workload, the Contact Manager Application
receives Business Objects (BOs) from the SAP Client Module via synchronous cross-module
SCA invocation, i.e., synchronously invoking an import bound to the corresponding export with
an SCA binding. Its first task is to transform the input BOs from the SAP format to a Generic
format via an Interface Map SCA Component. These generic BOs are then passed to a Business
Process component which contains logic responsible for determining whether the Business Event
requires creation of a new Contact, or updating an existing one and then routing the event to the
destination application. For all of the Business Events measured, a new contact was created. On
the way to the destination, the BO must be mapped again from generic format to the format
understood by the destination application. The destination application, simulated by the Clarify
Client Module, is also invoked via a cross-module, synchronous SCA binding. This module
simulates destination application work, including generation of a new unique identifier, and then
returns a modified BO to the Contact Manager Module. This return BO is mapped again from
Clarify to Generic format before the response is returned to the SAP Client Module.

SAP Emulator
Module

Contact Manager
Module

Clarify Emulator
Module
GemericTOClarify
Interface Map

SAPToGeneric
Interface Map

Contact Manager
Process

MAP
REL

BMK
SC
A

ClarifyTOGeneric
Interface Map

MAP

sy
nc

sy
nc
S

CA

BMK

REL

MAP
REL

Figure 1: Contact Manager Workload Topology SCA Synchronous Invocation

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server Core Workloads

10.2.2

197

Web Services Binding

SCA Components may expose their interfaces as Web Services via the Web Services binding.
This capability is modeled for performance purposes by changing the synchronous SCA binding
between the SAP Emulation Module and ContactManager Module to be WebServices, as
depicted in Figure 2. For measurement purposes, the Web Services client can be either local or
remote. The difference is that for the remote case the client resides on a different physical
machine from the remainder of the application.
SAP Emulator
Module

Contact Manager
Module

Clarify Emulator
Module
GemericTOClarify
Interface Map

SAPToGeneric
Interface Map

Contact Manager
Process

MAP
REL

BMK

MAP

ClarifyTOGeneric
Interface Map

sy
nc

SO
A

SC

/h
ttp

BMK

REL

MAP
REL

Figure 2: Contact Manager Workload with Web Services Binding

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server Core Workloads

198

10.3 Banking
10.3.1

Banking Workload Description

Banking is a long running business process, or macroflow. Macroflows are executed as a


stratified transaction i.e., a J2EE transaction encloses one or more of the steps in the business
process. The J2EE transactions are chained using persistent messaging. Steps within the process
are called activities. Note that after each transaction, the state of the business process is persisted
into a database. This allows the processes to be long-running and to be able to survive system
failures. The Banking workload consists of multiple transactions.
The business process used in the Banking workload models a realistic business process used in a
banks back office. It contains a subset of the steps necessary to process a mortgage loan via a
series of automated steps. The following diagram depicts the Banking workload.

Business Process

Transaction
Generator

JMS

Java Services

Sync or
Async

POJO

The workload setup consists of a Transaction Generator, which generates the load, and a Banking
process, which contains a scenario and outbound services. The Banking measurement run starts
when the workload driver places a large number of mortgage request instances onto a JMS queue.
Instances of the banking process are started via JMS messages. A Banking measurement run
concludes when the workload driver determines that all process instances have completed
processing.
A business transaction in this workload is mortgage loans completed.

10.3.2

Banking Scenarios

The Banking scenarios differ in the setting of the transactional behavior of the invoke activities.
When using the synchronous SCA binding, the process component wired to sync services has the
transactional behavior flag on invokes set to commit after. When using the SCA asynchronous
or JMS binding, the process component wired to async services has the transactional behavior
flag on invokes set to participates.
The BPEL process is shown in the following diagrams:

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server Core Workloads

Copyright IBM Corporation 2005, 2010. All right reserved.

199

WebSphere Process Server Core Workloads


Loop1

Loop2

Loop3

The Banking process contains the following elements:

Invoke activities

Copyright IBM Corporation 2005, 2010. All right reserved.

200

WebSphere Process Server Core Workloads

1 receive activity

1 reply activity

1 correlation set

3 loops with java conditions

10.3.3

Banking Services

Depending upon which binding option is used, the Banking process component is wired in one
of the follow fashions:
BankingProcessJMS: Banking process wired to a JMS MDB using import with JMS binding
(transactional behavior flag set to participates).
BankingProcessJavaSync : Banking process wired to a synchronous POJO (transactional behavior
flag set to commit after),
BankingProcessJavaAsync: Banking process wired to an asynchronous POJO (transactional
behavior flag set to participates),
BankingProcessEJBSOAP: Banking process wired to EJB session bean wrapped as SOAP web
service,
BankingProcessEJB: Banking process wired to EJB session bean using a self-written mapper.
This is required because business process components always have w-typed references (this is a
BPEL restriction) and session bean imports always have j-typed interfaces. This self-written
mapper mediates between the j-typed and w-typed interfaces by calling the session bean import
and also handles data mapping, and
The diagram which follows illustrates these choices. Note that in this report, measurements are
shown only for the JMS binding.

Copyright IBM Corporation 2005, 2010. All right reserved.

201

WebSphere Process Server Core Workloads

10.4 SOABench 2008 Choreography Facet


10.4.1

Overview

The SOABench 2008 workload is used in numerous studies in this report. It is an implementation
of the SOABench 2008 specification. SOABench 2008 replaces an earlier version, SOABench
2005, which was used in previous editions of the BPM Performance Report. Similar to the 2005
version, the 2008 version models the business processes of an automobile insurance company and
is intended to evaluate the performance of a distributed application implemented using a Service
Oriented Architecture (SOA).
The 2008 implementation extends the scope of the 2005 version in several ways. The Automated
Approval (microflow only) scenario performs more synchronous service calls than the previous
version. The Manual Approval (microflow + macroflow pattern) scenario in the previous version
is now implemented in two ways, an Outsourced scenario which does claim approval via
asynchronous Web Service calls, and an InHouse scenario which uses human tasks to approve
claims. In addition the InHouse scenario divides work among users and groups, adds think time to
user activity in human tasks; tracks response time of human task actions as well as recording
throughput. This makes the InHouse scenario very useful for evaluating response time and
throughput using a range of active concurrent users. Finally, the 2008 version also includes the
use of preloaded Process Choreography tasks in both the OutSourced and InHouse Scenarios.
The following diagram illustrates the workload architecture flow.

Copyright IBM Corporation 2005, 2010. All right reserved.

202

WebSphere Process Server Core Workloads

10.4.2

Automated Approval Scenario details

One of the modes of operation for SOABench 2008 in handling insurance claim requests is using
automated approval. No human or asynchronous tasks take place in this scenario; the flow is
implemented as a microflow that makes synchronous service invocations. All of the service
invocations are to service providers that return cached responses; this prevents bottlenecks in the
service providers while exercising the process server.
A claim request is sent to the HandleClaimMicro business process which performs an operation
called CreateClaim followed by FraudCheck. This scenario then follows the FastpathApproval
path which performs synchronous services calls for ApproveClaim, InformPolicyHolder, and
CompleteClaim. The process finishes by sending a response back to the requestor.
The Business Object (BO) size for the input request is variable. By default, a 3 KB request size is
used. The BO size for the reply is fixed at 3 KB.
The BPEL process is shown in the following diagram.

Copyright IBM Corporation 2005, 2010. All right reserved.

203

WebSphere Process Server Core Workloads

The Automated Approval process contains the following activities:


o

1 Receive

1 Reply

1 Choice

5 web services invokes

42 variable assignments across 7 assignment blocks

10.4.3

Outsourced Scenario details

The SOABench 2008 Outsourced scenario is one of two scenarios that utilize long running
processes (macroflow) for manual approval of insurance claims. OutSourced Mode uses both a
microflow and a macroflow; the microflow is the same process shown for Automated Approval
Mode above, but in this mode the logic does not follow the fast path approval path. Instead a long

Copyright IBM Corporation 2005, 2010. All right reserved.

204

WebSphere Process Server Core Workloads


running business process called HandleClaimLongExternal is invoked by following the NO
path of the ProcessInHouse choice activity in the microflow.
As in the Automated Approval Scenario, all of the service invocations are to service providers
that return cached responses which prevents bottlenecks in that area while exercising the process
server.
Claims enter the system via client requests to the HandleClaimMicro process. Synchronous web
service invocations are then made to CreateClaim, FraudCheck, and RecoverVehicle. No human
or asynchronous tasks take place in the HandleClaimMicro process except for an invocation to
the InvokeExternalLong long running process for the Outsourced claim processing with
asynchronous calls workload. This process finishes by sending a response back to the requestor
but the claim is not complete until the long running process invoked by InvokeExternalLong is
finished.
Before running this scenario the system is preloaded with a variable number of Claim requests in
various stages of completion. Oldest claims in the system are worked on first. As claims are
completed more claims are injected into the system to maintain the preloaded number. The
throughput rate for completed claims is reported.
The Business Object (BO) size for the input request is variable. By default, a 3 KB request size is
used. The BO size for the reply is fixed at 3 KB.
The BPEL for the HandleClaimMicro process is shown in the previous section and in this mode
uses the following activities:

3 web services invokes

1 long running process invoke

41 variable assignments across 6 assignment blocks

The second, long running, process named HandleClaimLongExternal is called via


InvokeExternalLong. Early in this process three parallel, asynchronous, one-way web service
invokes are performed to continue claim processing. At this point the process waits for
corresponding receives to be invoked from another application that processes the asynchronous
web service calls. When all three receives have been completed the process continues to the twoway UpdateClaim web service invocation, followed by (for this scenario) another asynchronous
one-way invoke of RequestManualApproval. The process waits again at this point for the
ReceiveManualApproval to be completed. For this scenario all claims then take the approval path
where three more two-way calls to web services are performed to complete the claim and process.
The HandleClaimLongExternal process contains the following elements:

4 two-way web services invokes

4 one-way web service invokes which wait on 4 corresponding receives

32 variable assignments across 10 assignment blocks

1 parallel activity

The BPEL process is shown in the following diagram.

Copyright IBM Corporation 2005, 2010. All right reserved.

205

WebSphere Process Server Core Workloads

Copyright IBM Corporation 2005, 2010. All right reserved.

206

WebSphere Process Server Core Workloads

10.4.4

InHouse Scenario details

The SOABench 2008 InHouse scenario is one of two scenarios that utilize long running processes
(macroflow) for manual approval of insurance claims. InHouse Mode uses both a microflow and
a macroflow; the microflow is the same process shown for Automated Approval Mode above, but
in this mode the logic does not follow the fast path approval path. Instead it invokes a long
running business process called HandleClaimHuman.
As in the Automated Approval Scenario, all of the service invocations are to service providers
that return cached responses which prevents bottlenecks in that area while exercising the process
server.
Claims enter the system via client requests to the HandleClaimMicro process. Synchronous web
service invocations are then made to CreateClaim, FraudCheck, and RecoverVehicle. No human
or asynchronous tasks take place in the HandleClaimMicro process except for an invocation to
the InvokeInHouseLong for the InHouse claim processing workload. This process finishes by
sending a response back to the requestor but the claim is not complete until the long running
process invoked by InvokeInHouseLong is finished.
Before running this scenario the system is preloaded with Insurance claim requests in various
stages of completion. The insurance claims are assigned equally to regions. Human task
processing is done by users belonging to a single region and those users can only process
insurance claims from their region which is enforced via authentication. Within a region, users
are divided into 2 groups, adjusters and underwriters. Of the four human tasks required to
complete an insurance claim, two are done by adjusters and two are done by underwriters.
Users query existing processes for a list of work that they can perform. A work item is claimed
(selected from the list) and then completed by the user. Users think between query, claiming,
and completing activities. The think time is random but averages a total of 180 seconds per
human task. The time a user waits for responses to their human task queries, claims and
completes is recorded as response time. The rate at which insurance claims are completed is the
throughput. Once an entire insurance claim is finished, another is added to the region to maintain
its work at the preloaded level.
The BPEL for the HandleClaimMicro process is shown in the Automated Approval section. The
path to InvokeExternalLong contains the following activities:

3 web services invokes

1 long running process invoke

41 variable assignments across 6 assignment blocks

The second, long running, process named HandleClaimHuman is called via InvokeInHouseLong.
Early in this process three parallel activities take place, one asynchronous, one-way web service
invoke and two human tasks to be done by users in the adjusters group. When all three activities
complete the process continues to the two-way UpdateClaim web service invocation, followed by
(for this scenario) two Human Tasks called FirstApprovalTask and SecondApprovalTask which
are performed by users in the underwriters group. Upon completion of SecondApprovalTask all
claims for this scenario then take the approval path where three more two-way calls to web
services are performed to complete the claim and process.
The HandleClaimLongExternal process contains the following elements:

4 two-way web services invokes

Copyright IBM Corporation 2005, 2010. All right reserved.

207

WebSphere Process Server Core Workloads

1 one-way web service invoke which waits on a corresponding receive

4 Human Tasks

36 variable assignments across 11 assignment blocks

1 parallel activity

The BPEL process is shown in the following diagram.

Copyright IBM Corporation 2005, 2010. All right reserved.

208

WebSphere Process Server Core Workloads

Copyright IBM Corporation 2005, 2010. All right reserved.

209

WebSphere Process Server Core Workloads

10.5 SOABench 2005 (Used in previous performance reports)


10.5.1

Overview

The SOABench 2005 workload was used in previous BPM performance reports; the description
is included in this report as a bridge since this report contains the initial set of measurements for
the SOABench 2008 workload (described above).
The SOABench 2005 workload is an implementation of the SOABench 2005 specification and
models the business processes of an automobile insurance company. SOABench 2005 is intended
to evaluate the performance of a distributed application implemented using a Service Oriented
Architecture (SOA). SOABench 2005 uses a driver that produces a complex workload similar to
a real production system. The complex driver workload is made up of several subset technologies
called facets which can be included or excluded from performance evaluations. Examples of
SOABench 2005 facets include Services (use of service components), Mediation (use of
mediation to transform requests and responses), and Choreography (application implementation
using service choreography).
By combining facets, SOABench 2005 implements 2 aspects of the IT systems of an insurance
company called SOAAssure. The first is the Claims application which combines the
Choreography and Services facets to process insurance claims. The second is realized using the
Mediation and Services facets and provides a third-party gateway which enables another
company to establish whether coverage exists for an existing policy. The following diagram
illustrates the workload architecture flow.

Copyright IBM Corporation 2005, 2010. All right reserved.

210

WebSphere Process Server Core Workloads

211

SOABench Architecture

Business processes service choreography

Simulate
service
requestors
SOABench
Client

Process
Choreography

Integration
Handle Claim
Process
(Macro flow)

Enterprise
Service Bus

Fraud Check
SCA component

Submit
Claim
Handle Claim
process
(micro flow)

Mediations
Check
Coverage

Claim Approval
Business Rule

Route, transform and


adapt requests
Web service
binding
Claim Service
Call
Claim
Service
Process
Human Tasks

Service providers
Service providers
Service
providers
Claim service
implementation
Claim service
(Web
service)
implementation
Claim
service
(Web
service)
implementation
(Web service)

Human Tasks
Simulator

Service
implementation

Adjuster

Business
Data

The SOABench 2005 Client can drive the workload with mediation or business process claim
requests. The minimum request and response size is 3 KB but this can be increased by the user.
The client driver also provides for an infrastructure mode to make interactions with the backend
Service providers trivial. The Human Tasks Simulator handles both adjuster and underwriter
tasks generated during the Choreography facet manual approval process.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere Process Server Core Workloads

10.5.2

Choreography facet: Automated Approval

One of the workloads in the SOABench 2005 Choreography facet is the handling of an insurance
claim using automated approval. No human or asynchronous tasks take place in this scenario; the
flow is implemented as a microflow. A claim request is sent to a business process which performs
an operation called HandleClaim. HandleClaim does Submit Claim to create the claim, checks
the claim for validity via FraudCheck_SCA , then approves and invokes the Complete Claim
operation. The process finishes by sending a response back to the requestor.
The BPEL process is shown in the following diagram.

The Automated Approval process contains the following elements:

Copyright IBM Corporation 2005, 2010. All right reserved.

212

WebSphere Process Server Core Workloads

2 web services invokes

1 java invoke

23 variable assignments across 6 assignment blocks

1 data map with 12 moves

A business transaction for this workload is claims completed.

10.5.3

Choreography facet: Manual Approval

Another workload in the SOABench 2005 Choreography facet is the handling of an insurance
claim using manual approval. Depending on claim amount, either 1 or 2 human tasks are
performed. For data in this report the second task occurs for 40% of claim requests. The workload
starts in the process used in the Automated Approval Scenario (a microflow), as described in the
previous section. A claim request is sent to the process which performs HandleClaim.
HandleClaim does Submit Claim to create the claim, skips the check claim for validity, then
calls a long running (macroflow) process to perform more work on the claim.
The long running process does a fraud check on the claim via FraudCheck_SCA. A claims
adjuster also looks at the claim via the Adjuster human task and the claim is updated through a
webservice call. For the workload measured, all claims are marked valid and then checked by a
business rule to determine if an underwriter needs to evaluate the claim. Forty percent of the
claims are checked by the Underwriter human task. At this point all claims are processed for
claim amount and approved using 2 more webservice calls. The current long running process then
calls back the microflow process to perform the FinishClaim operation which performs a
webservice call to complete the claim.
An adjuster and underwriter simulator is used to process human tasks for the long running
process.
The BPEL process is shown in the following diagram.

Copyright IBM Corporation 2005, 2010. All right reserved.

213

WebSphere Process Server Core Workloads

Copyright IBM Corporation 2005, 2010. All right reserved.

214

WebSphere Process Server Core Workloads


Manual Approval has 2 processes containing the following elements:

5 web services invokes

1 java invoke

54 or 58 variable assignments across 16 assignment blocks note the additional 4


assignments occur 40% of the time

1 or 2 human tasks. 2 occur 40% of the time

1 data map with 12 moves

2 process calls

1 business rule call

2 java snippets

A business transaction for this workload is claims completed.

Copyright IBM Corporation 2005, 2010. All right reserved.

215

WebSphere ESB Core Workloads

216

11 WebSphere ESB Core Workloads


This performance report includes measurements for WESB mediations. There are four basic
topologies covered:

Web Services bindings

JMS bindings

MQ JMS bindings

MQ bindings

The tests make use of the mediations and Web Services from the SOABench 2008 workload.
SOABench 2008 is a workload intended to evaluate the performance of a distributed application
implemented using a Service-Oriented Architecture. For a description of the SOABench 2008
workload, please see section 11.3.

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB Core Workloads

217

11.1 Web Services Test Scenario


The Web Services tests use the following scenario:-

Standalone multithreaded HTTP client to produce SOAP requests

Synchronous SOAP(XML)/HTTP request/response invocation

WESB mediation

SOABench 2008 as the target Web Service

WESB
Mediation

WebSphere 7.0
SOABench2008

50 HTTP Clients

Figure 1: Web Services Topology

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB Core Workloads

11.1.1

218

Web Services Fan Out / Fan In Mediation

The Fan Out mediation allows you to iterate over a repeating element in the request message.
This mediation iterates over the following:

Perform a Service Invoke to call a target Web Service

Use a Message Element Setter to update the shared context with some data from the
response. The shared context was created on the Input node.

The Fan In mediation will then wait for all iterations to complete, before using an XSLT
mediation to create a response message. It then returns the response.
This test was then executed with different request messages so that we would get a different
number of Fan Out iterations. It was executed with requests that would result in 1, 2 and 4 Fan
Out iterations. Note that each iteration is run sequentially, rather than in parallel.

Fan Out
Mediation

Service
Invoke
Mediation

Message
Element
Setter
Mediation

50 HTTP Clients

WebSphere 7.0
SOABench2008

Copyright IBM Corporation 2005, 2010. All right reserved.

Fan In
Mediation

XSLT
Mediation

WebSphere ESB Core Workloads

219

11.2 JMS Test Scenarios


The JMS, MQ JMS and MQ bindings measurements utilize the following test scenario:

Standalone JMS Producer and Consumer

WESB mediation

BestEffort non-persistent and Assured persistent Messaging

One-way request scenario

The tests use a standalone JMS producer and consumer. The JMSPerfHarness workload program
is used for this as it can be configured to run standalone JMS producers and consumers and
measure the rate at which messages are processed by the consumers. The producer and consumer
are within the same JVM and therefore co-located on one machine

11.2.1

JMS Binding test topology

JMS Export
JMS
Queue

WESB
XSLT
Transformation
Mediation
Mediation

JMS Import

JMS Queue

JMS
Producer

JMS
Consumer

DB2

Figure 3: JMS Topology

11.2.2

MQ JMS and MQ Binding Test topology

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB Core Workloads

220

The MQ JMS and MQ bindings are all used to connect to an MQ Queue Manager. Messages are
delivered into WESB from the MQ inbound queue and sent to an MQ outbound queue. No
internal SIB queues are used in this scenario. The MQ Queue Manager is deployed on the same
machine as the WESB server.

JMS
Producer

JMS
Consumer

MQ Queue Manager

Mediation
MQ JMS
Export

MQ JMS
Import

Figure 4: MQ JMS and MQ Topology

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB Core Workloads

221

11.3 SOABench 2008 Mediation Facet


All of the test scenarios use mediations taken from the SOABench 2008 workload. The bindings
differ between the scenarios but the function of the mediations remains the same.

11.3.1

Transformation Mediations

These are mediations which transform requests and in some cases responses. There are various
levels of complexity of transformation possible.
XSLT Value transform mediation
Transforms the value of a single element in the request message using XSLT.
XSLT Namespace transform mediation
Transforms request and response messages from one schema to another using XSLT. The
schemas are largely the same but the name of an element differs and the two schemas have
different namespaces.
XSLT Schema transform mediation
Transforms request and response messages from one schema to another using XSLT. The
schemas are completely different but contain similar data which is mapped from one to the other.
In addition to the transform a value from the request is transferred to the response by storing it in
a context header.
Message element setter mediation
Transforms the value of a single element in the request message using the Message Element
Setter primitive.
Business Object Mapper mediation
Uses the Business Object Mapper mediation to map the entire body of the request into a new
Business Object.

11.3.2

Routing Mediations

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB Core Workloads

222

These are mediations which route requests to different services based on content.
Route on header mediation
Route the request based on the presence of a string in the SOAP or JMS header. The Web
Services workload does not use any standard headers, so we use an optional one called
Internationalization Context. The JMS workload introspects the JMSCorrelationId header field.
Route on body mediation
Route the request based on the content of a field in the body of the request
Service Invoke mediation
Uses the Service Invoke primitive to invoke a Web Service, and then returns the response.

Service Invoke
Mediation

50 HTTP Clients

WebSphere 7.0
SOABench2008

11.3.3

Composite mediation

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB Core Workloads

223

The composite mediation consists of four mediation primitives wired together inside a single
mediation module. This saves the overhead of inter-module call overheads, but at the expense of
the ability to individually administer the pieces of the overall mediation. The Authorisation
mediation is a routing mediation which checks a password field in the request body.
No logging is performed in either the JMS or Web Services implementations of this scenario.

Transform
Schema
Authorisation

50 HTTP Clients

Logging

Route
Body
Transform
Schema

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere 7.0
SOABench2008
SOAAssure Service

WebSphere 7.0
SOABench2008
LegacySure Service

WebSphere ESB Core Workloads

11.3.4

224

Chained mediation

The chained mediation performs the same function as the composite mediation but the four types
of mediation primitives are each packaged as separate mediation modules, which are then joined
together using bindings.

Web Services to SCA

Authorisation

50 HTTP Clients

Logging

Transform

WebSphere 7.0
SOABench2008
SOAAssure Service

Route Body

Transform

WebSphere 7.0
SOABench2008
LegacySure Service

Copyright IBM Corporation 2005, 2010. All right reserved.

WebSphere ESB Core Workloads

225

11.4 SOABench 2008 Mediation Facet Message Sizes


The workloads used for the WESB tests were taken from the mediation facet of the SOABench
2008 Client. The actual size of the messages in the workloads is shown below.

SOABench 2008
Client

Web Services

Web Services

JMS payload

SOAP Request

SOAP Response

Base

1.8 K

0.8 K

1.2 K

10

9.1 K

8.3 K

8.5 K

100

107.3 K

106.5 K

106.7 K

Workloads

Copyright IBM Corporation 2005, 2010. All right reserved.

WID and Modeler Core Workloads

226

12 WID and Modeler Core Workloads


12.1 Order Processing
Order Processing is a workload based on a storyline derived from a fictitious furniture
manufacturing company. This workload is responsible for receiving and managing customer

orders for furniture, scheduling the orders for shipment to the customer, shipping the
orders to the customer, and maintaining the inventory of the company.
Order Processing contains 25 business integration modules, 2 business integration libraries, 57
interfaces, 150 data types and makes use of the full spectrum of SCA component kinds available
in WPS.

12.2 Loan Processing


Loan Processing is a workload based on a storyline from the Financial Services Industry. It is a
collection of many applications with related functionality combined into a single project
interchange. When built, it results in 62 installable EARs, which are used to study performance
characteristics of Build, Publish & Startup operations. As such, it is useful for evaluating the
authoring performance for a relatively large and complex workload.
Loan Processing contains 62 business integration modules, 23 business integration libraries, 5
Java projects, 624 interfaces, 105 business processes, 140 data types, 602 imports & 563 exports.

12.3 Customer Service


Customer Service is a workload based on a storyline from the Telecommunications Services
Industry relative to satisfying customer service requests. It is a collection of many applications
with related functionality combined into a single project interchange. This workload is used to
study performance characteristics of Build operations within WID.
Customer Service contains 49 business integration modules, 15 business integration libraries, 184
interfaces, 97 business processes, 1212 data types, 186 imports & 99 exports.

12.4 BPM@Work
BPM@Work is a Business Process Modeler workload modeling a software development
storyline. It contains a single, complex business process that results in 11 independent process
models that get installed via direct deploy from Modeler to the WPS server.

Copyright IBM Corporation 2005, 2010. All right reserved.

Appendix A - Measurement Configurations

227

Appendix A - Measurement Configurations


This appendix lists the various components and settings used for the measurements presented in
this document; note that some workloads do not utilize all of these settings if they are not
applicable to that workload. Although the measurements were generated on separate platforms
(AIX and Windows), the tuning options for the common software modules (WPS, database, etc.)
were intentionally kept as similar as possible. System settings are listed first, followed by
detailed descriptions of the individual systems used to obtain measurements.

1.1 WPS Settings


Following are settings that were changed for the WPS performance measurements. Otherwise,
unless specifically noted in the workload description, the default settings as supplied by the
product installer were used.

1.1.1 SOABench 2008 Automated Approval and OutSourced Mode


Settings: AIX
This table shows application cluster related settings modified from their default value as
measured in sections 5.1.3, 5.1.4, and 9.10.
Setting

Value

Java Heap Megabytes

1536

Java nursery Megabytes Xmn

768

Default Thread Pool Max

100

BPEDB Data source > connection pool max

300

BPEDB Data source > WebSphere Application Server data


source properties > Statement cache size

300

BPC ME Data source > connection pool max

50

SCA SYSTEM ME Data source > connection pool max

50

WPS Common Data source > connection pool max

500

J2C activation specifications > SOABenchBPELMod2_AS


> Custom properties > maxConcurrency, maxBatchSize

50,

Resources > Asynchronous Beans > Work Managers >


BPENavigationWorkManager > Work request queue size,
max threads, growable

400,

50,
no

Copyright IBM Corporation 2005, 2010. All right reserved.

228
Setting

Value

Application Cluster > Business Flow Manager > Message


pool size, Intertransaction cache size

5000,

Application Cluster > Business Flow Manager > Custom


Properties > DataCompressionOptimization

400
False

These settings are common for measurements at all cores and all numbers of nodes except for the
following additional changes that were made for vertical scaling measurements:
WebContainer Thread Pool Min,Max

100, 100

com.ibm.websphere.webservices.http.maxConnection

50

1.1.2 SOABench 2008 Automated Approval and OutSourced Mode


Settings: Windows and Linux
Three systems were used for these SOABench measurements: the request driver, the WPS server,
and the DB2 database server. The WPS server with SOABench and the DB2 database server
were tuned extensively to maximize throughput; see below for details. Note that some tuning
varied due to the Operating System and the number of processor cores used for measurement.
These variations are presented in tabular format below, after the common tuning.
WPS server configuration (used for all measurements):

Production Template

Security disabled

No default or sample applications installed

Common database defined as local DB2 type 4

Business Process support established with bpeconfig.jacl (note that this sets the Data
sources > BPEDataSourceDb2 > WebSphere Application Server data source properties
statement cache to 300)

WPS server tuning (used for all measurements):

PMI disabled

HTTP maxPersistentRequests to -1

GC policy set to Xgcpolicy:gencon (see table below for nursery setting Xmn)

Remote DB2 databases (connection type 4) used for BPE, SIB System, and SIB BPC
databases

Copyright IBM Corporation 2005, 2010. All right reserved.

229

Automated Approval

OutSourced Approval

Cores

Cores

Tuning
Variations
1

Java Heap Megabytes

1280

1280

1280

1280

1280

Java nursery Megabytes -Xmn

640

640

640

768

768

Web Container Thread Pool Max

100

150

150

100

300

Default Thread Pool Max

100

200

200

100

200

BPE database connection pool


max

150

250

250

150

350

BPC ME database connection


pool max

30

30

30

30

150

SYSTEM ME database
connection pool max

30

40

40

30

100

Common database connection


pool max

80

80

80

80

100

160

160

160

160

BPEInternalActivationSpec
batch size

10

10

SOABenchBPELMod2_AS
batch size

32

32

200

200

Yes

Yes

J2C activation specifications >


eis/BPEInternalActivationSpec >
Custom properties >
maxConcurrency
J2C activation specifications >
SOABenchBPELMod2_AS >
Custom properties >
maxConcurrency

Java custom property


com.ibm.websphere.webservices
.http.maxConnection

40

100

40

200

40

200

Application servers > server1 >


Business Flow Manager >

Copyright IBM Corporation 2005, 2010. All right reserved.

230
Automated Approval

OutSourced Approval

Cores

Cores

Tuning
Variations
1

Application servers > server1 >


Business Flow Manager >
interTransactionCache.size

400

400

Application servers > server1 >


Business Flow Manager >
workManagerNavigation.messag
ePoolSize

4000

4000

Resources > Asynchronous


Beans > Work Managers >
BPENavigationWorkManager >
min threads, max threads,
request queue size

30, 30, 30

30, 30, 30

Application servers > server1 >


Business Process Choreographer
> Business Flow Manager >
Custom Properties >
DataCompressionOptimization

false

false

allowPerformanceOptimizations

The DB2 database server has 3 databases defined for use by the WPS server. The database logs
and tablespaces were spread across a RAID array to distribute disk utilization. The database used
for the BPC.cellname..Bus data store was not tuned. The SCA.SYSTEM.cellname.BUS database
and the BPE database were tuned as follows.
The SCA.SYSTEM.cellname.BUS database:
o

db2 update db cfg for sysdb using logbufsz 512 logfilsiz 8000 logprimary 20
logsecond 20 auto_runstats off

db2 alter bufferpool ibmdefaultbp size 30000

The BPE database was created and tuned as follows:


o

db2 CREATE DATABASE bpedb ON /raid USING CODESET UTF-8


TERRITORY en-us

db2 update db cfg for bpedb using logbufsz 512 logfilsiz 10000 logprimary 20
logsecond 10 auto_runstats off

Using the WPS generated script: db2 -tf createTablespace.sql

Using the WPS generated script: db2 -tf createSchema.sql

db2 alter bufferpool ibmdefaultbp size 132000

Copyright IBM Corporation 2005, 2010. All right reserved.

231
o

db2 alter bufferpool bpebp8k size 132000

1.1.3 SOABench 2008 InHouse Settings


This workload was used in the study evaluating throughput and response time for up to 10,000
concurrent users. (section 9.3) A multi tier topology was used for this study:

A database server which holds the Choreography and Messaging databases.

A WPS Server which runs the processes involved in the application scenario.

A Tivoli Directory Server with LDAP database for user authentication. This ran on the
support system below with the client controller.

2 support systems which each run workload generators (client agents) under the direction
of a single client controller. One support system handles asynchronous service requests
and the other handles synchronous service requests by the business processes running on
the WPS Server.

The database system was tuned in a similar fashion as for the SOABench 2008 OutSourced
scenario measurements. In addition, unused indexes were deleted per the db2 advisor.
The client systems were tuned with two considerations in mind. The first was maintaining load on
the WPS server running the workload which involved Java, thread pool and work manager
tuning. The second was to avoid problems preloading the numerous process tasks into the
system. The latter involved increasing timeouts and resources to maintain connectivity during the
preloading.
Client tuning:
o Transaction Service > tran lifetime timeout
9000
o Transaction Service > async response timeout 9000
o Transaction Service > client inactivity timeout 9000
o Transaction Service > max tran timeout
9000
o Java > max Heap
1280
o Java > -Xgcpolicy
gencon
o Java > -Xmn
512M
o Java Custom > com.ibm.websphere.webservices.http.maxConnection
o Java Custom > com.ibm.ws.webservices.writeTimeout
o Java Custom > com.ibm.ws.webservices.readTimeout
o port 9080 > TCP inbound > Max open connections
30000
o port 9080 > TCP inbound > Inactivity timeout
60
o port 9080 > HTTP inbound > Max persistent req
unlimited
o port 9080 > HTTP inbound > read timeout
6000
o port 9080 > HTTP inbound > write timeout
6000
o port 9080 > HTTP inbound > persistent timeout
3000
o Thread Pool Default min, max
50 to 300
o Thread Pool ORB min max
10 to 100
o Thread Pool WebContainer min max
100 to 400
o Thread Pool TCPChannel min max
5 to 50

Copyright IBM Corporation 2005, 2010. All right reserved.

unlimited
9000
9000

232
o
o
o
o
o
o
o

WebSphereSOABenchWM > Work request queue size 800


WebSphereSOABenchWM > alarm threads
800
WebSphereSOABenchWM > minimum threads
600
WebSphereSOABenchWM > maximum threads
800
Application > HSASC_WebSphereImplApp > WS client binding timeouts
Win OS CurrentControlSet\Services\Tcpip\Parameters TcpTimedWaitDelay
Win OS CurrentControlSet\Services\Tcpip\Parameters MaxUserPort

9000
20 sec
52768

For the system running the directory server the following setting was updated through the LDAP
server admin console.
o

Server Administration > Manage Server properties > Search Settings > Search Size
Limit "unlimited"

The WPS server tuning parameters for this workload are as follows.
o Transaction Service > tran lifetime timeout
900
o Transaction Service > async response timeout 900
o Transaction Service > client inactivity timeout 900
o Transaction Service > max tran timeout
900
o Business Flow Manager > Allow Perf optimizations
yes
o Business Flow Manager > Message Pool Size
4000
o Business Flow Manager > max age for stalled messages 360
o Business Flow Manager > max process time on thread 360
o Business Flow Manager > Intertransaction cache size
400
o Business Flow Manager > DataCompressionOptimization
false
o Java > Heap
1280
o Java > -Xgcpolicy:
gencon
o Java > -Xmn
768M
o Java Custom > com.ibm.websphere.webservices.http.maxConnection
150
o Java Custom > com.ibm.ws.webservices.writeTimeout 9000
o Java Custom > com.ibm.ws.webservices.readTimeout 9000
o Java Custom > com.ibm.websphere.webservices.http.waitingThreadsThreshold
o port 9080 > TCP inbound > pool > WebContainer
yes
o port 9080 > TCP inbound > Max open connections
20000
o port 9080 > TCP inbound > Inactivity timeout
60
o port 9080 > HTTP inbound > Max persistent req
unlimited
o port 9080 > HTTP inbound > read timeout
60
o port 9080 > HTTP inbound > write timeout
60
o port 9080 > HTTP inbound > persistent timeout 60
o Thread Pool Default
50 to 200
o Thread Pool ORB
10 to 50
o Thread Pool WebContainer
10 to 300
o Thread Pool TCPChannel
5 to 20
o connection pool BPE DB
25 to 350

Copyright IBM Corporation 2005, 2010. All right reserved.

233
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o

WebSphere Application Server data source properties > Stmt cache


256
connection pool SIB BPC DB 25 to 150
connection pool SIB SYS DB 25 to 100
connection pool Common DB 10 to 100
Common DB type/location
local/derby
BPECF connections
10 to 100
BPECFC connections
10 to 100
HTMCF connections
10 to 100
BPEInt AS concurrency
160
BPEInt AS batch size
10
SOA BPEL App AS concurrency
160
SOA BPEL App AS batch size
32
BPENavigationWorkManager > Work request queue size
30
BPENavigationWorkManager > alarm threads
30
BPENavigationWorkManager > minimum threads
30
BPENavigationWorkManager > maximum threads
30
BPENavigationWorkManager > isGrowable
false
Win OS netsh int ipv4 set dynamicport tcp start=12000 num=52000
yes
Win OS CurrentControlSet\Services\Tcpip\Parameters TcpTimedWaitDelay

30 sec

Security related tuning for WPS running the InHouse scenario is as follows:
o

Java custom property: com.ibm.websphere.security.util.authCacheSize = 15000

authCache Timeout = 420 minutes

LTPA Timeout = 600 minutes

humantask people query timeout = 25200 sec

1.1.4 Banking Settings


For Banking, BPE uses a DB2 database, while the (SIB) messaging engines have been configured
to use file stores. To select the file store option, start the Profile Management Tool, select
Advanced Profile Creation and then on the Database Configuration screen, select the checkbox
next to Use a file store for Messaging Engines (MEs). For the Banking workload, the BPE
database is located on the same machine as WPS.
Tuning parameter settings for the BPE database were initially derived using the DB2
Configuration Advisor. A few key parameter settings have been modified further. These include:

MAXAPPLS which must be large enough to accommodate connections from all possible
JDBC Connection Pool threads, and

The default buffer pool sizes (number of 4K pages in IBMDEFAULTBP) for each
database are set so that each pool is 256MB in size.

The following table shows the parameter settings used for this report.

Copyright IBM Corporation 2005, 2010. All right reserved.

234
Parameter Name

BPEDB Setting

APP_CTL_HEAP_SZ

144

APPGROUP_MEM_SZ

13001

CATALOGCACHE_SZ

521

CHNGPGS_THRESH

55

DBHEAP

600

LOCKLIST

500

LOCKTIMEOUT

30

LOGBUFSZ

245

LOGFILSIZ

1024

LOGPRIMARY

11

LOGSECOND

10

MAXAPPLS

90

MAXLOCKS

57

MINCOMMIT

NUM_IOCLEANERS

NUM_IOSERVERS

10

PCKCACHESZ

915

SOFTMAX

440

SORTHEAP

228

STMTHEAP

2048

DFT_DEGREE

DFT_PREFETCH_SZ

32

UTIL_HEAP_SZ

11663

IBMDEFAULTBP

65536

Copyright IBM Corporation 2005, 2010. All right reserved.

235
In addition to these database level parameter settings, several other parameters were also
modified using the WAS Admin Console, mostly those affecting concurrency (i.e., thread
settings).

The size of the default thread pool was set to 50 threads

Database connection pool size for the BPEDB was increased to 60 and the statement
cache size for the BPEDB was increased to 300.

The maximum connections property for JMS connection pools was set to 40

Connectivity to the local database is via the DB2 JDBC Universal Driver Type 2 driver.

WPS JVM heap sizes were set to 512 MB

Copyright IBM Corporation 2005, 2010. All right reserved.

Formatted: Font: (Default) Times


New Roman, 11 pt, Font color: Auto

236

1.2 WESB Settings


Following are settings that were changed for the WESB performance measurements. Otherwise,
unless specifically noted in the workload description, the default settings as supplied by the
product installer were used.

1.2.1 WESB Common Settings


These settings are used for all the tests, Web Services and JMS.

Tracing is disabled

Security is disabled

Java Heap size is fixed at 1280 MB for Windows and 1280 MB for AIX

Gencon garbage collection policy enabled, setting the nursery heap size to 1024 MB.

1.2.2 WESB Settings for Web Services measurements

PMI monitoring is disabled

WebContainer Thread pool sizes set to max 50 and min 10

WebContainer Thread pool inactivity timeouts for thread pools set to 3500

1.2.3 WESB Settings for JMS measurements

Activation specification - set maximum concurrent endpoints to 50

Queue Connection factory - set the maximum connection pool size to 51

DiscardableDataBufferSize set to 10MB and CachedDataBufferSize set to 40MB

1.2.4 DB2 Settings for JMS persistent measurements


These settings are only relevant to the JMS persistent tests as they make use of the database to
persist messages.

Place Database Tablespaces and Logs on a Fast Disk Subsystem

Place Logs on Separate Device from Table Spaces

Set Buffer Pool Size Correctly

Set the Connection Min and Max to 30

Set the Statement cache size to 40

Raw partition for DB2 logs

Otherwise, unless specifically noted in the workload description, the default settings as supplied
by the product installer were used.

Copyright IBM Corporation 2005, 2010. All right reserved.

237

1.3 Individual Measurement System Descriptions


The individual configurations listed in this section were used in various combinations to form the
measurement and analysis environment for the workloads and studies documented in this report.
The combination of machines and software distributions used is identified with the data for each
individual workload reported.
The section headings for each of the following machines are used as identifiers in the datasheets
for various workloads. Included in each section here are settings which are specific to the
systems and any of the software modules running on that system.

1.3.1 Intel 2.0GHz - A


Hardware

IBM ThinkPad T42p, Type 2373-C61

1 x 2.0 GHz Intel Pentium M 755

Hyper-Threading not supported

2 GB RAM

L1 1 x 32KB (D) 1 x 32 KB (I), L2 1 x 2 MB caches

Hitachi Travelstar 80 GB Disk

100Mbit Ethernet

Software

Windows XP Professional, SP2

IBM WebSphere Integration Developer v6.2fFix001 - Build id: 6.2-20081216_1440

1.3.2 Intel 2.16GHz - A


Hardware

Lenovo ThinkPad T62p, Model 2007-F16

2 x 2.16 GHz Intel Mobile Core 2 Duo T7400

Hyper-Threading not supported

3 GB RAM

L1 2 x 32 KB (D) 2 x 32 KB (I), L2 1 x 4 MB caches

Hitachi Travelstar 100 GB

100Mbit Ethernet

Software

Windows XP Professional, SP2

IBM WebSphere Integration Developer v6.2fFix001 - Build id: 6.2-20081216_1440

Copyright IBM Corporation 2005, 2010. All right reserved.

238

1.3.3 Intel 2.2 GHz D2D1


Hardware

Lenovo T60p 2.16GHz Intel Core 2 T2600

3.0 GB RAM

100 GB 7200 RPM HDD

1Gbit Ethernet

Software

Microsoft Windows XP Professional with Service Pack 2

WB Modeler 6.2.0.1 or WB Modeler 7.0.0.0

Copyright IBM Corporation 2005, 2010. All right reserved.

239

1.3.4 Intel 2.66GHz - A


Hardware

Lenovo ThinkCentre, Model 9196-A49

2 x 2.66 GHz Intel Core2 Duo E6750

Hyper-Threading not supported

4 GB RAM

L1 2 x 32 KB (D) 2 x 32 KB (I), L2 1 x 4 MB caches

Seagate Barracuda 250GB Disk

100Mbit Ethernet

Software

Windows XP Professional, SP2

IBM WebSphere Integration Developer v7001 - Build id: 7.0.0.1_20091220_1924

1.3.5 Intel 2.66GHz - B


Hardware

Lenovo ThinkCentre, Model 9196-A49

2 x 2.66 GHz Intel Core2 Duo E6750

Hyper-Threading not supported

4 GB RAM

L1 2 x 32 KB (D) 2 x 32 KB (I), L2 1 x 4 MB caches

2 x Seagate Barracuda 250GB Disk

1Gbit Ethernet

Software

Windows XP Professional, SP2

IBM WebSphere Integration Developer v7001 - Build id: 7.0.0.1_20091220_1924

IBM WebSphere Process Server v7.0.0.1 - of0950.17

IBM WebSphere Business Modeler v7.0.0.1

IBM DB2 v8.1.16.429 s080111

1.3.6 Intel 2.8GHz - A


Hardware

8686-3RQ (IBM xSeries 360)

4 x 2.8GHz Pentium 4

Copyright IBM Corporation 2005, 2010. All right reserved.

240

Hyperthreading disabled

4 GB RAM

L1: 8KB per physical processor

L2: 512KB per physical processor

L3: 2MB per physical processor

14x18.2GB 15K U160 RAID 0 Disk Array

100Mbit Ethernet

Software

Windows 2000 Advanced Server SP4

IBM DB2 8.1 FP7

WPS 6.1.0

1.3.7 Intel 2.8GHz - B


Hardware

IBM xSeries x365 Xeon 2.8GHz Pentium 4 (4-way SMP)

2MB L3 cache

3.5 GB RAM

IBM ServeRaid Disk Subsystem 256MB Battery backed cache, write


back cache

1 Gbit Ethernet

Software

Windows 2003 Server Standard Edition Service Pack 2

JMSPerfHarness

Copyright IBM Corporation 2005, 2010. All right reserved.

241

1.3.8 Intel 2.8GHz - C


Hardware

IBM xSeries x335 2.8GHz Pentium IV Xeon (1-way)

1.5 GB RAM

1 Gbit Ethernet

Software

Red Hat Enterprise Linux (RHEL 4)

HTTP traffic generator

1.3.9 Intel 2.8GHz - D


Hardware

IBM ThinkCentre, Model 8212-KUA

2 x 2.8 GHz Intel Pentium D 820

Hyper-Threading not supported

3 GB RAM

L1 2 x 16 KB, L2 2 x 1 MB caches

Western Digital 160 GB Disk

100Mbit Ethernet

Software

Windows XP Professional, SP2

IBM WebSphere Integration Developer v6.2fFix001 - Build id: 6.2-20081216_1440

1.3.10

Intel 2.93GHz A

Hardware

IBM xSeries 3950M2

4 Quad-core 2.93GHz Intel Xeon CPU X7350

Hyperthreading not supported

24GB RAM

L1 (Primary cache): 32K Instruction (I) + 32K Data (D) per processor, L2 (Secondary
cache): 8MB I+D per processor (4MB shared per 2 cores)

Copyright IBM Corporation 2005, 2010. All right reserved.

242

24 x 73.4 GB RAID 10 Disk Array

1 Gigabit Ethernet

Software

Windows 2008 Server Enterprise Edition

1.3.11

Intel 2.93GHz B

Hardware

7141-4SU (IBM xSeries 3950 M2)

16 x 2.93 GHz: Intel Xeon CPU X7350 (4 quad-core processors)

Hyperthreading not supported

24 GB RAM

L1: 32K Instruction (I) + 32K Data (D) per processor

L2: 8MB I+D per processor (4MB shared per 2 cores),

two 12 x 73.4 GB 15K IBM SAS RAID 10 Disk Array

1 Gigabit Ethernet

Software

Microsoft Windows Server 2008 Enterprise SP1

Red Hat Enterprise 5.2 Linux, version 2.6.18-92.el5PAE

WPS 7.0.0.1

1.3.12

Intel 2.93GHz C

Hardware

7141-4RG (IBM xSeries 3950 M2)

16 x 2.93 GHz: Intel Xeon CPU X7350 (4 quad-core processors)

Hyperthreading not supported

40 GB RAM

L1: 32K Instruction (I) + 32K Data (D) per processor

L2: 8MB I+D per processor (4MB shared per 2 cores),

One 1 x 73.4 GB 10K IBM SAS RAID 0 Disk Array

One 3 x 73.4 GB 10K IBM SAS RAID 0 Disk Array

1 Gigabit Ethernet

Copyright IBM Corporation 2005, 2010. All right reserved.

243
Software

Windows 2003 Server Standard Edition Service Pack 2

WESB 7.0.0.1

1.3.13

Intel 2.93GHz D

Hardware

7141-4RG (IBM xSeries 3950 M2)

16 x 2.93 GHz: Intel Xeon CPU X7350 (4 quad-core processors)

Hyperthreading not supported

24 GB RAM

L1: 32K Instruction (I) + 32K Data (D) per processor

L2: 8MB I+D per processor (4MB shared per 2 cores),

1 Gigabit Ethernet

Copyright IBM Corporation 2005, 2010. All right reserved.

244
Software

Windows 2008 Server Standard Edition Service Pack 1

WPS 6.2.0 (for SOABench 2008 Services)

1.3.14

Intel 3.0GHz A

Hardware

IBM xSeries 365, 4 x 3.0 GHz Pentium 4 Xeon

Hyper-threading disabled for measurements

6 GB RAM, 4 MB L3 Cache

14 x 34 GB RAID 1E Disk Array (for WPS and DB containers)

14 x 34 GB RAID 1E Disk Array (for DB logs)

Software

Windows 2003 Server SP1

IBM DB2 8.2 FP6

WPS 6.0.0 (GMo0537.08), WPS 6.0.1 (GMo0550.06), WPS 6.0.1.1 (m0612.02),


WPS 6.0.1.2 (o0621.07), WPS 6.0.2 (m0649.11), WPS 6.1.0 (o0748.03)

1.3.15

Intel 3.0GHz - B

Hardware

IBM xSeries 365

4 x 3.0 GHz Xeon

Hyper-threading disabled for measurements

6 GB RAM, 4 MB L3 Cache

14 x 36 GB RAID 1E Disk Array

100 Mbit Ethernet

Software

Windows 2003 Server Standard Edition SP1

IBM UDB ESE 8.1 Fix pack 13

IBM WebSphere Process Server, 6.0.2.0 Build m0649.11 with 6.0.2-WS-WPS-ESBWinX32-CritFixes.zip packaged 13 DEC 2006

SOABench 2005 (2005 specification) built on IBM WebSphere Integration


Developer 6.0.2

Copyright IBM Corporation 2005, 2010. All right reserved.

245

1.3.16

Intel 3.0GHz - C

Hardware

IBM xSeries x365 Xeon 3.0GHz Pentium 4 (4-way SMP)

4MB L3 cache

4 GB RAM

IBM ServeRaid Disk Subsystem 128MB Battery backed cache, write

back cache

1 Gbit Ethernet

Software

Windows 2003 Server Standard Edition Service Pack 1

WebSphere ESB V6.2.0

WebSphere ESB V6.2.0 Fix packs :6.2.0.X-WS-WASJavaSDK-WindowsX32-IFPK58751.pak

1.3.17

Derby ( default database )

Intel 3.0GHz - D

Hardware

IBM xSeries x365 Xeon 3.0GHz Pentium 4 (4-way SMP)

4MB L3 cache

3.5 GB RAM

IBM ServeRaid Disk Subsystem 128MB Battery backed cache, write


back cache

1 Gbit Ethernet

Software

Windows 2003 Server Standard Edition Service Pack 2

WebSphere MQ V6.0.2.2

WebSphere ESB V7.0.0.1

Copyright IBM Corporation 2005, 2010. All right reserved.

246

1.3.18

Intel 3.0GHz D2D2

Hardware

Lenovo ThinkCentre 9482-FBU 3.00GHz Intel Core 2 Duo E8400

4.0 GB RAM

250GB 7200 RPM S-ATA HDD

1Gbit Ethernet

Software

Red Hat Linux, kernel 2.6.18-164.6.1.e15

WB Monitor 6.2.0.2 or WB Monitor 7.0.0.0

1.3.19

Intel 3.5GHz - A

Hardware

IBM xSeries x3850 Xeon 3.5GHz (4-way Dual Core)

3 GB RAM

IBM ServeRaid Disk Subsystem with 8i SAS Controller, 256MB Battery


backed cache, write back cache

1 Gbit Ethernet

Software

Windows 2003 Server Standard Edition Service Pack 2

IBM DB2 9.5

1.3.20

Intel 3.5GHz - B

Hardware

IBM xSeries x3850 Xeon 3.5GHz (4-way Dual Core)

16 GB RAM

IBM ServeRaid Disk Subsystem with 8i SAS Controller, 256MB Battery


backed cache, write back cache

1 Gbit Ethernet

Software

Windows 2003 Server Standard Edition Service Pack 2

Copyright IBM Corporation 2005, 2010. All right reserved.

247

1.3.21

WebSphere Application Server V7.0

Intel 3.5GHz C

Hardware

8864-5RU (IBM xSeries 3850)

8 x 3.5 GHz Pentium 4 Intel Xeon (4 dual-core processors)

Hyper-threading disabled for measurements

10 GB RAM

L1: 16 KB per core

L2: 1 MB per core

L3: 16 MB per physical processor

12 x 36.4 GB 15K IBM SAS RAID 10 Disk Array

1 Gb Ethernet

Software

Windows 2003 Server SP2 Enterprise Edition

WPS 6.2.0 (for the SOABench 2008 driver and services)

1.3.22

Intel 3.5 GHz D

Hardware

IBM x Series 3850, 4 dual-core 3.5GHz Pentium 4 Xeon cores

Hyperthreading disabled

16GB RAM

RAID Disk Subsystem

Software

Windows 2003 SP2

1.3.23

Intel 3.67GHz - A

Hardware

IBM xSeries x366 Xeon 3.67GHz Pentium 4 (4-way SMP)

3.25 GB RAM

IBM ServeRaid Disk Subsystem with 8i SAS Controller, 256MB Battery


backed cache, write back cache

Copyright IBM Corporation 2005, 2010. All right reserved.

248

1 Gbit Ethernet

Software

Windows 2003 Server Standard Edition Service Pack 2

JMSPerfHarness

1.3.24

Intel 3.67 GHz - B

Hardware

IBM xSeries x366

4 x Xeon 3.67GHz Pentium 4

Hyper-threading disabled

4 GB RAM

L1 cache 16KB, L2 cache 1MB

1 Gbit Ethernet

Software

Windows 2003 Server Standard Edition Service Pack 1

SOABench 2005 Client and Human Task Simulator

1.3.25

Intel 3.67GHz - C

Hardware

IBM xSeries 366 3.66GHz Intel Xeon 1-way

2.0 GB RAM

1Gbit Ethernet

Copyright IBM Corporation 2005, 2010. All right reserved.

249
Software

Red Hat Linux 2.6.9-34.EL

HTTP traffic generator

1.3.26

PPC 1.9 GHz - A

Hardware

9117-570 (p5 570)

16 x 1.9GHz POWER5+ processor cores

Simultaneous Multithreading (SMT) enabled

64GB RAM

1.9 MB L2 cache share per two core

36 MB L3 cache shared per two cores

1 Gb Ethernet

Software

AIX 5300-11-01-0944

WPS 7.0.0.1

IBM HTTP Server 7

1.3.27

PPC 2.2 GHz A

Hardware

9117-570 (p5 570)

8 x 2.2GHz POWER5+ processor cores

Simultaneous Multithreading (SMT) enabled

16GB RAM

1.9 MB L2 cache share per two core

36 MB L3 cache shared per two cores

4 12 x 36GB 15K U320 RAID 10 arrays

1 Gb Ethernet

Software

AIX 5300-07-01-0748

DB2 8.1 FP13 or DB2 9.5 FP 3

Copyright IBM Corporation 2005, 2010. All right reserved.

250

1.3.28

PPC 2.2 GHz B

Hardware

9117-570 (p5 570)

8 x 2.2GHz POWER5+ processor cores

Simultaneous Multithreading (SMT) enabled

32GB RAM

1.9 MB L2 cache share per two core

36 MB L3 cache shared per two cores

4 12 x 36GB 15K U320 RAID 10 arrays

1 Gb Ethernet

Software

AIX 5300-07-01

IBM DB2 9.5 FP3

1.3.29

PPC 2.2 GHz C

Hardware

IBM POWER5 2.2 GHz 8 processor cores

16GB RAM

RAID Disk Subsystem

Software

AIX 5300-07-01-0748

DB2 9.5 Fix Pack 3

1.3.30

PPC 4.2GHz - A

Hardware

IBM pSeries 570, 4.2 GHz PPC POWER6 (8-way SMP)

SMT enabled for measurements

64 GB RAM

1 Gbit Ethernet

Software

AIX 6.1.0.0

Copyright IBM Corporation 2005, 2010. All right reserved.

251

1.3.31

WebSphere ESB V6.2

PPC 4.2GHz - B

Hardware

IBM pSeries 570, 4.2 GHz PPC POWER6 (8-way SMP)

SMT enabled for measurements

64 GB RAM

SAN Disk Subsystem

1 Gbit Ethernet

Software

AIX 6.1.0.0

IBM DB2 9.5 FP 2

Copyright IBM Corporation 2005, 2010. All right reserved.

252

1.3.32

POWER6 4.7 GHz - A

Hardware

9117-MMA (IBM Power 570)

8 x 4.7GHz POWER6 processor cores

Simultaneous Multithreading (SMT) enabled

64GB RAM

4 MB L2 cache per core

32 MB L3 cache shared per two cores

6 x 36GB 15K U320 RAID 10 array per lpar

1 Gb Ethernet per lpar

Software

AIX 6100-04-01-0944

WPS 7.0.0.1

1.3.33

POWER6 4.7 GHz - B

Hardware

9117-MMA (IBM Power 570)

8 x 4.7GHz POWER6 processor cores

Simultaneous Multithreading (SMT) enabled

64GB RAM

4 MB L2 cache per core

32 MB L3 cache shared per two cores

6 x 36GB 15K U320 RAID 10 array per lpar

1 Gb Ethernet per lpar

Software

AIX 6100-04-01-0944

WPS 7.0.0.1

1.3.34

POWER6 4.7 GHz - C

Hardware

9117-MMA (IBM Power 570)

Copyright IBM Corporation 2005, 2010. All right reserved.

253

8 x 4.7GHz POWER6 processor cores

Simultaneous Multithreading (SMT) enabled

32GB RAM

4 MB L2 cache per core

32 MB L3 cache shared per two cores

12 x 73GB 15K SAS RAID 10 array per lpar

1 Gb Ethernet per lpar

Software

AIX 6100-04-01-0944

WPS 7.0.0.1

1.3.35

POWER6 4.7 GHz - D

Hardware

9117-MMA (IBM Power 570)

16 x 4.7GHz POWER6 processor cores

Simultaneous Multithreading (SMT) enabled

64GB RAM

4 MB L2 cache per core

32 MB L3 cache shared per two cores

12 x 73 GB 15K SAS RAID 10 array

1 Gb Ethernet per lpar

Software

AIX 6100-04-01-0944

WPS 7.0.0.1

1.3.36

POWER6 4.7 GHz E

Hardware

9117-MMA (IBM Power 570)

8 x 4.7GHz POWER6 processor cores

Simultaneous Multithreading (SMT) enabled

32GB RAM

4 MB L2 cache per core

Copyright IBM Corporation 2005, 2010. All right reserved.

254

32 MB L3 cache shared per two cores

12 x 73 GB 15K SAS RAID 10 array and 12 x 73 GB 15K SAS RAID 0 array

1 Gb Ethernet

Software

AIX 6100-04-01-0944

WPS 7.0.0.1

DB2 9.5 FP 3

1.3.37

POWER6 4.7 GHz F

Hardware

IBM Power 570 POWER6 4.7GHz 8 processor cores

64GB RAM

RAID Disk Subsystem

Software

AIX 6100-00-03-0808

WPS 6.2.0

Websphere Interchange Server (WICS) 4.3.0.6

Websphere ADK 2.6.0.12

WICS Webservices adapter 3.4.7

WICS XML datahandler 2.7.3

Websphere MQ 6.0.2.5

1.3.38

POWER6 4.7 GHz G

Hardware

9117-MMA (IBM Power 570)

16 x 4.7GHz POWER6 processor cores

Simultaneous Multithreading (SMT) enabled

128RAM

4 MB L2 cache per core

32 MB L3 cache shared per two cores

12 x 73GB 15K U320 RAID 10 arrays

1 Gb Ethernets configured as an 802.3ad Link Aggregation

Software

Copyright IBM Corporation 2005, 2010. All right reserved.

255

AIX 6100-04-01-0944

WPS 7.0.0.1

DB2 9.7 FP 1

1.3.39

POWER7 3.55 GHz A

Hardware

IBM pSeries 750

6 x 3.55 GHz POWER7 processor cores

12 GB RAM

Simultaneous Multithreading (SMT) enabled with 4 SMT threads

Software

AIX 6..1

WPS 7.0.0.1

1.3.40

POWER7 3.55 GHz B

Hardware

IBM pSeries 750

4 x 3.55 GHz POWER7 processor cores

12 GB RAM

Simultaneous Multithreading (SMT) enabled

Software

AIX 6..1

SOABench 2008 Client Driver and Services provider

Copyright IBM Corporation 2005, 2010. All right reserved.

Appendix B References

256

Appendix B References
1. WebSphere BPM Performance References
https://w3quickplace.lotus.com/QuickPlace/wasperf/PageLibrary852569AF00670F15.nsf
/h_Toc/3648196DB48799C7852570EE00730294/?OpenDocument&Form=h_PageUI
2. WebSphere BPM Version 7.0 information center
http://publib.boulder.ibm.com/infocenter/dmndhelp/v7r0mx/index.jsp
3. WebSphere Application Server Performance Best Practices and Resources
https://w3quickplace.lotus.com/QuickPlace/wasperf/Main.nsf/h_Toc/e600a81c8a827220
85256efb000b5116/?OpenDocument
4. WebSphere Application Server Performance URL
http://www.ibm.com/software/webservers/appserv/was/performance.html
5. WebSphere Application Server 7.0 information center (including Tuning Guide)
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websph
ere.base.doc/info/aes/ae/welcome_base.html
6. Setting up a Data Store in the Messaging Engine
http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websph
ere.pmc.nd.multiplatform.doc/tasks/tjm0005_.html
7. DB2 Best Practices for Linux, UNIX, and Windows
http://www.ibm.com/developerworks/data/bestpractices/?&S_TACT=105AGX11&S_C
MP=FP
8. DB2 Version 9.7 Info Center
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp.
9. DB2 Version 9.5 Info Center
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp
10. Redbook: WebSphere BPM v7 Production Topologies
http://www.redbooks.ibm.com/redpieces/abstracts/sg247854.html
11. Redbook: IBM WebSphere InterChange Server Migration to WebSphere Process
Server
http://www.redbooks.ibm.com/redbooks/pdfs/sg247415.pdf

Copyright IBM Corporation 2005, 2010. All right reserved.

Appendix B References

257

12. Red Paper: WebSphere Business Process Management v7 Performance Tuning


http://www.redbooks.ibm.com/redbooks.nsf/RedpieceAbstracts/redp4664.html?Open
13. WebSphere Adapters v7.0 Performance Report
https://w3quickplace.lotus.com/QuickPlace/wasperf/PageLibrary852569AF00670F15.nsf/
h_B5E907903D033001852570EE00732B7E/2EE6499DD40C9569852576FD00422032/?
OpenDocument

14. WPS Wiki


http://w3.tap.ibm.com/w3ki2/display/WPS/Home
15. Microflows vs. Long-running Processes : Tuning Transaction Boundaries
https://w3quickplace.lotus.com/QuickPlace/wasperf/PageLibrary852569AF00670F15.nsf
/h_0DD72A0FDA0EFC0785256E010040EEC1/74341DDA82E3C4F78525729000676B
D7/?OpenDocument

16. Using JCA Adapters with WPS and WESB


http://www-128.ibm.com/developerworks/library/ws-soa-j2caadapter/index.html?ca=drs-

17. WPS Support


http://www-306.ibm.com/software/integration/wps/support/
18. WESB Support
http://www-306.ibm.com/software/integration/wsesb/support/

19. IBM Java 6.0 Diagnostic Guide


http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp
20. Oracle Database 11g Release 1 documentation (includes a Performance Tuning Guide):
http://www.oracle.com/pls/db111/homepage
21. A white paper discussing Oracle on AIX:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100883
22. Oracle 10g Release 2 documentation (includes a Performance Tuning Guide)
http://www.oracle.com/pls/db102/homepage

Copyright IBM Corporation 2005, 2010. All right reserved.

You might also like