You are on page 1of 5

C24 White Paper — High performance scalable XML

XML is great, but ….


How do you achieve High Performance Scalable XML processing?
A C24 White Paper by John Davies, Technical Director of C24

John Davies summarizes the key points that make the approach taken by the C24 Integration Objects
toolkit faster, more efficient, more scalable and more robust – and it works straight out of the box.

The background
We've had XML (Extensible Mark-up Language), as we know it, for almost 6 years. It is not a new idea,
having evolved from Standard Generalized Mark-up Language (SGML) which itself evolved from GML.
These two predecessors date from 1986 and 1969 respectively.

W3C finally produced the XML 1.0 standard in early 1998; at last we had a common path for HTML
(XHTML) and programmers could finally work towards a standard.

But somewhere along the line, things got a little out of hand.

XML hit the "dot com" boom in its heyday, tens and even hundreds of millions were spent on promoting
technologies that used XML and it became the "must have" technology of the era. Products were laughed
at if they didn't support XML in some form or another.

New versions of Java came out with XML support; C# did the same a little later. XML was now the mark-
up language of choice for inter-process and inter-application communication. New standards like SOAP
and XML-RPC evolved and before we knew it there were suggestions that it could even replace RMI,
(Java's Remote Method Invocation), IIOP (CORBA's Internet Inter ORB Protocol) and DCOM (Microsoft's
Distributed COM) as the "standard" for distributed computing.

XML is great, but…


XML is great. It has made so many previously flat tag/value forms actually usable. Certainly from a
data-oriented rather than a document-oriented perspective, there is no better way to store application
data and configuration files.

All this doesn't come without compromise and cost though. You can't for example execute XML (OK,
excluding one of my favourite technologies, Jelly but it’s basically tags for Java). Furthermore, how can
we specify that a number conforms to a checksum routine in XML Schema?

The main problem comes with complexity. In specialist vertical market applications such as retail, telcos,
pharmaceuticals and banking things get very complex. These verticals have their own “languages”, some
not too dissimilar to XML and many derived directly from it. The majority of these verticals are moving
towards XML as a standard; while this doesn't give them much over the existing languages itself it does
make third party integration much easier given the vast number of tools available. The complexity of
these standards creates new problems, first of all the implementation of the standard itself, and then its
usage. I.e. is XML schema up to the job? If so then how do we manage the resulting XML instance
documents, we can define them but they start to get complex and bloated?

Can we standardise on XML Schema?


Many of the industry standards have rules and conditions; these are often so complex that they can't be
represented in Schema. The number of different message and sub-message types is often so numerous
that each one needs its own schema. The end result is often an unmanageable tangle of schema
documents that implement only 95% of the original standard. We are actually losing definability by
moving to XML. Having said that though, XML Schema is about the best language we have for defining
data models and constraints, all the others are proprietary, it’s just lacking precision.

But we need human readable messages. Or do we?


One much-quoted advantage of XML is that it is "human readable". Whereas this is still technically true
for complex XML, your poor human will need a PhD in the particular industry vertical before being able
to "read" the instance document. The document will have dozens of namespace prefixes, frequently lack
“pretty” formatting and start to look like something off the Matrix.

www.C24.biz
Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved


C24 White Paper — High performance scalable XML

Grinding to a halt
Finally the show stopper! You'll have watched with dismay as your XML documents grow in size and
complexity, and throughput performance declines proportionally. You need to parse and validate it
against the rules (at least the 95% you managed to implement) in the schema and these are dispersed
across a dozen or so .XSD files. These schema files themselves contain XML, and all have to be parsed.
The alternative, choosing not to validate, is pointless since XML without a schema has no concept of
types. Someone could easily put "NaN" into a numeric field and it will be perfectly valid without the
schema. Your invalidated XML will parse but your program logic won't be so happy. Inevitably your
application source code would become polluted with the very constrain tests which XML Schema should
be doing for you.

XML is legacy
XML data interchange is the equivalent of the fax machine of contemporary software architecture. "Fax?"
I hear you say, "surely XML is up to date and leading edge, not like the outmoded technology of fax
which has been superseded by email". Let’s backtrack for a moment and look at how XML got to its
position as the jewel in the crown of data interchange architecture that it is today, then see if I can
justify this analogy.

Using XML for complex message exchange is like using a fax machine to exchange Word documents
even though you've got email. You enter data into your document and then serialize the document on
the printer, the universal human readable "standard". You then fax the document to your colleague who
passes it through his OCR (Optical Character Recognition) reader. After what could be as quick as 5
minutes your colleague has the same document and can begin proof reading it. He makes the changes
on his version of Word (or something else for that matter) and faxes it back,. Voila - two way document
flow! About the only advantage gained over email is that we've solved the problem of which of the
umpteen versions of Word to use and reduced the risk of getting a virus to virtually nothing.

This might sound like a daft way of working but it is an accurate description of the process. It’s very
inefficient on both sides of the message exchange.

The real world


I have spent most of my working life in large wholesale banks, architecting complex systems. About 5 to
6 years ago we started to hear "XML is our standard", but very few systems actually "spoke" XML. As
time moved on, we started to see XML message buses and even messaging frameworks built around
XML e.g. OpenAdapter. Trade volume went up as efficiency improved, XML complexity increased as we
got better tools. More applications came on line, the network volume went through the roof and servers
started to red-line. Throwing hardware at the problem rarely helped; the bottleneck was parsing and
validating the XML and it's not easy to share that across multiple machines without a framework in
place. It is not unusual to see thousands of messages a second come from the front or middle office in a
bank. A small change in a base rate can force a re-valuation of tens of thousands of trades, each one
having multiple "legs" (parts), and delays can cost millions. Speed can be absolutely business-critical.

The XML bus


Over the last 2 to 3 years entire architectures have now been built around an XML bus. A deal is entered
into the front office trading system; it is transformed into XML and sent on to the middle office. From
there we have risk systems, P&L, limit checking, counterparty confirmation, settlement instructions,
matching, ticket printing and anti-laundering checking, to mention just a few. Some results are sent
back up to the front, some stored, some sent to brokers as confirmation and some sent to the back
office. From the back office the documents are parsed (from XML again) and transformed once again for
confirmation, reconciliation and settlement. Volume and XML complexity is increasing faster than
Moore's law is helping (by producing faster hardware). We find ourselves closer and closer to gridlock, it
takes just one small surprise on the markets and tens of millions can be lost “in process” waiting for XML
messages to filter through the systems.

What about SOA as a solution?


Does a Service Oriented Architecture (SOA) help? Yes it can but not on its own. By reducing the amount
of discrete, stand-alone processes and placing them in one manageable "box" parts of the system
become easier to manage and at the same time reduce the amount of XML being passed from place to
place. What's happening though is that we are un-doing the XML-bus architecture and going back to a
single server model (or at least a cluster). It’s very hard to get these various parties to agree to put
everything into one box. Traditionally these systems have evolved over a number of years and each
contains decades of expertise. It’s not unusual for each application to be in a physically different location
and we’re back to having XML flowing between them. SOA is good for expressing the connections
between application clusters but it is not the silver bullet for large enterprise scenarios. Rather it is
important to see SOA as part of a wider architectural solution in which it plays a part but does not
dominate.

www.C24.biz
Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved


C24 White Paper — High performance scalable XML

What is the solution to XML performance?


There are a lot of quick fixes. Hardware helps but it's an expensive way to fix performance problems and
it's unlikely that doubling up on the hardware will double your capacity. Move from .NET to J2EE or move
from J2EE to .NET? This is exactly what you're likely to hear if you speak to one of the big application
server vendors or a major consulting firm. If you just change vendor and not your architecture you are
unlikely to see much return other than empty pockets after you've paid the consulting firm for
recommending the change and implementing it for you.

There are a number of companies offering significant performance gains with XML. Confirmative Systems
claims “The company estimates that its solution provides a greater than 20X improvement over
conventional servers by addressing XML data processing including parsing, validation, and
transformation in its proprietary <CSXp> chipset.”

PolarLake claims “PolarLake overcomes the performance issues often associated with processing XML by
employing a number of innovative technologies, typically increasing throughput by 30-50 times
compared with other servers.”, and go on to list “XML-streaming, Multi-threading, Single scan and
Selective processing” as key factors.

So, these companies have obviously seen the problem but have different solutions: one will sell you yet
more hardware and the other will sell you a closed server using “innovative technologies”.

We have a simpler and cheaper solution!


Don't use XML for inter-process communication and data transfer. Use XML for what it was designed for,
document oriented mark-up and use Java objects for complex data-oriented messages. This isn’t a new
idea; we just provide the tools to facilitate it, you can then devote more time to your business.

Replace XML with Java?


Document oriented; no, data-oriented; yes, what’s the difference?

The difference is simple If your data was, is or at some time will be a document, e.g. web form, web
page, report etc. then stick to XML. The tools around XML are mainly designed for documents.

If however your message is part of an exchange of data, e.g. FpML, FIX, SWIFT etc. then use Java to
exchange data and not XML. Note that XML/SOAP still has its place in inter-company messaging; it just
makes more sense to use Java objects in many internal scenarios.

How do I replace XML with Java?


The goal here is not to change something that works already, just fix the problems. XML works, but it's
just slow and inefficient. XML Schema is a good way to define data models and it has been the main
drive behind standardisation initiatives in the vertical industries (e.g. FpML, MDDL).

The result is to generate Java classes from the Schema, a Model Driven Architecture (MDA) where XML
Schema is the model. The generated Java object model then not only functions as a template for the
Schema model but also undertakes the validation. It is type-safe, self validating, in most cases quite a
bit smaller than the XML equivalent and requires no parsing. Since it has knowledge of its own structure
it knows how many instances of a particular element can be added as defined by the XML Schema
model. What's more, when you want to get XML back out of it you just ask the object to output itself as
XML.

C24’s Integration Objects


Century 24 Solutions, C24, is a well established software house selling integration solutions,
predominantly for the financial services industry. We have tier one banks and clearing houses all over
the world as clients using our SWIFT, FIX, XML and other message format objects.

C24’s Integration Objects (IO) is quite simply a model driven code generator. You can either design the
data model using the rich Swing based GUI, import it from an external source e.g. XML Schema, DTD,
RDBMS etc. or use one of the library models that we’ve painstakingly created from the original
specification (as was done with SWIFT for example).

Taking FpML as an example the user can simply load the main FpML 4.0 schema into the IO-Editor in a
second or two. From there the FpML IO model is an exact copy of the FpML Schema, so much so that if
re-exported it is binary identical to the original. Changes can now be made to the model under the
control of the version management system; although in the example of FpML it might not be the best
idea to do so. You can of course, use the IO Editor to create and manage the XML schema data models
rather than just import them.

www.C24.biz
Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved


C24 White Paper — High performance scalable XML

The resultant models can be deployed as Java code, in the case of FpML it results in something between
350 and 1050 source files. The range is due to the number of deployment options. You can for example
produce an interface with each complex type implementation; this provides the user with a fixed API
that resists change rather like XML without a schema does. NameSpaces become packages, complex
types become classes and schema restrictions and regular expressions are implemented as validation
methods.

The deployed Java is not simply a directory full of source files, messages are deployed along with ANT
scripts, dependent JARs, JavaDoc (including the Schema annotations) and even a Maven project file for
the brave. In less than 5 minutes you can go from FpML Schema to a Java component in the form of a
JAR with a richly documented API. Anything simpler than FpML is obviously much quicker.

The classes implement “hand coded” externalization routines whereby they serialize themselves with
near perfect efficiency. All deployed IO components have utility classes for reading and writing XML
instances into and out of the IO component and all include an XPath implementation. This also works for
non-XML based models including SWIFT and FIX etc.

The serialized objects can be easily decoded to allow simple and effective debugging as well as XSL-style
facilities implemented directly in Java. This increases both performance and the coherence of your code.

XML and beyond…


With C24’s IO you now have a Java object model that is very close to what would have been written if
you had had to code it yourself. It is small, efficient and powerful and yet it retains all of the features of
XML. It goes a lot further though, rather than simply take on XML Schema restrictions it can be
extended. We can now write real Java code for checking the value of elements, these can reference
other elements or even external sources.

We have clients for example that check counterparties and currencies from live databases and these
checks are actually built into the deployed code. We can fully validate things like IBAN ISO13616 codes,
ISO currency, country and BIC codes, postcodes, zip codes, credit cards numbers, payment dates,
holidays dates etc. all things that are impossible in XML Schema.

Because IOs are small and totally self-contained a lot of other interesting possibilities arise. We can
apply rules to the components and send them off on their way. These rules can be executed remotely
without having to be centralised. Components can by truly distributed by using technologies like RuleML
and Enigmatec’s RIF.

C24’s IO provides the components needed for Grid-computing and Jini’s JavaSpaces. IOs and
JavaSpaces were made for each other. The IO-Editor can deploy code that implements (for example)
net.jini.core.entry.Entry, they can all be written into JavaSpaces. C24’s IO in JavaSpaces is like having a
database that works on native XML but with everything in memory – indeed shared memory across a
number of machines. This database is transactional, scalable and with IO can contain not only data but
executable validation and workflow rules.

It makes a high performance and scaleable processing using a Grid computing model a practical
framework. By applying matching rules to IO components and writing them into JavaSpaces we are able
to provide XML-to-XML matching and reconciliation orders of magnitude faster than traditional “flat”
matching engines.

Using GigaSpace’s Embedded Spaces for example we are able to achieve more than 2 and in some cases
3 orders of magnitude faster throughput than using “raw” XML messaging.

Conclusion
Using XML for standards publication and associated rules is good. But in message based integration, as
XML becomes more popular, it becomes proportionally less practical. The issues are that XML tends to
get bloated and inefficient if used in large complex inter process communications.
The necessary parsing and validation based on moving standardised XML instances is inefficient in terms
of computational horsepower to achieve the required throughputs and latency demands, and expensive
to deploy in terms of development resources.

This existing integration infrastructure can be made more efficient using open C24 IO components
without any investment in proprietary “go faster” solutions.

C24 IO also enables the practical application of Grid type architectures. With the increasing availability of
Blade type high density low cost computing platform appliances, the Grid type model provided by
JavaSpaces implementations comes of age.

Put simply, the C24 IO toolkit technology provides a more efficient model drive architecture approach to
XML integration.

www.C24.biz
Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved


C24 White Paper — High performance scalable XML

Glossary of terms:
FpML (Financial products Markup Language) is the business information exchange standard for
electronic dealing and processing of financial derivatives instruments. (http://www.fpml.org)

OpenAdapter can be loosely classified as EAI (Enterprise Application Integration) software based on
Java and XML. (http://www.openadapter.org)

S.W.I.F.T. (Society for Worldwide Interbank Financial Telecommunications), one of the main standards
for financial messaging, handles over 2 billion messages per year (http://www.c24.biz/swift.htm)

FIX (Financial Interface eXchange) another de facto standard in the banking industry
(http://www.c24.biz/fix.htm)

Jelly is an excellent tool for turning XML into executable code from Apache
(http://jakarta.apache.org/commons/jelly/)

RuleML (Rule Markup Language) is an open standard for rules in XML (http://www.ruleml.org/)

CodeMesh is a company that provides leading edge C++ to Java integration tools
(http://www.codemesh.com)

Enigmatec, an innovative company working on leading edge technologies like Grid computing and
Distributed Rules (http://www.enigmatec.net)

IBAN (International Bank Account Number) is an ISO standard that sounds simple but it’s actually
rather complex. (http://www.ecbs.org/iban.htm)

JavaSpaces is part of Jini. Grid Computing for Java, enabling truly distributed systems with minimal
overhead. The perfect framework for Blade hardware.

Jini has been around since the 90s, now finally come of age as the Grid technology for the future
(http://www.jini.org)

www.C24.biz
Applied Solutions in Finance

copyright © 2004 Century 24 Solutions. All rights reserved

You might also like