You are on page 1of 5

1 Aim of this document

This document presents ideas for what good performance tailored automation should be aiming for, what parts comprise it and how individuals or groups can use new developments in technology to enable sharing and reuse. Nothing herein is especially innovative, but instead makes up the logical pieces of a good solution, many of which get overlooked. It is targeted at developers/maintainers of performance systems whether new or existing and should help direct decisions toward good long-term solutions. Each recommendation is based upon observation of strong points and shortcomings in various existing performance systems. This is a lessons learnt document as much as a best practice one. In the spirit of sharing, if you are interested in the points being made, please contact the author.

2 Terminology
The following terminology is used: Automation the control and co-ordination system that runs a performance focused test. Tooling the scripts and tools which are run to setup resources, monitor resources or generate load etc. This includes tools that would come packaged with a generic performance automation package as well as those which are specific to a certain product or performance domain. System the set of all machines, tooling and automation involved in a particular problem domain. Product The system or software product that is the actual focus of performance measurements.

3 Performance systems should embrace reusability


Building new automation or tooling is expensive. The reasons for reuse need not be explained at length but, briefly, they fall into the following categories: Common problems and solutions Common skill sets Greater probability of good documentation These points do not stop many from rushing into solutions-for-today, problems-fortomorrow designs - clearly a balance must be drawn. Similarly too many existing tools get reviewed but discarded because they had not considered future reuse. The educated choice is to spend resource upgrading an existing tool, if possible, to cover more aspects, thus solving your problem and improving the tool for the rest of the community. Developing tooling and automation for performance should be treated with the same rigour as for product development. Even though this might not be immediately evident, end users will often not be the group developing the tools.

3.1 A modular automation example

1. 2. 3. 4. 5. 6.

Generic automation core Performance focussed automation core and modules Product-specific automation modules (that know about the Product). Generic tooling Performance tooling Product-specific tooling

The only areas a performance test developer needs to program/maintain are the (blue) user domain specific tooling (i.e. things involved directly with performance testing the product you work on) and especially the automation module that handles them. This module is aware of what is being tested, knows how to invoke the commands can run different styles of test and do any other custom actions. Where items are more general, they should be written as reusable modules (eg. STAF services) and made available at a higher level which can be shared by others. More detail on elements making up reusable automation are given next.

3.2 General automation should be handled through existing packages


Testing, in the general software scope, already provides many systems for invoking a remote process on an arbitrary platform, controlling it and collecting any results. The aptness of these test harnesses for performance work can be less than desirable but the services they provide cannot (sensibly) be ignored and re-implemented. Performance options and alternatives should be added to the existing systems if at all possible. In an organisation with good test automation (performance or otherwise), reuse should not be an insurmountable problem.

3.3 Performance automation should have a generic core


Having a common core set of tools allows the effort of maintenance to be shared. This is clearly a statement across all tooling, but in the case of performance automation it raises questions as to the impact of generalising. The arguments for rolling your own performance system revolve around the efficiency of reporting information / co-ordinating multiple entities. At all times these overheads should be as low as possible on the machines under test. Having observed several attempts at this, my conclusion is that bespoke reporting systems will probably fall short in some aspect or other, even in respect to the performance impact! In particular, what works efficiently one day may become less so over time as the environment it operates in changes. If the overhead of a generic performance core is a worry then it can be easily quantified by having loosely-coupled tooling (see later section about this) that can be run manually without any automation present. Of course not everything can be generalised and, specifically in the performance field, not everything should be generalised. Here, briefly, are some features which could be implemented to make automation more performance focused.

3.3.1 Cross platform resource monitoring


Each operating system has its own unique ways of querying performance counters on the myriad of measurable items. The majority of these items can be found across all platforms and a general interface to access these in an efficient manner is essential. Features of a good tool would be: Multiple platform coverage with a common input and output format. Multiple metrics. CPU, disk, memory, network etc Per-processor CPU usage Per-Process resource usage Overhead of all these optional metrics should be taken up in an initialisation phase not runtime Standalone tool that can be shared by everyone in the performance community (and beyond of course!)

3.3.2 Constraint monitoring rules


More complex performance tests will be designed to push limits of a system and see what maximum values are achievable before the measurement environment

constrains. In driving this workload it is desirable to keep track of any constraint conditions in the system. Examples of constraints include: Performance response time > 1 second CPU > 95% Memory usage > 80% Test duration > predicted (or previously observed average) Errors detected A constraint monitor can take action (commonly just to end the test) if a particular rule is broken for more than a defined number of consecutive checks. Without constraint checking, machines can be left in a limbo state which is hard to recover from (particularly with operating system constraints such as paging space). It is often hard to determine where the bottleneck was after an event such as this.

3.3.3 Client apportioning algorithms (load balancing)


Some tests will only use one or two machines. Others require a set of machines playing host to larger set of applications generating load. The process of selecting precisely which machines to place which applications on is not something the test needs to pre-define. The simplest incarnation of an apportioning algorithm is a simple round-robin distribution. Extensions to this include: User defined weightings of machines Weightings based on offline assessments of each machines power and scalability Availability of machines (having a smaller pool does not always invalidate a test) Real-time feedback from the machines resource monitoring This ability allows new machines to be added easily and the entire test suite to be transported to a new location.

3.3.4 Runtime test interaction


Runtime test interaction does not have to mean that human users can control everything in a running system. This feature would introduce significant performance overheads in any systems which supported them. As a bare minimum though, performance automation should be capable of starting and stopping processes during the test. Examples where this is useful: Vary number of clients over time to support Performance Test objectives (see the later section on objectives) Allow statistics and traces to be run in flight to support Performance Development objectives (e.g. run a trace if errors are detected) Allow constraint rules or error detection to initiate relevant actions.

3.3.5 Statistical comparison


Good automation and tooling technology captures critical issues such as experimental repeatability, statistical tests for convergence, correct computation of variability, and so on. Many people make errors in these areas, and they can result in errors in analysis and a wasted investment. Once built, the automation and analysis tools tend to be accepted as correct. This is dangerous unless much care has been taken in their construction and validation.

Whilst it is not possible to predict how performance will be measured and the meaning of the variables being observed, templates can demonstrate how to postprocess data from the automation.

3.3.6 Strict environment definition


It is becoming more common now to see automation suites that take an environment definition (beyond simply the main product) and either choose a machine that matches and/or makes the relevant changes to a named machine. As with any test environment, repeatability and trust can only be improved by this feature.

3.3.7 Environment deviation detection


In performance, especially in areas where multiple people share pools of machines for multiple tests, the environment will change over time on any particular machine. Some of these things can be controlled by environment definition and some cannot. The automation can quite easily list the environment and record it into the log for any particular test. No understanding of what it is logging is required. The logged environment can then be compared (diffed) to the last test of that kind and any deviation flagged for human operators to investigate. Examples: General scripts that detail the hardware and operating system constants (number of CPUs, OS levels, security patches etc) Usage-specific scripts eg. java version, ipconfig Recording the environment also gives a lot more variables which can be used to reference test results from a database. Detected settings are better than recording the defined settings simply because occasionally things dont match and some items may not be included in environment definitions.

3.4 Generic automation core should be shared


Apart from one or two highly expensive commercial products there is little prepackaged performance tooling available for use. This is because very little generic automation is created for any particular task. IBM is in the privileged position of being large enough to challenge this status quo (dare I say leverage our assets). We have already had great success in this area with the STAF product, which currently focuses on general test automation. The Rational family of products also provides performance tooling, based on the open source Hyades framework in Eclipse, although this is currently limited in functionality. A generic, shared performance harness is Used and maintained by multiple groups. Documented. As soon as something is to be shared, it needs documentation. Before this point, it should be documented and commonly isnt. Supportable. The combination of the above two points means issues are more likely to be noticed and there is more resource and impetus to address those issues in a timely fashion. Available. There is no point in obscuring good work within a single department that could just as easily be made visible to the wider community. It is also important for uptake that involvement is encouraged and made simple.

You might also like